# Profitable App Profiles for the App Store and Google Play Markets

## Business Problem

We are intending on building a new app that will be free to users and will contain adds. Our goal is to maximize our revenue with this app. Due to our method of monitization, the app will make more money the more users use the app and engage with the adds. We would like to discover what types of apps have the most amount of users so that we can have better information when we are deciding what type of app we would like to make.

## Solution

Investigate a dataset of current apps on both the App Store and the Google Play Markets. Determine what types of apps have the most users.


---

## Turn CSVs into Lists `Helper`

Create a function that takes in a csv as an argument and returns the header and body of the dataset as lists.


In [1]:
from csv import reader


def create_list_from_csv(csv):
    opened_file = open(csv)
    read_file = reader(opened_file)
    new_list = list(read_file)
    return new_list[0], new_list[1:]


## Basic Exploration of Data `Helper`

Create a function that takes in a dataset (as a list of lists), starting row, ending row, and optional rows_and_columns toggle, and displays basic information about the dataset. It will show rows between the two added numbers and if the toggle is on it will also display the number of rows and columns.


In [2]:
def explore_data(dataset, start, end, row_and_column_count=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')

    if row_and_column_count:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


## Initial exploration

For this exploration two datasets from Kaggle are being used. For more information on the datasets, they can be found here [Apple Store Dataset](https://www.kaggle.com/lava18/google-play-store-apps) and [Google Play Store Dataset](https://www.kaggle.com/lava18/google-play-store-apps).

Open both datasets. Turn them into lists. print the first four rows of each and display the row and column count of each.


In [3]:
[apple_header, apple_data] = create_list_from_csv('./raw-data/apple-store.csv')
print("Apple data:")
print(apple_header)
print('\n')
explore_data(apple_data, 0, 3, True)


Apple data:
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17


In [4]:

[android_header, android_data] = create_list_from_csv(
    './raw-data/google-play-store.csv')
print("Android data:")
print(android_header)
print('\n')
explore_data(android_data, 0, 3, True)


Android data:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


## Remove Row with Missing Data

One of the rows in the Google Play Dataset is missing information in one of the rows. More information on that can be found here [Discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015). This row will be removed to avoid issues with missing data.


In [5]:
print(android_data[10472])
android_data_without_missing = android_data[:]
del android_data_without_missing[10472]


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


## Remove Duplicate Entries

There are a number of entries that are duplicated in the dataset. After inspection, it turns out that the only difference has to do with the number of reviews. Only the duplicate entry with the most reviews will be kept (this is under the assumption that it is the most recent entry).


In [6]:
unique_apps = []
duplicate_apps = []

for app in android_data_without_missing:
    app_name = app[0]

    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)

print("Number of unique apps:", len(unique_apps))
print("Number of duplicate apps:", len(duplicate_apps))


Number of unique apps: 9659
Number of duplicate apps: 1181


### Create a dictionary with max reviews value
- Loop through the `android_data_without_missing` dataset
- Create a variable for the app name
- Create a variable of the number of app reviews *converting it to a float*
- Check if there is an entry in the dictionary with the app name or if there is and the stored value is less than the current number of reviews
  - If so set the value for the app name to the current number of reviews

### Use dictionary to remove duplicate entries with lower reviews values
- Create two empty lists: `android_clean` and `android_added`
- Loop through the `android_data_without_missing` dataset
- Check if the current app name is not in the `android_added` list and that the current review count matches the max value stored in the dictionary
  - If so add the current list to `android_clean` and add the app name to `already_added` (This ensures that multiple duplicates that may match the max value don't sneak in)

In [7]:
reviews_max = {}

for app in android_data_without_missing:
    app_name = app[0]
    n_reviews = float(app[3])

    if app_name not in reviews_max or reviews_max[app_name] < n_reviews:
        reviews_max[app_name] = n_reviews

android_clean = []
already_added = []

for app in android_data_without_missing:
    app_name = app[0]
    n_reviews = float(app[3])

    if reviews_max[app_name] == n_reviews and app_name not in already_added:
        android_clean.append(app)
        already_added.append(app_name)


## Is String English `Helper`
This function takes in a string and loops over each character. It increments an internal count everytime it finds a char that is higher than 127 (according to ASCII). If the resulting number is higher than 3 it returns False otherwise it returns True

In [8]:
def is_english(str):
    foreign_count = 0;
    for char in str:
        if ord(char) > 127:
            foreign_count += 1
    
    return foreign_count <= 3

## Filter For English `Helper`
Create a an empty list to put the english apps in. Loop through the cleaned data and run the current app name through the `is_english` helper function. If the function returns True, add the app to the new list.

In [9]:
def english_apps_filter(dataset, name_column):
    english_apps = []

    for app in dataset:
        app_name = app[name_column]
        if (is_english(app_name)):
            english_apps.append(app)

    return english_apps

## Remove Non English Apps
Use helper functions to remove non english apps from both datasets

In [10]:
android_apps_english = english_apps_filter(android_clean, 0)

print(len(android_apps_english))

9614


In [11]:
ios_apps_english = english_apps_filter(apple_data, 2)

print(len(ios_apps_english))

6183


## Format Price and Get Free Apps `Helpers`
`format_price` takes in a string, removes the dollar sign and converts what remains into a float.  
`get_free_apps` loops through a dataset, converting the current price into a float and checking if the value is `0.0` (adding it to a list if it does), and returning the list after the loop is done.

In [12]:
def format_price(price):
    price = price.replace('$', '')
    return float(price)

def get_free_apps(dataset, price_column):
    free_apps = []

    for app in dataset:
        price = format_price(app[price_column])
        if (price == 0):
            free_apps.append(app)

    return free_apps

In [13]:
andriod_free = get_free_apps(android_apps_english, 7)
ios_free = get_free_apps(ios_apps_english, 5)

print(len(andriod_free))
print(len(ios_free))


8864
3222


## Go To Market Strategy
1. Build Andriod MVP and add to Google Play Store
2. Assess and Increment or Pivot
3. If the app is profitable at six months, build iOS version and add to the App Store

## Method to Determine Best Genre For New App
In order to build a successful app with adds in both markets it will be important to know what genres of apps do well in both markets. The best way to get that is to build a frequency table of genres for each platform to see what the top genres are and how they are different on each platform.

In [16]:
def freq_table(dataset, index):
    table = {}

    for row in dataset:
        item = row[index]

        if item in table:
            table[item] += 1
        else:
            table[item] = 1

    return table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []

    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
        print(entry[1], ":", entry[0])

In [17]:

display_table(ios_free, 12)

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


In [18]:
display_table(andriod_free, 9)

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 81
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
Arcade;Action & Advent

In [19]:
display_table(andriod_free, 1)

FAMILY : 1676
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53
