# Profitable App Profiles for the App Store and Google Play Markets
Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

In [1]:
from csv import reader
import re

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
tmp_list = list(read_file)
android_header = tmp_list[0]
android = tmp_list[1:]

### The Apple App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
tmp_list = list(read_file)
ios_header = tmp_list[0]
ios = tmp_list[1:]

## Explore Data function
Define explore_data() function to assis in previwing data easily

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
explore_data(android, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


In [4]:
explore_data(ios, 0, 5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


## Find bad rows
Find any rows that are not the same length (missing fields):

In [5]:
for row in android[0:]:
    if len(row) != len(android[0]):
        print(android.index(row))
        print(len(row))

10472
12


Row index 10472 is missing the 'Category' (2nd) field. It does not have a very high rating (1.9) so is unlikely to be impactful to our analysis, so lets just remove it. Add the app_name qualifier so as not to remove other rows by multiple runs:

In [6]:
print(android[10472])
if android[10472][0] == 'Life Made WI-Fi Touchscreen Photo Frame':
    del android[10472]
print(android[10472])    

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


The ios dataset does not have any rows missing fields. All are the same length:

In [7]:
for row in ios[0:]:
    if len(row) != len(ios[0]):
        print(ios.index(row))
        print(len(row))

## Identify if dupe apps exist
Create function to find duplicate apps by their app_name. Accept the index of the app name field so it can be used for multiple datasets. Return the list of dupe app names:

In [8]:
def find_dupes(dataset, app_name_index):
    unique = []
    dupes = []
    for row in dataset:
        app_name = row[app_name_index]
        if app_name in unique:
            dupes.append(app_name)
        else:            
            unique.append(app_name)
    return dupes

In [9]:
ios_dupes = find_dupes(ios,1)
print(ios_dupes)
### found 2 dupes in ios dataset:

['Mannequin Challenge', 'VR Roller Coaster']


In [10]:
android_dupes = find_dupes(android,0)
print(len(android_dupes))
print(android_dupes[0:15])
### android dataset has 1181 dupe apps

1181
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


## Determine how dupes can be fixed
Create a function to view all the rows with a single app_name

In [11]:
def print_app_by_value(dataset, index_of_field, field_value):
    for row in dataset:
        app_value = row[index_of_field]
        if app_value == field_value:
            print(row)


In [12]:
print_app_by_value(ios,1,'VR Roller Coaster')

['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


In [13]:
print_app_by_value(android,0,'Instagram')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Base on what we see above, we can use choose to keep the app row with the most reviews

Write a function to delete dupes, keeping the app row with the most reviews:

In [14]:
def create_max_reviews_dict(dataset, app_name_index, reviews_index):
    reviews_max = {}
    for row in dataset:
        app_name = row[app_name_index]
        n_reviews = float(row[reviews_index])
        if app_name in reviews_max and reviews_max[app_name] < n_reviews:
            reviews_max[app_name] = n_reviews
        if app_name not in reviews_max:
            reviews_max[app_name] = n_reviews
    return reviews_max

## Confirm results for dedupe strategy
Verify a value from each dataset. Make sure the reviews amount is indeed the largest for that app_name. Also verify the length of reviews_max is `total - dupes`.

In [15]:
# index mapping
# ios[1] = "track_name": App Name
# ios[5] = "ratingcounttot": User Rating counts (for all version)
ios_review_max = create_max_reviews_dict(ios,1,5)

print (ios_review_max['VR Roller Coaster'])
print ('Total with max reviews: ', len(ios_review_max))
print ('Expected Total - Dupes: ', len(ios) - len(ios_dupes))

107.0
Total with max reviews:  7195
Expected Total - Dupes:  7195


In [16]:
# index mapping
# android[0] = "App": Application name
# android[3] = "Reviews": Number of user reviews for the app (as when scraped)
android_review_max = create_max_reviews_dict(android,0,3)

print (android_review_max['Instagram'])
print ('Total with max reviews: ', len(android_review_max))
print ('Expected Total - Dupes: ', len(android) - len(android_dupes))

66577446.0
Total with max reviews:  9659
Expected Total - Dupes:  9659


## Create unique datasets
Create a function that takes in the original dataset and the review_max dataset and returns a clean dataset of unique apps with all the same fields as the original.

In [17]:
def create_clean_dataset(dataset_orig, dict_review_max, app_name_index, reviews_index):
    dataset_clean = []
    already_added = []
    
    for row in dataset_orig:
        app_name = row[app_name_index]
        n_reviews = float(row[reviews_index])
        
        if n_reviews == dict_review_max[app_name] and app_name not in already_added:
            dataset_clean.append(row)
            already_added.append(app_name)
    
    return dataset_clean

In [18]:
# index mapping
# ios[1] = "track_name": App Name
# ios[5] = "ratingcounttot": User Rating counts (for all version)
ios_clean = create_clean_dataset(ios,ios_review_max,1,5)

print(len(ios_clean))
explore_data(ios_clean, 0, 5)

7195
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']




In [19]:
# index mapping
# android[0] = "App": Application name
# android[3] = "Reviews": Number of user reviews for the app (as when scraped)
android_clean = create_clean_dataset(android,android_review_max,0,3)

print(len(android_clean))
explore_data(android_clean, 0, 5)

9659
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']




## Find non-english apps
Our analysis should not include apps that are clearly not in english.

In [20]:
print(ios[813][1])
print(ios[6731][1])
print(ord('爱'))

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
29233


Create function that takes in a string and returns `False` if any character doesn't belong in the set of common English characters (Ord() > 127); otherwise return `True`

In [21]:
def is_english(string_input):
    for character in string_input:
        if ord(character) > 127:
            return False
    return True

In [22]:
#test function with some values
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


In [23]:
# index mapping
# ios[1] = "track_name": App Name
ios_non_english_apps = []

for row in ios_clean:
    if not is_english(row[1]):
        ios_non_english_apps.append(row[1])

print(len(ios_non_english_apps))      
explore_data(ios_non_english_apps,1,5)        

1490
The Sims™ FreePlay


8 Ball Pool™


Lose It! – Weight Loss Program and Calorie Counter


▻Sudoku




In [24]:
# index mapping
# android[0] = "App": Application name
android_non_english_apps = []

for row in android_clean:
    if not is_english(row[0]):
        android_non_english_apps.append(row[0])

print(len(android_non_english_apps))      
explore_data(android_non_english_apps,1,5)  

542
CarMax – Cars for Sale: Search Used Car Inventory


AutoScout24 Switzerland – Find your new car


Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo


ReadEra – free ebook reader




The current version of the non_english function is excluding apps that we want to keep:
- 'The Sims™ FreePlay'
- 'Lose It! – Weight Loss Program and Calorie Counter'
- '▻Sudoku'
- 'CarMax – Cars for Sale: Search Used Car Inventory'

Let's adjust it to have a minimum of 5 non-english characters before calling it non-english

In [25]:
#after redefining this funciton, check results again
def is_english(string_input):
    string_len = float(len(string_input))
    non_english_count = 0
    percent_of_total = 0.0
    
    for character in string_input:
        if ord(character) > 127:
            non_english_count += 1
            percent_of_total = non_english_count / string_len
        if percent_of_total > 0.5:
            return False
    return True

In [26]:
# index mapping
# ios[1] = "track_name": App Name
ios_non_english_apps = []

for row in ios_clean:
    if not is_english(row[1]):
        ios_non_english_apps.append(row[1])

print(len(ios_non_english_apps))      
explore_data(ios_non_english_apps,1,5) 

958
聚力视频HD-人民的名义,跨界歌王全网热播


优酷视频


网易新闻 - 精选好内容，算出你的兴趣


淘宝 - 随时随地，想淘就淘




In [27]:
# index mapping
# android[0] = "App": Application name
android_non_english_apps = []

for row in android_clean:
    if not is_english(row[0]):
        android_non_english_apps.append(row[0])

print(len(android_non_english_apps))      
explore_data(android_non_english_apps,1,5) 

24
РИА Новости


صور حرف H


RMEduS - 음성인식을 활용한 R 프로그래밍 실습 시스템


AJ렌터카 법인 카셰어링




The first version of the function detected many "non-english" apps, but upon investigating, some where apps we want to keep:

- ios: 1490
- android: 542

With the updated function we now have better results:

- ios: 958
- android: 24

And our sample shows:
- 聚力视频HD-人民的名义,跨界歌王全网热播
- 网易新闻 - 精选好内容，算出你的兴趣
- صور حرف H
- RMEduS - 음성인식을 활용한 R 프로그래밍 실습 시스템

In [28]:
def create_english_dataset(dataset_orig, app_name_index):
    dataset_english = []
    
    for row in dataset_orig:
        app_name = row[app_name_index]
        
        if is_english(app_name):
            dataset_english.append(row)
    
    return dataset_english

In [29]:
# index mapping
# ios[1] = "track_name": App Name
ios_english = create_english_dataset(ios_clean,1)

# confirm the count of ios_english
print('English length: ',len(ios_english))
print('Expected length: ',len(ios_clean) - len(ios_non_english_apps))
explore_data(ios_english, 0, 5)

English length:  6237
Expected length:  6237
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']




In [30]:
# index mapping
# android[0] = "App": Application name
android_english = create_english_dataset(android_clean,0)

# confirm the count of android_english
print('English length: ',len(android_english))
print('Expected length: ',len(android_clean) - len(android_non_english_apps))
explore_data(android_english, 0, 5)

English length:  9635
Expected length:  9635
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']




## Restricting to Free Apps
So far in the data cleaning process, we've done the following:

- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps

As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps, so let's filter down to just free apps.

In [31]:
def create_free_dataset(dataset_orig, price_index):
    dataset_free = []
    
    for row in dataset_orig:
        price_string = row[price_index]
        #clean string before converting to float
        
        price = float(re.sub('[^\d\.]', '',price_string))
        
        if price == 0:
            dataset_free.append(row)
    
    return dataset_free

In [32]:
# index mapping
# ios[4] = "price": Price amount
ios_free = create_free_dataset(ios_english,4)

# show count of free apps and confirm a few
print('free length: ',len(ios_free))
explore_data(ios_free, 0, 5)

free length:  3263
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']




In [33]:
# index mapping
# android[7] = "Price": Price of the app (as when scraped)
android_free = create_free_dataset(android_english,7)

# show count of free apps and confirm a few
print('free length: ',len(android_free))
explore_data(android_free, 0, 5)

free length:  8882
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']




We have the remaining number of clean free apps:
- ios_free:     3,263
- android_free: 8,882

## Start the Analysis
### Find most popular apps by genre
As mentioned before, our company only builds free to install apps and makes revenue from ads. Therefore, we want to build an app that attracts the most users. 

Our strategy for developing a new app will follow these steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

### Part 1 - common genres
Since our app needs to be successful for both Android and iOS, we start by looking at the most common genres in both datasets and look for comminalities. Let's start by developing a function the builds a frequency table for a given field index.

In [34]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [35]:
# index mapping
# ios_free[11] = "prime_genre": Primary Genre
ios_freq_table_genre = freq_table(ios_free, 11)

In [36]:
print(ios_freq_table_genre)

{'Social Networking': 3.2178976402083976, 'Photo & Video': 4.96475635917867, 'Games': 57.98345081213607, 'Music': 2.0226785167024213, 'Reference': 0.5516395954642966, 'Health & Fitness': 1.9920318725099602, 'Weather': 0.8581060373889058, 'Utilities': 2.5130248237817963, 'Travel': 1.2565124118908981, 'Shopping': 2.63561140055164, 'News': 1.3484523444682808, 'Navigation': 0.18387986515476554, 'Lifestyle': 1.6242721422004291, 'Entertainment': 8.029420778424763, 'Food & Drink': 0.9193993257738278, 'Sports': 2.1146184492798037, 'Book': 0.4596996628869139, 'Finance': 1.1032791909285933, 'Education': 3.6163040147103893, 'Productivity': 1.7468587189702727, 'Business': 0.5209929512718358, 'Catalogs': 0.12258657676984369, 'Medical': 0.21452650934722645}


This gives us the percentages, but it is diffuclt to see the most common one with them out of order. Let's build another function to order our freqency tables.

In [37]:
def display_table(freq_table):
    table_display = []
    for key in freq_table:
        key_val_as_tuple = (freq_table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [38]:
display_table(ios_freq_table_genre)

Games : 57.98345081213607
Entertainment : 8.029420778424763
Photo & Video : 4.96475635917867
Education : 3.6163040147103893
Social Networking : 3.2178976402083976
Shopping : 2.63561140055164
Utilities : 2.5130248237817963
Sports : 2.1146184492798037
Music : 2.0226785167024213
Health & Fitness : 1.9920318725099602
Productivity : 1.7468587189702727
Lifestyle : 1.6242721422004291
News : 1.3484523444682808
Travel : 1.2565124118908981
Finance : 1.1032791909285933
Food & Drink : 0.9193993257738278
Weather : 0.8581060373889058
Reference : 0.5516395954642966
Business : 0.5209929512718358
Book : 0.4596996628869139
Medical : 0.21452650934722645
Navigation : 0.18387986515476554
Catalogs : 0.12258657676984369


It appears that over 57% of the iOS Apps (free and english) are games and another 8% are for entertainment.

Let's do the same analytis for the Android category.

In [39]:
# index mapping
# android_free[1] = "Category:": Category the app belongs to
display_table(freq_table(android_free,1))

FAMILY : 18.937176311641522
GAME : 9.716280117090745
TOOLS : 8.444044134204008
BUSINESS : 4.59356000900698
LIFESTYLE : 3.9292952037829316
PRODUCTIVITY : 3.8955190272461158
FINANCE : 3.6928619680252197
MEDICAL : 3.5239810853411395
SPORTS : 3.388876379193875
PERSONALIZATION : 3.3213240261202435
COMMUNICATION : 3.231254222022067
HEALTH_AND_FITNESS : 3.073632064850259
PHOTOGRAPHY : 2.938527358702995
NEWS_AND_MAGAZINES : 2.814681378068003
SOCIAL : 2.6570592208961945
TRAVEL_AND_LOCAL : 2.3305561810403064
SHOPPING : 2.24048637694213
BOOKS_AND_REFERENCE : 2.161675298356226
DATING : 1.8576897095248817
VIDEO_PLAYERS : 1.7901373564512497
MAPS_AND_NAVIGATION : 1.4073406890340014
FOOD_AND_DRINK : 1.238459806349921
EDUCATION : 1.159648727764017
ENTERTAINMENT : 0.956991668543121
LIBRARIES_AND_DEMO : 0.9344742175185768
AUTO_AND_VEHICLES : 0.9232154920063049
HOUSE_AND_HOME : 0.8218869623958567
WEATHER : 0.7993695113713127
EVENTS : 0.7092997072731367
PARENTING : 0.6530060797117767
ART_AND_DESIGN : 0.641

For Andoid, it appears less than 10% are games, however, if we go to the play store and look at the FAMILY category, we see these are mostly kids' games, so let's lump them together and say about 20% of the apps are games for Android.

There was a second field in the android data called `Genres`, so let's see if that field gives us any more clarity:

In [40]:
# index mapping
# android_free[9] = "Genres:": An app can belong to multiple genres (apart from its main category)
display_table(freq_table(android_free,9))

Tools : 8.432785408691736
Entertainment : 6.079711776626886
Education : 5.359153343841477
Business : 4.59356000900698
Lifestyle : 3.91803647827066
Productivity : 3.8955190272461158
Finance : 3.6928619680252197
Medical : 3.5239810853411395
Sports : 3.4564287322675074
Personalization : 3.3213240261202435
Communication : 3.231254222022067
Action : 3.096149515874803
Health & Fitness : 3.073632064850259
Photography : 2.938527358702995
News & Magazines : 2.814681378068003
Social : 2.6570592208961945
Travel & Local : 2.319297455528034
Shopping : 2.24048637694213
Books & Reference : 2.161675298356226
Simulation : 2.037829317721234
Dating : 1.8576897095248817
Arcade : 1.8464309840126099
Video Players & Editors : 1.7676199054267057
Casual : 1.7563611799144336
Maps & Navigation : 1.4073406890340014
Food & Drink : 1.238459806349921
Puzzle : 1.125872551227201
Racing : 0.990767845079937
Role Playing : 0.9344742175185768
Libraries & Demo : 0.9344742175185768
Strategy : 0.9232154920063049
Auto & Vehic

This appears to be much more detailed and breaks down our "Games" category into a bunch of sub-genres. This is not helpful in comparing to iOS, so we'll just use the analysis from the `Category` field going forward.

### Part 2 - Check user counts per genre
Our frequency tables above simply tell us how many of the free-english apps were created in each genre. It doesn't neccessarily mean those apps draw the most users. Remember, since the app we are developing will be ad-revenue based, we want as many people using the app as possible.

The Android dataset has a field called `Installs`, but the iOS dataset does not have a comparable field. Instead, let's go with user review counts to determine the popularity of an app. Regardless if the review is good or bad, we can infer the user likey used the app long enough to be able to give a review.

For iOS, we'll use `rating_count_tot` to count the ratings for all versions of the app. On Android, the field will be `Reviews`.

For each genre, we'll want to calculate the average number of reviews per app, this way, the shear number of apps in a genre will not skew our totals.

To do this, we'll create a new function that takes in our previous frequency tables (dictionaries). We'll loop through each genre key, looking for apps that match that genre. For each app found, we'll sum the total user ratings as well as keep track of how many apps for that genre we enountered. At the end, we'll do the division to calculate the average per genre.

In [41]:
def avg_user_counts(dataset, genre_index, count_index, freq_table):
    genre_avg_user_counts = {}
    
    for genre in freq_table:
        total = 0
        len_genre = 0
        
        for row in dataset:
            genre_app = row[genre_index]
            
            if genre_app == genre:
                ratings_count = float(row[count_index])
                total += ratings_count
                len_genre += 1
        
        avg_user_counts = total / len_genre
        genre_avg_user_counts[genre] = avg_user_counts
    
    return genre_avg_user_counts
            



In [42]:
# index mapping
# ios_free[11] = "prime_genre": Primary Genre
# ios_free[5] = "rating_count_tot": User Rating counts (for all version)
display_table(avg_user_counts(ios_free, 11, 5, freq_table(ios_free, 11)))

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 72222.84761904762
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 37107.933333333334
Finance : 31468.027777777777
Food & Drink : 28891.033333333333
Photo & Video : 28090.413580246914
Travel : 27551.219512195123
Shopping : 26293.732558139534
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22572.40539112051
News : 20765.113636363636
Productivity : 20659.491228070176
Utilities : 18467.256097560974
Lifestyle : 15872.924528301886
Entertainment : 13602.141221374046
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 525.4285714285714


In [43]:
# index mapping
# android_free[1] = "Category:": Category the app belongs to
# android_free[3] = "Reviews": Number of user reviews for the app (as when scraped)
display_table(avg_user_counts(android_free, 1, 3, freq_table(android_free, 1)))

COMMUNICATION : 995608.4634146341
SOCIAL : 965830.9872881356
GAME : 682733.6234067207
VIDEO_PLAYERS : 425350.08176100627
PHOTOGRAPHY : 404081.3754789272
TOOLS : 305732.8973333333
ENTERTAINMENT : 301752.24705882353
SHOPPING : 223887.34673366835
PERSONALIZATION : 180508.3525423729
WEATHER : 171250.77464788733
PRODUCTIVITY : 160170.91040462427
MAPS_AND_NAVIGATION : 141775.68
TRAVEL_AND_LOCAL : 129484.42512077295
SPORTS : 116938.6146179402
FAMILY : 112837.6694411415
NEWS_AND_MAGAZINES : 92343.584
BOOKS_AND_REFERENCE : 87087.75520833333
HEALTH_AND_FITNESS : 78094.9706959707
FOOD_AND_DRINK : 57478.79090909091
EDUCATION : 56293.09708737864
COMICS : 42585.61818181818
FINANCE : 38535.8993902439
LIFESTYLE : 33767.031518624644
HOUSE_AND_HOME : 26435.465753424658
ART_AND_DESIGN : 24699.42105263158
BUSINESS : 24181.105392156864
DATING : 21953.272727272728
PARENTING : 16378.706896551725
AUTO_AND_VEHICLES : 14140.280487804877
LIBRARIES_AND_DEMO : 10925.807228915663
BEAUTY : 7476.226415094339
MEDICAL 

Now that were a looking the popularity of the apps, we can see the top three genres for iOS are:
1. Navigation : 86090
2. Reference : 74942
3. Social Networking : 72222

And for Android:
1. COMMUNICATION : 995608
2. SOCIAL : 965830
3. GAME : 682733

We see `Social` in both top 3's, but I suspect those numbers are going to be dominated by some very well established social media apps like Facebook and Twitter.

### Part 3 - Check for apps skewing the averages

Let's create another function to see the ratings counts of each app in a specific genre.

In [44]:
def app_percent_of_user_counts(dataset, genre_index, genre_filter, count_index, app_name_index):
    app_counts_dict = {}
    total_ratings = 0
    
    for row in dataset:
        app_genre = row[genre_index]
            
        if app_genre == genre_filter:
            ratings_count = float(row[count_index])
            total_ratings += ratings_count
            app_name = row[app_name_index]
            app_counts_dict[app_name] = ratings_count
    
    #convert each app's ratings count to a percentage of total
    for key, value in app_counts_dict.items():
        app_counts_dict[key] = round(value / total_ratings * 100.00,2)
    
    return app_counts_dict
    

In [45]:
# index mapping
# ios_free[11] = "prime_genre": Primary Genre
# ios_free[5] = "rating_count_tot": User Rating counts (for all version)
# ios_free[1] = "track_name": App Name
# Navigation : 86090.33333333333
# Reference : 74942.11111111111
# Social Networking : 72222.84761904762
display_table(app_percent_of_user_counts(ios_free, 11, 'Weather', 5, 1))

The Weather Channel: Forecast, Radar & Alerts : 33.86
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 14.25
WeatherBug - Local Weather, Radar, Maps, Alerts : 12.88
MyRadar NOAA Weather Radar Forecast : 10.26
AccuWeather - Weather for Life : 9.85
Yahoo Weather : 7.69
Weather Underground: Custom Forecast & Local Radar : 3.36
NOAA Weather Radar - Weather Forecast & HD Radar : 3.12
Weather Live Free - Weather Forecast & Alerts : 2.44
Storm Radar : 1.56
QuakeFeed Earthquake Map, Alerts, and News : 0.42
Moji Weather - Free Weather Forecast : 0.16
Hurricane by American Red Cross : 0.08
Forecast Bar : 0.03
iWeather - World weather forecast : 0.01
Weather - Radar - Storm with Morecast App : 0.01
Hurricane Tracker WESH 2 Orlando, Central Florida : 0.01
FEMA : 0.01
wetter.com : 0.0
Yurekuru Call : 0.0
Weather & Radar : 0.0
WarnWetter : 0.0
WRAL Weather Alert : 0.0
TodayAir : 0.0
Météo-France : 0.0
JaxReady : 0.0
Freddy the Frogcaster's Weather Station : 0.0

In [46]:
# index mapping
# android_free[1] = "Category:": Category the app belongs to
# android_free[3] = "Reviews": Number of user reviews for the app (as when scraped)
# android[0] = "App": Application name
# COMMUNICATION : 992152.3194444445
# SOCIAL : 965830.9872881356
# GAME : 683523.8445475638
display_table(app_percent_of_user_counts(android_free, 1, 'COMMUNICATION', 3, 0))    

WhatsApp Messenger : 24.19
Messenger – Text and Video Chat for Free : 19.82
UC Browser - Fast Download Private & Secure : 6.2
BBM - Free Calls & Messages : 4.49
Viber Messenger : 3.97
LINE: Free Calls & Messages : 3.78
Skype - free IM & video calls : 3.67
Google Chrome: Fast & Secure : 3.37
Truecaller: Caller ID, SMS spam blocking & Dialer : 2.74
WeChat : 1.89
Opera Mini - fast web browser : 1.8
imo free video calls and chat : 1.67
Gmail : 1.61
Yahoo Mail – Stay Organized : 1.47
UC Browser Mini -Tiny Fast Private & Secure : 1.28
Hangouts : 1.2
Telegram : 1.09
Firefox Browser fast & private : 1.08
GO SMS Pro - Messenger, Free Themes, Emoji : 1.01
KakaoTalk: Free Calls & Text : 0.89
Dolphin Browser - Fast, Private & Adblock🐬 : 0.88
Opera Browser: Fast and Secure : 0.87
Who : 0.86
Kik : 0.86
CM Browser - Ad Blocker , Fast Download , Privacy : 0.79
Google Duo - High Quality Video Calls : 0.73
Messenger Lite: Free Calls & Messages : 0.5
WhatsCall Free Global Phone Call App & Cheap Calls : 0

In [47]:
## Why is Twitter not in Social?
for app in android_free:
    if app[0] == 'Twitter':
        print(app)
        break

for app in ios_free:
    if app[1] == 'Twitter':
        print(app)
        break

['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11667403', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'August 6, 2018', 'Varies with device', 'Varies with device']
['333903271', 'Twitter', '210569216', 'USD', '0.0', '354058', '452', '3.5', '4.0', '6.79.1', '17+', 'News', '37', '2', '33', '1']


Above, we checked the top 3 genres for both datasets to see if specific apps are skewing the averages.

For iOS - both Navigation and Reference are dominated by one or two apps accounting for over 70% of the user ratings.

For both iOS and Android, the combination of Facebook + Instagram make up over 50% of the user ratings for Social. I was surpised to not see Twitter in that list, but it turns out that both stores categorize it as "News".

On Android, the only top 3 grenre that didn't show skewing was Games.

### Remove skewing apps and check average ratings again

Let's remove some of the above apps that skew averages and check the genre ratings again.

In [48]:
# iOS skewing apps:
# Waze - GPS Navigation, Maps & Real-time Traffic : 66.8
# Google Maps - Navigation & Transit : 29.99
# Bible : 73.09
# Dictionary.com Dictionary & Thesaurus : 14.83
# Facebook : 39.23
# Pinterest : 14.0
app_delete_list_names = ['Waze - GPS Navigation, Maps & Real-time Traffic'
                         ,'Google Maps - Navigation & Transit'
                         ,'Bible'
                         ,'Dictionary.com Dictionary & Thesaurus'
                         ,'Facebook'
                         ,'Pinterest']

ios_no_skew = []
# ios_free[1] = "track_name": App Name
for row in ios_free:
    if row[1] not in app_delete_list_names:
        ios_no_skew.append(row)

print('Length of ios_free: ', len(ios_free))
print('Expected length of ios_no_skew: ', len(ios_free) - len(app_delete_list_names))
print('Length of ios_no_skew: ', len(ios_no_skew))

Length of ios_free:  3263
Expected length of ios_no_skew:  3257
Length of ios_no_skew:  3257


In [55]:
# android skewing apps:
# Facebook : 34.29
# Instagram : 29.21
# WhatsApp Messenger : 24.19
# Messenger – Text and Video Chat for Free : 19.82
app_delete_list_names = ['Facebook'
                         ,'Instagram'
                         ,'WhatsApp Messenger'
                         ,'Messenger – Text and Video Chat for Free'
                        ]

android_no_skew = []
# android[0] = "App": Application name
for row in android_free:
    if row[0] not in app_delete_list_names:
        android_no_skew.append(row)

print('Length of android_free: ', len(android_free))
print('Expected length of android_no_skew: ', len(android_free) - len(app_delete_list_names))
print('Length of android_no_skew: ', len(android_no_skew))


Length of android_free:  8882
Expected length of android_no_skew:  8878
Length of android_no_skew:  8878


In [50]:
#recheck average ratings
# index mapping
# ios_free[11] = "prime_genre": Primary Genre
# ios_free[5] = "rating_count_tot": User Rating counts (for all version)
display_table(avg_user_counts(ios_no_skew, 11, 5, freq_table(ios_no_skew, 11)))

Music : 57326.530303030304
Weather : 52279.892857142855
Book : 37107.933333333334
Social Networking : 34437.854368932036
Finance : 31468.027777777777
Food & Drink : 28891.033333333333
Photo & Video : 28090.413580246914
Travel : 27551.219512195123
Shopping : 26293.732558139534
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22572.40539112051
News : 20765.113636363636
Productivity : 20659.491228070176
Utilities : 18467.256097560974
Lifestyle : 15872.924528301886
Entertainment : 13602.141221374046
Reference : 10186.9375
Business : 7491.117647058823
Education : 7003.983050847458
Navigation : 4146.25
Catalogs : 4004.0
Medical : 525.4285714285714


In [57]:
#recheck average ratings
# index mapping
# android_free[1] = "Category:": Category the app belongs to
# android_free[3] = "Reviews": Number of user reviews for the app (as when scraped)
display_table(avg_user_counts(android_no_skew, 1, 3, freq_table(android_no_skew, 1)))

GAME : 682733.6234067207
COMMUNICATION : 561311.350877193
VIDEO_PLAYERS : 425350.08176100627
PHOTOGRAPHY : 404081.3754789272
SOCIAL : 355557.0982905983
TOOLS : 305732.8973333333
ENTERTAINMENT : 301752.24705882353
SHOPPING : 223887.34673366835
PERSONALIZATION : 180508.3525423729
WEATHER : 171250.77464788733
PRODUCTIVITY : 160170.91040462427
MAPS_AND_NAVIGATION : 141775.68
TRAVEL_AND_LOCAL : 129484.42512077295
SPORTS : 116938.6146179402
FAMILY : 112837.6694411415
NEWS_AND_MAGAZINES : 92343.584
BOOKS_AND_REFERENCE : 87087.75520833333
HEALTH_AND_FITNESS : 78094.9706959707
FOOD_AND_DRINK : 57478.79090909091
EDUCATION : 56293.09708737864
COMICS : 42585.61818181818
FINANCE : 38535.8993902439
LIFESTYLE : 33767.031518624644
HOUSE_AND_HOME : 26435.465753424658
ART_AND_DESIGN : 24699.42105263158
BUSINESS : 24181.105392156864
DATING : 21953.272727272728
PARENTING : 16378.706896551725
AUTO_AND_VEHICLES : 14140.280487804877
LIBRARIES_AND_DEMO : 10925.807228915663
BEAUTY : 7476.226415094339
MEDICAL :

After removing the skewing apps from the top three genres by user counts, we see iOS has all new top three genres and Android had `GAME` move to the top.

On iOS, `Games` is in the middle of the pack, but let's check this genre for skewing in both data sets.

### Closer look at Games genre

In [51]:
print('===ios-Games===')
display_table(app_percent_of_user_counts(ios_free, 11, 'Games', 5, 1))


===ios-Games===
Clash of Clans : 4.99
Temple Run : 4.04
Candy Crush Saga : 2.25
Angry Birds : 1.93
Subway Surfers : 1.65
Solitaire : 1.59
CSR Racing : 1.59
Crossy Road - Endless Arcade Hopper : 1.57
Injustice: Gods Among Us : 1.43
Hay Day : 1.33
PAC-MAN : 1.19
DragonVale : 1.18
Head Soccer : 1.13
Despicable Me: Minion Rush : 1.09
The Sims™ FreePlay : 1.05
Sonic Dash : 0.98
8 Ball Pool™ : 0.98
Tiny Tower - Free City Building : 0.97
Jetpack Joyride : 0.95
Bike Race - Top Motorcycle Racing Games : 0.95
Kim Kardashian: Hollywood : 0.93
WordBrain : 0.92
Trivia Crack : 0.92
Sniper 3D Assassin: Shoot to Kill Gun Game : 0.91
Flow Free : 0.88
Geometry Dash Lite : 0.87
▻Sudoku : 0.84
Fruit Ninja® : 0.77
Pixel Gun 3D : 0.71
Temple Run 2 : 0.69
My Horse : 0.69
Word Cookies! : 0.67
Dragon City Mobile : 0.65
The Simpsons™: Tapped Out : 0.64
Plants vs. Zombies™ 2 : 0.63
Clash Royale : 0.63
Pokémon GO : 0.6
CSR Racing 2 : 0.6
Star Wars™: Commander : 0.59
Boom Beach : 0.57
MARVEL Contest of Champions :

Drive Ahead! : 0.05
Design Home : 0.05
Crazy Taxi™ City Rush : 0.05
Colorfly : Coloring Book for Adults - Free Games : 0.05
Century City : 0.05
Castle Crush: Epic Strategy Game : 0.05
Angry Birds Star Wars HD : 0.05
Amazing Thief : 0.05
100 PICS Quiz - guess the picture trivia games : 0.05
Zombie Catchers : 0.04
Zombie Castaways : 0.04
X-War: Clash of Zombies : 0.04
Willy Wonka Slots: Vegas Casino Slot Machines : 0.04
Warhammer 40,000: Freeblade : 0.04
The Walking Dead: Road to Survival - Strategy Game : 0.04
The Silent Age : 0.04
The Line Zen : 0.04
Tap Tap Fish - Abyssrium : 0.04
Talking Tom Gold Run: Fun & Endless Running Game : 0.04
Swamp Attack : 0.04
Sonic CD : 0.04
Shooty Skies - Endless Arcade Flyer : 0.04
Rodeo Stampede - Sky Zoo Safari : 0.04
Racing in Car : 0.04
Plants vs. Zombies™ Heroes : 0.04
Ninja Kid Run VR: Runner & Racing Games For Free : 0.04
Merged! : 0.04
MARVEL Avengers Academy : 0.04
Letter Soup - Word Game : 0.04
Kitchen Scramble: Cooking Game : 0.04
Hovercraft:

Loop Drive 2 : 0.01
Lollipop: Sweet Taste Match3 : 0.01
Legacy of Discord - Furious Wings : 0.01
Leap Day : 0.01
League of War: Mercenaries : 0.01
Last Empire – War Z: Zombie Strategy Game : 0.01
Lara Croft: Relic Run : 0.01
Land Sliders : 0.01
LEGO® Star Wars™: The Force Awakens : 0.01
LEGO® Juniors Create & Cruise : 0.01
Kritika: The White Knights : 0.01
King of Avalon: Dragon Warfare : 0.01
Kids Shave Salon Spa Games (Boys & Girls) : 0.01
Ketchapp Summer Sports : 0.01
Ketchapp Basketball : 0.01
KINGDOM HEARTS Union χ[Cross] : 0.01
Just Dance Now : 0.01
Jupiter Jump : 0.01
Juju on the Beat : 0.01
Jelly Blast: New Exciting Match 3 : 0.01
Javelin Masters 2 : 0.01
Jam League : 0.01
It Girl Story - Styling, Fashion & Celebrity Life : 0.01
Iron Maiden: Legacy of the Beast : 0.01
Invasion: Modern Empire : 0.01
Ingress : 0.01
Icon Pop Song : 0.01
Hot Wheels: Race Off : 0.01
Horizon Chase - World Tour : 0.01
Higher Higher! : 0.01
He-Man™ Tappers of Grayskull™ : 0.01
Hand of God - Top Clicker

Splat Wars : 0.0
Splashy Cats: Endless Zigzag Arcade Water Game : 0.0
Splash Pop : 0.0
Splash Cars : 0.0
Splash : 0.0
Spirit Rush : 0.0
Spiral Tower : 0.0
Spiral Splatter : 0.0
Spinny Phone : 0.0
Spinning Rings : 0.0
Spike Run : 0.0
Spider Solitaire ∙ : 0.0
Spellbinders : 0.0
Sparkwave : 0.0
Spa Birthday Party - Nails, Hair, Dress Up & Cake : 0.0
Soundtrack Attack - Steven Universe Rhythm Runner : 0.0
Sophia - My Little Sis : 0.0
Solitaire ▪ : 0.0
Solitaire ∘ : 0.0
Solitaire One : 0.0
Soda World - Your Soda Inc : 0.0
Soda Coin Party: Free Casino Pusher : 0.0
Soccer Stadium Sports Car & Bus Parking Simulator 3D Driving Sim : 0.0
Soccer Puzzle for Brain Training -SoccerStrike- : 0.0
Soccer Hit : 0.0
Soccer Cup Championship 2016 : 0.0
Snowboarding The Fourth Phase : 0.0
Snow Roll : 0.0
Sniper: Traffic Hunter : 0.0
Snakebird : 0.0
Snake splix : 0.0
Snail Bob 2 : 0.0
Smurfs Epic Run - Fun Platform Adventure : 0.0
Smelly Baby - Farty Party : 0.0
Smash Slots : 0.0
Smash Fu - Endless Arcade Sm

Multi Level Car Parking 6 Shopping Mall Garage Lot : 0.0
Multi Level Car Parking 5 a Real Airport Driving Test Simulator : 0.0
Multi Level 4 Car Parking Simulator a Real Driving Test Run Racing Games : 0.0
Muhammad Ali: Puzzle King : 0.0
Mucho Party : 0.0
Mr. Crab 2 : 0.0
Move the Match - Matchstick Puzzles for Free : 0.0
Motor World: Bike Factory : 0.0
Moonvale 2: Puzzle Adventure : 0.0
Monster Zombie Plague War - Virtual Reality (VR) : 0.0
Monster Tail : 0.0
Monster Strike : 0.0
Monster Raid™ : 0.0
Monster Mania : 0.0
Monster Busters: Link Flash : 0.0
Monster & Commander : 0.0
Monkey Swingers : 0.0
Money Multiplier : 0.0
Mixels Rush - Use Mixes, Maxes and Murps to Outrun the Nixels : 0.0
Minescape : 0.0
MineSweeper　マインスイーパ無料 : 0.0
Mimpi : 0.0
Mikey Jumps : 0.0
Midnight Calling: Jeronimo : 0.0
Microtrip : 0.0
Micromon : 0.0
Microgolf Masters : 0.0
Micro Machines : 0.0
Medal Masters : Call of destiny : 0.0
Maze Walk VR - Virtual Reality Game Puzzle Apps : 0.0
Maximum Car : 0.0
Matchsti

Escape Game  Escape from Lost Memory : 0.0
Escape Alice House2 : 0.0
Escalate : 0.0
Epic Orchestra : 0.0
Epic Flail : 0.0
Epic Battle Simulator : 0.0
Endless Ducker : 0.0
Emoji Blitz : 0.0
Emily’s Polar Adventure : 0.0
Emergency Surgery Simulator - Doctor Game FOR FREE : 0.0
Eden: The Game - Build Your Village! : 0.0
EatMe.io: Hungry Fish Attack! : 0.0
Eat Game : 0.0
Earth Defender S - Trillion Battle Game : 0.0
Earn to Die 2 Lite : 0.0
Ear Age Diagnosis : 0.0
EXORUN : 0.0
EVERYTOWN : 0.0
ENYO : 0.0
EA SPORTS™ FIFA 17 Companion : 0.0
Dustoff Heli Rescue 2 : 0.0
Dunkers : 0.0
Dungeon Witcher : 0.0
Dungeon Monsters RPG : 0.0
Dulp : 0.0
Duel Master - Best Duelist : 0.0
Ducky Fuzz - Chain Reaction : 0.0
Duck Roll : 0.0
Droppy Balls! : 0.0
Dropple : 0.0
DropEraser : 0.0
Drop Out! : 0.0
Driving Zone : 0.0
Driving Evolution Parking Sim Real Car Driver Test : 0.0
Drive Ahead! Sports : 0.0
Drifty Chase : 0.0
Dream Diary - My Life, My Adventure! : 0.0
Dream Defense : 0.0
Dragon Heroes: Shooter R

In [52]:
print('===android-Games===')
display_table(app_percent_of_user_counts(android_free, 1, 'GAME', 3, 0))

===android-Games===
Clash of Clans : 7.62
Subway Surfers : 4.71
Clash Royale : 3.93
Candy Crush Saga : 3.81
My Talking Tom : 2.53
8 Ball Pool : 2.41
Shadow Fight 2 : 1.86
Pou : 1.78
Pokémon GO : 1.77
Yes day : 1.71
My Talking Angela : 1.68
Dream League Soccer 2018 : 1.68
Hill Climb Racing : 1.51
Asphalt 8: Airborne : 1.42
Mobile Legends: Bang Bang : 1.4
Temple Run 2 : 1.38
Piano Tiles 2™ : 1.38
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 1.3
Trivia Crack : 1.09
Geometry Dash Lite : 1.05
Candy Crush Soda Saga : 1.05
Hungry Shark Evolution : 1.03
Garena Free Fire : 0.94
Angry Birds Classic : 0.94
Score! Hero : 0.92
Traffic Racer : 0.91
slither.io : 0.89
Fruit Ninja® : 0.86
Zombie Tsunami : 0.84
Dr. Driving : 0.84
Gangstar Vegas - mafia game : 0.82
Jetpack Joyride : 0.79
Pixel Gun 3D: Survival shooter & Battle Royale : 0.76
Crossy Road : 0.72
Smash Hit : 0.7
Gardenscapes : 0.7
Plants vs. Zombies FREE : 0.69
Temple Run : 0.68
DEER HUNTER CLASSIC : 0.67
Angry Birds 2 : 0.66
Angry Bir

SHADOWGUN LEGENDS : 0.02
Rolly Vortex : 0.02
Red Hands – 2-Player Games : 0.02
Power Pop Bubbles : 0.02
Navy Gunner Shoot War 3D : 0.02
Moto Rider : 0.02
Mini Motor Racing WRT : 0.02
Mini DAYZ: Zombie Survival : 0.02
Last Shelter: Survival : 0.02
Identity V : 0.02
IV Go（get IV for Pokemon） : 0.02
Hole.io : 0.02
Hello Stars : 0.02
Hambo : 0.02
Guns of Glory : 0.02
Gun Strike Shoot : 0.02
Gun Builder ELITE : 0.02
Gear.Club - True Racing : 0.02
Game of Thrones: Conquest™ : 0.02
Fruit Block - Puzzle Legend : 0.02
Four In A Line Free : 0.02
Fast like a Fox : 0.02
Evil Apples: A Dirty Card Game : 0.02
Drag'n'Boom : 0.02
Dr. Chess : 0.02
Disney Heroes: Battle Mode : 0.02
CrossFire: Legends : 0.02
Crazy Wheels : 0.02
Counter Terrorist Attack : 0.02
Counter Attack - Multiplayer FPS : 0.02
Color by Number – New Coloring Book : 0.02
Classic Words Solo : 0.02
Cheating Tom 3 - Genius School : 0.02
CONTRACT KILLER: ZOMBIES : 0.02
Bunny Skater : 0.02
Block Gun 3D: Ghost Ops : 0.02
Beauty Idol: Fashio

Extreme Match : 0.0
Ew, the small alien : 0.0
Endless Ducker : 0.0
EmuBox - Fast Retro Emulator : 0.0
Elite Commando Shooting War : 0.0
Eh Amego! : 0.0
EP Gem Hunter : 0.0
EF Jumper : 0.0
EC Mover : 0.0
E.G. Chess Free : 0.0
Dubai Racing : 0.0
Driving Cars Simulator Citroen : 0.0
Drive 4x4 Luxury SUV Jeep : 0.0
Drink-O-Tron The Drinking Game : 0.0
Dreamland Arcade - Steven Universe : 0.0
Draw A Stickman : 0.0
Dragon X Adventure: Warrior Z : 0.0
Dr.Slender Ep 1 Guide (Eng) : 0.0
Dr. Shogi : 0.0
Dr. Parker : Real car parking simulation : 0.0
Dr. Parker : Parking Simulator : 0.0
Dr. Dominoes : 0.0
Dr Driving Racer : 0.0
Dr Dre - Beatmaker : 0.0
Don't Hug Me I'm So Scared : 0.0
Dinosaur Simulator: Dino World : 0.0
Dino War: Rise of Beasts : 0.0
Dino Defends king 3 – Dinosaur T rex Hunter Games : 0.0
Destroy Gunners Σ : 0.0
Destiny Ninja Shall we date otome games love story : 0.0
Decay: The Mare - Ep.1 (Trial) : 0.0
Dead Zombie Evil Killer:Axe : 0.0
Dead Target Zombie Shooting US Sniper Kil

For both datasets, the games genre is pretty evenly spread. I would suggest a deeper dive investigation into the top games. Find one that has game mechanics that could be duplicated but given a new and fresh theme. Make sure to use some very catchy ads to draw users to your game and offer codes to provide free goodies.

## Conclusion

In this project, we took mobile store datasets and cleaned them up and filtered them down until they only contained apps similar to what we wanted to develop. For this situation, that meant only free apps in English.

After that, we wanted to find out which apps were the most popular since our new app will gain revenue through ads. This means more users using our free app will generate more ad displays.

We decided to first use the genre of the apps to narrow our scope. We found the most popular genres by average number of user ratings, but when we investigated the top apps of those genres, we found a few apps dominated the top. This caused skewing in our averages since we know our brand new app is not going to instantly compete with those giants.

After removing the top skewing apps, our averages looked quite different in Android vs iOS. Even though the "Games" genre was only middle of the pack on iOS, the averages were fairly evenly distributed, meaning it should be easier for our new app to get a peice of the pie.

The final suggestion to the app developers is to create a game similar to one of the top games (both datasets have similar lists). With a core game decided on, give it a fresh theme and some new features to lure new users. The people playing the top games for a long time may tire and want something new, so if you present them with something they are familiar with, they can jump right over to your new app.