# Most Profitable App Categories on the Apple App Store and Google Play Store



The goal in this project is to find the most profitable categories of free apps on the Apple App Store and Google Play store. In this project we are working as a data analyst for a hypothetical company looking to make data-driven decisions in regards to what kind of apps we should build that would be most profitable. This information will help enable the app developers in our company to make data-driven decisions about which apps they should build.

The main source of revenue in free-to-download apps comes from in-app ads, so the revenue for any given app is most influenced by its number of users. At the end of this analysis we will know what kinds of apps are most likely to attract the highest number of users. 

## First Things First - Opening and Exploring the Data

We use the following data sets for our analysis. The first contains approximately seven thousand iOS apps from the Apple App Store. The second contains approximately ten thousand Android apps from Google Play store. There are more than 2 million apps available on each marketplace, but analyzing so much data would require significant time and resources, so we will use the following data sets which are smaller, but nevertheless contain sufficient data for our analysis. 

* [Apple App Store Dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)
* [Google Play Store Dataset](https://www.kaggle.com/lava18/google-play-store-apps/home)

First we start by opening each data set, and then we continue by exploring the data. 

In [1]:
from csv import reader

# Importing Apple App Store Data
open_file = open('AppleStore.csv')
read_file = reader(open_file)
apple_apps = list(read_file)
apple_header = apple_apps[0]
apple_apps = apple_apps[1:]

# Importing Google Play Store Data
open_file = open('googleplaystore.csv')
read_file = reader(open_file)
google_apps = list(read_file)
google_header = google_apps[0]
google_apps = google_apps[1:]

To make it easier to explore the two data sets, we will write a function named `explore_data()` so that we can view the rows in a more readable way. This function also has an option to show the number of rows and columns for the data set.

In [2]:
# Function for Exploring the Data
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
# Apple App Store Data Sample
print(apple_header)
print('\n')
explore_data(apple_apps, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


The Apple App Store data set has 7,197 apps and 16 columns. We'll likely be most interested in the `'track_name'`, `'currency'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'` columns for our analysis. To view the details about each column visit the [data set documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

In [4]:
# Google Play Store Data Sample
print(google_header)
print('\n')
explore_data(google_apps, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


The Google Play data set has 10,841 apps and 13 columns. We'll probably be most interested in the `'App'`, `'Category'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, and `'Genres'` columns for our analysis.


## Clean Your Data

In the discussion section of the Google Play data set [here](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), we learn that row 10472 has an error.

When we print this row we can see that in column 3 the rating is 19, which is an error because ratings should only go up to 5. We'll delete that row, and after deleting we can see the number of apps in the data set has been decreased by one.

Note: Only run the `del` statement once, otherwise you will delete more than one row.

In [5]:
# On the discussion forums we learn there is one row with a missing value.
print(google_apps[10472])
print(len(google_apps))
del(google_apps[10472]) # don't run more than once
print(len(google_apps))


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10841
10840


### Removing Duplicate Entries

If you explore the Google Play data set, you'll notice it has some duplicate entries listed for certain apps.

In [6]:
for app in google_apps:
    name = app[0]
    if name == 'Facebook':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


We will need to remove these duplicate listings, but first we need to decide which of the duplicate listings to remove. We would like to keep the most recent data, so one criterion we can check to tell us which entry is the most recent is the `'Rating'` column. We'll just keep the entry with the highest number of ratings so we know that we are keeping the most recent listing for that app.

In [7]:
# finds the duplicate Google Play app entries
duplicate_apps = []
unique_apps = []

for app in google_apps:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of Duplicate Apps: ', len(duplicate_apps))
print('\n')
print('Examples of Duplicate Apps: ', duplicate_apps[:10])

Number of Duplicate Apps:  1181


Examples of Duplicate Apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [8]:
# deletes the duplicate Google Play app entries
print('Expected number of entries after deletion: ', len(google_apps) - 1181)

# creates a dictionary that keeps the apps with the highest number of reviews
reviews_max = {}

for app in google_apps:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews 
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Actual number of entries after deletion: ', len(reviews_max))

Expected number of entries after deletion:  9659
Actual number of entries after deletion:  9659


Now we'll use the `reviews_max` dictionary to remove the duplicates.

In [9]:
google_clean = []
already_added = []

for app in google_apps:
    name = app[0]
    n_reviews = float(app[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        google_clean.append(app)
        already_added.append(name)

Now that we have removed the duplicate entries, we are going to use the `explore_data()` function we wrote to make sure everything looks like we expect it should. 

In [10]:
explore_data(google_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


We have 9,659 rows as expected, so it looks like the duplicates have been successfully removed. 

### Removing Non-English Apps

Next, we need to take care of the non-english apps in our dataset. We will want to remove them as they do not represent our target audience. 

We will use the following function to determine if an app name contains only characters used in English or not.

In [11]:
def in_english(string):
    
    for char in string:
        if ord(char) > 127: #ASCII range for regular English characters
            return False
    
    return True

print(in_english('Instachat'))
print(in_english('电视剧热播'))

True
False


This function appears to work, but some English app names that contain emojis or other symbols will fall outside of the ASCII range we used. This could lead us to removing apps usefull in our analysis.

In [12]:
print(in_english('Business™'))
print(in_english('Emoji🤯'))

print(ord('™'))
print(ord('🤯'))

False
False
8482
129327


One way we can minimize losing more of our dataset entries than necessary is by accepting up to 3 characters that fall outside of the ASCII '127' range.

In [13]:
def in_english(string):
    not_ascii = 0
    
    for char in string:
        if ord(char) > 127:
            not_ascii += 1
            
    if not_ascii > 3:
        return False
    else:
        return True
    
print(in_english('Business™'))
print(in_english('Emoji🤯'))
print(in_english('Insta 电视剧热'))

True
True
False


Now we'll use our `in_english()` function to filter out the non-English apps from both of our data sets.

In [14]:
# filtering the non-english apps out of our Apple data set
apple_apps_eng = []
for app in apple_apps:
    if in_english(app[1]):
        apple_apps_eng.append(app)
explore_data(apple_apps_eng, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


In [15]:
# filtering the non-english apps out of our Google data set
google_clean_eng = []
for app in google_clean:
    if in_english(app[0]):
        google_clean_eng.append(app)
explore_data(google_clean_eng, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


### Isolating the Free Apps

Next we'll filter out all of the apps that are not free. For our purposes we only need to analyze the free apps since our revenue comes from in-app ads.

In [16]:
# filtering out the apps that are not free out of the Apple data set
apple_free = []

for app in apple_apps_eng:
    price = app[4]
    if price == '0.0':
        apple_free.append(app)
        
print(len(apple_free))

3222


In [17]:
# filtering out the apps that are not free out of the Google data set
google_free = []

for app in google_clean_eng:
    price = app[7]
    if price == '0':
        google_free.append(app)
        
print(len(google_free))

8864


It looks like we are left with 3,222 apps in the Apple App Store, and 8,864 apps from Google Play. This should still be enough for our analysis.

## Most Common Apps by Category

Our aim is to determine the types of apps that are likely to attract more users. A good strategy for developing a new app might be to first build a minimal Android version of the app. If that app gets a good response rate, then we could develop it further. Finally, if the app we develop is profitable after six months, we can then go and build an iOS version. 

Because we want to add our apps to both marketplaces, we need to find app profiles that are likely to be successful on both markets. We'll begin by building frequency tables for the Genre and Category columns of our data sets to get a sense of the most common app types for each market. We'll write two functions to analyze the frequency tables.

In [18]:
# this is our frequency table function that will show percentages
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentage = {}
    
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentage[key] = percentage
        
    return table_percentage

In [19]:
# this function will help us sort our frequency table percentages in descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Apple App Store

Here we'll examine the `prime_genre` column of the Apple App Store data set.

In [20]:
display_table(apple_free, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


It looks like Games are by far the most common category of free English apps on the Apple App Store at 58.16%. The next most common categories of apps are Entertainment at around 8%, Photo & Video at around 5%, Education at 3.66%, and Social Networking at 3.29%.

The Apple App Store seems to be dominated by apps that are more for fun than those apps that focus more on utility like Productivity, Business, Reference, and Navigation. 

However, we are only looking at how many of these apps are actually on the app store, and that may not correlate to having the greatest number of users.


### Google Play

We'll continue by examining the `Category` and `Genre` columns of the Google Play data set.

In [21]:
display_table(google_free, 1) # Category Column

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

It looks like the categories of most common free English apps on the Google Play store are quite different. The Apple App Store doesn't have a Family category, but my first guess would be that it contains mostly games for kids. Games appear to still be the most common type of app on Google Play, but the landscape is a little different. There are more utility related apps in the Tools, Business, Productivity, Finance than on the Apple App Store, and the entertainment related apps seem to be less common.

Let's look at the `Genre` columns as well since this column seems to be closely related to `Category`.

In [22]:
display_table(google_free, -4) # Genre Column

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The `Genre` column has many more types of apps than the `Category` column. It's more specific than what we need for our analysis, so we'll just stick to looking at the actual category and ignore the genre.

Next we'll look at the categories of apps that actually have the most users.


## Most Popular Apps by Category

### Apple App Store

One way we can calculate how popular certain app categories are on the Apple App Store is by looking at how many ratings each app has.

In [24]:
genre_apple = freq_table(apple_free, -5)

for genre in genre_apple:
    total = 0
    len_genre = 0
    for app in apple_free:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, " : ", avg_n_ratings)

Social Networking  :  71548.34905660378
Photo & Video  :  28441.54375
Games  :  22788.6696905016
Music  :  57326.530303030304
Reference  :  74942.11111111111
Health & Fitness  :  23298.015384615384
Weather  :  52279.892857142855
Utilities  :  18684.456790123455
Travel  :  28243.8
Shopping  :  26919.690476190477
News  :  21248.023255813954
Navigation  :  86090.33333333333
Lifestyle  :  16485.764705882353
Entertainment  :  14029.830708661417
Food & Drink  :  33333.92307692308
Sports  :  23008.898550724636
Book  :  39758.5
Finance  :  31467.944444444445
Education  :  7003.983050847458
Productivity  :  21028.410714285714
Business  :  7491.117647058823
Catalogs  :  4004.0
Medical  :  612.0


Social Networking, Music, and Reference average the highest number of reviews with Navigation coming in on top at 86,090. We need to look into these categories further. The average may actually be heavily influenced by just a few giants like Facebook, Spotify, Google Translate, and Google Maps.

In [25]:
# Navigation
for app in apple_free:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # prints name of app and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [26]:
# Social Neworking
for app in apple_free:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [27]:
# Music
for app in apple_free:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [28]:
# Reference
for app in apple_free:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # prints name of app and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Above we see that hugely popular apps like Waze, Facebook, Pandora, and Bible skew the data in their respective categories just because they have so many ratings.

If we look at the other categories like Weather, Food & Drink, Finance, or Games, we might find something worth developing. However, people don't spend much time in weather apps, so the chances of making a profit there are low. Food and Drink apps seem to only be popular for actual stores and restaurants like Starbucks. And something like a finance app could require too much domain expertise for us to even get started on building it.

Before we try to make any conclusions, lets look at the Google Play store to see if we find any correlations.


### Google Play Store

In the Google Play data set, we have a column that shows the number of installs for each app.

In [30]:
display_table(google_free, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


The number of installs aren't very precise, but it can still tell us which app genres attract the most users.

Let's examine the number of installs for each category. 

In [31]:
# frequency table to calculate the number of installs for each category

category_google = freq_table(google_free, 1)

for category in category_google:
    total = 0
    len_category = 0
    for app in google_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')    # cleaning up the number of installs so we can convert to a float
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, " : ", avg_n_installs)

ART_AND_DESIGN  :  1986335.0877192982
AUTO_AND_VEHICLES  :  647317.8170731707
BEAUTY  :  513151.88679245283
BOOKS_AND_REFERENCE  :  8767811.894736841
BUSINESS  :  1712290.1474201474
COMICS  :  817657.2727272727
COMMUNICATION  :  38456119.167247385
DATING  :  854028.8303030303
EDUCATION  :  1833495.145631068
ENTERTAINMENT  :  11640705.88235294
EVENTS  :  253542.22222222222
FINANCE  :  1387692.475609756
FOOD_AND_DRINK  :  1924897.7363636363
HEALTH_AND_FITNESS  :  4188821.9853479853
HOUSE_AND_HOME  :  1331540.5616438356
LIBRARIES_AND_DEMO  :  638503.734939759
LIFESTYLE  :  1437816.2687861272
GAME  :  15588015.603248259
FAMILY  :  3695641.8198090694
MEDICAL  :  120550.61980830671
SOCIAL  :  23253652.127118643
SHOPPING  :  7036877.311557789
PHOTOGRAPHY  :  17840110.40229885
SPORTS  :  3638640.1428571427
TRAVEL_AND_LOCAL  :  13984077.710144928
TOOLS  :  10801391.298666667
PERSONALIZATION  :  5201482.6122448975
PRODUCTIVITY  :  16787331.344927534
PARENTING  :  542603.6206896552
WEATHER  :  50

Above we see that communication apps the the most installs, with an average of 38,456,119. However, just like with the Apple App Store, this number is likely heavily skewed up by just a few apps that have a massive amount of installs.

Let's check to see how many apps that have the higher amount of installs. If there are only a few outliers, it might be better for us to exclude them from our analysis.

In [34]:
for app in google_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                    or app[5] == '500,000,000+'
                                    or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we remove these apps that have over 100 million installs, then our average would be more represenative of the types of apps we could develop.

In [36]:
under_100M = []

for app in google_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100M.append(float(n_installs))
        
sum(under_100M) / len(under_100M)

3603485.3884615386

## Conclusion
What kind of app should we build next?

Conclusion
For our purposes here, a good conclusion:

Reminds the reader what the initial goal was and what was the main approach you took to reach that goal.
Summarizes the most important results.
Practically, the conclusion is similar to an introduction rephrased in the light of what has been done in the main body. To write a good conclusion, try to:

Be concise and don’t use more than two paragraphs (of maximum 3-4 sentences each).
Don’t be bombastic in your final claims. Also, it’s better to stay away from trying to end triumphantly with a quote because most readers won’t read all your work and, as a consequence, they won’t be able to share your enthusiasm.