# Understanding App Popularity in the iOS and Google Play Stores

Our aim with this project is to understand the characteristics of the apps that are likely to attract more users on the App Store and Google Play. 

We will be using Kaggle datasets containing a sample of apps from both app stores, and we will focus specifically on free apps that are intended for an English-speaking audience. Our process will consist of cleaning and organizing the data, finding criteria to analyze the data with, and ultimately exploring our results more in-depth. 

Please find the sources and documentation for the datasets below:
- iOS App Store (https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)
- Google Play Store (https://www.kaggle.com/lava18/google-play-store-apps)

Let's begin by importing our datasets and taking an initial look at how the dataset is structured.

In [136]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new empty line after each row
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [137]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [138]:
print("Google Store")
explore_data(android,0,5,"TRUE")
print("\n")
print("Apple Store")
explore_data(ios,0,5,"TRUE")

Google Store
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


Apple Store
['284882

In [139]:
print(android_header)
print("\n")
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## Step 1 - Cleaning the Data
### Omitting Errors and Duplicates
We notice that an entry in the Google Play dataset has a different number of columns than the rest of the entries. This might cause problems in the future, so we choose to omit it through the code below.

In [140]:
error_row = android[10472]

error_check = len(error_row) == len(android_header)
print(error_check)

del android[10472]

False


At this point, we also notice that the Google Play dataset contains duplicate entries. The code below serves as an example, displaying four different entries for the same app - Instagram.

In [141]:
for app in android:
    app_name = app[0]
    if app_name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [142]:
unique_apps = []
duplicate_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print("The number of unique apps is:",len(unique_apps))
print("The number of duplicate apps is:",len(duplicate_apps))
print("Sample of duplicate apps:", duplicate_apps[0:15])

The number of unique apps is: 9659
The number of duplicate apps is: 1181
Sample of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


From the Instagram duplicates, we can see that the four duplicate entries have a different number of reviews. We can assume that the entry with the highest number of reviews is the most recently collected data point.

Now, we will do a quick check to find out whether this or any other pattern is present in the rest of the duplicates. This will give us a starting point in creating a framework on which to remove the duplicates. First, we examine a sample of five apps from the duplicate list.

In [143]:
for app in android:
    app_name = app[0]
    if app_name == "Quick PDF Scanner + OCR FREE":
        print(app)

for app in android:
    app_name = app[0]
    if app_name == "Box":
        print(app)

for app in android:
    app_name = app[0]
    if app_name == "Google My Business":
        print(app)

for app in android:
    app_name = app[0]
    if app_name == "ZOOM Cloud Meetings":
        print(app)

for app in android:
    app_name = app[0]
    if app_name == "join.me - Simple Meetings":
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Fr

From the above, we can see that the pattern was present in the app "Quick PDF Scanner + OCR Free", so it is likely that it is present in other apps as well. We also notice that the rest of the sample apps have the same number of reviews, so this pattern is not present in all duplicate entries. We will proceed with this knowledge in mind and choose to only keep those entries with the highest number of reviews.

In order to clean the data, we will now create a dictionary where each key is a unique app name, and its corresponding value is the highest number of reviews of that app. 

In [144]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print(len(reviews_max))


9659


We can now use this dictionary to sort through the original dataset and create a new list with only the highest-review entry of the apps. To do this, we will:
- Create two empty lists: `android_clean` and `already_added` 
- Loop through the dataset, and for every iteration we:
    - Define the `name` of the app and its number of reviews (`n_reviews`) 
    - If the number of reviews is equal to the maximum number of reviews as defined by our dictionary, and we have not yet looped through the app or one of its duplicates, we append the entire row to our `android_clean` list and append the name of the app to our `already_added` list. This last step plays the role of a running book of the apps that we have already looped through, duplicate or otherwise.
        - We must ensure that we have not yet looped through a duplicate of the app because the code would otherwise append duplicates of the apps that contain their maximum number of reviews in multiple identical entries, such as the apps we previously examined in our five-app sample.

In [145]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

print(len(android_clean))
print(len(already_added))

9659
9659


### Omitting Non-English Apps
For the purpose of this analysis, we want to focus on apps that are focused towards an English-Speaking audience. Looking through the data, however, we realize that there are several entries which list non-English apps, such as those listed below:

In [146]:
print(ios[813][1])
print(ios[6731][1])
print("\n")
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


Keeping in mind that the characters commonly used in the English language are ranged 0 to 127 in the ASCII system, we can exclude those entries in which the title of the app is not in English.

To filter the apps, we will only include those that have 3 or less characters outside of those commonly used in the English language. Using 3 rather than 1 as a treshold ensures we still keep those apps that have emojis or other symbols outside of our range in their titles. 

In [147]:
def english_check(string):

    non_english_counter = 0
    
    for character in string:
        if ord(character) > 127:
            non_english_counter += 1
        
    if non_english_counter > 3:
        return False
    else:
        return True

print(english_check('Instagram'))
print(english_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))

print(english_check('Docs To Go™ Free Office Suite'))
print(english_check('Instachat 😜😜😜'))

True
False
True
True


In [148]:
android_english = []
for app in android_clean:
    name = app[0]
    if english_check(name) == True:
        android_english.append(app)

print(len(android_english))
print(len(android_clean))

ios_english = []
for app in ios:
    name = app[1]
    if english_check(name) == True:
        ios_english.append(app)

print(len(ios_english))
print(len(ios))

9614
9659
6183
7197


We have now isolated those apps that can be reasonably expected to be intended for an English-speaking audience. Let's also isolate the apps that are free.

In [149]:
ios_final = []
android_final = []

for app in ios_english:
    price = app[4]
    if price == "0.0":
        ios_final.append(app)
        
for app in android_english:
    price = app[7]
    if price == "0":
        android_final.append(app)

print(len(ios_final))
print(len(android_final))

3222
8864


## Step 2 - Analyzing the Data
### Finding Criteria to Compare Apps

We now have a clean dataset containing only free apps intended for an English-speaking audience. Let's now proceed to analyze what criteria we have available to understand the types of apps that are most popular.

Below, we have identified three columns in the dataset that could help guide our analysis: categories and genres.

In [150]:
print(ios_header[11])

print(android_header[1],"\n",android_header[9])

prime_genre
Category 
 Genres


We will now create a function that loops through the dataset and finds the relative frequency of the app genres. We also create a function that sorts this table out from most frequent to least frequent.

In [151]:
def freq_table_percentages(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table_percentages(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [152]:
display_table(ios_final,11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


### App Store Frequency Analysis

As we can see in the frequency table above, Games are by far the most common app category in the Apple Store, with over half (58.16%) of apps falling into this category. We also notice an interesting trend - most of the top categories are leisure and hobby-related (ie. Games, Entertainment, Photo & Video, Social Networking, Shopping). However, this does not necessarily mean that the Gaming apps are the most popular, as they could have a relatively lower number of users per app than the other categories.

Let's keep this in mind as we look through the Google Play apps.

In [153]:
display_table(android_final,9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [154]:
display_table(android_final,1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

### Google Play Frequency Analysis

For starters, the difference between Category and Genre is not readily apparent, other than Genres (the first table) being more granular by having a larger number of categories. In Genres, we notice some differences between the top Genres compared to the App Store: many of them are related to productivity, with categories such as Tools, Education, Business, Productivity and Finance at the top of the list. There are no significantly dominant categories here, compared to the App Store where over half were gaming apps.

For Categories, we notice that many of the top Genres from the previous table are also present, with notable additions in Family and Game as the top two categories. We can see that the Google Play store's most common categories of apps are more of a mix between productivity and entertainment. 

### Isolating The Most Popular Apps

Let's now create a function where we can see the popularity of each of the categories based on the average number of downloads that each of them have. For this, we will first create a function that returns a dictionary with the absolute frequency of each of the categories in the dataset. 

Then, we will create a function that loops through each of the categories in the dictionary, loops through the apps in the dataset that match the specified category, finds the corresponding number of downloads for that app and then returns the average number of downloads per app.

In [155]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1

    return table

In [156]:
### New Code with Sorted Table
table_ios = freq_table(ios_final, 11)

def freq_table_categories(dataset, index):
    table = {}
    
    for genre in table_ios:
        total = 0
        len_genre = 0
    
        for app in ios_final:
            genre_app = app[11]
        
            if genre_app == genre:
                app_ratings = float(app[5])
                total += app_ratings
                len_genre += 1
    
        avg_ratings = total / len_genre
        
        table[genre] = avg_ratings
        
    return table

def display_table_categories(dataset, index):
    table = freq_table_categories(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


display_table_categories(ios_final, 11)

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


## Step 3 - Exploring Results
### Most Popular iOS Apps

The iOS dataset did not have a dedicated column with the number of downloads per app, but we can use the number of reviews as a proxy. We can see that Navigation, Reference, Social Networking and Music are among the most popular categories. Let's explore each these categories more in-depth.

In [157]:
for app in ios_final:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [158]:
for app in ios_final:
    if app[11] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [159]:
for app in ios_final:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [160]:
for app in ios_final:
    if app[11] == 'Music':
        print(app[1], ':', app[5]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [161]:
for app in ios_final:
    if app[11] == 'Weather':
        print(app[1], ':', app[5]) # print name and number of ratings

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

In [162]:
for app in ios_final:
    if app[1] == 'Yo':
        print(app[1], ':', app[5])

We can see that Navigation is heavily influenced by the popularity of Google Maps and Waze, with their competitors having a significantly lower popularity. The rest of the categories also have a number of dominant apps that bring up the average popularity of their respective categories. Notably, the Social Networking category is the most saturated in terms of apps available, but it also contains some of the most popular apps in the iOS store, such as Facebook and Pinterest, which keeps its average high.

Let's keep this in mind as we proceed to look at the Google Play Store.

### Most Popular Android Apps

The Google Play has data on the number of installs, but it shows an open-ended range for the number of installs (ie. 100+, 1,000+, 10,000+, etc.)

In [163]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


We can still use this data for our purposes as we are mainly focused on popularity and it provides a granular enough insight into the number of downloads. Let's now loop through the dataset and convert the number of downloads from a string to a float, as otherwise our function will return an error. 

In [164]:
table_android = freq_table(android_final, 1)

for category in table_android:
    total = 0
    len_category = 0
    
    for app in android_final:
        category_app = app[1]
        
        if category_app == category:
            app_installs = app[5]
            app_installs = app_installs.replace('+','')
            app_installs = app_installs.replace(',','')
            app_installs = float(app_installs)
            total += app_installs
            len_category += 1
    
    avg_ratings = total / len_category
    
    print(category, ':', avg_ratings)

COMMUNICATION : 38456119.167247385
FOOD_AND_DRINK : 1924897.7363636363
LIFESTYLE : 1437816.2687861272
BOOKS_AND_REFERENCE : 8767811.894736841
EDUCATION : 1833495.145631068
ART_AND_DESIGN : 1986335.0877192982
TOOLS : 10801391.298666667
GAME : 15588015.603248259
FINANCE : 1387692.475609756
SHOPPING : 7036877.311557789
AUTO_AND_VEHICLES : 647317.8170731707
FAMILY : 3695641.8198090694
PHOTOGRAPHY : 17840110.40229885
HOUSE_AND_HOME : 1331540.5616438356
PERSONALIZATION : 5201482.6122448975
LIBRARIES_AND_DEMO : 638503.734939759
MEDICAL : 120550.61980830671
MAPS_AND_NAVIGATION : 4056941.7741935486
PRODUCTIVITY : 16787331.344927534
COMICS : 817657.2727272727
SPORTS : 3638640.1428571427
NEWS_AND_MAGAZINES : 9549178.467741935
EVENTS : 253542.22222222222
WEATHER : 5074486.197183099
TRAVEL_AND_LOCAL : 13984077.710144928
BUSINESS : 1712290.1474201474
ENTERTAINMENT : 11640705.88235294
PARENTING : 542603.6206896552
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
DATING : 854028.83030303

We can see that Communication Apps are the most popular, with an average of 38,456,119 installs. Other popular categories include Video Players (24,727,872); Social (23,253,652); Photography (17,840,110); and Games (15,588,015). 

The Communication category has a number of overlapping apps with the Social Networking category from the App Store, and we can also see that the top apps (ie. Whatsapp, Skype, etc.) have an outsized influence on the average number of downloads. Note that some of the apps included in this category, such as Chrome, Gmail and Yahoo Mail might not be clustered in this category for the iOS dataset.

In [165]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we examine the Social category, we can see that the usual suspects (Facebook, Instagram, Snapchat, etc.) also have considerable leverage over the average number of installs.

In [166]:
for app in android_final:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


For some perspective, the most common type of category (Family) has a lower amount of highly-popular apps, which result in a much lower average number of installs of 3,695,641. Below we can also see popular apps from Tools and Games, categories which are among the most popular. We notice a common trend that several highly-popular apps are from Google, and it is possible that some of them come pre-loaded in the Android phone and are being counted as installs.

In [167]:
for app in android_final:
    if app[1] == 'FAMILY' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Minion Rush: Despicable Me Official Game : 100,000,000+
ROBLOX : 100,000,000+
Hay Day : 100,000,000+
Talking Tom Cat 2 : 100,000,000+
Talking Ginger : 100,000,000+
Where's My Water? Free : 100,000,000+
Talking Tom Cat : 100,000,000+
Tom Loves Angela : 100,000,000+
Talking Tom & Ben News : 100,000,000+
Google Play Games : 1,000,000,000+
Bitmoji – Your Personal Emoji : 100,000,000+
Bubble Witch 2 Saga : 100,000,000+
Duolingo: Learn Languages Free : 100,000,000+
Farm Heroes Saga : 100,000,000+


In [168]:
for app in android_final:
    if app[1] == 'TOOLS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google : 1,000,000,000+
Google Translate : 500,000,000+
Calculator : 100,000,000+
Device Help : 100,000,000+
Account Manager : 100,000,000+
SHAREit - Transfer & Share : 500,000,000+
Samsung Calculator : 100,000,000+
Gboard - the Google Keyboard : 500,000,000+
Google Korean Input : 100,000,000+
Share Music & Transfer Files - Xender : 100,000,000+
Tiny Flashlight + LED : 100,000,000+
GO Keyboard - Cute Emojis, Themes and GIFs : 100,000,000+
Speedtest by Ookla : 100,000,000+
CM Locker - Security Lockscreen : 100,000,000+
Applock : 100,000,000+
Clean Master- Space Cleaner & Antivirus : 500,000,000+
Lookout Security & Antivirus : 100,000,000+
Google Now Launcher : 100,000,000+
360 Security - Free Antivirus, Booster, Cleaner : 100,000,000+
Samsung Smart Switch Mobile : 100,000,000+
Avast Mobile Security 2018 - Antivirus & App Lock : 100,000,000+
AppLock : 100,000,000+
AVG AntiVirus 2018 for Android Security : 100,000,000+
Security Master - Antivirus, VPN, AppLock, Booster : 500,000,000+
Batt

In [169]:
for app in android_final:
    if app[1] == 'GAME' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Extreme Car Driving Simulator : 100,000,000+
Trivia Crack : 100,000,000+
Angry Birds 2 : 100,000,000+
Candy Crush Saga : 500,000,000+
8 Ball Pool : 100,000,000+
Subway Surfers : 1,000,000,000+
Candy Crush Soda Saga : 100,000,000+
Clash Royale : 100,000,000+
Clash of Clans : 100,000,000+
Plants vs. Zombies FREE : 100,000,000+
Pou : 500,000,000+
Flow Free : 100,000,000+
My Talking Angela : 100,000,000+
slither.io : 100,000,000+
Cooking Fever : 100,000,000+
Yes day : 100,000,000+
Score! Hero : 100,000,000+
Dream League Soccer 2018 : 100,000,000+
My Talking Tom : 500,000,000+
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 100,000,000+
Zombie Tsunami : 100,000,000+
Helix Jump : 100,000,000+
Crossy Road : 100,000,000+
Temple Run 2 : 500,000,000+
Talking Tom Gold Run : 100,000,000+
Agar.io : 100,000,000+
Bus Rush: Subway Edition : 100,000,00

In [170]:
for app in android_final:
    if app[1] == 'VIDEO_PLAYERS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+


# Summary, Conclusions and Next Steps

#### Summary

In this project, we looked at sample datasets from the iOS and Google Play Stores to determine the characteristics of the apps that are more likely to attract users.

#### Conclusions
We saw that both the iOS and Google Play stores have *Social Networks*, *Messaging* and *Games* among their most popular app categories. However, we also noticed that the relative popularity of *Navigation* apps was lower in the iOS store. This could be a testament to the popularity of Google Maps among iOS users as well as the app already coming pre-installed in some Android devices.

The *Photography* category was also considerably less popular in the iOS Store than in Google Play, which could be related to Apple products having more built-in photo editing functionality than Android devices.

Finally, the *Video Player* category was significantly more popular in Google Play than the *Photo & Video* category in iOS, which could be related to a high number of unpopular Photo & Video apps in the iOS store.

Some limitations of this analysis include using Reviews as a proxy for installs in the iOS store - this could give us misleading data as certain types of "essential" apps (ie. Youtube, Google Maps, etc.) could have a proportionally lower number of reviews compared to highly involved apps such as Games that rely more heavily on user input. Another issue in comparability between the datasets lies in some apps being classified under different categories in the iOS Store and the Google Play Store.

#### Next Steps
For more granular analyses in the future, we could pinpoint and exclude the popular apps that are having an outsize influence in the average number of reviews for their respective category, which would help us define the popularity of an average app in that category with more precision.  