# My First Project in Python- Guided Project from Dataquest: App Recommendation

## Introduction
In this markdown, we will learn to take a project from data into analysis and insights. Developers want to understand waht types of apps are likely to attract more users on Google Play and the App Store. We will analyze 2 million iOS apps and 2 million Android apps on Google Play. Due to the size, the analysis will use a sample of the data. 

The Android dataset has 10,000 apps. And the iOS dataset has 7,000 apps.

In [1]:
#Dataquest created a function to explore our datasets
def explore_data(dataset, start, end, rows_and_columns=True):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

We will now use the "explore_data" function to explore the iOS and Android datasets.

In [2]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [3]:
print(android_header)
explore_data(android, 0, 5)
print(ios_header)
explore_data(ios, 0, 5)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Every

## Data Cleaning
From the exploration, we will identify data that needs to be removed or edited. 

For example, row 10472 has data that doesn't make sesnes for the columns they are in, so it's best to just remove it.

In [4]:
print(android[10472])
del android[10472]

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Now we're going to look for duplicates in the dataset and remove all except one of the entries. First we will create a list of all the apps that are unique and how many are duplicates. Then create a dictionary of duplicates to delete. We might want to keep the data with the highest number of reviews to ensure that we have the most amount of data per app.

In [5]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicates apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicates apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [6]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [7]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3]) #Indice 3 is the number of reviews
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

From the previous code we found that there are 1,181 duplicate apps. The next code will tell us what the expected number of apps should be after we remove all the duplicate apps.

In [8]:
print('Expected length:', len(android)-1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


They both gave us the same number which choices that are for loop worked correctly in making a dictionary of unique apps while keeping the duplicate with the highest number of reviews.

In [9]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Now, let's check that our new list of lists has the expected number of rows.

In [10]:
explore_data(android_clean, 0, 3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


The number of rows are 9,659, which is what we expected when we ran our previous code.

## Removing non-English apps from the dataset

Since our demographic are English speakers, we can remove the apps that are not in English.

In [11]:
print(ios[813][1]) #Prints the name of the app in row 813
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


Behind the scenes, each character has an assigned number. We might be able to use this to remove non-English apps. We can find this out by using the built-in function 'ord()'. The most commonly used codes are from 0 to 127 in ASCII (American Standard Code for Information Interchange). 

In [12]:
print(ord('a'))
print(ord('A'))
print(ord('5'))
print(ord('+'))
print(ord('爱'))

97
65
53
43
29233


In [13]:
def English_yes(string):
    for char in string:
        if ord(char) > 127:
            return False
        else:
            return True

print(English_yes('Instagram'))
print(English_yes('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(English_yes('Docs To Go™ Free Office Suite'))
print(English_yes('Instachat 😜'))

True
False
True
True


Symbols and emojis fall outside of the typical ASCII range. However, my code seemed to correctly distinguish the English names with symbols or emojis. There was concern that the code would not be able to identify the titles with emojis or symbols in it because it fell onto a high ASCII number.

In [14]:
def English_yes(string):
    ascii_m3 = 0
    
    for char in string:
        if ord(char) > 127:
            ascii_m3 += 1
            
    if ascii_m3 > 3:
        return False
    else:
        return True

print(English_yes('Instagram'))
print(English_yes('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(English_yes('Docs To Go™ Free Office Suite'))
print(English_yes('Instachat 😜'))

True
False
True
True


Using the code we used above, we can now loop through both datasets to exclude non-English apps.

First, you'll want to create empty lists to fill with lists.
Loop through the data, extract the name of the app, if the name of the app is returns 'True' from our 'English_yes()' function, add it to the empty list of lists. 

In [15]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if English_yes(name):
        android_english.append(app)

for app in ios:
    name = app[1]
    if English_yes(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3)
explore_data(ios_english, 0, 3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

Our new dataset that is cleaned of duplicates and only in English for Android are 9,614 and for iOS are 6,183.

## Isolating the apps that are free from our datasets.

We're going to create new empty lists for both data sets. Then we're going to loop through the data and if the price is equal to 0, we'll add it to the new list of lists.

In [16]:
android_free = []
ios_free = []

for app in android_english:
    price = app[7]
    
    if price == '0':
        android_free.append(app)

for app in ios_english:
    price = app[4]
    
    if price == '0.0':
        ios_free.append(app)

print(len(android_free))
print(len(ios_free))

8864
3222


Our final dataset for Android has 8,864 apps and for iOS has 3,222 apps.

## Most Popular Genres for the Top Free Apps

The developers want to release apps both on Android and iOS, so we want to find the most popular genres in both sectors.

They will be test running an app on Android, and if it gets a good response, they'll develop the app more. And if after six months the app is profitable, they'll develop the same app for iOS. 

The code below is going ot create a function that pops out the frequency tables of the app genres with the most popularity. First, we're going to define the function, create an empty dictionary, loop through the dataset~input, add one to the total of apps, and if the app doesn't already exist add it to the dicitonary. Then in the loop, we're going to create another empty dictionary, loop through the table we created in the above part of the loop, and calculate percentages. 

The second code will pop out the sorted table of the frequency table created above. It first creates an empty list of lists, loops through the frequency table code created above, saves the values as a tuple, adds the tuple to the list of lists, and then sorts the tuples. 

In [23]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def displayfreq(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
    return table

In [18]:
displayfreq(ios_free, -5) #-5 Index is the genre in iOs

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


{'Book': 0.4345127250155183,
 'Business': 0.5276225946617008,
 'Catalogs': 0.12414649286157665,
 'Education': 3.662321539416512,
 'Entertainment': 7.883302296710118,
 'Finance': 1.1173184357541899,
 'Food & Drink': 0.8069522036002483,
 'Games': 58.16263190564867,
 'Health & Fitness': 2.0173805090006205,
 'Lifestyle': 1.5828677839851024,
 'Medical': 0.186219739292365,
 'Music': 2.0484171322160147,
 'Navigation': 0.186219739292365,
 'News': 1.3345747982619491,
 'Photo & Video': 4.9658597144630665,
 'Productivity': 1.7380509000620732,
 'Reference': 0.5586592178770949,
 'Shopping': 2.60707635009311,
 'Social Networking': 3.2898820608317814,
 'Sports': 2.1415270018621975,
 'Travel': 1.2414649286157666,
 'Utilities': 2.5139664804469275,
 'Weather': 0.8690254500310366}

The most popular free genres by far in the iOS apps are games at roughly 58%. The next few are entertainment, photo & video, education, and socal networking. The developers should look into creating a game or an app that is casual.

In [19]:
displayfreq(android_free, 1) #1 Index is the Category in Androiod

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

{'ART_AND_DESIGN': 0.6430505415162455,
 'AUTO_AND_VEHICLES': 0.9250902527075812,
 'BEAUTY': 0.5979241877256317,
 'BOOKS_AND_REFERENCE': 2.1435018050541514,
 'BUSINESS': 4.591606498194946,
 'COMICS': 0.6204873646209386,
 'COMMUNICATION': 3.2378158844765346,
 'DATING': 1.861462093862816,
 'EDUCATION': 1.1620036101083033,
 'ENTERTAINMENT': 0.9589350180505415,
 'EVENTS': 0.7107400722021661,
 'FAMILY': 18.907942238267147,
 'FINANCE': 3.7003610108303246,
 'FOOD_AND_DRINK': 1.2409747292418771,
 'GAME': 9.724729241877256,
 'HEALTH_AND_FITNESS': 3.0798736462093865,
 'HOUSE_AND_HOME': 0.8235559566787004,
 'LIBRARIES_AND_DEMO': 0.9363718411552346,
 'LIFESTYLE': 3.9034296028880866,
 'MAPS_AND_NAVIGATION': 1.3989169675090252,
 'MEDICAL': 3.531137184115524,
 'NEWS_AND_MAGAZINES': 2.7978339350180503,
 'PARENTING': 0.6543321299638989,
 'PERSONALIZATION': 3.3167870036101084,
 'PHOTOGRAPHY': 2.944494584837545,
 'PRODUCTIVITY': 3.892148014440433,
 'SHOPPING': 2.2450361010830324,
 'SOCIAL': 2.662454873646

The most popular free apps for Android look quite different from the iOS apps. The top three app Categories are family, games, and tools. The developers' best bet will be to develop games for both markets. 

In [20]:
displayfreq(android_free, -4) #-4 Index is the app genres for Android

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

{'Action': 3.1024368231046933,
 'Action;Action & Adventure': 0.10153429602888085,
 'Adventure': 0.6768953068592057,
 'Adventure;Action & Adventure': 0.033844765342960284,
 'Adventure;Education': 0.01128158844765343,
 'Arcade': 1.8501805054151623,
 'Arcade;Action & Adventure': 0.12409747292418773,
 'Arcade;Pretend Play': 0.01128158844765343,
 'Art & Design': 0.5979241877256317,
 'Art & Design;Action & Adventure': 0.01128158844765343,
 'Art & Design;Creativity': 0.06768953068592057,
 'Art & Design;Pretend Play': 0.01128158844765343,
 'Auto & Vehicles': 0.9250902527075812,
 'Beauty': 0.5979241877256317,
 'Board': 0.3835740072202166,
 'Board;Action & Adventure': 0.02256317689530686,
 'Board;Brain Games': 0.078971119133574,
 'Books & Reference': 2.1435018050541514,
 'Books & Reference;Education': 0.01128158844765343,
 'Business': 4.591606498194946,
 'Card': 0.45126353790613716,
 'Card;Action & Adventure': 0.01128158844765343,
 'Casino': 0.42870036101083037,
 'Casual': 1.7599277978339352,
 '

The top free Android app genres are Tools, Entertainment, Education, and Business. The Android data has both a category and genre variable. Moving forward, we'll use the Genre category since there's a lot more information in it.

## Extrapolating Average Number of Users That Have Installed an App per Genre

We might want to see how many users have installed the app on their phones. We'll accomplish this by dividng the total number of ratings for each app and then dividing it by the number of ratings in the genre.

In [21]:
genres_ios = displayfreq(ios_free, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre:
            tot_ratings = float(app[5])
            total += tot_ratings
            len_genre += 1
    avg_tot_rating = total / len_genre
    print(genre, 'with average total rating of', avg_tot_rating) 

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
Catalogs with average total rating of 4004.0
Education with average total rating of 7003.983050847458
Shopping with average total rating of 26919.690476190477
Finance with average total rating of 31467.944444444445
Productivity with average total rating of 21028.410714285714
Games with average t

Reference, Navigation, and Social Networking have the highest number of user reviews. These results make sense since most phones come pre-installed with Facebook or Google Maps. 

In [30]:
for app in ios_free:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) 

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In Reference, we see that the Bible app and Dictionary.com app are the most popular apps by far. Although popular, the next few top apps aren't as popular, which means, we may not want to develop a Reference app.

In [28]:
for app in ios_free:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) 

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


From this, we see that Waze is and Google Maps are the most popular navigation apps. It makes sense why we saw Navigation being one of the top categories. 

In [29]:
for app in ios_free:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The top social networking apps are Facebook, Pinterest, Skype and Messenger. These apps aren't just social networking apps, they are also useful to people by providing other services or unique features. 

In [37]:
categories_android = displayfreq(android_free, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, 'with average total rating of', avg_n_installs)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Similar to what we see with the iOS apps, Social, Communication and Travel_and_Local are the most popular apps. 

In [39]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Similar to the iOS apps, we see the same top apps.

In [41]:
for app in android_free:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


## Conclusion

In this guided project, we started out with a dataset of app information for the iOS store and the Google Play store. Data cleaning was done to ensure we were left with a dataset that would be usable to answer the question, what free app type would you recommend developing based on popularity. 

Given that leisure and utility apps are some of the most popular apps, I might recommend identifying gaps may people have in terms of utility and creating a social networking app around it. It could also be helpful to add elements of fun and/or leisure into the app.