# Comparison of free apps

The goal of this project is to compare the revenue via in-app ads of the top free apps in both Play Store and App Store.

## Opening and Exploring the Data

First, we will start presenting our two datasets and opening them:

In [1]:
from csv import reader

#App Store database#
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

#Google Play database#
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

Now, we will create a function to easily open them during the analysis:

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

First, we should explore the headers and the first rows of each database, as well as check the total number of rows and columns:

In [3]:
print(android_header)
print('\n')
explore_data(android,0,5,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'E

In [4]:
print(ios_header)
print('\n')
explore_data(ios,0,5,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


## Deleting Wrong Data

In the Google Play data set, row 10472 is mentioned as having a problem, so we will check what kind of issue is it:

In [5]:
print(android_header)
print('\n')
explore_data(android,10471,10474)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




We have confirmed that in row 10472 of our Google Play data base there is an issue where the second column "Category" does not appear. Therefore, we will get rid of this row to avoid future problems:

In [6]:
del android[10472]
explore_data(android,10471,10474)

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


['Sat-Fi Voice', 'COMMUNICATION', '3.4', '37', '14M', '1,000+', 'Free', '0', 'Everyone', 'Communication', 'November 21, 2014', '2.2.1.5', '2.2 and up']




## Removing Duplicate Entries 

Once we solved this problem, now we will check for duplicated apps in our data base comparing their names:

In [7]:
duplicated_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicated_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate android apps:', len(duplicated_apps))
print('\n')
print('Examples of duplicate apps:', duplicated_apps[:10])

print('\n')

duplicated_apps = []
unique_apps = []

for app in ios:
    name = app[0]
    if name in unique_apps:
        duplicated_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate ios apps:', len(duplicated_apps))
print('\n')
print('Examples of duplicate apps:', duplicated_apps[:10])

Number of duplicate android apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Number of duplicate ios apps: 0


Examples of duplicate apps: []


We confirmed that there are 1181 duplicated apps in the Android data set. We will check now why this happened by checking the actual rows:

In [8]:
for app in android:
    name = app[0]
    if name == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


If we check the rows, we can verify that there is a difference in the review count, indicating that the information for each app was scrapped at different times.

In order to obtain an accurate result, we will take only the most recent information for each duplicated app, which will be the row with the higest number of reviews.

First, we will create a dictionary called reviews_max which will contain the unique names of the apps as key and their maximum number of reviews as value:

In [9]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Final number of unique apps is:', len(reviews_max))

Final number of unique apps is: 9659


Then, we will loop through our original data set that containes the duplicated apps and every time the number of reviews of that row matches the value of the app name key in the reviews_max dictionary we created before, we will add that row to the android_clean list:

In [10]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

print('Final number of rows in the cleaned Android data set is:', len(android_clean))

Final number of rows in the cleaned Android data set is: 9659


## Removing Non-English Apps

Since we are only interested only in the apps directed towards an English-speaking audience, we will also remove the rows of the apps in which their title is not part of group of characters commonly used in an English text (ASCII until symbol number 127).

In order to identify non-english title apps, we will create a function verify_english that iterates through the string of each app title and checks the order of each character using the built-in function ord(). If there is some character over the order of 127, it will return False, otherwise it will return True.

In [11]:
def verify_english(title):
    for character in title:
        if ord(character) > 127:
            return False
        return True

In [12]:
print(verify_english('Instagram'))
print(verify_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))
print(verify_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(verify_english('Instachat üòú'))
print(verify_english('üòú'))

True
False
True
True
False


If we use the function verify_english, we can select the apps in which the first character in the title is inside the 1-127 ASCII range. However, we also want to include the apps in which emojis and other special symbols are present, which order number is over 127.

In [13]:
print(ord('üòú'))
print(ord('‚Ñ¢'))

128540
8482


Therefore, we will include the criteria that more than three symbols in the title string should be of order higher than 127 to return False in the verify_english function:

In [14]:
def verify_english(title):
    nonenglish_count = 0
    for character in title:
        if ord(character) > 127:
            nonenglish_count += 1
    if nonenglish_count > 3:
            return False
    return True

In [15]:
print(verify_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(verify_english('Instachat üòú'))
print(verify_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

True
True
False


Now, we will use the new function to filter out non-English apps from both data sets looping through them and adding the app title that matches de condition in a separate list:

In [16]:
android_english = []
ios_english = []

for app in android_clean:
    title = app[0]
    if verify_english(title) is True:
        android_english.append(app)
        
for app in ios:
    title = app[1]
    if verify_english(title) is True:
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

## Isolating the free apps

As mentioned before, we are only interested in the free apps, since out main source of revenue consists of in-app ads. Therefore, we will loop through the android_english and ios_english lists to select only the free apps for each data base:

In [17]:
print(android_header)
print('\n')
print(ios_header)

android_free = []
ios_free = []

for app in android_english:
    if app[7] == '0':
        android_free.append(app)
        
for app in ios_english:
    if app[4] == '0.0':
        ios_free.append(app)

print('\n')
print(len(android_free))
print(len(ios_free))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


8864
3222


We're left with 8864 Android apps and 3222 iOS apps for our analysis.

## Most Common Apps by Genre

The aim of this analysis is to determine what kind of apps are likely to attract more users, since our revenue model will be based on the number of people that uses our apps.

Our validation strategy for an app idea consists of the following three steps:

1. Build a basic version of an app and publish it in Google Play.
2. If the app has a good response from users, we will develop it further.
3. If the app is profitable after six months, we will develop and publish an iOS version.

Because our final goal is to publish in both Apple and Android markets, we need to find the kind of app that is successgul on both markets.

Let's begin the analysis by finding what are the most common genres for each market by creating frequency tables for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.

We'll build two functions we can use to analyze the frequency tables:

* One function to generate frequency tables that show percentages.
* Another function that we can use to display the percentages in a descending order.

In [18]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total * 100)
        table_percentages[key] = percentage
        
    return table_percentages

In [19]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We start by examining the frequency table for the prime_genre column of the App Store data set:

In [20]:
display_table(ios_free, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can verify how more than half of the English free apps in Apple Store are games (58.6%), meanwhile Entertainment category is close to 8% and Photo & Video to 5%. Following these 3 categories we find education with 3.66% and Social Networking with 3.28%.

Judging by these results, it looks like most of the apps in Apple Store are designed for fun (games, entertainment, photography, social networking, etc.) meanwhile apps with more practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare.

Let's continue by checking the Category and Genres columns of the Google Play data set:

In [21]:
display_table(android_free, 1) #Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

We find a quite different situation on Google Play: judging by the Category column, there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes. However, if we check the actual Google Play from 2018 we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids, taking the first place as it happened in the Apple Store.

In [22]:
display_table(android_free, 9) #Genre

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

However, by checking the Genre column we confirm that practical apps seem to have a better representation on Google Play compared to App Store, with the genre Tools taking the first place with 8.5%.

Also we can notice that in Google Play for the Genre store there are much more categories compared to the column Category. In order to look for the bigger picture, we will focus for now only on the Category column.

If we compare the results of the frequency tables for Google Play and App Store, we find that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps.

## Most Popular Apps by Genre on the App Store

In order to find what genres are the most popular we will calculate the average number of installs for each app genre.

For the Google Play data set, we can find this information in the Installs column. However, for the App Store data set we cannot find this information. Instead, we will take the total number of user ratings as a reference, which we can find in the rating_count_tot column.

Below, we calculate the average number of user ratings per app genre on the App Store:

In [23]:
prime_genre_ios = freq_table(ios_free, -5)

In [24]:
for genre in prime_genre_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre:
            number_ratings = float(app[5])
            total += number_ratings
            len_genre += 1
    average_ratings = total / len_genre
    print(genre, average_ratings)

Social Networking 71548.34905660378
Photo & Video 28441.54375
Games 22788.6696905016
Music 57326.530303030304
Reference 74942.11111111111
Health & Fitness 23298.015384615384
Weather 52279.892857142855
Utilities 18684.456790123455
Travel 28243.8
Shopping 26919.690476190477
News 21248.023255813954
Navigation 86090.33333333333
Lifestyle 16485.764705882353
Entertainment 14029.830708661417
Food & Drink 33333.92307692308
Sports 23008.898550724636
Book 39758.5
Finance 31467.944444444445
Education 7003.983050847458
Productivity 21028.410714285714
Business 7491.117647058823
Catalogs 4004.0
Medical 612.0


According to this data, Navigation, Reference, Social Networking and Music are the most popular apps, in this order.

Since it looks strange that genres like Navigation and Reference come up as the most popular, we will check the most popular apps of each of those four genres to confirm the results:

In [25]:
print('Navigation')
for app in ios_free:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])
        
print('\n')

print('Reference')
for app in ios_free:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])
        
print('\n')

print('Social Networking')
for app in ios_free:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])
        
print('\n')

print('Music')
for app in ios_free:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Navigation
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching¬Æ : 12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Reference
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Ed

We can observe that in reality the average number of reviews can be heavily influenced by two or three apps in the case of categories where the total numbers of app is low, like Reference (Bible and Dictionary.org)and Navigation (Waze and Google Maps).

In the other hand, Social Networking and Music show much more balanced numbers, with 26 apps with more than 30,000 reviews in the Social Networking category and 12 apps in the Music Category.

Giving a look at the Social Networking most popular list, after some giants like Facebook or Whatsapp, some other more niche apps have found a place in the market creating a social network for more specific purpose, like We Heart It (Fashion, wallpapers...) or various dating apps.

## Most Popular Apps by Genre on Google Play

Now we will search for the most popular genres in the Google Play database.

In this case we can use the number of installs. However, to be able to use the the installs values we will need to remove first the comas and the + symbols included in the data. Although most values are open-ended (100+, 1,000+, 5,000+, etc.), we can just take the number since we do not need a high degree of precision to compare the different categories:

In [26]:
prime_category_android = freq_table(android_free, 1)

In [27]:
for category in prime_category_android:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs = float(n_installs)
            total += n_installs
            len_category += 1
    average_installs = total / len_category
    print(category, ':', average_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

We can observe that the communication apps have the most installs: 38,456,119. If we check this category, we can find that social networks are also included in it:

In [28]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'
                                      or app[5] == '50,000,000+'
                                      or app[5] == '10,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
Xperia Link‚Ñ¢ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
free video calls and chat : 50,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
chomp SMS : 10,000,000+
Glide - Video Chat Messenger : 10,000,000+
Text SMS : 10,000,000+
Talkray - Free Calls & Texts : 10,000,000+
LINE: Free Calls & Messages : 500,000,000+
GroupMe : 10,000,000+
Goo

We can verify that even not well-known apps in the Communication category are able to get more than 10,000,000 million installs, which matches the data we obtained for the App Store.

# Conclusion

In this project, we analyzed the data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets, considering we were looking for a free app with in-app ads revenue model.

We can conclude that a social networking focusing on a specific target of users, like shopping, make up or dating, could turn into a profitable app for both the Google Play and the App Store markets.