# In-App Ad Analysis
This is an analysis to see what types of apps attract the most users. 
The goal of this project is to determine which free apps currently available attract the most users so developers can know which apps to develop to place advertisements in to get more revenue.
The analysis will be performed on apps that are found in both of the Google and Apple app stores.

The Google data set can be found here: [data set](https://www.kaggle.com/lava18/google-play-store-apps/home)

The Apple data set can be found here: [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home#AppleStore.csv)

Here we open the data sets and explore the first few rows to see what they contain:

In [1]:
### Opening Apple Dataset ###       
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
data = list(read_file)  
apple_header = data[0]
apple_apps = data[1:]

### Opening Android Dataset ###
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
data = list(read_file)  
android_header = data[0]
android_apps = data[1:]

Now that the datasets are open, we can create a function that explores the first few rows to see the header and to get some initial observations:

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(apple_header)
print(explore_data(apple_apps,0,3,True))
print('\n')

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17
None




From what we can see in the Apple dataset, there are 7197 rows and 17 columns. For the purposes of our analysis we will focus on track_name, price, rating_count_tot, rating_count_ver, and prime_genre.

In [3]:
print(android_header)
print(explore_data(android_apps,0,3,True))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13
None


From what we can see in the Android dataset, there are 10841 rows and 13 columns. For the purposes of our analysis we will focus on App, Category, Installs, Reviews, Type, Price, Content Rating, and Genre. 

## Cleaning the data

For the Android data set, there is a [discussion forum](https://www.kaggle.com/lava18/google-play-store-apps/discussion) where other people who are exploring the data set can ask questions or address any errors with the data. There is a [post](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) in the discussion forum that points out that there is an incorrect line that is causing issues with column alignment. The row has been identified as row 10472.

We will take a further look at this line to see if it does need to be removed from the data set.

In [4]:
print(android_apps[10472])
print('Number of columns for 10472:', len(android_apps[10472]))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
Number of columns for 10472: 12


As we can see here, there are only 12 columns for this particular row, which is one shorter than the rest of the data that has 13 columns.
We can also see that this particular row is missing a Category column.
Since this row is indeed invalid, we will go ahead and delete it from this dataset.

In [5]:
print(len(android_apps))
### Do not run this line of code more than once. ###
del android_apps[10472]

### Validating deletion of data has been done. ###

print(len(android_apps))

10841
10840


After reviewing the [discussion forum](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion) for the Apple dataset, there were no data issues identified within the data set.

## Deleting duplicate data

In the Android App dataset discussion forum, there is a [post](https://www.kaggle.com/lava18/google-play-store-apps/discussion/67894#latest-416829) regarding duplicate data contained within the dataset.

We will go ahead and identify those apps that are duplicates.

In [6]:
duplicate_apps = []
unique_apps    = []

for row in android_apps:
    if row[0] in unique_apps:
        duplicate_apps.append(row[0])
    else:
        unique_apps.append(row[0])
        
print('Number of duplicate apps:', len(duplicate_apps))
print('Example duplicated apps:', duplicate_apps[:15])

Number of duplicate apps: 1181
Example duplicated apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


As we can see, there are 1181 apps that are duplicated in the Android dataset.

Let's take a closer look to see if there is a unique value among the duplicates.

In [7]:
for row in android_apps:
    if row[0] == 'Quick PDF Scanner + OCR FREE':
        print(row)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


Looking at the columns from the duplicates, the only difference between these rows is the rating column.

For the purposes of the analysts, we will focus only on those apps with the highest rating and then delete the other rows.

To do this, we will create a dictionary to store the app name and the highest rating.

After that, we will create two new lists to store the dataset without duplicate values and another one to determine if the line has already been added or not because as we saw in our above example, there are lines with duplicate ratings and do not need to be added in twice.

Once the dataset has been cleaned of duplicates, then we will check to see if the data set is the correct length.

In [8]:
### Creating a dictionary to store the app name and the highest rating value ###

reviews_max = {}

for rows in android_apps:
    name = rows[0]
    n_reviews = float(rows[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

### Creating a new dataset with the cleaned data ###        
android_clean = []
already_added = []

for rows in android_apps:
    name = rows[0]
    n_reviews = float(rows[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(rows)
        already_added.append(rows[0])
        
print('Length of data set before duplicates are removed:', len(android_apps))
print('Expected length after duplicates are removed:', len(android_apps) - 1181)
print('Length of new android_clean dataset:',len(android_clean))

Length of data set before duplicates are removed: 10840
Expected length after duplicates are removed: 9659
Length of new android_clean dataset: 9659


## Removing Non-English Apps

Both of the datasets contain apps that are geared more towards non-English speakers. For our analysis, we are only going to focus on those apps that are for English speakers.

To do that, we are going to create a function to filter out those apps that contain characters that are outside of the ASCII values of 0 to 127.

In [9]:
def english_or_not(string):
    english = True
    for character in string:
        if 0 <= ord(character) <= 127:
            english = True
        else:
            english = False
            break
    return english


print(english_or_not('Instagram'))
print(english_or_not('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_or_not('Docs To Go™ Free Office Suite'))
print(english_or_not('Instachat 😜'))

True
False
False
False


From what we can see with the initial creation of this app, there are some apps that are in English but also have characters that are outside of the ASCII range of 0 to 127. This might cause issues because it could potentially be deleting out apps that are in English.

To prevent this from happening, the function will be updated to see if there are at least three non-English characters in the app name before determining if the app needs to be removed from the dataset.

In [10]:
def english_or_not(string):
    english = False
    non_english_count = 0
    for character in string:
        if ord(character) > 127:
            non_english_count += 1
        
    if non_english_count > 3:
        english = False
    else:
        english = True
    return english

print(english_or_not('Instagram'))
print(english_or_not('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_or_not('Docs To Go™ Free Office Suite'))
print(english_or_not('Instachat 😜'))

True
False
True
True


Now that we have a better function to determine which apps are made for English speakers, we can go ahead and filter out those apps which are not for English speakers from both of our datasets.

We will do this by creating a new list for each dataset, creating a function to filter through the rows for non-English apps, and then appending only those rows that are for English speakers.

In [11]:
android_apps_english = []
apple_apps_english = []

def english_clean(dataset,name_col_no, clean_dataset):
    for row in dataset:
        name = row[name_col_no]
        if english_or_not(name):
            clean_dataset.append(row)
    return clean_dataset

print('The length of the Android dataset before non-English apps are removed is:',len(android_clean))
android_apps_english = english_clean(android_clean,0,android_apps_english)        
print('The length of the Android dataset after non-English apps are removed is:', len(android_apps_english))
print('\n')
print('The length of the Apple dataset before non-English apps are removed is:',len(apple_apps))
apple_apps_english = english_clean(apple_apps,2,apple_apps_english)        
print('The length of the Apple dataset after non-English apps are removed is:', len(apple_apps_english))

The length of the Android dataset before non-English apps are removed is: 9659
The length of the Android dataset after non-English apps are removed is: 9614


The length of the Apple dataset before non-English apps are removed is: 7197
The length of the Apple dataset after non-English apps are removed is: 6183


## Focusing on only free apps

Since the focus for this data analysis is free apps, we will now filter for only those apps that are free. To do this, we will create two new lists for the Apple and Android apps that have a price of 0 or type of Free.

In [15]:
android_free_apps = []
apple_free_apps   = []


def free_apps_only(dataset,price_col_no, free_dataset):
    for row in dataset:
        price = row[price_col_no]
        if price == '0' or price == '0.0':
            free_dataset.append(row)
    return free_dataset    

print('The length of the Android dataset before paid apps are removed is:',len(android_apps_english))
android_free_apps = free_apps_only(android_apps_english,7,android_free_apps)        
print('The length of the Android dataset after paid apps are removed is:', len(android_free_apps))
print('\n')
print('The length of the Apple dataset before paid apps are removed is:',len(apple_apps_english))
apple_free_apps = free_apps_only(apple_apps_english,5,apple_free_apps)        
print('The length of the Apple dataset after paid apps are removed is:', len(apple_free_apps))

The length of the Android dataset before paid apps are removed is: 9614
The length of the Android dataset after paid apps are removed is: 8864


The length of the Apple dataset before paid apps are removed is: 6183
The length of the Apple dataset after paid apps are removed is: 3222


## Diving into Analyzing Free Apps

Now that we have the data sets to the information that we need, we will look further into which types of apps could be the most profitable.

The goal after determining which type of app to build is:

1. Develop a basic app and add it to the Google Play store.
2. If the app has a good response, then the app will be developped further.
3. In six months, if the app still appears to be profitable, then the app will be developed for iOS to be put onto the Apple App Store.

The apps will need to be appealing to both markets to maximize profit, so the first analysis will focus on which genre of apps are the most popular by building a function to create frequency table with the percentages and another function to display those percentages.

In [16]:
def freq_table(dataset,index):
    frequency_table = {}
    percentage_table = {}
    total_rows = 0
    for row in dataset:
        total_rows +=1
        column = row[index]
        if column in frequency_table:
            frequency_table[column] += 1
        else:
            frequency_table[column] = 1
    
    for key in frequency_table:
        percentage = (frequency_table[key]/total_rows) * 100
        percentage_table[key] = percentage
    
    return percentage_table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now we will take a look at the prime_genre column in the Apple apps data set to see which genres are the most popular.

In [17]:
display_table(apple_free_apps,12)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


From what we can see in the above table, that apps that are based upon entertainment purposes are more popular than practical purposes with Games at 56.73% and Entertainment apps at 8.01% being the top two most popular apps that are free and made for English speakers. This does not imply that the amount of users is going to be highest among the entertainment apps vs. the practical apps.

We will continue our analysis with the Android apps dataset. There are going to be two columns that we will look at in the Android apps dataset to determine the frequency: Category and Genre.

In [18]:
display_table(android_free_apps,1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

From what we can see in this table on the Category column that Family apps are at the top of the list with 18.9% with Games close behind at 9.71%. There seems to be more practical applications within the Android apps data set than the Apple apps data set. 

In [19]:
display_table(android_free_apps, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

From running the frequency table on the Genre column, we can see similar results to the Category column, where apps that are practical are a higher percentage than entertainment apps with Tools at 8.44% and Entertainment at 6.07%. The Genre column does have a wider range of different app types than the Category column, so for the purposes of our analysis, we will focus on the Category column rather than the Genre column.

## Popular Apps by Genre in App Stores

# Most popular Apple Apps

Now we will take a look at which apps are the most popular by the number of users by calculating the average number of installs for each genre. In the Android dataset, this information is easily obtained from the Installs column, but this is more difficult in the Apple dataset. In the Apple dataset, we will calculate the average number of installs by using the total user ratings column (rating_count_tot column) and dividing that by the total count for the genre to get this amount.

In [21]:
apple_freq_table = freq_table(apple_free_apps, 12)

for key in apple_freq_table:
    total = 0
    len_genre = 0
    for row in apple_free_apps:
        if row[12] == key:
            total += float(row[6])
            len_genre +=1
    average_user_installs = total/len_genre
    print(key,":",average_user_installs) 

Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0


From what we can see here, Navigation apps have the highest amount of user ratings with Waze and Google maps having the highest counts:

In [26]:
for app in apple_free_apps:
    if app[12] == 'Navigation':
        print(app[2],":",app[6])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching® : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


Even though Navigation apps do have the bulk of the user reviews, this would not suit our purposes for recommending building a free application with advertisements built into it since people use Navigation apps while driving or doing other activities. People would be more focused on obtaining directions for where they need to go rather than viewing advertisements. Also, these types of apps are quite complicated to build and would require resources that may not be within our reach.

The second most popular app genre is reference at 74942 reviews.

In [27]:
for app in apple_free_apps:
    if app[12] == 'Reference':
        print(app[2],":",app[6])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


The Bible app has the highest amount of user ratings at 985920 with the Dictionary.com Dictionary & Thesaurus app being second at 200047. Other applications in this genre are for apps that are reference applications for video games such as Pokemon Go and Minecraft.

This might be a better genre to have targeted advertising placed within a free application since a person would use the application for reference and see the advertisement at the same time. Ideas for an app may include references or inside information on popular movies franchiese, TV Shows, video games, and book series. In addition to this, there does not seem to be too many reference apps linked to video games, so this may be an untapped niche that may be worth looking into further.

Other popular apps such as Social Networking, Music, and Weather would not suit our purposes because of either pre-existing market dominance by Facebook, Pandora, or the Weather Channel.

Next, we will take a look a Android apps.

# Most popular Android Apps

The Android dataset already has the total amount of installs within the dataset, but these values are put into separate bins without any sort of accurate install count.

In [28]:
display_table(android_free_apps,5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


For our analysis though, we do not need precise numbers to determine which apps are the most popular. We can use the data provided in the Installs column to give us an approximate number.

However, we do need to remove all commas and plus signs from the data.

In [33]:
import re

android_freq_table = freq_table(android_free_apps,1)

for key in android_freq_table:
    total = 0
    len_category = 0
    for row in android_free_apps:
        if row[1] == key:
            installs = row[5]
            installs = int(re.sub(r'[^0-9]', '', installs))
            total += installs
            len_category += 1
    average_android_installs = total/len_category
    print(key,":",average_android_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

From looking at this output, communication apps have on average the most installations at 38,456,119. This is because major social networking messaging apps dominate this genre for example WhatsApp, Skype, Messenger, and Google Duo.

In [35]:
for app in android_free_apps:
    installs = app[5]
    installs = int(re.sub(r'[^0-9]', '', installs))
    if app[1] == 'COMMUNICATION' and installs >= 100000000:
        print(app[0],":",installs)

WhatsApp Messenger : 1000000000
imo beta free calls and text : 100000000
Android Messages : 100000000
Google Duo - High Quality Video Calls : 500000000
Messenger – Text and Video Chat for Free : 1000000000
imo free video calls and chat : 500000000
Skype - free IM & video calls : 1000000000
Who : 100000000
GO SMS Pro - Messenger, Free Themes, Emoji : 100000000
LINE: Free Calls & Messages : 500000000
Google Chrome: Fast & Secure : 1000000000
Firefox Browser fast & private : 100000000
UC Browser - Fast Download Private & Secure : 500000000
Gmail : 1000000000
Hangouts : 1000000000
Messenger Lite: Free Calls & Messages : 100000000
Kik : 100000000
KakaoTalk: Free Calls & Text : 100000000
Opera Mini - fast web browser : 100000000
Opera Browser: Fast and Secure : 100000000
Telegram : 100000000
Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000
UC Browser Mini -Tiny Fast Private & Secure : 100000000
Viber Messenger : 500000000
WeChat : 100000000
Yahoo Mail – Stay Organized : 10000000

From reviewing this output, a messenger application would not best suit our needs since there is already market dominance in existence.

Looking at our frequency table again, video players would also not be a good category due to market dominance by YouTube and Googe Play TV and Movies.

Referring back to our analysis of the Apple apps, Books and Reference and News and Magazines are all equally popular with Books representing 8,767,811 installs and News with 9,549,178 installs.

Let's take a deeper dive into the more popular applications in these genres.

In [37]:
for app in android_free_apps:
    if app[1] == 'NEWS_AND_MAGAZINES':
        print(app[0],":",app[5])

Fox News – Breaking News, Live Video & News Alerts : 10,000,000+
NEW - Read Newspaper, News 24h : 10,000,000+
BaBe + - Indonesian News : 1,000,000+
daily News : 1,000,000+
BBC News : 10,000,000+
Free TV Shows App:News, TV Series, Episode, Movies : 1,000,000+
News24 : 1,000,000+
Le Monde, the continuous news : 5,000,000+
Wireless news : 1,000,000+
CNN Breaking US & World News : 10,000,000+
Gnoche entertainment news · sports news is also free : 5,000,000+
BaBe - Read News : 10,000,000+
Nigeria News NAIJ.com : 1,000,000+
detikcom - Latest & Most Complete News : 10,000,000+
Dailyhunt (Newshunt) - Latest News, Viral Videos : 50,000,000+
BaBe Lite - Read Quota Saving News : 1,000,000+
ARY NEWS URDU : 500,000+
Bengali Newspaper - The first L. : 1,000,000+
Read- Latest News, Information, Gossip and Politics : 10,000,000+
Reddit: Social News, Trending Memes & Funny Videos : 10,000,000+
BBC Persian | BBC Farsi News : 100,000+
Opera News - Trending news and videos : 10,000,000+
Topbuzz: Breaking 

From what we can see from this frequency table, the majority of the applications are specific for regions, but there are some entertainment news apps in this list as well, which may be applicable to the idea of having an app that is related to a movie franchise, book, or video game.

Let's take a look at the Book and Reference category.

In [42]:
for app in android_free_apps:
    installs = app[5]
    installs = int(re.sub(r'[^0-9]', '', installs))
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0],":",installs)

E-Book Read - Read Book for free : 50000
Download free book with green book : 100000
Wikipedia : 10000000
Cool Reader : 10000000
Free Panda Radio Music : 100000
Book store : 1000000
FBReader: Favorite Book Reader : 10000000
English Grammar Complete Handbook : 500000
Free Books - Spirit Fanfiction and Stories : 1000000
Google Play Books : 1000000000
AlReader -any text book reader : 5000000
Offline English Dictionary : 100000
Offline: English to Tagalog Dictionary : 500000
FamilySearch Tree : 1000000
Cloud of Books : 1000000
Recipes of Prophetic Medicine for free : 500000
ReadEra – free ebook reader : 1000000
Anonymous caller detection : 10000
Ebook Reader : 5000000
Litnet - E-books : 100000
Read books online : 5000000
English to Urdu Dictionary : 500000
eBoox: book reader fb2 epub zip : 1000000
English Persian Dictionary : 500000
Flybook : 500000
All Maths Formulas : 1000000
Ancestry : 5000000
HTC Help : 10000000
English translation from Bengali : 100000
Pdf Book Download - Read Pdf Boo

From this frequency table, we can see that there are a lot of apps that are either reference books for programming languages, books like the Bible or Quran and eReaders.

However, there are applications such as Guide (for X-MEN), Painting Lulu DC Super Friends, and My Little Pony AR Guide that appear to be guides for popluar comic and cartoon shows, which shows promise that creating an app based around a movie franchise, book, video game, or even a TV show might be the best type of an application to develop. These applications can be built with special features like images, quotes, behind the scenes, or even a chat application within the app so people can chat about their favorite movie, video game, book, or TV show.

## Conclusions

In this project, we analyzed app data from both the Apple App Store and Google Android App store with the end goal of identifying which types of free apps would be the most profitable in both markets.

We concluded that developing an app that is based upon a popular movie franchise, video game, book, or TV show could be profitable in both markets. The app can be built upon existing libraries to make development and deployment to the Google Play or Apple Apps store much quicker. The app can feature things such as images, quotes, behind the sceans, or chat applications, etc.