# Profitable App Profiles in App Store and Google Play Markets

This project will analyze apps on the App Store and Google Play markets to try and find apps which are engaging to users. The more engagement by users with ads, the more ad revenue we can generate with our app. 

The goal of this project is to identify characteristics within profitable apps, in order to help our developers implement features in our own app to increase user engagement with ads. 

In [1]:
from csv import reader

# The Apple Store dataset
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

# The Google Play dataset
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

We will define and use a function called `explore_data()` to get a better look at the data sets that we have just imported. Exploring these datasets will let us know which columns of the dataset will be most useful for our analysis. 

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new empty line after each row
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        

In [3]:
# First 3 rows of android dataset
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


The Google Play datase has 10841 rows and 13 columns. It seems the columns that might be most interesting for this analysis are: App, Category, Reviews, Installs, Type, Price, and Genres. 

In [4]:
# First 3 rows of ios dataset
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


The App Store dataset has 7197 rows and 16 columns. The columns that might be most interesting for this dataset appear to be:  `'track_name'`, `'currency'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'`. The column names are not all self-explanatory, the documentation for the dataset can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

# Deleting Wrong Data

From reading the discussion section of the `android` dataset, we see there is an error in row 10472. We will first print the row, and if does indeed contain an error, delete the row.

In [5]:
print(android[10472])
print('\n')
print(android[2]) #compare to a row which doesn' contain an error

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


The rating for this row is 19, and the maximum rating allowed in Google Play Store is 5. Because of this, we will delete the row. 

In [6]:
del android[10472]

# Removing Duplicate Entries

Now we will check if there duplicate entries in the Google Play dataset.

In [7]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We can see that the app Instagram has 4 duplicate entries. Looking at the separate entries, it looks as though the main difference comes from the number of reviews in the 4th column. We can later use this as a criteria for removing duplicates. First, we need to determine how many duplicate entries exist in the dataset. 

In [8]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


We can see there are quite a few duplicates in this dataset. To remove the duplicates, we will use the Instagram example from above as a basis. Since the number of reviews are the main difference from the duplicates, we can deduce that the datapoints were taken at different points in time. We want the most up-to-date datapoints, so we will keep the data points with the most number of reviews. 

In [9]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))


Expected length: 9659
Actual length: 9659


Above we created a dictionary with the key being the app name, and the corresponding value is the maximum number of reviews. We looped through the android data set and assigned the app and maximum number of reviews to the new dictionary `reviews_max`. The dictionary length is expected to equal the length of the `android` dataset, minus the number of duplicate entries we found earlier. 

Now we will remove the duplicates using the dictionary we just created. We create two empty lists, `android_clean` and `already added`. 

In [10]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


Above, we loop through the `android` data set, and for every iteration:
* We isolate the name of the app and the number of reviews.
* We add the current row (app) to the `android_clean` list, and the app name (name) to the `already_cleaned` list if:
    * The number of reviews of the current app matches the number of reviews of that app as described in the `reviews_max` dictionary; and
    * The name of the app is not already in the `already_added` list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for `reviews_max[name] == n_reviews`, we'll still end up with duplicate entries for some apps.
    
Exploring the newly created list shows that there are 9659 rows, just as expected. 

# Removing Non-English Apps

For our analysis, we are only interested in English language apps, since that is the type of app our company will produce. Below, we can see there are examples from both the `ios` and `android_clean` datasets that are not in the English language.

In [11]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


We will attempt to remove those apps which are not in English. English text usually uses letters from the English aplphabet, numbers composed of digits from 0 to 9, punctuation marks, and other symbols (+, *, /). 

The numbers corresponding to these characters are in the range of 0 to 127, based on the ASCII standard. We can take advantage of that by creating a function that checks if the app name contains characters in the ASCII range. 

In [12]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

    
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


Above we created the function `is_english` to check whether the encoding number for the string falls within the ASCII range or not. We use the built-in `ord()` function to help us do this. However, we see in the last two examples that apps with emojis or other special characters in their names will be disqualified using this method. 

We will modify our function to allow for up to three non-ASCII characters. This method is not perfect, but should capture a substantial amount of datapoints. 

In [13]:
def is_english(string):
    non_ASCII = 0
    for character in string:
        if ord(character) > 127:
            non_ASCII += 1
    if non_ASCII > 3:
        return False    
    else:
        return True

    
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


Now that we have our function created, we will loop through both datasets to filter out the non-English results. 

In [14]:
android_english = []
ios_english = []

for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
explore_data(ios_english, 0, 3, True)
print('\n')
explore_data(android_english, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Vari

We see that we are left with 6183 iOS apps, and 9614 android apps.

# Isolating Free Apps

As the last step in our data cleaning process, we will isolate the free apps from the paid apps, since our company will be building a free app and relying on in-app ads as our main source of revenue. 

In [15]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8864
3222


We are left with 8864 apps from the `android` dataset, and 3222 apps from the `ios` dataset. We are now ready to begin our analysis. 

# Most Common Apps by Genre

## Part One

Our strategy for creating an app is laid out in the steps below. In this way, we minimize the overhead and risk in developing an app. 

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Since our end goal is to create an app that is successful in both app markets, we need to find app profiles that work well with both markets. 

To start, we will create frequency tables for both datasets to determine the most common apps by genre. 

We will use the `"prime_genre"` column from the iOS dataset, and the `Category` and `Genres` columns from the Android dataset.

## Part Two

We will build two functions to analyze frequency tables
* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order

In [16]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages ={}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    return table_percentages
    
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

## Part Three

Below we create a frequency table for the `prime_genre` column of the App Store dataset. 

In [17]:
display_table(ios_final, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We see above that the most common genre by far is Games, followed by Entertainment. Overall, the majority of the apps appear to be for entertainment purposes, rather than practical uses. Games, Entertainment, Photo and Video, and Social Networking make up almost 75% of the dataset alone. 

Only looking at this analysis, one would imply that an entertainment based app profile would be more successful on the App Store. However, the number of apps may no correlate to number of users. 

We will continue by analyzing the `Category` and `Genres` columns of the Google Play dataset.

In [18]:
display_table(android_final, 1) #Category column

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

We see a bit of difference here from the App Store dataset. The app profiles on the Google Play Store are a bit more balanced, with Family having the largest percentage of apps. The disparity between the top category and the second most category is also significantly less than it was in the App Store dataset. 

Looking through the top 10, we can see quite a few pratical app categories in addition to Family, like Tools, Business, Finance, Medical, etc. 

It would appear that the most frequent app catagories are inherently different for the App Store and Google Play Store. Below, we will look at the `Genres` column of the Google Play dataset to see if there are any differences.

In [19]:
display_table(android_final, 9) #Genres column

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

We can see similar trends to the `Category` column here, with a more balanced distribution between entertainment and pratical app profiles. The breakdown in genres is very detailed here, so for our purposes, we will use the `Category` column from here on. 

To this point, we can deduce that a majority of app profiles on the App Store are geared towards enertainment, while the Google Play Store maintains greater balance between entertainment and practical app profiles. 

# Most Popular Apps by Genre on the App Store

One way we can find out what genres are most popular is calculate the average number of installs for each app genre. The Google Play dataset has a column named `Installs`, but this information is missing in the App Store dataset. As a workaround, we will use the `rating_count_total` column instead. 

Below, we will calculate the average number of user ratings per genre on the App Store. 

In [20]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Travel : 28243.8
Health & Fitness : 23298.015384615384
Finance : 31467.944444444445
Book : 39758.5
Education : 7003.983050847458
Sports : 23008.898550724636
Games : 22788.6696905016
Food & Drink : 33333.92307692308
Utilities : 18684.456790123455
Entertainment : 14029.830708661417
Photo & Video : 28441.54375
News : 21248.023255813954
Catalogs : 4004.0
Music : 57326.530303030304
Medical : 612.0
Navigation : 86090.33333333333
Productivity : 21028.410714285714
Business : 7491.117647058823
Shopping : 26919.690476190477
Weather : 52279.892857142855
Lifestyle : 16485.764705882353
Reference : 74942.11111111111
Social Networking : 71548.34905660378


Overall, `Navigation`, `Social Networking`, and `Reference` have the highest average number of ratings. 

Generally, we know that there are not a lot of navigation apps out there, so maybe the high average number of ratings is skewed by a few very popular apps. 

In [21]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) #print app name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Above we can see that our assumption was correct, and the high average number of ratings comes from only two very popular apps. We may suspect that the same may be true for the social networking category. 

In [22]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Although there are more apps in this category, the two most reviewed, Facebook and Pinterest, greatly eclipse the rest of the apps and skew the high average number of reviews. 

Reference may be an interesting category to look at, as it is not readily apparent which apps this is referring to. 

In [23]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


While this category is again skewed by the Bible and Dictionary.com apps, there could be room here for something. Since religous apps seem to be popular, with the Bible and Muslim Pro apps having a high average number of reviews, we could create a religous app for the Jewish religion as an example. And with the entire app store seemingly more focused on "fun" apps, a more practical app may help us to stand out from the pack. 

# Most Popular Apps on Google Play

The Google Play dataset has a column with the number of installs, so we should be able to get a clear picture of app popularity by genre. However, the install numbers are not as precise as we would like them to be.

In [24]:
display_table(android_final, 5) #the Installs column

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


We can't determine the exact amount of installs an app has with this data, only a generic range. But for our purposes, this will be enough. We will leave the numbers as is for this analysis. 

In order to do the analysis, we will however need to convert the install number from a string to a float. This means removing the commas and plus signs. We will do this in the loop below. 

In [26]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

MAPS_AND_NAVIGATION : 4056941.7741935486
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
TRAVEL_AND_LOCAL : 13984077.710144928
VIDEO_PLAYERS : 24727872.452830188
ART_AND_DESIGN : 1986335.0877192982
FINANCE : 1387692.475609756
GAME : 15588015.603248259
BUSINESS : 1712290.1474201474
NEWS_AND_MAGAZINES : 9549178.467741935
SHOPPING : 7036877.311557789
LIBRARIES_AND_DEMO : 638503.734939759
MEDICAL : 120550.61980830671
COMMUNICATION : 38456119.167247385
BOOKS_AND_REFERENCE : 8767811.894736841
LIFESTYLE : 1437816.2687861272
PARENTING : 542603.6206896552
SOCIAL : 23253652.127118643
COMICS : 817657.2727272727
TOOLS : 10801391.298666667
HOUSE_AND_HOME : 1331540.5616438356
FAMILY : 3695641.8198090694
PRODUCTIVITY : 16787331.344927534
BEAUTY : 513151.88679245283
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
PHOTOGRAPHY : 17840110.40229885
FOOD_AND_DRINK : 1924897.7363636363
DATING : 854028.830303

Here we can see similar results to the App Store dataset. The Communication, Video Players, and Social categories are most popular, but those categories are dominated by popular apps that will be hard to compete with. Let's take a closer look at the Books and Reference category and see if it aligns with our findings from the App Store dataset. 

In [27]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

Let's see if the most popular apps are skewing the average number of installs. 

In [28]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


While there are a few apps with a large number of installs, there are still quite a bit of apps remaining with a decent amount of installs. Let's look at a middle range of popularity to see if there are any trends we can spot (from 1,000,000+ to 100,000,000+).

In [29]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+' 
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

There appear to be a large amount of e-readers and dicionaries in this category, so building a similar app wouldn't differeniate our app from the market. 

Again, there are quite a few apps based around the Bible and the Quran. It seems there is some value in building an app based upon a popular book. 

# Conclusion

This project set out to find a successful free app profile for use on both the Google Play store and the App Store. After analyzing popular app categories on both platforms, an app based on a popular book was suggested to be a profitable option on both platforms. In order to differentiate between the existing competition, the app should contain more features, such as an audio version of the book, quizzes and quotes from the book, and possibly a discussion forum from the book. 