# Identifying Profitable Apps

This project aims to identify mobile profitable app profiles for both Android and iOS markets. 

Problem to solve: our client develops free apps and would like to know what app types would be more profitable to build. The business model of the company is building free apps to generate revenue from in-app adds. Thus, our target is to identify app profiles that have a great user reach. 



# Opening and Exploring the Data
 
We are going to use two datasets. Both are a sample of the total number of apps available on App Store and Google Play Store. 

Google Play data set includes aproximately ten thousand android apps.
App Store dataset includes data about around seven thousand iOS apps. 

Here are the links to both datasets: 
- Google Play Store - https://www.kaggle.com/lava18/google-play-store-apps
- Apple Store - https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps


To avoid duplication of code, let's create a function `open_dataset` that we are going to use to open the files.

In [1]:
from csv import reader

def open_dataset(dataset):
    opened_file = open(dataset)
    read_file = reader(opened_file)
    data = list(read_file)
    header = dataset[0]
    file = dataset[1:]
    return data

#open AppleStore data set
applestore_dataset = open_dataset('AppleStore.csv')

#open GooglePlay data set
googleplaystore_dataset = open_dataset('googleplaystore.csv')

    

Next we are going to define a function `explore_data` to view the headers, rows content, numbers of columns and rows for each data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
data_google = explore_data(googleplaystore_dataset, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


Looking at the Google data set columns, we can identify some which could potentially be helpful to look closer at: Category, Genre, Price, Reviews.  

In [4]:
data_apple = explore_data(applestore_dataset, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16


Looking at the Apple data set, columns like track_name, price, rating_count_tot, prime genre could be useful to answer to our question. 


# Data Cleaning

### 1. Deleting missing data

Discussions on Google Play data set suggest row 10472 has missing or erroneus data. 
Let's compare the row against the header and if inconsistent, delete it. 

In [5]:
print(googleplaystore_dataset[0])
print('\n')
print(googleplaystore_dataset[10473])

del googleplaystore_dataset[10473]


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Row 10472 proved to be correct, but row 10473 seemed to have missing value for 'Category' column. 

### 2. Identifying Duplicates



In [6]:
google_unique_apps_list = []
google_duplicates_apps_list = []

for app in googleplaystore_dataset:
    name = app[0]
    if name in google_unique_apps_list:
        google_duplicates_apps_list.append(name)
    else:
        google_unique_apps_list.append(name)
print("Number of duplicates apps in Google Play data set:", len(google_duplicates_apps_list))
print('\n')
print("Examples of duplicates apps in Google Play data set:", google_duplicates_apps_list[0:10])

Number of duplicates apps in Google Play data set: 1181


Examples of duplicates apps in Google Play data set: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [7]:
apple_unique_apps_list = []
apple_duplicates_apps_list = []

for app in applestore_dataset:
    name = app[0]
    if name in apple_unique_apps_list:
        apple_duplicates_apps_list.append(name)
    else:
        apple_unique_apps_list.append(name)
print("Number of duplicates apps in Apple data set:", len(apple_duplicates_apps_list))
print('\n')
print("Examples of duplicates apps in Apple data set:", apple_duplicates_apps_list[0:10])

Number of duplicates apps in Apple data set: 0


Examples of duplicates apps in Apple data set: []


Google Play data set has 1181 duplicate entries while Apple data set has 0. 


In [8]:
# check how a duplicate app looks like

for app in googleplaystore_dataset:
    name = app[0]
    if name == "Slack":
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


Looking closer at our duplicates, it seems that some of them have different rating numbers. We are going to keep the app version that has the highest number of ratings. We assume these include the most recent numbers. 

### 3. Deleting Duplicates

Next we are going to remove duplicates from Google Play data set. 

To do that, we will:
- create a dictionary `reviews_max` that includse app names as keys and the value is going to be the highest number of ratings



In [9]:
reviews_max = {}

for app in googleplaystore_dataset[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Initial number of rows:', len(googleplaystore_dataset))        
print('Number of rows after removing duplicates:',len(reviews_max))




Initial number of rows: 10841
Number of rows after removing duplicates: 9659


Next lines of code are going to create a dataset without duplicates by:
- creating two empty list `android clean` and `already_added`
- looping through googleplaystore_dataset and append android_clean if the number of reviews of that app matches the number from `reviews_max` dictionary. 
- as we loop through, we append list `already added` to make sure we remove duplicates where the number of reviews is the same for all the entries. 



In [10]:
#this list will store our new clean data set (the above dictionary)
android_clean = []
#this list will store just app names
already_added = []

for app in googleplaystore_dataset[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
print('Number of rows without duplicates:', len(android_clean))

Number of rows without duplicates: 9659


### 4. Remove Non-English Apps

We want to develop an app that is directed toward an English speaking audience. Thus, we want to remove any apps with names which include symbols not used in English text.

Following lines of code define a function `is_english` that is going to loop through the input and assess weather the characters of the string have an assigned number in ASCII system greater than 127.

In [11]:
#define is_english function to identify non-english characters

def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

#test our function
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english("爱奇艺PPS -《欢乐颂2》电视剧热播"))

True
True
False


Next, we are going to apply the above function to both data sets to remove any apps which include non-English characters. 

In [12]:
english_apps_google = []
english_apps_apple = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        english_apps_google.append(app)
        
for app in applestore_dataset:
    name = app[1]
    if is_english(name):
        english_apps_apple.append(app)
        
print(len(english_apps_apple))
len(english_apps_google)

explore_data(english_apps_apple, 0, 3, True)
print('\n')
explore_data(english_apps_google, 0, 3, True)


6184
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6184
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.

We are left with 9614 Android apps and 6184 iOS apps.

# Isolating free apps

We want to build a free app, so we want to keep only representative data for our project. Thus, we will keep free apps only in our data sets. 


In [13]:
# remove non English apps

free_apps_google = []
free_apps_apple = []

for app in english_apps_google:
    name = app[0]
    price = app[6]
    
    if price == "Free":
        free_apps_google.append(app)


for app in english_apps_apple:
    name = app[1]
    price = app[4]
    
    if price == "0.0":
        free_apps_apple.append(app)

print('iOS, number of rows after isolating free apps:', len(free_apps_apple))

print('Android, number of rows after isolating free apps:', len(free_apps_google))

iOS, number of rows after isolating free apps: 3222
Android, number of rows after isolating free apps: 8863


We now have with 8863 Android apps and 3222 iOS apps


# Data Analysis

### 1. Identifying Most Common Apps by Genre

As our goal is to determine what kind of apps are more likely to attract users, we want to determine what app profiles are more popular on both Android and iOS markets. 

We'll build two functions to analyze the frequency tables:
- `freq_tables` to generate frequency tables that show percentages
- `display_table` to display the percentages in a descending order

In [14]:
# define function freq_table to generate frequency tables with percentages 

def freq_table(dataset, index): 
    freq_apps = {}
    total = 0 
    for app in dataset:
        total +=1
        column = app[index]
        if column in freq_apps:
            freq_apps[column] += 1 
        else:
            freq_apps[column] = 1
        percentages = {} 
        for app in freq_apps: 
            percentage = (freq_apps[app]/total) * 100
            percentages[app] = percentage
    return percentages  


# arange in descending order the percentages generated by the function above
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])



Let's start with column 'prime_genre' from iOS data set

In [15]:
#display percentages for prime_genre column        
display_table(free_apps_apple, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


More than a half apps from App Store data set seem to be games app, followed by entertainment and reacreational apps.  

Android data set seem to have two columns which are relevent for the app genre: Genres and Category. Let's look into both.

In [16]:
#display percentages for Genres column
display_table(free_apps_google, 9 )

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

In [17]:
#display percentages for Category column
print('\n')
display_table(free_apps_google, 1 )



FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN :

Android data set seems to have a more diverse range of common genres as compared to Apple. Moving forward, we will use column Category as we are rather more interested in the bigger picture.

### 2. Identify Most Popular Apps by Genre

We are going to look at number of installs for Google Play data and at rating_count_tot for App Store data to determine the most popular apps. 




Next we are going to calculate the average number of user ratings per app genre on the App Store:

In [18]:
unique_genres = freq_table(free_apps_apple, -5)

for genre in unique_genres:
    total = 0
    len_genre = 0
    
    for app in free_apps_apple:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Navigation seems to be the most popular genre in App Store data set. Let's have a closer look into top 5 genres. 

In [19]:
for app in free_apps_apple:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


It seems that Waze and Google Maps are outliers and influence the average of number of ratings for this genre. 

In [20]:
for app in free_apps_apple:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

We can notice a similar pattern also for Social Networking column. The average number of user ratings is heavily influenced by outliers like Facebook, Pinterest, Skype, Messenger.

In [21]:
for app in free_apps_apple:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [22]:
for app in free_apps_apple:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [23]:
for app in free_apps_apple:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

Reference and Weather columns seem to also have outliers but it looks like there is not a huge number of apps offering this kind of service. Let's keep these two in mind and see if there is a common pattern also in the Google play data set.

Let's look into Google Play data set to see what is the average number of reviews. 

In [24]:
unique_google = freq_table(free_apps_google, 1)

for category in unique_google:
    total = 0 
    len_category = 0
    
    # remove non numerical characters from number of installs 
    
    for app in free_apps_google:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            n_install = installs.replace('+', '')
            new_install = n_install.replace(',', '')
            new_install = float(new_install)
            total += new_install
            len_category += 1 
        
    avg_install = total/len_category
    print(category, ':', avg_install)
            

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication seem to have the highest number of installs. Let's have a closer look and see if there are any outliers.

In [25]:
display_table(free_apps_google, 5)

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


In [26]:
# display apps with very high numbers of installs 

for app in free_apps_google:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

As expected, apps like Whatsapp, Skype, Messenger, Google Chrome and another few apps influence the average. Let's remove these and check again the average of number of reviews

In [27]:
# remove outliers from numbers of installs

remove_outliers = []

for app in free_apps_google:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        remove_outliers.append(float(n_installs))
        
sum(remove_outliers) / len(remove_outliers)


3603485.3884615386

We can see that the average has significantly reduced from 38456119 to 3603485. 

We can see that the reference genre is popular also on android, so we will stick with books and reference column. Let's have a closer look. 

In [28]:
# print apps inside BOOKS_AND_REFERENCE

for app in free_apps_google:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

Let's repeat the above process to check for outliers.

In [29]:
for app in free_apps_google:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


It looks like the list of very popular apps providing this service is not a a huge one. Let look into the rest of the apps with lower numbers of installs. 

In [30]:
for app in free_apps_google:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

It looks like dictionaries are very popular. Also there are a few Al-Quran apps and a few other niches. There is only one app for favourite books. Maybe this could be a niche our client can explore. Building an app which offers a forum to readers to discuss about books, review, add favourite books and share ideas with other people. 

# Conclusion

We analysed mobile apps data for two markets: App Store and Google Play. The purpose of the analysis was to come up with a recommendation of what app profile would be probitable to develop for both markets. 

The analysis concluded that offering an app which would give readers a forum to share ideas, reviews and exchange favourite books lists could be profitable for both markets.