# An Analysis of Android and iOS Mobile apps 

The main aim of the project is to find mobile app profiles that are profitable for the App Store and Google 
play markets. The analysis is to enable the making of data driven decisions with respect to the kind of applications an application creator would build.

Their applications are largely free to to download and install with their main revenue streams being in app ads.
This means that the revenue for any given application is mostly influenced by the number of users that use an 
application. The goal of the poject is to analyse data to help developers understand what kinds of appls are 
likely to attract more users.

## Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:

- [A data set](https://www.kaggle.com/lava18/google-play-store-apps)containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from [this link](https://dqcontent.s3.amazonaws.com/350/googleplaystore.csv).

- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from [this link](https://dqcontent.s3.amazonaws.com/350/AppleStore.csv).

Let's start by opening the two data sets and then continue with exploring the data."


In [12]:
from csv import reader

opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]


In [16]:
### explore data function used to repeatedly print rows in a readable way

def extract_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:' , len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(android_header)
print('\n')
extract_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [18]:
### getting ios data

print(ios_header)
print('\n')
extract_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [20]:
print(android[10472])  # incorrect row
print('\n')
print(android_header)  # header
print('\n')
print(android[0])      # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


In [23]:
### deleting row

print(len(android))
del android[10472]
print(len(android))

10840
10839


In [29]:
### removing duplicate appsbased on the name of the applications

duplicate_apps = [] ## storing duplicate applications
unique_apps = [] ## storing unique applications

for apps in android:
    name = apps[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1180


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [37]:
### building dictionary with all unique apps

reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Actual Length:', len(reviews_max))        

Actual Length: 9659


In [44]:
### initialising empty lists

android_clean = []

already_added = []

for app in android: ### loop through the android data set for every iteration
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) 
        
### explore data_set        
extract_data(android_clean, 0, 3, True)        

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


# Removing Non-English Apps

Removal of applications that are not Enlgish based. We use the ord() function to eliminate characters of a corresponding numerical value higher than the maximum 127 in the ASCII

In [49]:

def non_english(words):
    for i in words:
        if ord(i) > 127:
            return False
    return True

print(non_english('Instagram'))
print(non_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(non_english('Docs To Go™ Free Office Suite'))
print(non_english('Instachat 😜'))

True
False
False
False


In [52]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if non_english(name):
        android_english.append(app)

for app in ios:
    name = app[1]
    if non_english(name):
        ios_english.append(app)

extract_data(android_english, 0, 3, True)
print('\n')
extract_data(ios_english, 0, 3, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9117
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '3

# Isolation of the Free Applications

In [58]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
    
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8408
2922


# Most Common Apps By Genre

Determining the kinds of apps that are more attractive to users by finding out the most common genres in both markets.

Building Frequency Tables for the prime_genre column of the App_Store dataset and Genre and Category columns of the Google Play dataset

Tasks:
- Generate Frequency tables that show percentages
- Create function used to display the percentages in descending order

In [71]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
      
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

# Examine the frequency table for prime_genre column of the App Store dataset

In [72]:
display_table(ios_final, -5)

Games : 59.171800136892536
Entertainment : 7.529089664613278
Photo & Video : 5.133470225872689
Education : 3.8329911019849416
Social Networking : 3.1143052703627654
Shopping : 2.4982888432580426
Utilities : 2.2587268993839835
Music : 2.1560574948665296
Sports : 2.0533880903490758
Health & Fitness : 1.9849418206707734
Productivity : 1.7111567419575633
Lifestyle : 1.4715947980835045
News : 1.3347022587268993
Travel : 1.1293634496919918
Finance : 1.0951403148528405
Weather : 0.8898015058179329
Food & Drink : 0.8898015058179329
Reference : 0.5133470225872689
Business : 0.5133470225872689
Book : 0.2737850787132101
Medical : 0.20533880903490762
Navigation : 0.13689253935660506
Catalogs : 0.10266940451745381


59% of the applications are games, 7.5% are Entertainment focused, 5% are photo and video applications with education and social networking having 3.8% and 3.1% respectively.

Most of the free English applications in the App Store are dominated by applications designed for games, entertainment, photo and video, social networking and shopping.

However this is indicative only of the supply of applications across genres and not of their demand by users across the genres, in the App Store.

In [78]:
display_table(android_final, 1)

FAMILY : 18.803520456707897
GAME : 9.60989533777355
TOOLS : 8.575166508087536
BUSINESS : 4.709800190294957
PRODUCTIVITY : 3.9724072312083734
LIFESTYLE : 3.8891531874405327
FINANCE : 3.73453853472883
MEDICAL : 3.6393910561370126
PERSONALIZATION : 3.306374881065652
SPORTS : 3.258801141769743
COMMUNICATION : 3.2231208372978117
HEALTH_AND_FITNESS : 3.1279733587059946
PHOTOGRAPHY : 3.0090390104662226
NEWS_AND_MAGAZINES : 2.7949571836346334
SOCIAL : 2.664129400570885
TRAVEL_AND_LOCAL : 2.3073263558515698
SHOPPING : 2.247859181731684
BOOKS_AND_REFERENCE : 2.1883920076117986
DATING : 1.8315889628924835
VIDEO_PLAYERS : 1.7602283539486203
MAPS_AND_NAVIGATION : 1.3558515699333968
FOOD_AND_DRINK : 1.2012369172216937
EDUCATION : 1.165556612749762
ENTERTAINMENT : 0.939581351094196
AUTO_AND_VEHICLES : 0.939581351094196
LIBRARIES_AND_DEMO : 0.9039010466222646
HOUSE_AND_HOME : 0.8087535680304472
WEATHER : 0.7968601332064701
EVENTS : 0.7136060894386299
ART_AND_DESIGN : 0.6660323501427212
PARENTING : 0.6

The genre column of the Android application shows that most applications are designed for family, games, tools and business.

In [77]:
display_table(android_final, -4)

Tools : 8.563273073263558
Entertainment : 6.089438629876309
Education : 5.387725975261656
Business : 4.709800190294957
Productivity : 3.9724072312083734
Lifestyle : 3.8772597526165553
Finance : 3.73453853472883
Medical : 3.6393910561370126
Sports : 3.3301617507136063
Personalization : 3.306374881065652
Communication : 3.2231208372978117
Health & Fitness : 3.1279733587059946
Action : 3.116079923882017
Photography : 3.0090390104662226
News & Magazines : 2.7949571836346334
Social : 2.664129400570885
Travel & Local : 2.3073263558515698
Shopping : 2.247859181731684
Books & Reference : 2.1883920076117986
Simulation : 2.0813510941960036
Dating : 1.8315889628924835
Arcade : 1.8315889628924835
Casual : 1.7721217887725977
Video Players & Editors : 1.736441484300666
Maps & Navigation : 1.3558515699333968
Food & Drink : 1.2012369172216937
Puzzle : 1.1298763082778307
Racing : 1.0228353948620361
Role Playing : 0.939581351094196
Auto & Vehicles : 0.939581351094196
Strategy : 0.9039010466222646
Librar

The Genres column is more detailed and nuanced but shows that applications serving as tools are the most popular

# Most Popular Apps by Genre on App Store

This can be discovered by calculating the average number of installs for each app genre. In the google play data set, the figures are found in the Installs column. However in the App Store data, the data is missing but we can use user ratings to determine popularity.

In [80]:
genre_ios = freq_table(ios_final, -5)

for genre in genre_ios:
    total = 0 # total number of ratings
    len_genre = 0 # total number of applications
    for app in ios_final:
        genre_app = app[-5]  # finding the genre of the applications
        if genre_app == genre:
            n_ratings =float(app[5]) # getting the rating of application
            total += n_ratings # add rating to total ratings
            len_genre += 1 # add new app to the number of genres
    avg_n_ratings = total / len_genre
    print(genre, ":", avg_n_ratings)
    

Social Networking : 78567.30769230769
Photo & Video : 29249.766666666666
Games : 21560.75072296125
Music : 55396.01587301587
Reference : 89562.6
Health & Fitness : 19418.620689655174
Weather : 48275.57692307692
Travel : 34115.57575757576
Shopping : 28877.575342465752
News : 23382.17948717949
Navigation : 125037.25
Lifestyle : 17260.53488372093
Entertainment : 15006.227272727272
Food & Drink : 33333.92307692308
Sports : 25791.666666666668
Finance : 26038.6875
Education : 6103.464285714285
Productivity : 22842.22
Utilities : 11571.69696969697
Book : 16671.0
Business : 6839.6
Catalogs : 5195.0
Medical : 612.0


Navigation have the highest number of reviews. 

In [81]:
for app in ios_final:
    if app[-5] == 'Navigation':
        # print app and number of ratings
        print(app[1], ':', app[5]) 

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The ratings associated with navigation apps are divided amongst 4 different applications with majority of the ratings attributed to Waze (4.5k) and Google Maps(15.49k)

In [87]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        # print social networking apps and number of ratings
        print(app[1], ':', app[5]) 
        
        
for app in ios_final:
    if app[-5] == 'Music':
        print('\n')
        # print music apps and number of ratings
        print(app[1], ':', app[5])         

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
TextNow - Unlimited Text + Calls : 164963
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 23965
SimSimi : 23530
Grindr - Gay and same sex guys chat, meet and date : 23201
Wishbone - Compare Anything : 20649
imo 

The number of ratings distributed across genres such as: music, social networking and navigation are all attributed to a small pool of applications in the genres which account for a vast proportion of the ratings.

In [89]:
for app in ios_final:
    if app[-5] == 'Reference':
        # print REFRENCE apps and the rating
        print(app[1], ":", app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

- Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

- Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

- Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

# Most Popular Apps by Genre on Google Play

Getting a clear picture about genre popularity of applications in the Google Play Store.

In [92]:
display_table(android_final, 5) # number of installs

1,000,000+ : 15.592293054234062
100,000+ : 11.596098953377735
10,000+ : 10.442435775451951
10,000,000+ : 10.323501427212179
1,000+ : 8.480019029495718
100+ : 7.088487155090391
5,000,000+ : 6.660323501427213
500,000+ : 5.5542340627973354
50,000+ : 4.7216936251189345
5,000+ : 4.5313986679353
10+ : 3.5442435775451955
500+ : 3.246907706945766
50,000,000+ : 2.2121788772597526
100,000,000+ : 2.1289248334919124
50+ : 1.9743101807802093
5+ : 0.8206470028544244
1+ : 0.5114176974310181
500,000,000+ : 0.285442435775452
1,000,000,000+ : 0.22597526165556614
0+ : 0.04757373929590866
0 : 0.011893434823977166


There is a lack of precision with regards to the exact number of installs among applications.

The number of installs will be converted to float characters. to perform computations more effectively.

In [94]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total/ len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1932519.642857143
AUTO_AND_VEHICLES : 645317.2278481013
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8504745.97826087
BUSINESS : 1602958.308080808
COMICS : 880440.625
COMMUNICATION : 36106662.328413285
DATING : 764959.4610389611
EDUCATION : 1844897.9591836734
ENTERTAINMENT : 12346329.11392405
EVENTS : 232885.83333333334
FINANCE : 1348224.9426751593
FOOD_AND_DRINK : 1974937.1386138613
HEALTH_AND_FITNESS : 4263642.1749049425
HOUSE_AND_HOME : 1391211.1911764706
LIBRARIES_AND_DEMO : 674917.2368421053
LIFESTYLE : 1375297.3058103975
GAME : 15434835.816831684
FAMILY : 3633707.342820999
MEDICAL : 119216.81045751635
SOCIAL : 24441088.17857143
SHOPPING : 7307823.2010582015
PHOTOGRAPHY : 18099283.85375494
SPORTS : 3647640.208029197
TRAVEL_AND_LOCAL : 14487541.68041237
TOOLS : 11084333.292649098
PERSONALIZATION : 5027006.791366907
PRODUCTIVITY : 16972497.946107786
PARENTING : 544745.6363636364
WEATHER : 5219216.7164179105
VIDEO_PLAYERS : 25234606.216216218
NEWS_AND_MAGAZINES 

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs.

In [107]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], ":", app[5])    

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Messenger : 500,000,000+
WeChat : 100,000,000+
BBM - Free Call

Removal of communication apps with over 100m+ installations, the average installations drops by as much as ten times.

In [109]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs)<1000000000):
        under_100_m.append(float(n_installs))
sum(under_100_m) / len(under_100_m)        

17988366.507518798

In [110]:
# exploring data on apps in the books and reference category

for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
English translation from Bengali : 100,000

In [112]:

for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+'
    or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Audiobooks from Audible : 100,000,000+


The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average.

In [113]:
for app in android_final:    
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
    or app[5] == '5,000,000+' or app[5] == '10,000,000+' or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
Hafizi Quran 15 lines per page : 1,000,000

Apps in ths category ranging between 1m and 100m in installations are largely restricted to processing and reading ebooks as well as collections of libraries and dictionaries. Few apps are built around the Quran indicating profitability in bulding an app around a popular book. Additional features other than the original content of the book would be a good way to add more value offered by the aplication and increase its utility among potential users. Features such as quizes, daily quotes and audio snippets might prove to be valuabale additions to the application.

# Conclusion

The project was centered around analysing data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

The conclusion was that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.