# Data Analysis for a Mobile App Company

This analysis is for a company that builds Android and IOS apps and publishes them on Google Play Store & the App Store but only for English Speaking Audiences. The company makes revenue through in app adds since the app themselves are generally free. As such the more the app downloads and ad engagement, the more the revenue.

We intend to do an analysis using available datasets to help developers build the right kind of mobile apps to maximise downloads and ads engagement

As of sept 2018, there were approximately 2 million apps available of App Store and 2.1 million apps on Google Play Store.

I use these datasets from https://dq-content.s3.amazonaws.com/350/googleplaystore.csv and https://dq-content.s3.amazonaws.com/350/AppleStore.csv which have the relevant informatio to form a basis of this analysis

In [347]:
from csv import reader

ios_open = open(r'AppleStore.csv', encoding='utf8')
ios_read = reader(ios_open)
ios = list(ios_read)


android_open = open(r'googleplaystore.csv', encoding='utf8')
android_read = reader(android_open)
android = list(android_read)

The function below helps quickly explore the data

In [348]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
            print('Number of rows:', len(dataset))
            print('Number of columns', len(dataset[0]))

The first few rows and number of columns and rows for each dataset

In [349]:
explore_data(android, 0, 5, rows_and_columns=True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns 13


In [350]:
explore_data(ios, 0, 5, rows_and_columns=True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns 16


Given that we only build apps that are free and only for English Speaking audiences, there are a few checks to run. But first i check whether there are rows that miss some columns thus causing a column shift. The header rows for both datasets will be the reference, so that if any of the rows in the datasets has columns more or less than the respective dataset header rows then they need to be fixed.

In [351]:
def check_num_of_cols(data_set):
    header_length = len(data_set[0])
    
    for row in data_set:
        row_length = len(row)
        if row_length != header_length:
            print(row)
            print(data_set.index(row))
            
check_num_of_cols(ios)


In [352]:
check_num_of_cols(android)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10473


from the above check the ios data has no rows that miss a column however the android data has one column with 12 columns instead of 13. We can choose to delete this datapoint or insert a null value. Since its a single datapoint out of 10842, it is unlikely that it has much significance plus we may have to remove null values eventually

In [353]:
del android[10473]

To clean the data further we check apps that have duplicate entries since we dont want to count a single app more than once when we analyze data

In [354]:
unique_android = []
duplicate_android = []

for app in android:
    app_name = app[0]
    if app_name in unique_android:
        duplicate_android.append(app_name)
    else:
        unique_android.append(app_name)
print(len(duplicate_android))
print(len(unique_android))
print(duplicate_android[:4])#examples of duplicate apps

1181
9660
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings']


In [355]:
unique_ios = []
duplicate_ios = []

for app in ios:
    app_name = app[0]
    if app_name in unique_ios:
        duplicate_ios.append(app_name)
    else:
        unique_ios.append(app_name)
print(len(duplicate_ios))

0


Android data has 1181 duplicated apps. We examine these to observe if there are any interesting patterns

In [356]:
for app in android:
    name = app[0]
    if name == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


In [357]:
for app in android:
    name = app[0]
    if name == 'Box':
        print(app)

['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [358]:
for app in android:
    name = app[0]
    if name == 'Google My Business':
        print(app)

['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up']
['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up']
['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up']


In [359]:

for app in android:
    name = app[0]
    if name == 'ZOOM Cloud Meetings':
        print(app)

['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']
['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']


Instead of doing a random deletion, we will keep the rows with the highest reviews and discard the rest of the duplicate rows since its indicative that the highest rating might be the most recent one


In [360]:
len(android)

10841

In [361]:
max_reviews = {}
for app in android:
    app_name = app[0]
    app_review = app[3]
    if app_name in max_reviews and max_reviews[app_name] < app_review:
        max_reviews[app_name] = app_review
    elif app_name not in max_reviews:
        max_reviews[app_name] = app_review
        
print(len(android) - 1181)
print(len(max_reviews))
      

9660
9660


Below I iterate through the android dataset and for each iteration i add to the android_clean list the app and to the already_added list the app_name if and only if the app's reviews for that particular app match the app's reviews as in my initial max_reviews dictionaries

In [362]:
android_clean = []
already_added = []
for app in android:
    app_name = app[0]
    app_review = app[3]
    if app_review == max_reviews[app_name] and app_name not in already_added:
        android_clean.append(app)
        already_added.append(app_name)
        
len(android_clean)      

9660

Now with the duplicate entries for apps gone, I remove all apps whose names suggest they arent target to English speaking audiences, I do this decoding the characters using the ASCII standards, all english characters according to ASCII range upto 127, so if a characters ASCII code is greater then 127 then it means its not an English character

In [363]:
def check_if_english(string):
    for character in string:  
        if ord(character) > 127:
            return False
    return True
        
    
print(check_if_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_if_english('Instachat 😜'))
print(check_if_english('Docs To Go™ Free Office Suite'))
print(ord('😜'))
print(ord('™'))

False
False
False
128540
8482


I see characters like the emoji will give me false results hence more modification is needed. I decide to only remove an app if the name has more than 3 characters outside of the 127 range of ASCII for English.

In [364]:
def check_if_english(string):
    char_more = []
    for char in string:
        if ord(char) > 127:
            char_more.append(char)
    if len(char_more) > 3:
        return False
    else:
        return True
print(check_if_english('Instachat😜'))
print(check_if_english('Docs To Go™ Free Office Suite'))
print(check_if_english('Instachat 😜😜😜😜'))
print(check_if_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False
False


In [365]:
android_english = []
for app in android_clean:
    app_name = app[0]
    if check_if_english(app_name):
        android_english.append(app)
        
        
ios_english = []
for app in ios:
    app_name = app[1]
    if check_if_english(app_name):
        ios_english.append(app)
        
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9615
Number of columns 13


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'US

A number of apps with non-english names both for android and ios are now removed. Next I isolate the free apps since we only deal with these

In [366]:
android = android_english
ios = ios_english


android_free = []
for app in android[1:]:
    price = app[7]
    if price == '0':
        android_free.append(app)
        
ios_free = []
for app in ios[1:]:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)
        
print(len(android_free))
print(len(ios_free))

8862
3222


In [367]:
android = android_free
ios = ios_free

Coming back to the objective, the aim is to determine the kind of apps to build since the revenue depends on the number of users of the apps build. The company's strategy is to build apps publish apps on Play Store if they show good results, improve them and build their ios versions for app store. for this reaseon its prudent to investigate app profiles that are successful for both markets. I further investigate the datasets to find out the columns that will exhibit the best profiles for frequency tables.

In [368]:
print(android_english[0])
explore_data(android, 0, 3)
print('\n')
print(ios_english[0])
explore_data(ios, 0, 3)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']




['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'Facebook', '389879808', 'USD', '0.0', '29746

Below i build frequency tables using the 'prime genre' column for the ios dataset & 'genre' and 'category' for the android dataset

In [369]:
def freq_table(data_set, index):
    table = {}
    total = 0
    for row in data_set:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1    
        else:
            table[value] = 1
    table_percentages = {}        
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages
        
            
def display_table(data_set, index):
    table = freq_table(data_set, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])    
        
    

In [370]:
display_table(ios, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [371]:
display_table(android, 9)

Tools : 8.440532611148726
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.5926427443015125
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5206499661475967
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.238546603475513
Action : 3.1031369893929135
Health & Fitness : 3.080568720379147
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8505980591288649
Video Players & Editors : 1.7716091175806816
Casual : 1.7490408485669149
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto & Vehicles : 0.92529902956443

In [372]:
display_table(android, 1)

FAMILY : 18.934777702550214
GAME : 9.693071541412774
TOOLS : 8.451816745655607
BUSINESS : 4.5926427443015125
LIFESTYLE : 3.9043105393816293
PRODUCTIVITY : 3.8930264048747465
FINANCE : 3.7011961182577298
MEDICAL : 3.5206499661475967
SPORTS : 3.39652448657188
PERSONALIZATION : 3.3175355450236967
COMMUNICATION : 3.238546603475513
HEALTH_AND_FITNESS : 3.080568720379147
PHOTOGRAPHY : 2.945159106296547
NEWS_AND_MAGAZINES : 2.798465357707064
SOCIAL : 2.663055743624464
TRAVEL_AND_LOCAL : 2.335815842924848
SHOPPING : 2.2455427668697814
BOOKS_AND_REFERENCE : 2.143985556307831
DATING : 1.8618821936357481
VIDEO_PLAYERS : 1.7941773865944481
MAPS_AND_NAVIGATION : 1.399232678853532
FOOD_AND_DRINK : 1.2412547957571656
EDUCATION : 1.1735499887158656
ENTERTAINMENT : 0.9591514330850823
LIBRARIES_AND_DEMO : 0.9365831640713158
AUTO_AND_VEHICLES : 0.9252990295644324
HOUSE_AND_HOME : 0.8237418190024826
WEATHER : 0.8011735499887158
EVENTS : 0.7109004739336493
PARENTING : 0.6544798013992327
ART_AND_DESIGN : 0.

## IOS/App Store

As shown by the frequency table for ios on 'prime genre' column, the most common genre is the games followed by entertainment. Most of the apps on App store seem to be more tailored for entertainment (games, social media, sports, music) purposes as opposed to more practical ones (Education, Productivity, shopping etc)

While there's a good indication that apps in the entertainment genres are more popular, it is not necessarily true to say that these have a large number of user, could be most people in the world enjoying building these kind of apps or they are perhaps generally easier to build. A good extension for this study would be to investigate the installations and app usage for these to indeed confirm this kind of hypothesis

## Android/Google Play Store

Looking at the frequency tables for Category and genre columns on the android data set the most popular apps are in family, games, entertainment with a few practical apps(business, education, tools, medical) following closely. This trend is unique compared to the results from app store where more practical apps are less common
This a good observation that is worth further investigation.

## Apps With most Users

To find out the apps with the most users per app genre we can use the installs column for Google play store and rating_count_tot column for App store.

1. average number of user ratings per app genre on the App Store

In [373]:
ios_genres = freq_table(ios, 11)

for genre in ios_genres:
    total = 0
    len_genre = 0
    for app in ios:
        app_genre = app[11]
        if app_genre == genre:
            rating = float(app[5])
            total += rating
            len_genre += 1
    avg_rating = total / len_genre
    print(genre, ':', avg_rating)
    



Finance : 31467.944444444445
Sports : 23008.898550724636
Education : 7003.983050847458
Social Networking : 71548.34905660378
Shopping : 26919.690476190477
Reference : 74942.11111111111
Weather : 52279.892857142855
Photo & Video : 28441.54375
Medical : 612.0
Music : 57326.530303030304
Entertainment : 14029.830708661417
Book : 39758.5
Catalogs : 4004.0
Games : 22788.6696905016
Travel : 28243.8
Lifestyle : 16485.764705882353
Food & Drink : 33333.92307692308
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Business : 7491.117647058823
News : 21248.023255813954
Navigation : 86090.33333333333
Health & Fitness : 23298.015384615384


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together.

This may not tell the whole picture. A genre might be influenced by very few apps in it that have hundred of thousands of user rations. I invesigate navigation.



Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


As seen above Waze - GPS Navigation, Maps & Real-time Traffic : 345046 and Google Maps - Navigation & Transit : 154911 both skew the frequency.

Some kind of apps are out of scope for the company since they need some domain practicality that the company doesnt engage in e.g Finance, food and drinks etc.
However looking and Music which has quite an average rating and sports (these have readily available data- there are Open APIs that the Dev team can use), they company can build an app that combines some aspects of these features for example.

In [375]:
for app in android:
    installs = app[5]
    installs = installs.replace('+', '')
    installs = installs.replace(',', '')
    app[5] = installs

In [376]:
android_genres = freq_table(android, 1)
for category in android_genres:
    total = 0
    category_len = 0
    for app in android:
        app_cat = app[1]
        if app_cat == category:
            installs = float(app[5])
            total += installs
            category_len += 1
    avg_installs = total / category_len
    print(category, ':', avg_installs)

ENTERTAINMENT : 11640705.88235294
FAMILY : 3694276.334922527
PERSONALIZATION : 5201482.6122448975
FINANCE : 1387692.475609756
SOCIAL : 23253652.127118643
AUTO_AND_VEHICLES : 647317.8170731707
BOOKS_AND_REFERENCE : 8767811.894736841
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
PHOTOGRAPHY : 17805627.643678162
SPORTS : 3638640.1428571427
BUSINESS : 1712290.1474201474
EDUCATION : 1820673.076923077
MAPS_AND_NAVIGATION : 4056941.7741935486
BEAUTY : 513151.88679245283
LIFESTYLE : 1437816.2687861272
VIDEO_PLAYERS : 24727872.452830188
LIBRARIES_AND_DEMO : 638503.734939759
EVENTS : 253542.22222222222
TOOLS : 10682301.033377837
COMICS : 817657.2727272727
GAME : 15560965.599534342
HOUSE_AND_HOME : 1331540.5616438356
PRODUCTIVITY : 16787331.344927534
ART_AND_DESIGN : 1986335.0877192982
NEWS_AND_MAGAZINES : 9549178.467741935
DATING : 854028.8303030303
COMMUNICATION : 38456119.167247385
MEDICAL : 120616.48717948717
SHOPPING : 7036877.311557789
TRAVEL_AND_LOCAL : 139840

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs

In [377]:
for app in android:
    if app[1] == 'COMMUNICATION' and (app[5] == '1000000000'
                                      or app[5] == '500000000'
                                      or app[5] == '100000000'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1000000000
imo beta free calls and text : 100000000
Android Messages : 100000000
Google Duo - High Quality Video Calls : 500000000
Messenger – Text and Video Chat for Free : 1000000000
imo free video calls and chat : 500000000
Skype - free IM & video calls : 1000000000
Who : 100000000
GO SMS Pro - Messenger, Free Themes, Emoji : 100000000
LINE: Free Calls & Messages : 500000000
Google Chrome: Fast & Secure : 1000000000
Firefox Browser fast & private : 100000000
UC Browser - Fast Download Private & Secure : 500000000
Gmail : 1000000000
Hangouts : 1000000000
Messenger Lite: Free Calls & Messages : 100000000
Kik : 100000000
KakaoTalk: Free Calls & Text : 100000000
Opera Mini - fast web browser : 100000000
Opera Browser: Fast and Secure : 100000000
Telegram : 100000000
Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000
UC Browser Mini -Tiny Fast Private & Secure : 100000000
Viber Messenger : 500000000
WeChat : 100000000
Yahoo Mail – Stay Organized : 10000000

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs

In [378]:
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50000
Download free book with green book : 100000
Wikipedia : 10000000
Cool Reader : 10000000
Free Panda Radio Music : 100000
Book store : 1000000
FBReader: Favorite Book Reader : 10000000
English Grammar Complete Handbook : 500000
Free Books - Spirit Fanfiction and Stories : 1000000
Google Play Books : 1000000000
AlReader -any text book reader : 5000000
Offline English Dictionary : 100000
Offline: English to Tagalog Dictionary : 500000
FamilySearch Tree : 1000000
Cloud of Books : 1000000
Recipes of Prophetic Medicine for free : 500000
ReadEra – free ebook reader : 1000000
Anonymous caller detection : 10000
Ebook Reader : 5000000
Litnet - E-books : 100000
Read books online : 5000000
English to Urdu Dictionary : 500000
eBoox: book reader fb2 epub zip : 1000000
English Persian Dictionary : 500000
Flybook : 500000
All Maths Formulas : 1000000
Ancestry : 5000000
HTC Help : 10000000
English translation from Bengali : 100000
Pdf Book Download - Read Pdf Boo

In [379]:
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1000000'
                                            or app[5] == '5000000'
                                            or app[5] == '10000000'
                                            or app[5] == '50000000'):
        print(app[0], ':', app[5])

Wikipedia : 10000000
Cool Reader : 10000000
Book store : 1000000
FBReader: Favorite Book Reader : 10000000
Free Books - Spirit Fanfiction and Stories : 1000000
AlReader -any text book reader : 5000000
FamilySearch Tree : 1000000
Cloud of Books : 1000000
ReadEra – free ebook reader : 1000000
Ebook Reader : 5000000
Read books online : 5000000
eBoox: book reader fb2 epub zip : 1000000
All Maths Formulas : 1000000
Ancestry : 5000000
HTC Help : 10000000
Moon+ Reader : 10000000
English-Myanmar Dictionary : 1000000
Golden Dictionary (EN-AR) : 1000000
All Language Translator Free : 1000000
Aldiko Book Reader : 10000000
Dictionary - WordWeb : 5000000
50000 Free eBooks & Free AudioBooks : 5000000
Al-Quran (Free) : 10000000
Al Quran Indonesia : 10000000
Al'Quran Bahasa Indonesia : 10000000
Al Quran Al karim : 1000000
Al Quran : EAlim - Translations & MP3 Offline : 5000000
Koran Read &MP3 30 Juz Offline : 1000000
Hafizi Quran 15 lines per page : 1000000
Quran for Android : 10000000
Satellite AR : 

There are apps  that seem to skew the average installations.However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads)


This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.