# An analysis on profitable apps on the Google Play Store and iOS AppStore
---
Written by Neil Mackenzie

This is the first project I completed as part of the Dataquest.io 'Data Scientist in Python' [path](https://www.dataquest.io/path/data-scientist/). Much of this project was done when I had completed very little of the Dataquest course. 

I returned to this project to finish it off with some additional skills I learnt in later courses such as the use of string operations and Pandas series to increase the efficiency of the code and readibility of the outputs.

# Introduction

The aim of this project is to analyse which type of applications attract the most users on Android and iOS devices. This information is intended to provide insight into which types of applications will generate more revenue based on the knowledge that more frequenctly used apps will offer more exposure to in-app advertisements.

This program uses existing data that was available at no cost. The data sets can be found by clicking the hyperlinks below. 

- [Andoid Data Set](https://www.kaggle.com/lava18/google-play-store-apps/home) - Data about approximately 10 000 Android apps from Google Play, collected in August 2018.
- [iOS Data Set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) - Data about approximately 7 000 iOS apps from the App Store, collected in July 2017.

In [1]:
from csv import reader

#Open and read data on Android Apps
Android_csv = open('PlayStore.csv', encoding = "UTF-8")
Android_data = reader(Android_csv)
android = list(Android_data)
android_apps = android[1:] #List containing lists of android app data without header row
android_header = android[0] #Header row for android app data set
android_title = 'Goolge Play Store'

#Open and read data on iOS Apps
ios_csv = open('AppleStore.csv', encoding = "UTF-8")
ios_data = reader(ios_csv)
ios = list(ios_data)
ios_apps = ios[1:] #List containing lists of iOS app data without header row
ios_header = ios[0] #Header row for iOS app data set
ios_title = 'iOS AppStore'

#Function to explore the data contained in a dataset
def explore_data(dataset,start,end,title,Header_row,rows_and_columns = False):
    #Slice dataset to preview only the first few rows
    sliced_data = dataset[start:end]
    print('The following columns are stored in the',title,'data set: \n\n',Header_row,'\n\n')
    print('The following lists show an example of the data available for the first ',end, 'apps of the ', title, ' data set:\n')
    for row in sliced_data:
        print(row, '\n') # adds an empty line after each row to print in a readable way
    
    if rows_and_columns:
        print('Number of rows in data set:', len(dataset)) #length of dataset already excludes header. Header list stored in separate variable 
        print('Number of columns in data set:', len(dataset[0]))
        
   
    
### explore Adnroid Apps data set ###

explore_data(android_apps,0,3,android_title,android_header,True)
print('\n===================================================================================================================\n')
explore_data(ios_apps,0,3,ios_title,ios_header,True)


The following columns are stored in the Goolge Play Store data set: 

 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 


The following lists show an example of the data available for the first  3 apps of the  Goolge Play Store  data set:

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

Number of rows in data set: 10841
Number of columns in data set: 13


The following columns are

# Improvements using pandas
After returning to this project after doing courses using pandas in python, I realised a lot of the code could be removed and the presentation improved by using pandas DataFrames. The code below uses pandas to replicate what was achieved above with far less code and in a much more presentable form:

In [2]:
import pandas as pd

#Open and read data on Android Apps
Android = pd.read_csv('PlayStore.csv', encoding = "UTF-8")
Android.head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [3]:
iOS = pd.read_csv('AppleStore.csv', encoding = "UTF-8")
iOS.head(5)

Unnamed: 0.1,Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,1,281656475,PAC-MAN Premium,100788224,USD,3.99,21292,26,4.0,4.5,6.3.5,4+,Games,38,5,10,1
1,2,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
2,3,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.0,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1
3,4,282614216,"eBay: Best App to Buy, Sell, Save! Online Shop...",128512000,USD,0.0,262241,649,4.0,4.5,5.10.0,12+,Shopping,37,5,9,1
4,5,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1


The Google Play Store dataset contains data on 10841 apps. The columns that may be of interest to this analysis are: 'App', 'Category', 'Rating', 'Reviews', 'Installs', 'Type', 'Price' and 'Genres'.

If unclear, the details of each of these columns can be found in the data set [documentation](https://www.kaggle.com/lava18/google-play-store-apps/home)

# Delete duplicate data

The Google Play data set is known to have errors in the set. As seen in [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) in the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) for the Play Store app data set, errors may occur because of missing data for an entry. To check if there is missing data, the length of each entry is compared to the header row of the data set. 

If any row has fewer entries than the header row, there is clearly missing data and this entry should be removed from the dataset. The function below identified an incomplete entry at index 10472. This entry is therefore deleted.

# Remove entries with incomplete data

In [4]:
### Remove entries with incomplete rows ###
def remove_wrong_data(dataset):
    #Identify entries with fewer columns than the header row
    index = 0
    for row in dataset:
        if len(row) < len(dataset[0]): 
            print('Entry with index number', index, 'only has', len(row), 'entries instead of ', len(dataset[0]),'!')
        index+=1
    # Delete entry identified to have too few entries  
    del dataset[10472] 

    
remove_wrong_data(android_apps)        

Entry with index number 10472 only has 12 entries instead of  13 !


# Identify duplicates

In [5]:
### Identify duplicate apps

def is_unique(dataset,name_index):
    #Empty lists for unique and duplicate apps
    unique_apps = []
    duplicate_apps = []

    #Identify duplicate apps
    for row in dataset:
        app_name = row[name_index]
        if app_name in unique_apps:
            duplicate_apps.append(app_name)
        else:
            unique_apps.append(app_name)    
    return unique_apps

android_unique = is_unique(android_apps,0)
ios_unique = is_unique(ios_apps,2)

# Remove duplicates and keep only those with highest # of reviews

In [6]:
### Delete duplicate apps and keep the entries with the highest number of reviews since these are the most meaningful values ###      

 
def highest_reviews_only(dataset,OS,name_index,ratings_index):
    #   First create a dictionary containing the maximum number of reviews for each app.
    #   Each apps only exists once in this dictionary.
    reviews_max = {}

    for row in dataset:
        name = row[name_index]
        n_reviews = float(row[ratings_index])

        if name in reviews_max and n_reviews > reviews_max[name]:
            reviews_max[name] = n_reviews

        elif name not in reviews_max:
            reviews_max[name] = n_reviews

    #   Use the reviews_max dictionary to create a list of unique app data with the 
    #   maximum number of reviews for each app.

    cleaned_data = []
    already_added = []

    for row in dataset:
        name = row[name_index]
        n_reviews = float(row[ratings_index])
        if n_reviews == reviews_max[name] and name not in already_added:
            cleaned_data.append(row)
            already_added.append(name)

    if len(reviews_max) == len(cleaned_data):
        print('''
        Duplicate entries for the',OS,'dataset have been removed and only the entry with
        the highest number of reviews has been retained. This dataset has been reduced 
        from''',
              len(dataset),'entries to',len(cleaned_data), 'entries.' )
    else:
        print('An error has occurred and the cleaned dataset does not have the same length as the non-duplicate apps dictionary.')
    return cleaned_data

android_clean = highest_reviews_only(android_apps,'Google Play Store',0,3)
print('\n')
ios_clean = highest_reviews_only(ios_apps,'iOS App Store',2,6)

#=======Use Markdown cells to explain this============#


        Duplicate entries for the',OS,'dataset have been removed and only the entry with
        the highest number of reviews has been retained. This dataset has been reduced 
        from 10840 entries to 9659 entries.



        Duplicate entries for the',OS,'dataset have been removed and only the entry with
        the highest number of reviews has been retained. This dataset has been reduced 
        from 7197 entries to 7195 entries.


# Remove non-English apps

In [7]:
###  Check if apps in the dataset are English and remove those which aren't. ###
#Apps are identified as 'non-english' if the number of non-english characters is greater than 3.
#This allows the use of some special characters e.g. emoticons or symbols.
def english_check(string):
    not_english = 0
    for char in string:
        if ord(char) > 127:   #English ASCII characters have numbers 0 to 127. If the app has a character greater than 127, it isn't english.
            not_english +=1
    if not_english > 3:
        return False
    else:                     #If loop runs without reaching 'return False' then all characters are below 127 and it must be English, so return True.
        return True 
    
#Check english_app_check function using following app names:    
#print(english_check('Instagram'))
#print(english_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
#print(english_check('Docs To Go™ Free Office Suite'))
#print(english_check('Instachat 😜'))

#Function to loop through cleaned data list and create new list of only english apps only
def english_apps_list(dataset, OS,name_index):
    english_apps = []
    for i in dataset:
        app_name = i[name_index]
        if english_check(app_name):
            english_apps.append(i)
    print ('The',OS,'dataset has been reduced to',len(english_apps),'entries of only English apps')
    return english_apps


android_english = english_apps_list(android_clean, 'Google Play Store',0)
ios_english = english_apps_list(ios_clean,'iOS App Store',2)

The Google Play Store dataset has been reduced to 9614 entries of only English apps
The iOS App Store dataset has been reduced to 6181 entries of only English apps


# Remove paid applications

In [8]:
### Isolate only free applications
def is_free(dataset,OS,price_index):
    free_apps = []
    for entry in dataset:
        price = entry[price_index] 
        
        if price == '0' or price == '0.0':
            free_apps.append(entry)
    print('The',OS,'dataset has been reduced to',len(free_apps),'free apps')
    return free_apps


android_free = is_free(android_english,'Play Store',7)
print('\n')
ios_free = is_free(ios_english,'iOS App Store',5)

  

The Play Store dataset has been reduced to 8864 free apps


The iOS App Store dataset has been reduced to 3220 free apps


# Part one

Up to this point we have managed to reduce the size of the datasets in part one by getting rid of some noise and isolating the free apps.

We will now concentrate on the aim of this project, which is to determine the types of apps that are likely to attract more users (and therefore revenue).

The intention is to release the app on both the iOS App Store and Android Play Store, so we will need to investigate which type app potential for success in both markets.

We will begin by investigating the most popular genres of apps in each market. 

# Identify most common genre for each market

In [9]:
#Print header row of each dataset to inspect which data may be useful for finding the most popular genre on each app store.
print(android_header)
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


# The following columns may be useful for each app store:
- Android: 'Category' (index = 1) and 'Genre' (index = 9)
- iOSL 'Prime_genre' (index = 12)

# Create frequency table to identify the instances of each genre

In [10]:
def freq_table(dataset,index):
    
    genres_table = {}
    for row in dataset:
        genre = row[index]
        if genre in genres_table:
            genres_table[genre] += 1  
        else:
            genres_table[genre] = 1
            
    genre_percentage = {}
    for genre in genres_table:
        genre_percentage[genre] = genres_table[genre]/len(dataset) * 100
        
    return genre_percentage


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        

# Frequency table analysis
The iOS 'prime_genre', Anroid 'Category' and Android 'Genre' frequency tables are examined below:

In [11]:
print('Freq table for android "prime_genre" column:\n')
display_table(ios_free,12)
print('================================================================')

Freq table for android "prime_genre" column:

Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


The most common genre on the app store for free, English apps is Games (58%).

The most common apps on the App Store are those which are designed for fun/recreation. This is indicated by the fact that the Games, Entertainment, Photo & Video and Social Networking genres account for over 74% of the apps on the App Store alone.

Apps designed for reference purposes are far less popular. This is seen from the frequency of Reference, Business, Book, Navigation, Medical and Catalogs genres which are the 6 least frequent app genres and together account for only just over 2% of the apps on the App Store.

In [12]:
print('Freq table for Android "Category" column:\n')
print(display_table(android_free,1))
print('================================================================')

Freq table for Android "Category" column:

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PAREN

'Fun' appls appear to be far less frequent on the Play Store with just under 10% of apps being in the 'Games' Category. Productivity tools appear to be far more frequent on the Play Store

In [13]:
print('Freq table for Android "Genre" column:\n')
display_table(android_free,9)
print('================================================================')

Freq table for Android "Genre" column:

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411

The Genre column of the Play Store dataset contains more specific definitions about each apps genre. This means it contains far more entries, only the 'Category' frequency table will be used in the rest of the analysis.

# Most popular iOS app genres

Just because an app genre may be the most frequent on either the App Store or Play Store does not mean it is the most popular. An analysis of the number of downloads in each genre is required to determine the most popular genre for either app store.

While the most downloaded apps on the Play Store can be determined from the "Installs" column of that dataset, there is no such indicator for the iOS App Store. As an alternative, the number of user reviews (found in the rating_count_tot column) will be used to determine the most popular genres for the iOS App Store.

In [14]:
ios_genres = freq_table(ios_free,12)

print('The total number of user ratings for each genre is shown below:\n')
for genre in ios_genres:
    total = 0
    len_genre = 0
    for row in ios_free:
        genre_app = row[12]
        if genre_app == genre:
            user_ratings = float(row[6])
            total += user_ratings
            len_genre += 1
    avg_n_ratings = total/len_genre
    
    print(genre, ':', avg_n_ratings)

The total number of user ratings for each genre is shown below:

Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22812.92467948718
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0


# Most downloaded Android categories

In [15]:
android_installs = display_table(android_free,5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [16]:
categories_android = freq_table(android_free,1)
for category in categories_android:
    total = 0
    len_category = 0
    for row in android_free:
        app_category = row[1]
        if app_category == category:
            n_installs = row[5]
            n_installs = n_installs.replace(',' , '')
            n_installs = n_installs.replace('+' , '')
            total += float (n_installs)
            len_category += 1
            
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)


ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps have the most installs out of the list above (over 38 million). This number will be heavily skewed by a small number of apps which have huge numbers of installs like WhatsApp, Skype, Gmail etc. Let's see which results in the communication category have large numbers of installs:

In [17]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If all communication apps with over 100 million installs were removed, it would have a major effect on the average number of installs for this category.

Note that a similar trend occurs in other categories which are dominated by just a few major apps. These include social apps (Facebook, Instagram etc), photography (Google+, Picasa and other popular photo editors), productivity (Microsoft Office apps, Evernote, Google Calendar etc).

Many of the seemingly popular categories are therefore dominated by a few big players. Creating a new app to compete in one of those categories would be very difficult.

The books and reference genre seems fairly popular on the Android Play Store and was found to be show potential on the App Store as well. This catergory is worth having a closer look at since our goal is to recommend an app genre/category that has the potential to be popular on both the iOS App Store and Android Play Store.

Let's have a look at some of the apps from this genre in the Play Store:

In [18]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [19]:
#Remove string characters from numbre of installs columns and convert it
#to int type for easier indexing
for app in android_free:
    app[5] = app[5].replace('+','')
    app[5] = app[5].replace(',','')
    app[5] = int(app[5])

In [20]:
import pandas as pd
above_1m = {}
for app in android_free:
    if app[1] == "BOOKS_AND_REFERENCE" and app[5] >= 1000000:
        n_dl = app[5]
        name = app[0]
        above_1m[name] = n_dl
        
series = pd.Series(above_1m)

series.sort_values(ascending = False)

Google Play Books                                     1000000000
Audiobooks from Audible                                100000000
Bible                                                  100000000
Amazon Kindle                                          100000000
Wattpad 📖 Free Books                                   100000000
Al-Quran (Free)                                         10000000
Cool Reader                                             10000000
FBReader: Favorite Book Reader                          10000000
HTC Help                                                10000000
Moon+ Reader                                            10000000
Aldiko Book Reader                                      10000000
English Hindi Dictionary                                10000000
Al Quran Indonesia                                      10000000
Al'Quran Bahasa Indonesia                               10000000
Quran for Android                                       10000000
Wikipedia                

This category is dominated by apps that are used to read ebooks and by various types of dictionary apps. Building an app to compete in this area would not be a good idea.

There are a number of apps with over 1M downloads that are related to the Quran. This suggests that building an app related to a popular book has good potential for profitability.

What would probably be even more appealing is an app that has some additional features related to the book such as audio detailing highlights of each chapter or a forum for people to discuss the book. Creating a forum would introducte a social aspect to the app and it has already been shown that the social category is the most popular.

# Conclusion

We have analyzed data about apps in the iOS App Store and Android Play store in order to recommend an app profile that could be profitable on both platforms. 

The conclusion reached in this analysis was that creating app related to a popular book and adding some additional, interactive features could be profitable in both markets. Adding additional features beyond the raw text of the book would make the app more appealing and competitive than similar apps which only contain the raw versions of books.