### Profitable App Profiles for the App Store and Google Play Markets

The goal of this project is to collect and analyze data about mobile apps available on Google Play and the App Store to understand what type of apps are likely to attract more users.


In [2]:
#opening and reading the files

opened_file1 = open('AppleStore.csv')
opened_file2 = open('googleplaystore.csv')
from csv import reader
file1 = reader(opened_file1)
file2 = reader(opened_file2)
ios_data = list(file1)
android_data = list(file2)

## I - Data Exploration

In [3]:
# exploring the files data

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

ios_columns = ios_data[0]
android_columns = android_data[0]


print('ios data preview')
print('\n')
print(ios_columns)
print('\n')
explore_data(ios_data,1,5,True)
print('\n')
print('android data preview')
print('\n')
print(android_columns)
print('\n')
explore_data(android_data,1,5,True)





ios data preview


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16


android data preview


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated'

#### Main columns (factors) to consider in the analysis

|Column name in PlayStore dataset|Column name in AppStore dataset        |
|--------------------------------|---------------------------------------|
|App                             |track_name                             |
|Rating                          |user_rating                            |
|Genres                          |prime_genre                            |
|Content Rating                  |cont_rating                            |
|Price                           |price                                  |
|Size                            |size_bytes                             |


## II - Data Cleaning

### 1 - Removing incorrect data


In [4]:
# Step 1 - detecting incorrect data. It has been mentioned in Kaggle that this entry index 10472, in the GooglePlayStore dataset is missing the rating

print(android_data[10473])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [5]:
# The row 10473 is definitely missing the rating entry, we will then remove it
del android_data[10473]

### 2 - Removing duplicates

In [6]:
# Step 2 - Removing duplicates

for app in android_data:
    if app[0]=='Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [7]:
#The code aboves clearly shows that the dataset contains duplicates, let's count how many duplicates the android_data has

unique_apps=[]
dup_apps=[]

for app in android_data[1:]:
    name=app[0]
    if name in unique_apps:
        dup_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of android unique apps : ', len(unique_apps))
print('Number of android dup apps : ', len(dup_apps))
print('Examples of android dup apps : ', dup_apps[:5])




Number of android unique apps :  9659
Number of android dup apps :  1181
Examples of android dup apps :  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


#### Criteria to remove duplicates
After examining the rows we printed for the Instagram app, we noticed that the main difference happens on the fourth position of each row, which corresponds to the number of reviews. We will use this information to build a criterion for removing the duplicates. (i.e. rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app).

To do so, we will first create an empty dictionary `reviews_max` where we will the store the unique names of apps and their highest number of ratings, then we will create an empty list `android_clean` that is going to be our cleaned dataset free from duplicates.


In [8]:
reviews_max = {}
for app in android_data[1:]:
    name=app[0]
    n_reviews=float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name]=n_reviews
    elif name not in reviews_max:
        reviews_max[name]=n_reviews
        
print(len(reviews_max)) # should be equal to 9659
    

9659


In [9]:
android_clean=[]
already_added=[]

for app in android_data[1:]:
    name=app[0]
    n_reviews=float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

#let's see how many occurence of Instagram we'll get now
for app in android_clean:
    if app[0]=='Instagram':
        print(app)

#let's also print the length of our cleaned dataset
print('\n')
print('Number of records after removing duplicates : '+str(len(android_clean)))
print(explore_data(android_clean,0,5,True))

     



['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Number of records after removing duplicates : 9659
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructi

In [10]:
#checking if there's any duplicates in the AppStore dataset

unique_ios_apps=[]
dup_ios_apps=[]

for app in ios_data[1:]:
    app_id=app[0]
    if app_id in unique_ios_apps:
        dup_ios_apps.append(app_id)
    else:
        unique_ios_apps.append(app_id)

print(len(dup_ios_apps))

    

0


### 4 - Dropping Non English Apps
To make sure that we only keep the english titled apps, we're going to loop through all the characters of each app title; if we find that an app title has more than 3 non-english characters we're going to drop the app from the list.
In python, we can check if a character is non-english by using the built in function `ord`. If `ord(char)>127`, we can assume that the character is non-english.

In [11]:
def IsEnglish(string_):
    ne_chars=0
    for i in string_:
        if ord(i)>127:
            ne_chars+=1
        if ne_chars>3:
            return False
    return True

android_apps_c1=[]
ios_apps_c1=[]

for app in ios_data[1:]:
    name=app[1]
    if IsEnglish(name):
        ios_apps_c1.append(app)

for app in android_clean:
    name=app[1]
    if IsEnglish(name):
        android_apps_c1.append(app)


print(explore_data(ios_apps_c1,0,5,True))
print(explore_data(android_apps_c1,0,5,True))



['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 6183
Number of columns: 16
None
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Ap

### 4 - Dropping Non-Free Apps


In [12]:
#Isolating non free apps

ios_data_clean=[]
android_data_clean=[]

for app in ios_apps_c1:
    if app[4]=='0.0':
        ios_data_clean.append(app)
        
for app in android_apps_c1:
    if app[6]=='Free':
        android_data_clean.append(app)
        
print('lenth of cleaned ios dataset is: '+str(len(ios_data_clean)))
print('lenth of cleaned android dataset is: '+str(len(android_data_clean)))

lenth of cleaned ios dataset is: 3222
lenth of cleaned android dataset is: 8904


## III - Data Analysis

### 1 - Most Common Apps by Genre


As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to build `frequency tables` for a few columns in our datasets.

In [13]:
# datasets inspections

print(explore_data(ios_data_clean,0,5,True))
print(explore_data(android_data_clean,0,5,True))

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 3222
Number of columns: 16
None
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Ap

In [14]:
def freq_table(dataset,index):
    freq_table={} 
    total=0
    
    for row in dataset:
        total+=1
        genre=row[index] 
        if genre in freq_table:
            freq_table[genre]+=1
        else:
            freq_table[genre]=1

    #percentages
    freq_table_perc={}
    for genre in freq_table:
        freq_table_perc[genre]=freq_table[genre]/total
        
    return freq_table_perc


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
            

print('Frequency table for AppleStore app genres')
print('\n')
print(display_table(ios_data_clean, 11))
print('\n')
print('\n')
print('Frequency table for PlayStore app genres')
print('\n')
print(display_table(android_data_clean, 1))
print('\n')
print(display_table(android_data_clean, 9))
print('\n')
                
            

Frequency table for AppleStore app genres


Games : 0.5816263190564867
Entertainment : 0.07883302296710118
Photo & Video : 0.04965859714463067
Education : 0.03662321539416512
Social Networking : 0.032898820608317815
Shopping : 0.0260707635009311
Utilities : 0.025139664804469275
Sports : 0.021415270018621976
Music : 0.020484171322160148
Health & Fitness : 0.020173805090006207
Productivity : 0.01738050900062073
Lifestyle : 0.015828677839851025
News : 0.01334574798261949
Travel : 0.012414649286157667
Finance : 0.0111731843575419
Weather : 0.008690254500310366
Food & Drink : 0.008069522036002483
Reference : 0.00558659217877095
Business : 0.005276225946617008
Book : 0.004345127250155183
Navigation : 0.00186219739292365
Medical : 0.00186219739292365
Catalogs : 0.0012414649286157666
None




Frequency table for PlayStore app genres


FAMILY : 0.1896900269541779
GAME : 0.09703504043126684
TOOLS : 0.08434411500449236
BUSINESS : 0.04582210242587601
LIFESTYLE : 0.03930817610062893
PRODUCTIVITY : 

#### iOs data freq table analysis
Based on the results above, we can see that almost `60%` of the apps in the App Store are `Games`, which is the most common genre. We can also see that in general, most of the apps are designed for entertainment (games, photo and video, social networking, sports, music).

#### Android data freq table analysis
Based on the results above, we can see that almost the most common genre in the Play Store is `Family`. We can also notice that in contrast to the ios data freq table, Google Play shows a more balanced landscape of both practical and fun apps. 

#### App recommendation based on the results above
So far, we have anlyzed the most frequent apps in each store. However, can't recommend an app profile for the App Store and Play Store markets based on these frequency tables alone. We should also analyze the number of downloads in each genre, which will give us an idea of the audience interest.


### 2 - Aps with the most users by Genre

#### AppStore


The frequency tables we analyzed on the previous screen showed us that apps designed for fun dominate the App Store, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to determine the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot app`.


In [24]:
# average number of user ratings per app genre on the App Store
genres_ios=freq_table(ios_data_clean,-5)

for genre in genres_ios:
    total=0
    len_genre=0
    for app in ios_data_clean:
        genre_app=app[-5]
        if genre_app==genre:
            num_ratings=float(app[5])
            total+=num_ratings
            len_genre+=1
    print(genre,' : ',total/len_genre)
        
    
        
        
    
        
    

Social Networking  :  71548.34905660378
Photo & Video  :  28441.54375
Games  :  22788.6696905016
Music  :  57326.530303030304
Reference  :  74942.11111111111
Health & Fitness  :  23298.015384615384
Weather  :  52279.892857142855
Utilities  :  18684.456790123455
Travel  :  28243.8
Shopping  :  26919.690476190477
News  :  21248.023255813954
Navigation  :  86090.33333333333
Lifestyle  :  16485.764705882353
Entertainment  :  14029.830708661417
Food & Drink  :  33333.92307692308
Sports  :  23008.898550724636
Book  :  39758.5
Finance  :  31467.944444444445
Education  :  7003.983050847458
Productivity  :  21028.410714285714
Business  :  7491.117647058823
Catalogs  :  4004.0
Medical  :  612.0


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together as shown below:

In [25]:
for app in ios_data_clean:
    if app[-5]=='Navigation':
        print(app[1],' : ',app[5])

Waze - GPS Navigation, Maps & Real-time Traffic  :  345046
Google Maps - Navigation & Transit  :  154911
Geocaching¬Æ  :  12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps  :  3582
ImmobilienScout24: Real Estate Search in Germany  :  187
Railway Route Search  :  5


The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. We could get a better picture by removing these extremely popular apps for each genre and then rework the averages, but we'll leave this level of detail for later.
Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:


In [27]:
for app in ios_data_clean:
    if app[-5]=='Reference':
        print(app[1],' : ',app[5])

Bible  :  985920
Dictionary.com Dictionary & Thesaurus  :  200047
Dictionary.com Dictionary & Thesaurus for iPad  :  54175
Google Translate  :  26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran  :  18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition  :  17588
Merriam-Webster Dictionary  :  16849
Night Sky  :  12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE)  :  8535
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools  :  4693
GUNS MODS for Minecraft PC Edition - Mods Tools  :  1497
Guides for Pok√©mon GO - Pokemon GO News and Cheats  :  826
WWDC  :  762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free  :  718
VPN Express  :  14
Real Bike Traffic Rider Virtual Reality Glasses  :  8
Êïô„Åà„Å¶!goo  :  0
Jishokun-Japanese English Dictionary & Translator  :  0


However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

- Weather apps ‚Äî people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

- Food and drink ‚Äî examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

- Finance apps ‚Äî these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

Now let's analyze the Google Play market a bit.

#### PlayStore

In [28]:
display_table(android_data_clean,5)

1,000,000+ : 0.15689577717879605
100,000+ : 0.11579065588499551
10,000,000+ : 0.10500898472596586
10,000+ : 0.1025381850853549
1,000+ : 0.08423180592991913
100+ : 0.06918238993710692
5,000,000+ : 0.06817160826594788
500,000+ : 0.05536837376460018
50,000+ : 0.04818059299191375
5,000+ : 0.04526055705300988
10+ : 0.03537735849056604
500+ : 0.03234501347708895
50,000,000+ : 0.022911051212938006
100,000,000+ : 0.02122641509433962
50+ : 0.019092542677448338
5+ : 0.007861635220125786
1+ : 0.005166217430368374
500,000,000+ : 0.0026954177897574125
1,000,000,000+ : 0.0022461814914645105
0+ : 0.0004492362982929021


In [37]:
android_cats=freq_table(android_data_clean,1)
for category in android_cats:
    total=0
    len_category=0
    for app in android_data_clean:
        category_app=app[1]
        if category==category_app:
            n_installs=app[5]
            nis=n_installs.replace('+','')
            nis2=nis.replace(',','')
            total+=float(nis2)
            len_category+=1
    print(category, ' ; ',total/len_category)


ART_AND_DESIGN  ;  1952105.1724137932
AUTO_AND_VEHICLES  ;  647317.8170731707
BEAUTY  ;  513151.88679245283
BOOKS_AND_REFERENCE  ;  8587351.855670104
BUSINESS  ;  1708215.906862745
COMICS  ;  803234.8214285715
COMMUNICATION  ;  38322625.697916664
DATING  ;  854028.8303030303
EDUCATION  ;  1825480.7692307692
ENTERTAINMENT  ;  11640705.88235294
EVENTS  ;  253542.22222222222
FINANCE  ;  1387692.475609756
FOOD_AND_DRINK  ;  1924897.7363636363
HEALTH_AND_FITNESS  ;  4188821.9853479853
HOUSE_AND_HOME  ;  1331540.5616438356
LIBRARIES_AND_DEMO  ;  638503.734939759
LIFESTYLE  ;  1436126.94
GAME  ;  15551995.891203703
FAMILY  ;  3671043.037892244
MEDICAL  ;  120550.61980830671
SOCIAL  ;  23253652.127118643
SHOPPING  ;  7001693.425
PHOTOGRAPHY  ;  17772018.759541985
SPORTS  ;  3638640.1428571427
TRAVEL_AND_LOCAL  ;  13984077.710144928
TOOLS  ;  10787009.952063914
PERSONALIZATION  ;  5183850.806779661
PRODUCTIVITY  ;  16738957.554913295
PARENTING  ;  542603.6206896552
WEATHER  ;  5074486.197183099

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [39]:
for app in android_data_clean:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Me

If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:

In [41]:
under_100_m = []

for app in android_data_clean:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3589717.245210728

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [43]:
for app in android_data_clean:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [45]:
for app in android_data_clean:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

### Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.