# An Ideal App Profile for a Profitable App in iOS and Google Play Stores  

In this project we will take a deeper insight on the type of Apps users tend to use more than others.  
In our company all the Apps that are made, are free. Wither they are iOS apps or Andriod mobile apps, the only source of revenue is from ads within the apps.  
This analysis will help our developers understand what type of apps are likely to attract more users to increase the overall company's profit.  
The goal of this project is to suggest an app profile that can be built for both stores.

## Opening our data sets
We can begin by opening both of our data sets

In [1]:
open_file = open('AppleStore.csv')
from csv import reader
read_file= reader(open_file)
ios_set = list(read_file)
ios_header= ios_set[0]
ios_set= ios_set[1:]

open_file = open('googleplaystore.csv')
from csv import reader
read_file= reader(open_file)
google_set = list(read_file)
google_header = google_set[0]
google_set= google_set[1:]

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
   



print(google_header)
print('\n')
explore_data(google_set, 0, 2, True)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


We have 10841 rows in Google play store data set and 13 columns. We will be interested on the following column for the purpose of our analysis: ('App', 'Category', 'Reviews', Installs', 'Type', 'Price', 'Genres')

In [2]:
print (ios_header)
print('\n')
explore_data(ios_set,0,2, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16


We have 7197 rows in the iOS store data set and 16 columns. We will be interested on the following column for the purpose of our analysis: ('track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre') The following [documintaion](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) describes what each column means

## Removing inaccurate data
After reading the [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion) about Google Play set, we relize there is an error on row 10472. The data for that row are off, therefore, we have to remove it.

In [3]:
for row in google_set:
    if len(row) !=len(google_header):
        print(row)
        print(google_set.index(row))


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


In [4]:
print(google_set[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [5]:
a = []
for row in google_set:
    if len(row) !=len(google_header):
        a.append(row)
print(a)

[['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']]


In [6]:
del google_set[10472]

In [7]:
print(google_set[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [8]:
len(google_set)

10840

In [9]:
for row in ios_set:
    if len(row) !=len(ios_header):
        print(row)
        print(ios_set.index(row))

In [10]:
repeated_apps= []
unique_apps= []

for row in google_set:
    name = row[0]
    if name in unique_apps:
        repeated_apps.append(name)
    else:
        unique_apps.append(name)
print(len(repeated_apps))


1181


## Data Cleaning
After exploring our data from Google play store, we noticed we have some duplicate apps. While doing our analysis we do not want to count the same app twice or more, therefore, we need to remove duplicated apps from our set. Here is a sample of some of the dulicated apps

In [11]:
print (repeated_apps[:10])

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Now we need to follow a certian criterion to removed these duplicated apps from our set. The removal will not be randome, instead, looking at the data we notice that dulplicated apps differ on the fourth column which is the number of reviews. Therefore, we will take the ones with higher number of reviews, because it reflects the more recent data. We will only keep the rows with the highest number of reviews and remove the others fromt he duplicated set.

In [12]:
print ('Expected length', len(google_set)-1181)

Expected length 9659


In [13]:
reviews_max= {}
for app in google_set:
    name=app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
len(reviews_max)

9659

In [14]:
googleclean = []
alreadyadded = []
for app in google_set:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and name not in alreadyadded:
        googleclean.append(app)
        alreadyadded.append(name)
len(googleclean)

9659

In [15]:
explore_data(googleclean, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


In the above steps we did two things:  
**First**:
    We isolate the duplicated apps with the highest number of reviews in a dictionary called **reviews_max**   
    Then we use this new dictionary to create a new data set that only have the highest number of reviews of each dulicated app.
    

**Second**:After we have our set with duplicated apps that have the highest reviews, we create two empty sets. One set will be for the new clean set without duplication named **googleclean** and another set is for the duplicated apps:   
    we loop the apps in our google_set, and for each iteration:  
        we separate the name and the number of reviews.  
        we add the row of the app to the googleclean list and the app name to alreadyclean list if:  
            The number of reviews is the same as the number of reviews is the reviews_max dictionary, and  
            The name of the app is not already in the already_added list. we added this second statment because some duplicated apps have their reviews at the same number, therefore, it could show up in our clean data set twice, which we don't want that to happen.

In [16]:
def englonly(string):
    outofrange = 0
    
    for character in string:
        if ord(character) > 127:
            outofrange += 1
    
    if outofrange > 3:
        return False
    else:
        return True
print (englonly('Docs To Go™ Free Office Suite'))


True


In [17]:
ios_english= []
google_english= []
for apps in ios_set:
    name = apps[1]
    if englonly(name):
        ios_english.append(apps)
for apps in googleclean:
    name = apps[0]
    if englonly(name):
        google_english.append(apps)
print(len(ios_english))
print(len(google_english))

6183
9614


After removing the duplicated apps, we notice some apps are directed to non-english speakers. Our analysis is focused primarily on English speakers. We need to fliter out any apps that are named using something other than the English alphabet. ASCII has numbers that correspond to letters, the ASCII range for the english alphabet starts from 0 and ends at 127. This means, if an app has a non-English letter in its name, the corresponding outcome will be more than 127.  
In the steps above we wrote a function **englonly** that takes in a string - the name of the app- and return *false* if the character doesn't belong to the English alphabet.  
In the function, it checks if the character count has more than 127 using the built-in function *ord()* then it adds it to a list we created calle **outofrange**.  
We should keep in mind that some English apps have characters and emojis in their names that could possibly be outside our range of 127. To fix this issue, we check inside our englonly function if the name has 3 characters or less out of range, then this is acceptable, otherwise, it is more likely that this app is not for English speaker audience.  
Now we can use our new function to create a new list after we fliter out the non-English apps from both data sets.

In [18]:
explore_data(ios_english, 0, 2, True)
print('\n')
explore_data(google_english, 0, 2, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6183
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


In [19]:
ios_free=[]

for apps in ios_english:
    price= apps[4]
    if price == '0.0':
        ios_free.append(apps)
print(len(ios_free))


3222


In [20]:
google_free=[]
for apps in google_english:
    price = apps[7]
    if price == '0':
        google_free.append(apps)
print(len(google_free))

8864


Since our aim is to look for free apps only, in the above steps we isolated the free apps from each set. We looped each set and checked in each iteration if the price is zero, then we add it to a new list for the iOS apps set we created a list called **ios_free**, and for the Google Play apps set we created a list called **google_free**  
These new sets we have are the final sets after the cleaning process. Now our data is ready to start the process of analyzing 

## Data Analysis
We would like to determine the apps that are more likely to attract more users, because the profit of our company is solely independant on how many users uses an app.  

We will build an andriod app and add it to the Google Play store.  
Then we will see if the app has good response from users, we can improve it furthe.  
Then after six months if the app is profitable, we can add it to the App store.  
Remembering that our goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both stores.  
We can start our analysis by looking at the more common genre in both stores. To do that we need to create a frequecy table for both sets to examin the genre columns.  


In [21]:
def freq_table(dataset, index):
    ftable = {}
    total = 0
    for row in dataset:
        total +=1
        value = row[index]
        if value in ftable:
            ftable[value]+=1
        else:
            ftable[value]=1
    percentage_table={}
    for key in ftable:
        percentage = (ftable[key] / total ) * 100
        percentage_table[key] = percentage
    return percentage_table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
display_table(ios_free, -5)
        

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that the iOS store is dominated by Games accounting for 58% of the English free apps. Then comes second Entertainment with almost 8% of the apps in our set.  
One can conclude from the above results, that our data of the iOS store (only for English free apps) are by far dominated by games directed for audience who are looking for entertainment.  
On the other hand, apps that fall under the categary of Book, Navigations, productivity, utilities...etc all these categories we can call as practical are not as common.  
Generally speaking, these results are misleading in a way. Furthermore, large number of apps for a certain genre, does not imply that this genre have a large number of users. Simply because, the demand could be higher than the supply.

In [22]:
display_table(google_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [23]:
display_table(google_free, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Moving on, when exploring the Google Play store data set, we notice we have two columns we have to work with ( Category and Genres). However, since there can not be a clear distinction between the two, we will go the broader column, that is, the Category column.  
Looking at the percentage of the category column, we see it is a little bit different than we had with the results in the iOS store. Here we can see that the store is spread between categories without domination on a particular category. Nevertheless, family accounts for 19% on the top of the list followed by Games then Tools. The pattern here suggest that Google Play store apps are more on the practical side, rather than entertainment like the iOS store.  


In conclusion, comparing both stores we see that the App store is dominated by Games and Entertainment apps, where the Play store is spread between practical and Games and Entertainment apps.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Below, we calculate the average number of user ratings per app genre on the App Store:

In [24]:
ios_genre = freq_table(ios_free, -5)
for genre in ios_genre:
    total = 0
    len_genre = 0
    for apps in ios_free:
        genre_app = apps[-5]
        if genre_app == genre:
            numRating = float(apps[5])
            total += numRating
            len_genre +=1
    avg_rating = total/len_genre
    print(genre, ':', avg_rating)

Photo & Video : 28441.54375
Utilities : 18684.456790123455
Shopping : 26919.690476190477
Music : 57326.530303030304
Catalogs : 4004.0
Productivity : 21028.410714285714
Reference : 74942.11111111111
Book : 39758.5
Entertainment : 14029.830708661417
Sports : 23008.898550724636
News : 21248.023255813954
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Finance : 31467.944444444445
Food & Drink : 33333.92307692308
Navigation : 86090.33333333333
Travel : 28243.8
Social Networking : 71548.34905660378
Weather : 52279.892857142855
Business : 7491.117647058823
Health & Fitness : 23298.015384615384
Medical : 612.0
Games : 22788.6696905016


In [25]:
for apps in ios_free:
    if apps[-5] == 'Navigation':
        print(apps[1], ':', apps[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


By looking at this data the Navigation genre has the highest number of user reviews, but this is because of two major apps, Waze and Google Maps.  The same can be said about Social Media genre, the average is high because of just few apps like Instagram and Facebook. Therefore, to get a clearer picture of the popular genre we need to remove these giant apps from each genre,  because our average is significantly skewed by these apps.  

However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

### Most Popular App by Genre on Google Play Store
Here we have the column of installs that we can use to count the number of users for each app. 

In [26]:
display_table(google_free, 5) #the Install column

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.

In [27]:
google_categ = freq_table(google_free, 1)
for category in google_categ:
    total = 0 #This variable will store the sum of installs to each genre
    len_category = 0 #This variable will store the number of apps to each genre
    for app in google_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_installs = total / len_category
    print(category, ':', avg_installs)


BUSINESS : 1712290.1474201474
SOCIAL : 23253652.127118643
TOOLS : 10801391.298666667
COMICS : 817657.2727272727
VIDEO_PLAYERS : 24727872.452830188
HOUSE_AND_HOME : 1331540.5616438356
AUTO_AND_VEHICLES : 647317.8170731707
FINANCE : 1387692.475609756
COMMUNICATION : 38456119.167247385
TRAVEL_AND_LOCAL : 13984077.710144928
LIFESTYLE : 1437816.2687861272
PERSONALIZATION : 5201482.6122448975
MEDICAL : 120550.61980830671
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
SHOPPING : 7036877.311557789
EDUCATION : 1833495.145631068
GAME : 15588015.603248259
LIBRARIES_AND_DEMO : 638503.734939759
FAMILY : 3695641.8198090694
DATING : 854028.8303030303
NEWS_AND_MAGAZINES : 9549178.467741935
PHOTOGRAPHY : 17840110.40229885
EVENTS : 253542.22222222222
FOOD_AND_DRINK : 1924897.7363636363
MAPS_AND_NAVIGATION : 4056941.7741935486
ENTERTAINMENT : 11640705.88235294
BOOKS_AND_REFERENCE : 8767811.894736841
WEATHER : 5074486.197183099
PRODUCTIVITY : 16787331.344927534
BEAUTY : 513151.88679245283

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs

In [28]:
for app in google_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Let us remove the apps in Communication with more than 100 million installs

In [29]:
under_100_m = []

for app in google_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [30]:
for app in google_free:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

looking at this data, we see that there is a small domination by large apps like "Google Play Books" and "Bible". But when looking at the apps somewhere in the middle, we notice that this genre covers so many topics.  
This gives is a room to work with and develop an app in the 'Book and Reference' genre, as well as we saw in the App store analysis. Therefore, I suggest our profile app should be built under the 'Book and Reference' genre.  

## Conclusion
It looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.