## Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.



How do you visualize price distribution of paid apps ?
How does price distribution get affected by category ?
What about paid apps Vs Free apps ?
Are paid apps good enough ?
As the size of the app increases do they get pricier ?
How are the apps distributed category wise ? can we split by paid category ?




### Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:

A data set containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
    A data set containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).


In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
def open_dataset(file_name, header=True):        
    opened_file = open(file_name,encoding = 'utf-8')
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    
    if header:
        return data[0],data[1:]
    else:
        return data
    

In [3]:
apple_header , apple = open_dataset("AppleStore.csv")
google_header , google = open_dataset("googleplaystore.csv")

In [4]:
explore_data(apple,0,4,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


In [5]:
explore_data(google,0,4,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


In [6]:
apple_header

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [7]:
google_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [8]:
len(google[10472])

12

#### Check for missing values in the datasets 

 

In [9]:
def missing_col(dataset):
    for each_row in dataset:
        if len(each_row) != len(dataset[0]):
            print(dataset.index(each_row))
            print(each_row)
            print('\n')
        

In [10]:
missing_col(google)

10472
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




##### Delete the missing data row 

In [11]:
del google[10472]

#### Check duplicate entries 

In [12]:
def duplicate_entries(dataset,index):
    duplicate_apps = []
    unique_apps = []
    for app in dataset:
        name = app[index]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
          
    return duplicate_apps , unique_apps
        
duplicate_google , unique_google = duplicate_entries(google,0)      
print('Number of duplicate apps:', len(duplicate_google))
print('\n')
print('Examples of duplicate apps:', duplicate_google[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [13]:
duplicate_apple , unique_apple = duplicate_entries(apple,0)
print('Number of duplicate apps:', len(duplicate_apple))

Number of duplicate apps: 0


In [14]:
reviews_max = {}
for each_row in google:
    name = each_row[0]
    n_reviews = float(each_row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
        
print('Expected length:', len(google) - 1181)
print('Actual length:', len(reviews_max))    

Expected length: 9659
Actual length: 9659


#### Creating a new dataset without duplicate entries (apps with highest reviews kept)

In [15]:

android_clean = []
already_added = []

for app in google:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

In [16]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


###### function to check whether app name is english 

In [17]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


In [18]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in apple:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

##### Isolate free apps in the datasets 

In [19]:
android_free_eng = []
for each_row in android_english:
    price = each_row[7]
    if price == '0':
        android_free_eng.append(each_row)
        
ios_free_eng = []
for each_row in ios_english:
    price = float(each_row[4])
    if price == 0.0:
        ios_free_eng.append(each_row)

explore_data(android_free_eng, 0, 3, True)
print('\n')
explore_data(ios_free_eng, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 


To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps
Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

To find profiles that are successful on both markets (ios and play) we need to find most common genres in each market .

In [20]:


def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])



In [21]:
display_table(ios_free_eng, -5)



Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (atleast free english ones) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.



In [22]:

# Category
display_table(android_free_eng, 1) 

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [23]:
#genres 
display_table(android_free_eng, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Category column more aptly conveys the type we need to assess the app profiles in Android 
We found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps(landscape dominated by FAMILY:19%,GAME:9.7%,TOOLS: 8.5%,BUSINESS:4.6% followed by LIFESTYLE and PRODUCTIVITY at approx 3.9%)

### To determine the kind of Apps with most users 
#### Avg no of  installs for each app genre 
 * Android  : Installs column 
 * ios store: rating_count_tot column 
(No installs column so total number of user ratings taken as a proxy)


In [24]:
# prime_genre column 11/-5 of ios dataset 
genres_ios = freq_table(ios_free_eng, -5)
for genre in genres_ios:
    total = 0 # sum of(number of)user ratings specific to genre
    len_genre = 0  #no of apps specific to genre
    for app in ios_free_eng:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5]) #rating_count_tot col
            total += n_ratings
            len_genre += 1
    
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)
    
    


Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


 ## Most Popular Apps by Genre on Apple store
 Navigation apps have the highest number of user ratings,followed by Reference and Social Networking.
figures are influenced by few giants like 
* facebook ,pinterest skype (Social) 
*Waze and Google Maps (in Navigation)  
*Bible and Dictionary.com (in Reference)
 

In [25]:
popular = ['Navigation','Reference','Social Networking']
for genre in popular:
    for app in ios_free_eng: 
        if app[-5] == genre:
            print(app[-5],app[1],":",app[5])
            

Navigation Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Navigation Google Maps - Navigation & Transit : 154911
Navigation Geocaching® : 12811
Navigation CoPilot GPS – Car Navigation & Offline Maps : 3582
Navigation ImmobilienScout24: Real Estate Search in Germany : 187
Navigation Railway Route Search : 5
Reference Bible : 985920
Reference Dictionary.com Dictionary & Thesaurus : 200047
Reference Dictionary.com Dictionary & Thesaurus for iPad : 54175
Reference Google Translate : 26786
Reference Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Reference New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Reference Merriam-Webster Dictionary : 16849
Reference Night Sky : 12122
Reference City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
Reference LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Reference GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Re

## Most Popular Apps by Genre on Google Play

We have data about the number of installs for the Google Play market but  install numbers don't seem precise enough — values are open-ended (100+, 1,000+, 5,000+, etc)
Since we just need the most attractive genres , precision is put on the back burner 

In [26]:

display_table(android_free_eng, 5) # the Installs columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [27]:
#category column 1 of google dataset
categories_android = freq_table(android_free_eng, 1)
genre_dict = {}
for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free_eng:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    

    print(category, ':', avg_n_installs)
    genre_dict[category] = avg_n_installs



ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

In [28]:
genre_list = []
for key in genre_dict:
        key_val_as_tuple = (genre_dict[key], key)
        genre_list.append(key_val_as_tuple)
        genre_sorted = sorted(genre_list, reverse = True)
        
for entry in genre_sorted:
    print(entry[1], ':', entry[0])


COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

Communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts)

In [29]:
popular = ['Communication','Video Players','Social','Photography']
for app in android_free_eng: 
    if app[1] == 'COMMUNICATION':
        print(app[1],app[0],":",app[5])

COMMUNICATION WhatsApp Messenger : 1,000,000,000+
COMMUNICATION Messenger for SMS : 10,000,000+
COMMUNICATION My Tele2 : 5,000,000+
COMMUNICATION imo beta free calls and text : 100,000,000+
COMMUNICATION Contacts : 50,000,000+
COMMUNICATION Call Free – Free Call : 5,000,000+
COMMUNICATION Web Browser & Explorer : 5,000,000+
COMMUNICATION Browser 4G : 10,000,000+
COMMUNICATION MegaFon Dashboard : 10,000,000+
COMMUNICATION ZenUI Dialer & Contacts : 10,000,000+
COMMUNICATION Cricket Visual Voicemail : 10,000,000+
COMMUNICATION TracFone My Account : 1,000,000+
COMMUNICATION Xperia Link™ : 10,000,000+
COMMUNICATION TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
COMMUNICATION Skype Lite - Free Video Call & Chat : 5,000,000+
COMMUNICATION My magenta : 1,000,000+
COMMUNICATION Android Messages : 100,000,000+
COMMUNICATION Google Duo - High Quality Video Calls : 500,000,000+
COMMUNICATION Seznam.cz : 1,000,000+
COMMUNICATION Antillean Gold Telegram (original version) : 100,000+


COMMUNICATION chat dz : 100+
COMMUNICATION love sms good morning : 5,000+
COMMUNICATION Goodbox - Mega App : 100,000+
COMMUNICATION Call Blocker - Blacklist, SMS Blocker : 1,000,000+
COMMUNICATION [EF]ShoutBox : 100+
COMMUNICATION Eg Call : 10,000+
COMMUNICATION ei : 10+
COMMUNICATION EJ messenger : 10+
COMMUNICATION Ek IRA : 10+
COMMUNICATION Orfox: Tor Browser for Android : 10,000,000+
COMMUNICATION EO Mumbai : 10+
COMMUNICATION EP RSS Reader : 100+
COMMUNICATION Voxer Walkie Talkie Messenger : 10,000,000+
COMMUNICATION ES-1 : 500+
COMMUNICATION Hangouts Dialer - Call Phones : 10,000,000+
COMMUNICATION EU Council : 1,000+
COMMUNICATION Council Voting Calculator : 5,000+
COMMUNICATION Have your say on Europe : 500+
COMMUNICATION Programi podrške EU : 100+
COMMUNICATION Inbox.eu : 10,000+
COMMUNICATION Web Browser for Android : 1,000,000+
COMMUNICATION Everbridge : 100,000+
COMMUNICATION Best Auto Call Recorder Free : 500+
COMMUNICATION EZ Wifi Notification : 10,000+
COMMUNICATION Test

The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.)

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811 . The aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

In [37]:
final_dict = {}
for app in android_free_eng: 
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[1],app[0],":",app[5])
        final_dict[app[0]] = app[5]


    


    
        

    

BOOKS_AND_REFERENCE E-Book Read - Read Book for free : 50,000+
BOOKS_AND_REFERENCE Download free book with green book : 100,000+
BOOKS_AND_REFERENCE Wikipedia : 10,000,000+
BOOKS_AND_REFERENCE Cool Reader : 10,000,000+
BOOKS_AND_REFERENCE Free Panda Radio Music : 100,000+
BOOKS_AND_REFERENCE Book store : 1,000,000+
BOOKS_AND_REFERENCE FBReader: Favorite Book Reader : 10,000,000+
BOOKS_AND_REFERENCE English Grammar Complete Handbook : 500,000+
BOOKS_AND_REFERENCE Free Books - Spirit Fanfiction and Stories : 1,000,000+
BOOKS_AND_REFERENCE Google Play Books : 1,000,000,000+
BOOKS_AND_REFERENCE AlReader -any text book reader : 5,000,000+
BOOKS_AND_REFERENCE Offline English Dictionary : 100,000+
BOOKS_AND_REFERENCE Offline: English to Tagalog Dictionary : 500,000+
BOOKS_AND_REFERENCE FamilySearch Tree : 1,000,000+
BOOKS_AND_REFERENCE Cloud of Books : 1,000,000+
BOOKS_AND_REFERENCE Recipes of Prophetic Medicine for free : 500,000+
BOOKS_AND_REFERENCE ReadEra – free ebook reader : 1,000,000+


In [38]:
tmp = []
for k,v in final_dict.items():
    tmp.append((v,k))
    
    
print("Descending order of installs :",sorted(tmp[:50],reverse = True))

Descending order of installs : [('500,000+', 'Recipes of Prophetic Medicine for free'), ('500,000+', 'Only 30 days in English, the guideline is guaranteed'), ('500,000+', 'Offline: English to Tagalog Dictionary'), ('500,000+', 'Google I/O 2018'), ('500,000+', 'Flybook'), ('500,000+', 'English to Urdu Dictionary'), ('500,000+', 'English Persian Dictionary'), ('500,000+', 'English Grammar Complete Handbook'), ('500,000+', 'Azpen eReader'), ('50,000+', 'eBoox new: Reader for fb2 epub zip books'), ('50,000+', "SH-02J Owner's Manual (Android 8.0)"), ('50,000+', 'E-Book Read - Read Book for free'), ('50,000+', 'C Programs and Reference'), ('50,000+', 'C Programs Handbook'), ('5,000,000+', 'Read books online'), ('5,000,000+', 'Ebook Reader'), ('5,000,000+', 'Ancestry'), ('5,000,000+', 'AlReader -any text book reader'), ('5,000+', 'Learn R Programming Full'), ('100,000,000+', 'Bible'), ('100,000,000+', 'Amazon Kindle'), ('100,000+', 'URBANO V 02 instruction manual'), ('100,000+', 'Pdf Book Dow

### Conclusion
This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.


Building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.
Some gamification or peer discussion might be beneficial which can include quotes , character profiles/graphics etc 