# Profitable Apps in App Markets

This project is going to analyze the kinds of free apps that will generate the most in-app ad revenue. In essence, what kinds of free apps are popular among users?

If we are part of a team that designs free apps where our revenue comes from the in-app ads we show to users, we want to know what kinds of apps will give us the biggest return on investment. In-app ads depend on many users seeing the ads, so we need to know what kinds of free apps are the most popular among users of Google Play and the App Store.

## Taking a Preliminary Look at the Data

In the following code blocks we'll be taking a look at our data sets for both Google Play and the App Store. Both markets carry over 2 million apps, so we will be looking at a subset of both markets.

- [A data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

In [1]:
from csv import reader

### Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

Following is a function that allows us to explore the datasets easier. It allows us to see specific rows in the dataset.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


## Data Cleaning

In the following code blocks we will be cleaning our data as follows:
- Correct or remove inaccurate data
- Remove duplicate data
- Remove non-english apps
- Remove apps that aren't free

### We will start by removing inaccurate data

In [5]:
### The Google Play data set has an error in row 10472 ###
print(android_header)
print('\n')
print(android[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In the incorrect row, the rating is logged as '19' which doesn't make sense as the rating scale for an app is 1-5 on Google Play.

In [6]:
del android[10472]

### Now we will deal with duplicate data

We will only go through this process with the Google Play data set, since the Apple Store data set has no duplicates by using an ID system.

In [7]:
### Instagram shows up as a data point four times in Google Play! ###
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [8]:
### Let's find the total number of duplicates in the Google Play data set ###
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))

Number of duplicate apps:  1181


To remove the duplicate entries and keep the most accurate data points, we will remove duplicates based on the number of reviews. The fourth position in any row corresponds to the number of reviews: the row with the most reviews is the one we will keep. More reviews -> more accurate ratings.

In [9]:
### Here we're making a dictionary that keeps the highest number of reviews for each app ###
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


In [10]:
### Now we will use the dictionary we just made to make a new dataset that contains no duplicates ###
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### Here we will remove non-english apps

We are going to be taking advantage of the [ASCII system](https://en.wikipedia.org/wiki/ASCII) to filter app names for common English characters. From there, we can remove any app names that likely are not English.

In [11]:
### Here we are writing a function that will tell us if a string has 3 or more characters that fall outside the ASCII range for common English characters ###
def english_check(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii +=1
    if non_ascii > 3:
        return False
    else:
        return True

In [12]:
### Now we will filter our data sets using the function we just created ###
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if english_check(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if english_check(name):
        ios_english.append(app)

print('Google Play')
print('\n')
explore_data(android_english, 0, 3, True)
print('\n')
print('Apple Store')
print('\n')
explore_data(ios_english, 0, 3, True)

Google Play


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


Apple Store


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.

### Finally, we will isolate free apps

In [13]:
android_final = []
ios_final = []

for app in android_english:
    type = app[6]
    if type == 'Free':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8863
3222


## Genre Filtering

Our end goal is to make an app for both Google Play and the Apple store that is profitable in both markets. The profitability of an app is dependant on how many users we attract. Our validation strategy is as follows:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

To start, we want to get an idea of what the most common genres are for each market.

In [14]:
### Here we are writing two functions. One will make a frequency table that shows percentages and the other will show percentages in a descending order ###
def freq_table(dataset, index):
    table = {}
    total = 0
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

First we will start by looking at the table for the `prime_genre` column in the App Store data set.

In [15]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Here we find that over half of the free English apps are games. While this genre may have the most apps, this doesn't mean that any other categories have a lower demand or user count.

Next we'll take a look at the `Category` column of the Google Play data set.

In [16]:
display_table(android_final, 1)

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

Google Play doesn't seem to have the same kind of genre domination that we saw in the App Store, though the `Family` category is mostly made up of games for kids. Here we are seeing a bit more balance among apps designed for practical purposes.

## Most Popular Apps by Genre

While we have filtered each market by the number of apps in each genre, we want to know what kinds of apps are the most popular. To figure this out, we are going the calculate the average number of installs for each app genre.

- The Google Play data set will be analyzed using the `Installs` column
- The App Store data set is missing the number of installs, so we will use the total number of user ratings as a proxy: the `rating_count_tot` column.

We'll start with the App Store.

In [17]:
ios_genres = freq_table(ios_final, -5)

for genre in ios_genres:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ":", avg_n_ratings)

Games : 22788.6696905016
Food & Drink : 33333.92307692308
Weather : 52279.892857142855
Education : 7003.983050847458
Navigation : 86090.33333333333
Business : 7491.117647058823
Catalogs : 4004.0
News : 21248.023255813954
Music : 57326.530303030304
Utilities : 18684.456790123455
Book : 39758.5
Entertainment : 14029.830708661417
Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Finance : 31467.944444444445
Health & Fitness : 23298.015384615384
Travel : 28243.8
Reference : 74942.11111111111
Sports : 23008.898550724636
Shopping : 26919.690476190477
Lifestyle : 16485.764705882353
Productivity : 21028.410714285714
Medical : 612.0


In the app store we see that navigation apps have the highest number of reviews, though this number is affected by big name apps like Waze and Google Maps:

In [18]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ":", app[5]) #the name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


We see the same problem occur in social networking and music. Excluding those three categories, the largest category would be reference, which includes things like dictionaries and a guide to Pokémon GO.

In [19]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ":", app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Making a reference app would likely show the most promise, since the Bible and Dictionary.com skew the number of users upward in this category. This section is a bit of a niche, and we would not need to find a domain specialist to make an app about a popular book.

Next we will take a look at Google Play.

Due to the lack of precision in the `Install` values, we will assume an app with 500+ downloads has 500 downloads, and app with 50,000,000+ downloads has 50,000,000 downloads and so on. We will still get a general idea of the popularity of each app using this method.

The first problem we come across is that the `Installs` category is open-ended and the formatting of the values make it difficult to turn the strings into floats. So we will start by cleaning that up.

In [20]:
display_table(android_final, 5)

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


In [21]:
android_categories = freq_table(android_final, 1)

for category in android_categories:
    total = 0
    len_category = 0
    for app in android_final:
        n_installs = app[5]
        n_installs = n_installs.replace(",","")
        n_installs = n_installs.replace("+","")
        total += float(n_installs)
        len_category += 1
    avg_n_installs = total /len_category
    print(category, ':', avg_n_installs)

EDUCATION : 8490471.774229946
DATING : 8490471.774229946
TOOLS : 8490471.774229946
ART_AND_DESIGN : 8490471.774229946
PERSONALIZATION : 8490471.774229946
FAMILY : 8490471.774229946
BOOKS_AND_REFERENCE : 8490471.774229946
AUTO_AND_VEHICLES : 8490471.774229946
FOOD_AND_DRINK : 8490471.774229946
LIBRARIES_AND_DEMO : 8490471.774229946
NEWS_AND_MAGAZINES : 8490471.774229946
PRODUCTIVITY : 8490471.774229946
HOUSE_AND_HOME : 8490471.774229946
BUSINESS : 8490471.774229946
FINANCE : 8490471.774229946
LIFESTYLE : 8490471.774229946
COMICS : 8490471.774229946
EVENTS : 8490471.774229946
MEDICAL : 8490471.774229946
HEALTH_AND_FITNESS : 8490471.774229946
BEAUTY : 8490471.774229946
WEATHER : 8490471.774229946
SOCIAL : 8490471.774229946
ENTERTAINMENT : 8490471.774229946
SPORTS : 8490471.774229946
GAME : 8490471.774229946
PARENTING : 8490471.774229946
VIDEO_PLAYERS : 8490471.774229946
COMMUNICATION : 8490471.774229946
PHOTOGRAPHY : 8490471.774229946
MAPS_AND_NAVIGATION : 8490471.774229946
SHOPPING : 849

Similar to the dillemma we found in the Apple Store, we have a few categories in Google Play that are heavily skewed by a few big apps. Here's the communication apps as an example:

In [22]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Other skewed categories include video players, social apps, and productivity apps. If we exclude these skewed categories, the next largest category would be games, but we may want to avoid this category as it is already quite saturated.

We can take a look at our recommendation for the App Store in Google Play and see if this category show some potential:

In [23]:
for app in android_final:
    if app[1] =='BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

This category still seems to have the skewed popularity problem due to a few more popular apps:

In [24]:
for app in android_final:
    if app[1] =='BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


This category still shows some potential since there are quite a few apps that are in the middle when it comes to user downloads:

In [25]:
for app in android_final:
    if app[1] =='BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

Here we can see plenty of software for processing and reading ebooks, libraries, and dictionaries so it may be a good idea to avoid building an app with the same functionality.

Looking at some of the book titles listed, it is apparent that taking a more recent and very popular book title and turning it into an app could br profitable for both markets.

Since the market has many libraries, our app will likely need some additional features other than the raw material itself, such as: daily quotes, an audio version, quizzes, a forum, etc.

# Conclusions

In this project, we looked at both Google Play and the App Store to recommend a type of app that is profitable in both markets.

We recommend turning a popular and more recent book into an app, as books are more of a niche category in both markets. We also recommend adding additional features to compete with the library apps  already on the market such as: daily quotes, an audio version, quizzes, a forum, etc. 