# Profitable App Profiles for the App Store and Google Play Markets

We are working as data analyst for softwarehouse which wants to create a profitable, free mobile application for App Store and Google Play Markets. Main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app.

Our goal for this project is to analyze data to help developers understand what kinds of apps are likely to attract more users.

Research will be based on two datasets from kaggle.com

- [Data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately ten thousand Android apps from Google Play.
- [Data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately seven thousand iOS apps from the App Store.



In [183]:
from csv import reader

open_file = open('./data/AppleStore.csv')
read_file = reader(open_file)
apple_store_list = list(read_file)
apple_store_header = apple_store_list[0]
apple_store_data = apple_store_list[1:]

open_file = open('./data/GooglePlayStore.csv')
read_file = reader(open_file)
google_store_list = list(read_file)
google_store_header = google_store_list[0]
google_store_data = google_store_list[1:] 

def explore_data(dataset: list, start: int, end: int, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row, '\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


In [184]:
print('Apple Store Data: \n')
print(apple_store_header, '\n')

explore_data(dataset=apple_store_data, start=0, end=5, rows_and_columns=True)

Apple Store Data: 

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1'] 

['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1'] 

['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1'] 

['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1'] 

['5', '282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+

In [185]:
print('Google Play Store Data: \n')
print(google_store_header, '\n')

explore_data(dataset=google_store_data, start=0, end=5, rows_and_columns=True)

Google Play Store Data: 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 

['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '10

## Data Cleaning

As part of data cleanng proccess we want to:
- remove non-english apps
- remove apps that are not free
- remove or fix incorrect app data - remove rows with incorrect number of columns
- remove duplicates

### Track and remove rows with incorrect number of columns

In [186]:
def validate_cols(dataset: list, header: list):
    for i, row in enumerate(dataset):
        if len(row) != len(header):
            print('data: ', row)
            print('length: ', len(row))
            print('index: ', i)

print('Apple Store:')
validate_cols(apple_store_data, apple_store_header)

print('Google Play Store:')
validate_cols(google_store_data, google_store_header)

Apple Store:
Google Play Store:
data:  ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
length:  12
index:  10472


We found only one invalid row. We will simply delete it.

In [187]:
del(google_store_data[10472])
validate_cols(google_store_data, google_store_header)

### Find and remove duplicates



In [188]:
def find_duplicates(dataset: list, name_index: int):
    unique_apps = []
    duplicate_apps = []

    for row in dataset:
        if row[name_index] in unique_apps:
            duplicate_apps.append(row[name_index])
        else:
            unique_apps.append(row[name_index])

    return duplicate_apps

google_store_duplicates = find_duplicates(google_store_data, 0)

print('Google:')
print('Number of duplicates:', len(google_store_duplicates))
print('Sample:', google_store_duplicates[:5])

apple_store_duplicates = find_duplicates(apple_store_data, 1)

print('Apple:')
print('Number of duplicates:', len(apple_store_duplicates))

Google:
Number of duplicates: 1181
Sample: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']
Apple:
Number of duplicates: 0


There is a lot of duplicates in google store dataset. Will list some below:

In [189]:
for app in google_store_data:
    if app[0] == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


The only difference is number in fourth cell, which is number of reviews. That means that data was scraped multiple times. We want preserve only most recent data which is indicated by highest number of reviews given.

In [190]:
reviews_max = {}

for app in google_store_data:
    name = app[0]
    n_reviews = float(app[3])

    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

len(reviews_max)        

9659

In [191]:
android_clean = [] # Stores data without duplicates.
already_added = [] # Stores names of already added apps.

for app in google_store_data:
    name = app[0]
    n_reviews = float(app[3])

    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

len(android_clean)

9659

### Removing non-english apps

Having data cleaned from duplicates we will continue with next step of data cleaning proccess - removing non-english apps. For this we will simply use ASCII range (0 - 127). I have decided to allow maximum of 3 non-ASCII signs so that apps with emojis in title will not be filtered out.

In [192]:
def is_english(name):
    n_invalid_chars = 0
    
    for letter in name:
        if ord(letter) > 127: # ASCII range
            n_invalid_chars += 1

    if n_invalid_chars > 3: # 3 non-ASCII chars threshold
        return False

    return True

is_english('Instachat')

True

In [193]:
google_english = []
apple_english = [] 

for app in android_clean:
    if is_english(app[0]): # 0 is app name index
        google_english.append(app)

for app in apple_store_data:
    if is_english(app[2]): # 2 is app name index
        apple_english.append(app)

print(len(google_english))
print(len(apple_english))


9614
6183


### Filter out non-free apps

In [194]:
google_price_index = 6
apple_price_index = 5

google_free_apps = [app for app in google_english if app[google_price_index] == 'Free']
apple_free_apps = [app for app in apple_english if float(app[apple_price_index]) == 0]

print(len(google_free_apps))
print(len(apple_free_apps))


8863
3222


So far, we spent a good amount of time on cleaning data, and:

Removed inaccurate data
Removed duplicate app entries
Removed non-English apps
Isolated the free apps
As I mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

## Analysis

Let's begin the analysis by getting a sense of what are the most common genres for each market.

In [195]:
google_genre_index = [1, 9]
apple_genre_index = 12

def freq_table(dataset: list, index: int):
    table = {}
    table_percentage = {}
    for app in dataset:
        if app[index] in table:
            table[app[index]] += 1
        else:
            table[app[index]] = 1

    for genre in table:
        table_percentage[genre] = round(table[genre] / len(dataset) * 100, 2)
    
    return table_percentage

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print('Google by', google_store_header[google_genre_index[0]], '\n')
display_table(google_free_apps, google_genre_index[0])

Google by Category 

FAMILY : 18.9
GAME : 9.73
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


In [196]:
print('Google by', google_store_header[google_genre_index[1]], '\n')
display_table(google_free_apps, google_genre_index[1])

Google by Genres 

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.9
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Vi

In [197]:
print('Apple by', apple_store_header[apple_genre_index], '\n')
display_table(apple_free_apps, apple_genre_index)

Apple by prime_genre 

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


The frequency tables we analyzed showed that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and fun apps. Now, I'd like to get an idea about the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, I can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, I'll take the total number of user ratings as a proxy, which I can find in the rating_count_tot app.

In [198]:
prime_genres = freq_table(apple_free_apps, apple_genre_index)

for genre in prime_genres:
    total = 0
    len_genre = 0

    for app in apple_free_apps:
        genre_app = app[apple_genre_index]

        if genre_app == genre:
            rating = float(app[6]) #index of rating_count_tot
            total += rating
            len_genre += 1

    user_rating_avg = round(total / len_genre, 2)
    user_rating_avg = "{:,}".format(user_rating_avg)
    print(genre, user_rating_avg)

Productivity 21,028.41
Weather 52,279.89
Shopping 26,919.69
Reference 74,942.11
Finance 31,467.94
Music 57,326.53
Utilities 18,684.46
Travel 28,243.8
Social Networking 71,548.35
Sports 23,008.9
Health & Fitness 23,298.02
Games 22,788.67
Food & Drink 33,333.92
News 21,248.02
Book 39,758.5
Photo & Video 28,441.54
Entertainment 14,029.83
Business 7,491.12
Lifestyle 16,485.76
Education 7,003.98
Navigation 86,090.33
Medical 612.0
Catalogs 4,004.0


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [199]:
for app in apple_free_apps:
    if app[apple_genre_index] == 'Navigation':
        print(app[2], ':', "{:,}".format(int(app[6]))) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345,046
Geocaching® : 12,811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3,582
Google Maps - Navigation & Transit : 154,911


The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold.

Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:

In [200]:
for app in apple_free_apps:
    if app[apple_genre_index] == 'Reference':
        print(app[2], ':', app[6]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

Now let's analyze the Google Play market a bit.

Most Popular Apps by Genre on Google Play
For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.)

In [201]:
display_table(google_free_apps, 5) # 5 is Installs index

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05


One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [202]:
category_freq = freq_table(google_free_apps, google_genre_index[0])

for category in category_freq: # Iterate over unique categories
    total = 0
    len_category = 0

    for app in google_free_apps:
        category_app = app[google_genre_index[0]]
        if category_app == category:
            installs = float(app[5].replace('+', '').replace(',', ''))
            total += installs
            len_category += 1

    installs_avg = round(total / len_category, 2)
    formated_installs = "{:,}".format(installs_avg)
    print(category, formated_installs)


ART_AND_DESIGN 1,986,335.09
AUTO_AND_VEHICLES 647,317.82
BEAUTY 513,151.89
BOOKS_AND_REFERENCE 8,767,811.89
BUSINESS 1,712,290.15
COMICS 817,657.27
COMMUNICATION 38,456,119.17
DATING 854,028.83
EDUCATION 1,833,495.15
ENTERTAINMENT 11,640,705.88
EVENTS 253,542.22
FINANCE 1,387,692.48
FOOD_AND_DRINK 1,924,897.74
HEALTH_AND_FITNESS 4,188,821.99
HOUSE_AND_HOME 1,331,540.56
LIBRARIES_AND_DEMO 638,503.73
LIFESTYLE 1,437,816.27
GAME 15,588,015.6
FAMILY 3,697,848.17
MEDICAL 120,550.62
SOCIAL 23,253,652.13
SHOPPING 7,036,877.31
PHOTOGRAPHY 17,840,110.4
SPORTS 3,638,640.14
TRAVEL_AND_LOCAL 13,984,077.71
TOOLS 10,801,391.3
PERSONALIZATION 5,201,482.61
PRODUCTIVITY 16,787,331.34
PARENTING 542,603.62
WEATHER 5,074,486.2
VIDEO_PLAYERS 24,727,872.45
NEWS_AND_MAGAZINES 9,549,178.47
MAPS_AND_NAVIGATION 4,056,941.77


Best looking categories:

- COMMUNICATION 38,456,119.17
- VIDEO_PLAYERS 24,727,872.45
- SOCIAL 23,253,652.13
- PHOTOGRAPHY 17,840,110.4
- PRODUCTIVITY 16,787,331.34

In [203]:
for app in google_free_apps:
    if app[google_genre_index[0]] == 'PRODUCTIVITY' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], ':', app[5]) # print name and number of ratings

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+


In [204]:
under_100_m = []

for app in google_free_apps:
    installs = float(app[5].replace('+', '').replace(',', ''))
    if (app[1] == 'COMMUNICATION') and (float(installs) < 100000000):
        under_100_m.append(float(installs))
        
installs_avg = sum(under_100_m) / len(under_100_m)
formated_installs_avg = "{:,}".format(installs_avg)

print(formated_installs_avg)

3,603,485.3884615386


We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [205]:
for app in google_free_apps:
    if app[google_genre_index[0]] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:



In [206]:
for app in google_free_apps:
    if app[google_genre_index[0]] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], ':', app[5]) # print name and number of ratings

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [207]:
for app in google_free_apps:
    if app[google_genre_index[0]] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

## Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.