# Analyzing Mobile App Data

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. The number of users of our apps determines our revenue for any given app. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

## Opening and Exploring the Data

In [78]:
from csv import reader
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

### The App Store Dataset ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
apple = list(read_file)
apple_header = apple[0]
apple = apple[1:]

print('App Store Dataset')
print(apple_header)
explore_data(apple, 0, 3, True)

### The Google Play Dataset ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
google = list(read_file)
google_header = google[0]
google = google[1:]

print('\n')
print('Google Play Dataset')
print(google_header)
explore_data(google, 0, 3, True)

del google[10472]
print(google[10472])

App Store Dataset
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Google Play Dataset
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free',

## Removing Duplicate Entries and Deleting Wrong Data
The Google Play dataset has duplicate entries. A few duplicate rows are printed to confirm.

In [79]:
duplicate_apps_google = []
unique_apps_google = []

for app in google:
    name = app[0]
    if name in unique_apps_google:
        duplicate_apps_google.append(name)
    else:
        unique_apps_google.append(name)
        
print('Number of duplicate Google apps:', len(duplicate_apps_google))
print('\n')
print('Examples of duplicate Google apps:', duplicate_apps_google[:])

Number of duplicate Google apps: 1181


Examples of duplicate Google apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express', 'Accounting App - Zoho Books', 'Invoice & Time Tracking - Zoho', 'join.me - Simple Meetings', 'Invoice 2go ‚Äî Professional Invoices and Estimates', 'SignEasy | Sign and Fill PDF and other Documents', 'Quick PDF Scanner + OCR FREE', 'Genius Scan - PDF Scanner', 'Tiny Scanner - PDF Scanner App', 'Fast Scanner : Free PDF Scan', 'Mobile Doc Scanner (MDScan) Lite', 'TurboScan: scan documents and receipts in PDF', 'Tiny Scanner Pro: PDF Doc Scan

Checking for duplicates for the App Store dataset.

In [80]:
duplicate_apps_apple = []
unique_apps_apple = []

for app in apple:
    name = app[1]
    if name in unique_apps_apple:
        duplicate_apps_apple.append(name)
    else:
        unique_apps_apple.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps_apple))
print('\n')
print('Examples of duplicate apps:', duplicate_apps_apple[:])

Number of duplicate apps: 2


Examples of duplicate apps: ['Mannequin Challenge', 'VR Roller Coaster']


The criterion used to remove duplicates will be to keep the most recent review with the highest number of reviews(number of reviews), and remove all previous duplicates for the Google Play and App Store. 

In [81]:
reviews_max_google = {}

for app in google:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max_google and reviews_max_google[name] < n_reviews:
        reviews_max_google[name] = n_reviews
    elif name not in reviews_max_google:
        reviews_max_google[name] = n_reviews 

print('Expected length:', len(google) - len(duplicate_apps_google))
print('Actual length of reviews_max:', len(reviews_max_google))

Expected length: 9659
Actual length of reviews_max: 9659


In [82]:
reviews_max_apple = {}

for app in apple:
    name = app[1]
    n_reviews = float(app[5])
    
    if name in reviews_max_apple and reviews_max_apple[name] < n_reviews:
        reviews_max_apple[name] = n_reviews
    elif name not in reviews_max_apple:
        reviews_max_apple[name] = n_reviews 

print('Expected length:', len(apple) - len(duplicate_apps_apple))
print('Actual length of reviews_max:', len(reviews_max_apple))

Expected length: 7195
Actual length of reviews_max: 7195


Using the dictionary reviews_max that was created above by data cleaning and removing the duplicate rows.

In [83]:
google_clean = []
already_added = []

for app in google:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max_google[name] == n_reviews) and (name not in already_added):
            google_clean.append(app)
            already_added.append(name)
            
explore_data(google_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


In [84]:
apple_clean = []
already_added_apple = []

for app in apple:
    name = app[1]
    n_reviews = float(app[5])
    
    if (reviews_max_apple[name] == n_reviews) and (name not in already_added_apple):
            apple_clean.append(app)
            already_added_apple.append(name)
            
explore_data(apple_clean, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7195
Number of columns: 16


## Removing Non-English Apps
Creating a function that takes in a string and returns False if there's any character in the string that doesn't belong to the set of common English characters and True otherwise to remove any apps that are not English-based. The filter function that is used is that the app will be removed from the database, if the app has more than 3 emoji or special characters(this filter function is not completely perfect, but fairly effective).

In [85]:
def is_english(string):
    sum = 0
    for character in string:
        if ord(character) > 127:
           sum += 1
    if sum > 3:
        return False
    else:
        return True        
        
print(is_english('Instagram'))
print(is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))
print(is_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_english('Instachat üòú'))

True
False
True
True


Examining the App Store and the Google Play datasets, if an app is identified as English, append to English list and explore how many rows are remaining for each English dataset.

In [86]:
google_english = []
apple_english = []
     
for app in google_clean:
    name = app[0]
    if is_english(name):
        google_english.append(app)

for app in apple_clean:
    name = app[1]
    if is_english(name):
        apple_english.append(app)
        
explore_data(google_english, 0, 3, True)
print('\n')
explore_data(apple_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

## Isolating the Free Apps
We only consider apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps, we'll isolate only the free apps and then perform our analysis.

In [87]:
apple_free = []
google_free = []

for app in apple_english:
    name = app[1]
    price = app[4]
    if (price == '0.0') and (name not in apple_free):
        apple_free.append(name)
        
print('The number of free apps in the App Store are:', len(apple_free))

for app in google_english:
    name = app[0]
    price = app[7]
    if (price == '0') and (name not in google_free):
        google_free.append(name)
        
print('The number of free apps in Google Play are:', len(google_free))
print('\nWe have 3220 Apple and 8064 Google apps, which should be enough to continue our analysis.')

The number of free apps in the App Store are: 3220
The number of free apps in Google Play are: 8864

We have 3220 Apple and 8064 Google apps, which should be enough to continue our analysis.


## Most Common Apps by Genre
Our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app has the following three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Our end goal is to add the app on both Google Play and the App Store, and we need to find app profiles that are successful in both markets.

To begin our analysis, we determine the most common genres for each market by building frequency tables for our datasets and organizing the indexed column in descending order.

In [88]:
apple_free_genres = []
google_free_genres = []

for app in apple_english:
    name = app[1]
    genre = app[11]
    if (genre not in apple_free_genres):
        apple_free_genres.append(genre)
        
print('The genres of free apps in the App Store are:', apple_free_genres)

for app in google_english:
    name = app[0]
    genre = app[9]
    if (genre not in google_free_genres):
        google_free_genres.append(genre)
        
print('\nThe genres of free apps in Google Play are:', google_free_genres)
print('\n')

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print('A sorted frequency table for the columns prime_genre for the App Store.') 
display_table(apple_english, 11)
print('\nA sorted frequency table for the categories and genres of the apps in Google Play.')
display_table(google_english, 1)
print('\n')
display_table(google_english, 9)

The genres of free apps in the App Store are: ['Social Networking', 'Photo & Video', 'Games', 'Music', 'Reference', 'Health & Fitness', 'Weather', 'Utilities', 'Travel', 'Shopping', 'News', 'Navigation', 'Lifestyle', 'Entertainment', 'Food & Drink', 'Sports', 'Book', 'Finance', 'Education', 'Productivity', 'Business', 'Catalogs', 'Medical']

The genres of free apps in Google Play are: ['Art & Design', 'Art & Design;Creativity', 'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business', 'Comics', 'Comics;Creativity', 'Communication', 'Dating', 'Education', 'Education;Creativity', 'Education;Education', 'Education;Pretend Play', 'Education;Brain Games', 'Entertainment', 'Entertainment;Brain Games', 'Entertainment;Creativity', 'Entertainment;Music & Video', 'Events', 'Finance', 'Food & Drink', 'Health & Fitness', 'House & Home', 'Libraries & Demo', 'Lifestyle', 'Lifestyle;Pretend Play', 'Card', 'Arcade', 'Puzzle', 'Racing', 'Sports', 'Casual', 'Simulation', 'Adventure', 'Trivia', 'Acti

## Analyzing the frequency table for the most common genres in the App Store and Google Play datasets

Upon analyzing the sorted frequency table, the most common genre were games, entertainment, education, photo & video, and utilities for the App Store dataset. These 5 categories made up around 75% or most of all English free apps. When examining the top 10 frequent categories of free apps: 6 of 10 were for entertainment(games, entertainment, photo & video, music, social networking, and sports) comprising 73.56%, while the other 4 were for practical purposes(education, utilities, productivity, and health & fitness) making up approximately 15.47%. There were significantly more apps for entertainment than practical purposes, with games making up over half of all free apps available with 54.86%. It is noted that fun apps are the most numerous, however, they do not have the greatest number of users - the demand is not the same as the offerings.

For the category column of the Google Play dataset, the top 5 categories were: family, games, tools, business, and medical, while the top 5 categories for genres were: tools, entertainment, education, business, and medical. This time the top 5 categories made up around 46% or nearly half of all English free apps and the top 5 genres made up only 28%. There were 11 categories with at least 3% usage for both the categories and genres. Of the top 10 apps used, there are not that many apps used for entertainment, and it seems more of the apps are designed for practical purposes(family, tools, business, lifestyle, productivity, etc.). Upon further investigation, the family category which accounts for almost 19% of the apps, is mainly games for kids.

Even so, practical apps seem to have a better representation on Google Play compared to the App Store. The difference between the genres and the category columns is not quite clear, but the genres column is more granular(it has more categories). We are interested in the bigger picture at the moment, so we'll work with the category column for Google Play moving forward. 

At this point, we found that the App Store is mostly apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps.

## Most popular apps on the App Store by average number of installs per app genre

One way to find out which genres are the most popular(have the most users) is to calculate the average number of installs for each app genre. For the Google Play dataset, we can find this information in ths Installs column, but this information is missing for the App Store dataset. As a workaround, we'll take the total number of user ratings instead, which we can find in the rating_count_tot app.

In [89]:
genres_apple = freq_table(apple_english, 11)

for genre in genres_apple:
    total = 0
    len_genre = 0
    for app in apple_english:
        genre_app = app[11]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 60253.84920634921
Photo & Video : 14688.715542521993
Games : 15595.90442477876
Music : 29047.109489051094
Reference : 27037.188679245282
Health & Fitness : 10802.157575757576
Weather : 23145.246376811596
Utilities : 7927.525821596244
Travel : 19030.183333333334
Shopping : 26635.011764705883
News : 16980.315789473683
Navigation : 19370.821428571428
Lifestyle : 8930.373737373737
Entertainment : 8862.409799554565
Food & Drink : 19934.386363636364
Sports : 15350.913461538461
Book : 10359.2
Finance : 23353.530612244896
Education : 2472.278048780488
Productivity : 8508.089285714286
Business : 5149.320754716981
Catalogs : 3465.0
Medical : 648.952380952381


On average, navigation/reference apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million reviews together. 

In [90]:
for app in apple_english:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings


Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
MotionX GPS : 14970
Geocaching¬Æ : 12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
Gaia GPS Classic : 2429
Plane Finder - Flight Tracker : 1438
iMaps+ for Google Maps ‚Ñ¢ and Street View ‚Ñ¢ : Transit and Offline Contacts : 1225
NAVIGON Europe : 927
Localscope - Find places and people around you : 868
Ski Tracks : 829
TRANSPORT MODS for MINECRAFT Pc EDITION : 754
Pocket Earth PRO Offline Maps & Travel Guides : 748
Ship Finder : 624
Boating USA : 342
Maps 3D PRO - GPS for Bike, Hike, Ski & Outdoor : 280
Cachly - Simple and powerful Geocaching for iPhone : 263
ImmobilienScout24: Real Estate Search in Germany : 187
The JMU Bus App : 35
Avertinoo : 32
iStellar : 30
mySTATE - State College : 26
Road watcher: dash camera, car video recorder. : 10
Streets ‚Äì Street View Browser : 10
Railway Route Search : 5
parkOmator ‚Äì for Apple Watch meter expiration timer, notifications & GPS 

The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. We could get a better picture by removing these extremely popular apps for each genre and then rework the averages.

## Most popular apps on Google Play by average number of installs per app genre

In [91]:
categories_google = freq_table(google_english, 1)

for category in categories_google:
    total = 0
    len_category = 0
    for app in google_english:
        category_app = app[1]
        if category_app == category: 
            n_installs = app[5]
            n_installs = n_installs.replace(',','')
            n_installs = n_installs.replace('+','')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)  

ART_AND_DESIGN : 1887285.0
AUTO_AND_VEHICLES : 632501.3214285715
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 7641777.871559633
BUSINESS : 1663758.627684964
COMICS : 817657.2727272727
COMMUNICATION : 35153714.17515924
DATING : 828971.2176470588
EDUCATION : 1782566.0377358492
ENTERTAINMENT : 11375402.298850575
EVENTS : 249580.640625
FINANCE : 1319851.4028985507
FOOD_AND_DRINK : 1891060.2767857143
HEALTH_AND_FITNESS : 3972300.388888889
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 630903.6904761905
LIFESTYLE : 1369954.7774725275
GAME : 14256217.600635594
FAMILY : 3345018.516684607
MEDICAL : 96944.49873417722
SOCIAL : 22961790.384937238
SHOPPING : 6966908.880597015
PHOTOGRAPHY : 16636241.267857144
SPORTS : 3373767.6861538463
TRAVEL_AND_LOCAL : 13218662.767123288
TOOLS : 9785955.211352658
PERSONALIZATION : 4086652.4853333333
PRODUCTIVITY : 15530942.008042896
PARENTING : 525351.8333333334
WEATHER : 4570892.658227848
VIDEO_PLAYERS : 24121489.079754602
NEWS_AND_MAGAZINES : 947

On average, communication apps have the most installs with 35,153,714. This number is heavily skewed up by a few apps that have over one billion installations(WhatsApp, Facebook, Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs. 

In [92]:
for app in google_english:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Me

If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times from 35,135,714 to 3,269,220: 

In [93]:
under100_m = []

for app in google_english:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under100_m.append(float(n_installs))
        
sum(under100_m) / len(under100_m)

3269220.386759582

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [94]:
for app in google_english:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [95]:
for app in google_english:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad üìñ Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [96]:
for app in google_english:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

## Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.