# Data Analysis of Andorid and iOS Mobile Apps

This project is focused on doing data analysis for a hypothetical company that makes mobile apps. These apps provide revenue for the company via the adds that users load. In summary the more users the company has the more revenue it gains from them through the adds in the apps. Our goal for this project is to do some data analysis on these apps to find a way the company could bring more users into using their apps.

In [143]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

Writing the explore_data set function to explore only a few csv entries at a time and explore how big this dataset is.

In [144]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Since each dataset has thousands of entries we only want to take a peak at the data to try and get a sense of what we are dealing with. The explore_data function will be used to print out the first 3 rows of each dataset below.

In [145]:
explore_data(android, 0, 3, rows_and_columns=True)
explore_data(ios, 0, 3, rows_and_columns=True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+',

### Finding useful columns for data analysis:

Below we have printed what each of the columns mean for both the android and ios datasets. Some of the column names like "ipadSc_urls.num" are a bit ambiguous to the user. To help with this a link to the datasets on Kaggle have been included below.

Android: https://www.kaggle.com/lava18/google-play-store-apps
<br>
iOS: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

In [146]:
print(android_header)
print('\n')
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## Data Cleaning:

When performing data analysis on any dataset you must first do your due diligence to ensure the data is clean. Clean means the dataset does not contain duplicates, errors, incorrectly inputed data, missing data and data that satifies all the requirements for what you are doing the data analysis on. For this project it only doing analysis on free English apps since this company only makes free apps within the United States. So below are some of the steps to be done:

- Check for and remove apps with missing data
- Check for and remove duplicates
- Remove non-English apps
- Remove apps that are not free

### Missing Data

Removal of missing data. Android Only

In [147]:
def find_missing(csv_file, csv_header):
    for row in csv_file:
        if len(row) != len(csv_header):
            print(row)
            print(csv_file.index(row))

find_missing(android, android_header)
find_missing(ios, ios_header)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


In [148]:
explore_data(android, 10471, 10473)
del(android[10472])
explore_data(android, 10471, 10473)

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




### Duplicate Entries

Here the data set is investigated to determine if it has duplicate entries. A function is written to show that the data does in fact have duplicate entries. These entries will be dealt with on the basis of comparing the number of reviews for each app. The app entries were collected at various times, so assuming they are the same, the one with the more reviews will be assumed to be the latest version of that app.
<br>

In [149]:
def find_duplicate(csv_file, csv_header, name_col):
    duplicate_apps = []
    unique_apps = []
    for row in csv_file:
        name = row[name_col]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
    return unique_apps, duplicate_apps
            
unique_android_apps, duplicate_android_apps = find_duplicate(android, android_header, 0)
print('Number of Android duplicate apps:', len(duplicate_android_apps))
print('Examples of Android duplicate apps:', duplicate_android_apps[:5])
print('\n')

unique_ios_apps, duplicate_ios_apps = find_duplicate(ios, ios_header, 1)
print('Number of iOS duplicate apps:', len(duplicate_ios_apps))
print('Examples of iOS duplicate apps:', duplicate_ios_apps[:5])

Number of Android duplicate apps: 1181
Examples of Android duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


Number of iOS duplicate apps: 2
Examples of iOS duplicate apps: ['Mannequin Challenge', 'VR Roller Coaster']


In [150]:
def remove_duplicates(csv_file, name_col, n_reviews_col):
    reviews_max = {}
    clean_list = []
    already_added = []
    
    for row in csv_file:
        name = row[name_col]
        n_reviews = row[n_reviews_col]
        
        if name in reviews_max and reviews_max[name] < n_reviews:
            reviews_max[name] = n_reviews
        elif name not in reviews_max:
            reviews_max[name] = n_reviews
            
    for row in csv_file:
        name = row[name_col]
        n_reviews = row[n_reviews_col]
        
        if (reviews_max[name] == n_reviews) and (name not in already_added):
            clean_list.append(row)
            already_added.append(name)
    return clean_list


android_clean = remove_duplicates(android, 0, 3)
ios_clean = remove_duplicates(ios, 1, 5)

print('Expected length of cleaned android:', len(android) - len(duplicate_android_apps))
print('Actual length of cleaned android:', len(android_clean))
print('\n')
print('Expected length of cleaned ios:', len(ios) - len(duplicate_ios_apps))
print('Actual length of cleaned ios:', len(ios_clean))
        

Expected length of cleaned android: 9659
Actual length of cleaned android: 9659


Expected length of cleaned ios: 7195
Actual length of cleaned ios: 7195


### Removing Non-English Apps

Above we can see that the last app listed in the cleaned iOS apps has one listed that is clearly not in English. Both 'cleaned' lists now need to have all non-English speaking apps removed from each dataset.

In [151]:
def check_english(mystring):
    mystring = list(mystring)
    non_english_count = 0
    for char in mystring:
        if ord(char) > 127:
            non_english_count +=1
        if non_english_count > 3:
            return False
    return True

In [152]:
def remove_non_english(csv_file, name_col):
    english_list = []
    for row in csv_file:
        name = row[name_col]
        if check_english(name):
            english_list.append(row)
    return english_list

android_english = remove_non_english(android_clean, 0)
ios_english = remove_non_english(ios, 1)

print('Previous Android length:', len(android_clean))
print('Number of Non-English Apps:', len(android_clean) - len(android_english))
print('New English Android length:', len(android_english))
print('\n')
print('Previous iOS length:', len(ios_clean))
print('Number of Non-English Apps:', len(ios_clean) - len(ios_english))
print('New English iOS length:', len(ios_english))

Previous Android length: 9659
Number of Non-English Apps: 45
New English Android length: 9614


Previous iOS length: 7195
Number of Non-English Apps: 1012
New English iOS length: 6183


### Remove Apps That Are Not Free

Finally in the data cleaning process, apps that are not free must be removed. The hypothetical company only creates apps that are free. So any analysis done on paid apps will not be of use for us during this project.

In [153]:
def remove_paid_apps(csv_file, price_col):
    freeapps = []
    for row in csv_file:
        price = row[price_col]
        if price == '0' or price == '0.0':
            freeapps.append(row)
    return freeapps

android_free = remove_paid_apps(android_english, 7)
ios_free = remove_paid_apps(ios_english, 4)
print('Number of free android apps:', len(android_free))
print('Number of free iOS apps:', len(ios_free))

Number of free android apps: 8862
Number of free iOS apps: 3222


## Data Analysis
### Most Common Apps By Genre

Now that the data is cleaned we can move on to the analysis to determine what app could be suggested to the company. The aim for us is to provide the company with an app that minimizes risk to the company while still pursuing the most likely app to become profitable. The strategy for this will be comprised of three steps:

- Can be built as a minimal app on the Google Play Store
- If it does well, can be developed further
- If the app is profitable, can be developed into having an iOS version and deployed to the iOS App Store

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

### Building Functions to Analyize Frequency Tables

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function we can use to display the percentages in a descending order



In [154]:
def freq_table(dataset, index):
    mydict = {}
    n_of_apps = len(dataset)
    
    for row in dataset:
        if row[index] in mydict:
            mydict[row[index]] += 1
        else:
            mydict[row[index]] = 1

    for key, value in mydict.items():
        mydict[key] = 100 * (value / n_of_apps)
    
    return mydict

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now that we have a way to visualize the percentages of each category or genre we will start by looking at the categories in the iOS App Store.

In [155]:
display_table(ios_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The clear winner in the App Store is the Games category. Approximately 58% of apps in the iOS store are Games. Here we can determine that the most popular category is Games with other app categories spread out fairly evenly among the remaining percentages of apps. 

From this alone an app category cannot be recommended. We must first take a look at the Google Play Store apps to see what kind of apps are popular on there since our app has to do well their first before being implemented on the App Store.

In [156]:
display_table(android_free, 1) # Category

FAMILY : 18.934777702550214
GAME : 9.693071541412774
TOOLS : 8.451816745655607
BUSINESS : 4.5926427443015125
LIFESTYLE : 3.9043105393816293
PRODUCTIVITY : 3.8930264048747465
FINANCE : 3.7011961182577298
MEDICAL : 3.5206499661475967
SPORTS : 3.39652448657188
PERSONALIZATION : 3.3175355450236967
COMMUNICATION : 3.238546603475513
HEALTH_AND_FITNESS : 3.080568720379147
PHOTOGRAPHY : 2.945159106296547
NEWS_AND_MAGAZINES : 2.798465357707064
SOCIAL : 2.663055743624464
TRAVEL_AND_LOCAL : 2.335815842924848
SHOPPING : 2.2455427668697814
BOOKS_AND_REFERENCE : 2.143985556307831
DATING : 1.8618821936357481
VIDEO_PLAYERS : 1.7941773865944481
MAPS_AND_NAVIGATION : 1.399232678853532
FOOD_AND_DRINK : 1.2412547957571656
EDUCATION : 1.1735499887158656
ENTERTAINMENT : 0.9591514330850823
LIBRARIES_AND_DEMO : 0.9365831640713158
AUTO_AND_VEHICLES : 0.9252990295644324
HOUSE_AND_HOME : 0.8237418190024826
WEATHER : 0.8011735499887158
EVENTS : 0.7109004739336493
PARENTING : 0.6544798013992327
ART_AND_DESIGN : 0.

The leading Category in the Google Play Store is much less of a clear cut winner than in the iOS App Store. There is a much larger overlap between apps that are designed for fun (games, social networking, etc) and apps that are designed for practical reasons (family, tools, business, lifestyle, finance, etc). But the family category likely means games for children rather than for other practical family reasons.

In [157]:
display_table(android_free, 9) # Genre

Tools : 8.440532611148726
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.5926427443015125
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5206499661475967
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.238546603475513
Action : 3.1031369893929135
Health & Fitness : 3.080568720379147
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8505980591288649
Video Players & Editors : 1.7716091175806816
Casual : 1.7490408485669149
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto & Vehicles : 0.92529902956443

Again after looking at the genres of Google Play Store apps we can confirm that there is a much larger overlap between productive and entertaining apps. The one difference between 'category' and 'genre' is that the genre apps are a some what lower level description of what the app is, i.e., they have much more groups within the genre tag when compared to the category tag. This means it will probably be easy to work with the Category tag moving forward to get a better overall picture for apps.

### Analyizing The Number of Users

We can also analyze the data by looking at the number of users instead of the number of reviews. To do that we can start by calculating the average number of user ratings per app genre on the App Store. To calculate that we can:

- Isolate the apps of each genre
- Sum up the user ratings for apps of that genre
- Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

In [158]:
ios_prime_genre = freq_table(ios_free, 11)

def n_user_ratings(dataset, genre_dataset, genre_col, user_rate_col):
    mydict = {}
    for genre in genre_dataset: # prime genre key in dictionary
        total = 0
        len_genre = 0
        for app in dataset:
            genre_app = app[genre_col] # prime_genre
            if genre == genre_app:
                n_ratings = float(app[user_rate_col])
                total += n_ratings
                len_genre += 1
        avg_n_ratings = total / len_genre
        mydict[genre] = avg_n_ratings
    return mydict

ios_n_user_ratings = n_user_ratings(ios_free, ios_prime_genre, 11, 5)

# Convert dictionary into a tuple of strings to be visualized
# most to least number of users
def display_general(data_dict):
    data_display = []
    for key, value in data_dict.items():
        key_val_as_tuple = (value, key)
        data_display.append(key_val_as_tuple)
    return sorted(data_display, reverse = True)

ios_display = display_general(ios_n_user_ratings)

# Visualizing the sorted tuple of strings
for entry in ios_display:
    print(entry[1], ':', entry[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


We can see that navigation apps have the largest number of users, but let us investigate this category a bit further.

In [159]:
for row in ios_free:
    if row[11] == 'Navigation':
        print(row[1], ':', row[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


After taking a closer look we can see that this category is dominated by a few select apps. The average number of ratings was skewed by a select few apps that are popular among users. This means that releasing an app in this category would likely be difficult to gain popularity compared to the two apps that dominate this category (Waze and Google Maps).

Going forward, we can do the same analysis on the 'Reference' genre of the iOS apps dataset to see if there is a simliar trend with this category of apps.

In [160]:
for row in ios_free:
    if row[11] == 'Reference':
        print(row[1], ':', row[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


A similar pattern occurs within this category of apps as well. The Bible and dictionary apps skew the category into appearing to have a large number of users per app.

This same trend likely extends to social networking and music apps, where a select few giant apps (Facebook, Pinterest, Pandora, Spotify, etc) dominate the category.

Our aim is to avoid these categories and find on that is not dominated by a select few giant apps so that it is more likely that our app will gain popularity within a popular category. Going forward it would be ideal to remove the giant apps and then do the analysis we just did, but that tasked will be saved for a later project.

Regardless, we can conclude that an app in the category of weather or books could choices for our app recommendation as these categories are still popular, but not as dominated by a select few apps.

### Popular Google Play Apps by Genre

The same analysis can be done on the Google Play store data, but without all of the hard work we just did. This data is already stored within the dataset with the column name 'Installs'.

In [161]:
display_table(android_free, 5)

1,000,000+ : 15.741367637102236
100,000+ : 11.554953735048521
10,000,000+ : 10.516813360415256
10,000+ : 10.200857594222523
1,000+ : 8.395396073121193
100+ : 6.917174452719477
5,000,000+ : 6.838185511171294
500,000+ : 5.574362446400361
50,000+ : 4.773188896411646
5,000+ : 4.513653802753328
10+ : 3.5432182351613632
500+ : 3.2498307379823967
50,000,000+ : 2.2906793048973144
100,000,000+ : 2.1214172872940646
50+ : 1.9183028661701647
5+ : 0.7898894154818324
1+ : 0.5077860528097494
500,000,000+ : 0.2708192281651997
1,000,000,000+ : 0.22568269013766643
0+ : 0.045136538027533285
0 : 0.011284134506883321


The first thing that comes up after seeing that is that each app doesn't have an exact install count, but instead was put into categories of number of installs (1+, 10+, 50+, 100+, etc). This will do for our purposes though as we are just trying to get a rough idea of the popularity of the Google Play store apps, exact numbers are not critical in this situation.

In [169]:
android_categories = freq_table(android_free, 1)

def n_android_ratings(dataset, category_dataset, cat_col, installs_col):
    mydict = {}
    for category in category_dataset:
        total = 0
        len_category = 0
        for app in dataset:
            cat_app = app[cat_col]
            if category == cat_app:
                n_installs = app[installs_col]
                n_installs = n_installs.replace(',', '')
                n_installs = n_installs.replace('+', '')
                total += float(n_installs)
                len_category += 1
        avg_n_installs = total / len_category
        mydict[category] = round(avg_n_installs, 2)
    return mydict

android_installs = n_android_ratings(android_free, android_categories, 1, 5)

# Displays them from highest to lowest number of installs
android_display = display_general(android_installs)
for entry in android_display:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38456119.17
VIDEO_PLAYERS : 24727872.45
SOCIAL : 23253652.13
PHOTOGRAPHY : 17805627.64
PRODUCTIVITY : 16787331.34
GAME : 15560965.6
TRAVEL_AND_LOCAL : 13984077.71
ENTERTAINMENT : 11640705.88
TOOLS : 10682301.03
NEWS_AND_MAGAZINES : 9549178.47
BOOKS_AND_REFERENCE : 8767811.89
SHOPPING : 7036877.31
PERSONALIZATION : 5201482.61
WEATHER : 5074486.2
HEALTH_AND_FITNESS : 4188821.99
MAPS_AND_NAVIGATION : 4056941.77
FAMILY : 3694276.33
SPORTS : 3638640.14
ART_AND_DESIGN : 1986335.09
FOOD_AND_DRINK : 1924897.74
EDUCATION : 1820673.08
BUSINESS : 1712290.15
LIFESTYLE : 1437816.27
FINANCE : 1387692.48
HOUSE_AND_HOME : 1331540.56
DATING : 854028.83
COMICS : 817657.27
AUTO_AND_VEHICLES : 647317.82
LIBRARIES_AND_DEMO : 638503.73
PARENTING : 542603.62
BEAUTY : 513151.89
EVENTS : 253542.22
MEDICAL : 120616.49


Doing a similar analysis as done for the iOS App Analysis, we see again a select few giants dominating the category of Communication. Even though the communications category has the most installs at over 38 billion, this number was skewed by select apps with a large number of installs (WhatsApp, Facebook Messenger, Skype, etc).

In [181]:
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Viber Messenger : 500,000,000+


When these 'giants' are removed from the data we can see the number of installs goes down by a factor of 10.

In [206]:
under_100_m = []

for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
print(android_installs['COMMUNICATION'], ': All installed apps')
print(round(sum(under_100_m) / len(under_100_m), 2), ': < 1 million installed apps')

38456119.17 : All installed apps
3603485.39 : < 1 million installed apps


We see the same trend in several other categories such as video players and social. Next up is the photography category. Let's take a closer look at that.

In [207]:
for app in android_free:
    if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+


In [208]:
under_100_m = []

for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'PHOTOGRAPHY') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
print(android_installs['PHOTOGRAPHY'], ': All installed apps')
print(round(sum(under_100_m) / len(under_100_m), 2), ': < 1 million installed apps')

17805627.64 : All installed apps
7633342.21 : < 1 million installed apps


For the photography category it does not appear to have quite the similar problem the social, video player and communications categories had. This category has a variety of apps centered around either taking photos or editing photos using some editing software.

In [216]:
for app in android_free:
    if app[1] == 'PHOTOGRAPHY':
        print(app[0], ':', app[5])

TouchNote: Cards & Gifts : 1,000,000+
FreePrints – Free Photos Delivered : 1,000,000+
Groovebook Photo Books & Gifts : 500,000+
Moony Lab - Print Photos, Books & Magnets ™ : 50,000+
LALALAB prints your photos, photobooks and magnets : 1,000,000+
Snapfish : 1,000,000+
Motorola Camera : 50,000,000+
HD Camera - Best Cam with filters & panorama : 5,000,000+
LightX Photo Editor & Photo Effects : 10,000,000+
Sweet Snap - live filter, Selfie photo edit : 10,000,000+
HD Camera - Quick Snap Photo & Video : 1,000,000+
B612 - Beauty & Filter Camera : 100,000,000+
Waterfall Photo Frames : 1,000,000+
Photo frame : 100,000+
Huji Cam : 5,000,000+
Unicorn Photo : 1,000,000+
HD Camera : 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera : 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor : 1,000,000+
Moto Photo Editor : 5,000,000+
InstaBeauty -Makeup Selfie Cam : 50,000,000+
Garden Photo Frames - Garden Photo Editor : 500,000+
Photo Frame : 10,000,000+
Selfie Camera - Photo Edito

Editing photos is a fairly common technology and would not be too difficult to put into an app. But before we recommend this let us double check to see the popularity of these apps in the iOS App Store.

In [217]:
for row in ios_free:
    if row[11] == 'Photo & Video':
        print(row[1], ':', row[5])

Instagram : 2161558
Snapchat : 323905
YouTube - Watch Videos, Music, and Live Streams : 278166
Pic Collage - Picture Editor & Photo Collage Maker : 123433
Funimate video editor: add cool effects to videos : 123268
musical.ly - your video social network : 105429
Photo Collage Maker & Photo Editor - Live Collage : 93781
Vine Camera : 90355
Google Photos - unlimited photo and video storage : 88742
Flipagram : 79905
Mixgram - Picture Collage Maker - Pic Photo Editor : 54282
Shutterfly: Prints, Photo Books, Cards Made Easy : 51427
Pic Jointer – Photo Collage, Camera Effects Editor : 51330
Color Pop Effects - Photo Editor & Picture Editing : 45320
Photo Grid - photo collage maker & photo editor : 40531
iSwap Faces LITE : 39722
MOLDIV - Photo Editor, Collage & Beauty Camera : 39501
Photo Editor by Aviary : 39501
Photo Lab: Picture Editor, effects & fun face app : 34585
Rookie Cam - Photo Editor & Filter Camera : 33921
FotoRus -Camera & Photo Editor & Pic Collage Maker : 32558
PicsArt Photo St

Good thing the iOS Store was checked first because it clearly shows this category is dominated by the Instagram and Snapchat apps. 

Next choice of potential recommendations would be the Books and Refrences app

In [218]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


While still dominated by a select few apps that are incredibly popular apps it doesn't have the same drastic contrast as the Communications, Video Players and Social Networking categories.

In [219]:
under_100_m = []

for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'BOOKS_AND_REFERENCE') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
print(android_installs['BOOKS_AND_REFERENCE'], ': All installed apps')
print(round(sum(under_100_m) / len(under_100_m), 2), ': < 1 million installed apps')

8767811.89 : All installed apps
1437212.22 : < 1 million installed apps


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [221]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

## Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.