# Analyzing Mobile App Data

This project is about applying different skills, such as basic Python syntax, lists and for loops, conditional statements, dictionaries and frequency tables, functions, and the Jupyter Notebook. Using these skills cohesively, we will analyze app store and google play data to determine which apps will most effectively deliver ad revenue for the publisher.

## Opening and Exploring the Data

In [1]:
from csv import reader
### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(android_header)
print('/n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
/n
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Here we see that the Google Play data set contains 10,841 rows, with 13 columns. 

The columns that will be most useful for our purposes here would be `'App'`, `'Category'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, and `'Genre'`.

Now let's look at the iOS App Store data set.

In [3]:
print(ios_header)
print('/n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
/n
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Here we see that the iOS App Store data set contains 7,197 rows and 16 columns. 

The columns that will be most useful for our purposes here would be `'track_name'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'`. 

Unlike the Google Play set columns, these ones are less immediately understandable, and documentation for this set can be found __[here](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)__.

## Deleting Wrong Data

The Google Play dataset has a dedicated __[discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion)__, and we can see that __[one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015)__ describes an error for a certain row.

Below, we will print said row and compare it to the header row, and a correct row. If there is in fact an error, we will delete that row from the data set for our purposes here.

In [4]:
print(android[10472]) # incorrect row
print('/n')
print(android_header) # header row
print('/n')
print(android[0])     # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
/n
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
/n
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


As we follow along the columns in the faulty row, we notice that `'1.9'` is listed in the spot where the category should be, indicating that there is no entry for category in this row, leading to all of the subsequent columns being in the wrong position. With this in mind, we will delete row 10,472, so as to keep our data set functional.

In [5]:
print(len(android))
del android[10472]
print(len(android))

10841
10840


## Removing Duplicate Entries

### Part One

If we explore the Google Play data set long enough, we'll find that some apps hage more than one entry. For instance, Instagram has four entries in the data set, as shown here:

In [6]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In total, there are 1,181 cases where an app has multiple entries:

In [7]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.

If you examine the rows we printed for Instagram above, the differnece between each row happens in the fourth column, which corresponds to the number of reviews. The different numbers show that the data was collected at different times. With this in mind, the most logical iteration of the Instagram app to keep in the set would be the one with the highest reviews (thus the most recent entry).

To do this, we will:
1. Create a dictionary where each key corresponds to a unique app name, and the value is the highest number of reviews among the duplicates
1. Use the dictionary to create a new data set with only one entry per app, keeping only the iterations with the highest number of reviews

### Part Two

Let's start by creating the dictionary.

In [8]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

Since we know that there are 1,181 cases of duplicate apps, the length of our newly created dictionary of unique apps should be 9,659 (10,840 - 1,181).

In [9]:
print('Expected length:', len(android) - 1181)
print('Actual length', len(reviews_max))

Expected length: 9659
Actual length 9659


Now, let's use the `reviews_max` dictionary to remove the duplicate apps from our set.

In the cell below, we:
* Create two empty lists, `android_clean` and `already_added`.
* Loop through the `android` data set, and for every iteration:
    * Isolate the name of the app and number of reviews
    * Add the current row (`app`) to the `android_clean` list, and the app name (`name`) to the `already_added` list if:
        * The number of reviews of the current app matches the number of reviews of that app as described in the `reviews_max` dictionary; and
        * The name of the app is not already in the `already_added` list. We need to add this supplementary condition to account for cases where the highest number of reviews of a duplicate app is the same for more than one entry. If we only check for `reviews_max[name] == n_reviews`, we'll still wind up with some duplicates in the dictionary.

In [10]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Below, we'll explore the new data set `android_clean` to confirm it has 9,659 rows, as expected.

In [11]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


The above test shows that we now have 9,659 remaining rows, as expected.

## Removing Non-English Apps
### Part One

Since the company we are working with develops only English apps, any non-English apps in our data set will not be relevant, below are a few examples from both the iOS and Google Play data sets:

In [12]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠
„ÄêËÑ±Âá∫„Ç≤„Éº„É†„ÄëÁµ∂ÂØæ„Å´ÊúÄÂæå„Åæ„Åß„Éó„É¨„Ç§„Åó„Å™„ÅÑ„Åß „ÄúË¨éËß£„ÅçÔºÜ„Éñ„É≠„ÉÉ„ÇØ„Éë„Ç∫„É´„Äú


‰∏≠ÂõΩË™û AQ„É™„Çπ„Éã„É≥„Ç∞
ŸÑÿπÿ®ÿ© ÿ™ŸÇÿØÿ± ÿ™ÿ±ÿ®ÿ≠ DZ


We're not interested in keeping these apps, so we'll remove them. One way to do this is to remove each app with a name containing a symbol that isn't commonly used in English text ‚Äî English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

All of the standard English letters, numbers, punctuation, and symbols fall within 0 and 127 in the ASCII system, therefore creating a function to determine if an app is English or not is fairly straightforward. The function, using the built in `ord()` function, is defined and tested below:

In [13]:
def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
        
    return True

print(is_english('Instagram'))
print(is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))
print(is_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_english('Instachat üòú'))

True
False
False
False


While the above function worked correctly for the first two names, it fails to account for certain commonly used ASCII codes for items such as emojis or symbols such as the trademark symbol which, as shown below, fall outside of the 0-127 ASCII range we used:

In [14]:
print(ord('‚Ñ¢'))
print(ord('üòú'))

8482
128540


### Part Two

To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

In [15]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
            
    if non_ascii >= 3:
        return False
    else:
        return True

print(is_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_english('Instachat üòú'))
print(is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

True
True
False


Below, we will use our updated `is_english()` function to filter out non-English apps in both sets, and explore the sets to see if our results are reasonable:

In [16]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9597
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

The remaining number of apps in the Google Play set is 9,597, and 6,155 in the iOS set, which is reasonable to expect.

## Isolating the Free Apps

As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

Below we isolate the free apps from both the Android and Apple sets:

In [17]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8848
3203


After removal of the paid apps, we are left with 8,848 apps in the Android set, and 3,203 in the Apple set.

## Most Common Apps by Genre
### Part One
As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of users correlates to ad views, and therefore to revenue.

To minimize risks and overhead, we use a three-step validation strategy for app ideas:
1. Build a minimal Android version of the app, and add it to Google Play.
1. If the app has a good response from users, we develop it further.
1. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because the ultimate goal is to have apps on both the Google Play and Apple stores, we need to find app profiles that have proven successful in both markets. One example of such an app could be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market by building frequency tables for a few key columns.

The most useful columns in the Google Play data set would be the `Genres` and `Category` columns, and in the Apple store it would be the `prime_genre` column.

### Part Two
We'll build two functions we can use to analyze the frequency tables:
* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in descending order

In [18]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
            
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Part Three
We start by examining the frequency table for the `prime_genre` column of the Apple data set:

In [19]:
display_table(ios_final, -5)

Games : 58.25788323446769
Entertainment : 7.836403371838902
Photo & Video : 4.995316890415236
Education : 3.6840462066812365
Social Networking : 3.3093974399000934
Shopping : 2.5913206369029034
Utilities : 2.466437714642523
Sports : 2.1542304089915705
Music : 2.0605682172962845
Health & Fitness : 2.0293474867311896
Productivity : 1.7483609116453322
Lifestyle : 1.5610365282547611
News : 1.3424914142990947
Travel : 1.248829222603809
Finance : 1.0927255697783327
Weather : 0.8741804558226661
Food & Drink : 0.8117389946924758
Reference : 0.5307524196066188
Business : 0.5307524196066188
Book : 0.3746487667811427
Navigation : 0.18732438339057134
Medical : 0.18732438339057134
Catalogs : 0.1248829222603809


As displayed above, we can see that the most common genre by far is games, taking up over half (58.26%) of the set. The next highest category, entertainment, doesn't even comprise 10% of the set, photo and video apps and education apps follow entertainment with ~5% and ~4% respectively.

The most obvious pattern from this table is that apps designed for fun take the lion's share of the number of apps, while practical apps are fewer and farther between. While this is significant, it doesn't yet factor in the size of the user base for different genres of apps. With this in mind, game and entertainment apps are the frontrunners for our use, but whether they will continue to be so remains to be seen.

Next we will look into the frequency tables for the `Category` and `Genre` columns in the Google Play data set to see if the results are similar.

In [20]:
display_table(android_final, 1) # Category

FAMILY : 18.942133815551536
GAME : 9.697106690777577
TOOLS : 8.453887884267631
BUSINESS : 4.599909584086799
PRODUCTIVITY : 3.899186256781193
LIFESTYLE : 3.887884267631103
FINANCE : 3.7070524412296564
MEDICAL : 3.5375226039783
SPORTS : 3.390596745027125
PERSONALIZATION : 3.322784810126582
COMMUNICATION : 3.2323688969258586
HEALTH_AND_FITNESS : 3.0854430379746836
PHOTOGRAPHY : 2.949819168173599
NEWS_AND_MAGAZINES : 2.802893309222423
SOCIAL : 2.667269439421338
TRAVEL_AND_LOCAL : 2.3395117540687163
SHOPPING : 2.2490958408679926
BOOKS_AND_REFERENCE : 2.1360759493670884
DATING : 1.8648282097649187
VIDEO_PLAYERS : 1.7970162748643763
MAPS_AND_NAVIGATION : 1.3901446654611211
FOOD_AND_DRINK : 1.2432188065099457
EDUCATION : 1.164104882459313
ENTERTAINMENT : 0.9606690777576853
LIBRARIES_AND_DEMO : 0.9380650994575045
AUTO_AND_VEHICLES : 0.9267631103074141
HOUSE_AND_HOME : 0.8024412296564195
WEATHER : 0.7911392405063291
EVENTS : 0.7120253164556962
PARENTING : 0.6555153707052441
ART_AND_DESIGN : 0.64

At first glance, the Google Play store is significantly less occupied by games than the App Store, with 'Family' taking the top spot at 18.92%, followed by games at 9.7% and tools at 8.45%. Games and tools seem to be roughly similar percentages of the app store, with utility apps as a whole occupying the majority share. The disparity between the percentage of games may be smaller than apparent, as other users of this set have pointed out that many apps that are categorized as 'Family' are in fact games, specifically games aimed at children.

Below we will see if the frequency table of the `Genres` column follows suit:

In [21]:
display_table(android_final, -4)

Tools : 8.44258589511754
Entertainment : 6.080470162748644
Education : 5.357142857142857
Business : 4.599909584086799
Productivity : 3.899186256781193
Lifestyle : 3.8765822784810124
Finance : 3.7070524412296564
Medical : 3.5375226039783
Sports : 3.4584086799276674
Personalization : 3.322784810126582
Communication : 3.2323688969258586
Action : 3.096745027124774
Health & Fitness : 3.0854430379746836
Photography : 2.949819168173599
News & Magazines : 2.802893309222423
Social : 2.667269439421338
Travel & Local : 2.328209764918626
Shopping : 2.2490958408679926
Books & Reference : 2.1360759493670884
Simulation : 2.0456600361663653
Dating : 1.8648282097649187
Arcade : 1.842224231464738
Video Players & Editors : 1.7744122965641953
Casual : 1.763110307414105
Maps & Navigation : 1.3901446654611211
Food & Drink : 1.2432188065099457
Puzzle : 1.1301989150090417
Racing : 0.9945750452079566
Role Playing : 0.9380650994575045
Libraries & Demo : 0.9380650994575045
Auto & Vehicles : 0.9267631103074141
St

While, unlike the App Store, it seems in Google Play's data set the different kinds of games are separated into individual genres, it does in fact seem as though apps designed for fun and entertainment represent a smaller portion than apps designed for utility in the Google Play Store. Since the `Genres` column is so siloed and specific, we'll use the `Category` column for our recommendations going forward, as we are interested in the biggest possible picture.

Since the App Store seems to favor fun-based apps and the Google Play Store favors utility apps, we will need to look into user base sizes to better make a recommendation.

## Most Popular Apps by Genre on the App Store
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` column.

To do this, we'll start by calculating the average number of user ratings per app genre in the App Store:

In [22]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22886.36709539121
Music : 57326.530303030304
Reference : 79350.4705882353
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 19156.493670886077
Travel : 28243.8
Shopping : 27230.734939759037
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16815.48
Entertainment : 14195.358565737051
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 46384.916666666664
Finance : 32367.02857142857
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Upon first glance, it would seem that navigation apps have the largest user base, and would be the best candidate for app development, however, common knowledge would indicate that most people use one of two or three navigation apps, Waze, Google Maps, or Apple Maps, with the latter not counting in our data set. This suspicion is easily confirmed with the statistics below:

In [23]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) #name and number of ratings per app

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching¬Æ : 12811
CoPilot GPS ‚Äì Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The relatively small number of navigation apps suggests that most people are loyal to one of a few apps, therefore our efforts would be best spent elsewhere.

Reference apps take the number two slot, which, when further interrogated, seems to also be affected by a few enormously popular apps, as shown below:

In [24]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


Social Networking also has a high number of average reviews, but it can be assumed that a similar phenomenon to navigation apps is occurring, with apps like Faceboo, Instagram, and Snapchat taking the lion's share of users and skewing the review averages.

Interestingly enough, weather apps have nearly double the average reviews of games, however they are usually quickly checked, and are better suited for in-app purchase profit rather than ad-based profit.

Food and drink apps are popular because of specific restaurant chains, finance apps are usually connected to institutions or specific markets, and book apps have little space in them for ads. Despite being relatively low on our lists, games are seemingly the best candidate for long exposure to ads, so we will check for the review distribution below:

In [27]:
for app in ios_final:
    if app[-5] == 'Games':
        print(app[1], ':', app[5])

Clash of Clans : 2130805
Temple Run : 1724546
Candy Crush Saga : 961794
Angry Birds : 824451
Subway Surfers : 706110
Solitaire : 679055
CSR Racing : 677247
Crossy Road - Endless Arcade Hopper : 669079
Injustice: Gods Among Us : 612532
Hay Day : 567344
PAC-MAN : 508808
DragonVale : 503230
Head Soccer : 481564
Despicable Me: Minion Rush : 464312
The Sims‚Ñ¢ FreePlay : 446880
Sonic Dash : 418033
8 Ball Pool‚Ñ¢ : 416736
Tiny Tower - Free City Building : 414803
Jetpack Joyride : 405647
Bike Race - Top Motorcycle Racing Games : 405007
Kim Kardashian: Hollywood : 397730
Trivia Crack : 393469
WordBrain : 391401
Sniper 3D Assassin: Shoot to Kill Gun Game : 386521
Flow Free : 373857
Geometry Dash Lite : 370370
‚ñªSudoku : 359832
Fruit Ninja¬Æ : 327025
Pixel Gun 3D : 301182
Temple Run 2 : 295211
My Horse : 293857
Word Cookies! : 287095
Dragon City Mobile : 277268
The Simpsons‚Ñ¢: Tapped Out : 274501
Plants vs. Zombies‚Ñ¢ 2 : 267394
Clash Royale : 266921
Pok√©mon GO : 257627
CSR Racing 2 : 257100


Dear Diary - Interactive Story : 8922
Archery King : 8905
Mimitos Virtual cat with minigames : 8899
Despicable Bear - Top Beat Action Game : 8872
Smashy City : 8742
Sniper Fury: Fun Mobile Shooter Game : 8596
Crazy Taxi : 8573
Fit The Fat 2 : 8557
Stop - Fun Categories Word Game : 8516
My Emma :) : 8412
Spin It Rich! Casino Slots: Free Slot Machines : 8402
Futurama: Game of Drones : 8394
One More Line : 8392
Diggy's Adventure : 8271
Pet Buddies HD : 8227
Mekorama : 8157
Tomb of the Mask : 8054
Kids Doodle - Movie Kids Color & Draw : 8025
Swing : 8013
Vikings: War of Clans : 7969
Fantasy Forest Story: Land Before Dragons : 7935
Run!!! : 7891
Pok√©mon Duel : 7868
Clash of Lords 2: New Age : 7795
Soccer Star 2016 World Legend : 7765
Steve - The Jumping Dinosaur Widget Arcade Game : 7751
My Singing Monsters: Dawn of Fire : 7749
Burrito Bison: Launcha Libre : 7740
DragonSoul RPG : 7681
Tactile Wars : 7674
Quick Hit Slots ‚Äì Casino Slot Machines Games : 7646
Galaxy on Fire 3 - Manticore : 7

Snowboarding The Fourth Phase : 815
Pretty Ballerina - Ballet Dreams : 814
Hockey Clicker : 813
One More Jump : 806
Sky Hoppers : 805
ReRunners: Race for the World : 804
Plantera - Idle Farm Clicker : 802
Twisty Board : 800
Marry Me - Perfect Wedding Day! : 795
Mad City Crime : 793
Droppy Balls! : 791
Knights of Pen & Paper 2 : 790
Battleplans - #1 Battle Strategy & Defense Game : 785
One Tap Tennis : 779
Crazy Clown Chase : 778
Brutal Street : 777
Just Rolling : 776
Focus! - Can You Tell Them Apart? : 766
Versus Run : 766
Buddyman Run Ôºç keep running! : 762
Dustoff Heli Rescue 2 : 754
Slots Discovery : 754
Play Basketball Hoops 2017 - Real slam dunks game : 751
Don't Grind : 751
Solitaire ‚àò : 741
Medal Masters : Call of destiny : 738
RAID HQ : 737
Hop Shot : 737
Kick Hero : 733
Wonder Tactics : 730
Micro Machines : 729
Glob Trotters - Endless Arcade Blobber : 724
Lonely One : 723
Break Liner : 721
The Counter Of Death : 719
My First Crush - High School Love : 710
Baby Twins - Terri

Rolly Worms : 135
Head Soccer France 2016 : 135
Tina's Diary: Spring Outing : 134
Trapdoors : 133
Osmo Newton : 132
Cooking Mama Let's Cook Puzzle : 132
Skippy Box : 130
LOST MAZE : 130
One More Bounce : 129
Quiz Tales : 129
Roller Coaster VR : 128
Bridge : 128
Bouncing Ball 2 : 128
Midnight Calling: Jeronimo : 127
Jet Plane Fighter Pilot Flying Simulator Real War Combat Fighting Games : 127
Play-Doh TOUCH - Shape, Scan, Explore : 126
Boyfriend Plus - Brett Jeff and Kai : 126
Soccer Stadium Sports Car & Bus Parking Simulator 3D Driving Sim : 126
Drop Out! : 125
Super Fashion Show - Girls Makeup, Dressup Games : 125
Swiperoo : 125
Fatal : 124
Glory of Generals 2 : 122
LINE BROWN FARM : 122
Paddle Panic : 122
Spa Birthday Party - Nails, Hair, Dress Up & Cake : 122
Offroad 4x4 Truck Trials Parking Simulator 2 a Real Stunt Car Driving Racing Sim : 121
Osmo Words : 121
Frantic Architect : 117
OUTDODGE : 117
Conduct THIS! : 116
Monster Raid‚Ñ¢ : 116
NHL SuperCard 2K17 : 116
Jrump : 116
InMin

One More Spin! - Free Vegas Casino Slots : 1
Escape Game Darkness LittleRedHood : 1
Frantic Ball : 1
LINE FIGHTERS : 1
Trail Jump : 1
Versus - Multiplayer Game (2 players) : 1
There's no tablecloth I can't pull!! : 1
METAL SAGA -Ark in the wild land- : 0
Myth Kingdom:Wrath of Gauss : 0
Impossible Goal 3D : 0
TLMJ, Tout Le Monde Joue : 0
Toy Blocks Game for Brain Traning -TSUMIKI- : 0
Billiards8 (8 Ball & Mission) : 0
Der IQ Test - Intelligenztest (Deutsch) : 0
Ear Age Diagnosis : 0
Tetro Blocks - Beat IQ with Tetrix-style-puzzle. : 0
REVERSI ZERO - free classic game : 0
SUDOKU -The puzzle game that makes your brain younger!- : 0
Fusion Pictures - A Multiplayer Game with 4 Hidden Pics Object & 1 Word Puzzle : 0
Limite Limite : 0
Refreshing ! Brain training fit : 0
S, - Puzzle Game With Create Level Feature : 0
Stress Release Grenade : 0
striker ‚Äì dropping balls : 0
Otaku quizzes from manga, and anime--Slice HEROES! : 0
DropEraser : 0
Puzzle Game for Brain Training!! PITATTO!! : 0
Socc

While the reviews per app is lower on average for games, the reviews seem far more evenly spread, so there is more room for many games with large user bases, combined with a prime form factor for advertising, game apps seem like the best option when utilizing App Store data.

Let's see if the Google Play data tells us something different.

## Most Popular Apps by Genre on Google Play
Since the Google Play set, unlike the App Store set, gives us a category for the number of installs, we'll take a look at a frequency table constructed from that column below:

In [29]:
display_table(android_final, 5)

1,000,000+ : 15.75497287522604
100,000+ : 11.539330922242314
10,000,000+ : 10.567359855334539
10,000+ : 10.194394213381555
1,000+ : 8.39737793851718
100+ : 6.928119349005425
5,000,000+ : 6.826401446654612
500,000+ : 5.560578661844485
50,000+ : 4.769439421338156
5,000+ : 4.486889692585895
10+ : 3.5375226039783
500+ : 3.2436708860759493
50,000,000+ : 2.2830018083182644
100,000,000+ : 2.1360759493670884
50+ : 1.9213381555153706
5+ : 0.7911392405063291
1+ : 0.5085895117540687
500,000,000+ : 0.27124773960216997
1,000,000,000+ : 0.22603978300180833
0+ : 0.045207956600361664
0 : 0.011301989150090416


We can see that the install numbers are broken up into segments, and are fairly imprecise. Luckily we are only interested in the big picture here, so we will count 1,000+ installs as 1,000 installs, 5,000+ installs as 5,000 installs, and so on.

As they are currently stored as strings, we will need to convert the install columns to floats, which will involve removing the commas and plus signs. We will do this at the same time as we compute the average number of installs per app below:

In [31]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8814199.78835979
BUSINESS : 1712290.1474201474
COMICS : 832613.8888888889
COMMUNICATION : 38590581.08741259
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1360598.042253521
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1446158.2238372094
GAME : 15544014.51048951
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3650602.276666667
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10830251.970588235
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5145550.285714285
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_MAGAZ

On first look, one might be tempted to jump straight to communications apps, as they have the highest number of average installs, so we will check to see how skewed this category is, looking at the highest few tiers of installs.

In [32]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Me

This data shows there are six different communications apps with over 1,000,000,000 installs, which for sure is skewing the data,

To see how influential the few highest apps are, we will remove apps with over 100,000,000 installs and see what happens to our average:

In [33]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3617398.420849421

This new average, 3,617,398, is significantly less than the old average of 38,456,119, over ten times less in fact.

We can assume this would also be the case with social apps (Facebook, Instagram, etc.), video players (YouTube, Google Play Movies, etc), photography apps (Facetune, Google Play Photos, etc.), productivity apps (Microsoft Word, Dropbox, Evernote, etc.), and other categories.

If we combine the `FAMILY` and `GAMES` columns we get 19,239,655 average installs, which is nothing to discount. Below we check for the distribution:

In [34]:
for app in android_final:
    if app[1] == 'GAMES' or app[1] == 'FAMILY':
        print(app[0], ':', app[5])

Jewels Crush- Match 3 Puzzle : 1,000,000+
Coloring & Learn : 5,000,000+
Mahjong : 5,000,000+
Super ABC! Learning games for kids! Preschool apps : 1,000,000+
Toy Pop Cubes : 1,000,000+
Educational Games 4 Kids : 5,000,000+
Candy Pop Story : 1,000,000+
Princess Coloring Book : 5,000,000+
Hello Kitty Nail Salon : 50,000,000+
Candy Smash : 1,000,000+
Happy Fruits Bomb - Cube Blast : 500,000+
Princess Adventures Puzzles : 500,000+
Kids Educational Game 3 Free : 5,000,000+
Puzzle Kids - Animals Shapes and Jigsaw Puzzles : 1,000,000+
Coloring book moana : 500,000+
Baby Panda Care : 10,000,000+
Kids Educational :All in One : 500,000+
Number Counting games for toddler preschool kids : 1,000,000+
Learn To Draw Glow Flower : 1,000,000+
No. Color - Color by Number, Number Coloring : 10,000,000+
Draw.ly - Color by Number Pixel Art Coloring : 1,000,000+
Baby puzzles : 5,000,000+
Garden Fruit Legend : 500,000+
Barbie‚Ñ¢ Fashion Closet : 10,000,000+
Candy Day : 1,000,000+
Learn To Draw Glow Princess :

Battle Gems (AdventureQuest) : 500,000+
Adventure Quest World Mobile Quiz : 500+
Aqw&3d Design Notes Manager : 500+
AQ Ria Retail : 50+
Accounting Quiz (AQ) Malaysia : 1,000+
Wowkwis aq Ka'qaquj : 10+
Dungeon Boss ‚Äì Strategy RPG : 5,000,000+
Questland: Turn Based RPG : 1,000,000+
Order & Chaos 2: 3D MMO RPG : 10,000,000+
Dungeon Quest : 5,000,000+
Dungeon Hunter 5 ‚Äì Action RPG : 10,000,000+
Grow Stone Online : 2d pixel RPG, MMORPG game : 1,000,000+
Megatramp - a Story of Success! : 5,000,000+
Knightfall‚Ñ¢ AR : 10,000+
Sketchfab : 50,000+
My Tamagotchi Forever : 1,000,000+
Egg, Inc. : 5,000,000+
Monster Park AR - Jurassic Dinosaurs in Real World : 10,000+
AR Remote Car : 100,000+
Solar System AR ( ARCore ) : 10,000+
Super AR : 10,000+
Spacecraft AR : 10,000+
Assemblr - Create 3D Models (Sandbox AR) : 1,000,000+
MCRAFT AR - EDITOR : 5,000+
Jenga¬Æ AR : 10,000+
Figment AR : 5,000+
AR Dragon Pet : 50,000+
Paint Space AR : 1,000+
Just a Line - Draw Anywhere, with AR : 100,000+
AR Toybo

Park Limousine: Realistic Limo Parking Simulator : 1,000+
Filmi Gaane : 100,000+
CA Lottery Official App : 1,000,000+
CA Technologies : 100+
CB Outdoors : 1,000+
WRLP CB Repeater : 1,000+
CB STARS : 10+
CB TV : 100+
CB News : 50+
CB Register : 10+
CD CHOICE TUBE : 500+
CD - Teach me ABC English L1 : 500+
CD Carpe Diem App : 100+
Yazdani Cd Center EllahAbad Official App : 500+
CE Broker : 10,000+
CEHQ - CE Credits for Nurses : 1,000+
Professor Online SEDUC-CE : 10,000+
ABAI CE Scanner : 1,000+
Shiver: Moonlit Grove CE : 50,000+
Beast of Lycan Isle CE : 50,000+
QUI EST CE ? : 1,000+
MCQ CE IT : 1,000+
Placement Tips for BE (CE/IT) : 500+
CE-STRONG : 100+
Magana Jari ce Littafi Na Uku : Part 3 of 3 : 5,000+
Qu'est ce qui est jaune et qui attend ? : 500+
CF Life : 100+
CF Church : 50+
All Info about Cg : 1,000+
cg guruji : 10,000+
CG Yojna & Jansampark : 5,000+
CG Job Alerts : 10,000+
CG Samanya Gyan : 10,000+
CG Wallpapers : 500+
Best CG Backgrounds : 500+
Amazing CG Backgrounds : 1,000+


EF Smart English for Phone : 10,000+
My EF Center : 10,000+
EF Mentor: Sounds : 5,000+
EF English Live Business : 100,000+
EF Classroom : 50,000+
EF English Live for phone : 100,000+
EF Smart English : 5,000+
EF English Live Business Tablet : 10,000+
EF English Live for tablets : 50,000+
EF Par–µnts : 50,000+
EF Teacher : 1,000+
English Bite : 10,000+
EF/VR ‚Äì VR Tours of EF Schools : 1,000+
EF Spelling Bee : 500+
Travelmoji : 500+
EFAmbassador : 100+
Agenda EF : 5,000+
e-Docente EI/anos iniciais EF : 1,000+
EF Universe: Endless Battle : 50+
EF First : 1+
Di√°rio Escola Mestres EF : 1,000+
English Conversation Courses : 1,000,000+
Equestria Girls : 10,000,000+
EG Player : 100+
Fashion Pony Girls Dress Up Makeup Game : 10,000+
My Little Pony Celebration : 1,000,000+
MY LITTLE PONY: Magic Princess : 50,000,000+
Equestria Girl Snapchat Addict : 1,000+
Sirens Fashion Style Game : 500,000+
My Little Pony: Story Creator : 1,000,000+
Pocket Little Pony : 1,000,000+
Dance Magic Fashion Style 

While at first glance the game/family categories may seem crowded, there are a very large number of games with large user bases, and it seems like the genre with the most room for multiple apps in a user's life. There is definitely a market for games marketed towards children, so that is something that also needs to be taken into account.

## Conclusions

In this project, we analyzed data about the App Store and Google Play store with the intent of recommending an app profile that would be successful in generating ad revenue in both Apple and Android markets.

Our conclusion is that, while seemingly crowded, gaming apps have not yet hit their limit, as users who play games are highly likely to play many different games. While also providing the best environment for ad revenue, our recommendation is to focus on game apps, with a pointed effort to make sure some of those apps are specifically targeted towards children, and the majority are still playable for older children as well as 