# Profitable App Profiles for the App Store and Google Play Markets

In this project, I'll be diving into datasets from the Apple App Store and Google Play marketplace. Datasets contain info around ratings and reviews by app. 

The goal is to determine characteristics that contribute to a successful app, and to ultimately make a recommendation for the ideal app profile based upon this analysis. 

To begin our analysis, we'll first open and read the applicable datasets, and explore thenm using the function below:

In [93]:
from csv import reader

#Open Apple Data
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

#Open GooglePlay Data
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

#Defining our exploration function
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

#Exploring Apple App Store data
explore_data(ios,0,5,True)

#Apple Column names
print(ios_header)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


For more info on these column names, visit [this link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps). 

In [94]:
#Exploring Android app data
explore_data(android,1,5,True)
#Android Column names
print(android_header)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


For more info on these column names, [visit this link](https://www.kaggle.com/lava18/google-play-store-apps).

## Cleaning our data
Starting with Google Play data, we've detected a row with 1 fewer column than the rest of the dataset (12 columns in this row vs. 13). 

In [95]:
#Printing number of columns in row 10,472
print(len(android[10472]))
#Row in question
print(android[10472])

12
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


The row appears to be missing the `Category` column, which will be important for future analysis. As a result, we'll need to delete the row. 

In [96]:
del android[10472]

We can verify that the row has been deleted by printing that same row again:

In [97]:
print(android[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


This row contains data for the **osmino Wi-Fi** app, which is different from the deleted row. We're good to move forward and turn our attention to the iOS data.

### Duplicate apps - App Store

In [98]:
app_names = {}
#Loop through iOS data, update our dictionary based upon frequency if app name
for row in ios:
    name = row[1]
    if name in app_names:
        app_names[name] += 1
    else:
        app_names[name] = 1

for key in app_names:
    if app_names[key] > 1:
        print(key)
        

Mannequin Challenge
VR Roller Coaster


We've identified app names that appear more than once in the App Store dataset. Let's look at all associated rows next.

In [99]:
#Print header for reference
print(ios_header)
print('\n')
#Print row if it matches the duplicate names we've identified
for row in ios:
    if row[1] == 'Mannequin Challenge' or row[1] == 'VR Roller Coaster':
        print(row)
        print('\n')

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']


['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']


['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']


['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']




 In the Apple App store dataset, 2 duplicate app names were detected: `Mannequin Challenge` and `VR Roller Coaster`. Upon further review, we decided not to remove these rows from the dataset, given that each entry corresponded to a different app version. One `Mannequin Challenge` row pertained to version `1.4`, while the other applied to version `1.0.1`. Similarly, one of the rows for `VR Roller Coaster` applied to version `2.0.0` while another corresponded to version `0.81`. There were additional column values that distingushed duplicates (ie. `rating_count_tot`).

### Duplicate Apps - Google Play
We do have duplicates in the Google Play store dataset, as seen below:

In [100]:
for row in android:
    name = row[0]
    if name == 'Instagram':
        print(row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


You'll notice that Instagram has 4 different entries, though our rating count distinguishes each of these entries.

Instagram is not the only app with duplicate entries. The following code will help highlight multiple entries for apps with the same name:

In [101]:
unique_apps = []
duplicate_apps = []

#Loop through android rows
for app in android:
    name = app[0]
#If the app has been appended to our unique list, send it to the duplicates list
    if name in unique_apps:
        duplicate_apps.append(name)
#If the app is not in the unique list, append it    
    else:
        unique_apps.append(name)

print(duplicate_apps[:5])
print('Number of duplicate apps:', len(duplicate_apps))

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']
Number of duplicate apps: 1181


After running the above code, we've detected 1,181 duplicate apps, or more specifically, 1,181 app names that appear in the dataset more than once. 

These rows should be removed, but we will not do so randomly, for reasons described below: 

In [102]:
#Print number of reviews for each Instagram instance
for row in android:
    name = row[0]
    reviews = row[3]
    if name == 'Instagram':
        print(name,reviews)

Instagram 66577313
Instagram 66577446
Instagram 66577313
Instagram 66509917


The above code pulls duplicate `Instagram` entries, along with associated review counts. Review counts actually vary. Entries with higher review counts are more up to date. With this in mind, we will only be using the entry with the most reviews. This goes for all entries that share the identical app name. 

#### Removing duplicate rows in the Google Play dataset
To start things off, we will be creating dictionary that will contain 1) the app name as a key, and 2) the highest review count detected among all entries, which will be stored as the dictionary value. 

In [103]:
#Dictionary for unique app data

reviews_max = {}
for row in android:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and n_reviews > reviews_max[name]:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews
reviews_max        

{'Photo Editor & Candy Camera & Grid & ScrapBook': 159.0,
 'Coloring book moana': 974.0,
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps': 87510.0,
 'Sketch - Draw & Paint': 215644.0,
 'Pixel Draw - Number Art Coloring Book': 967.0,
 'Paper flowers instructions': 167.0,
 'Smoke Effect Photo Maker - Smoke Editor': 178.0,
 'Infinite Painter': 36815.0,
 'Garden Coloring Book': 13791.0,
 'Kids Paint Free - Drawing Fun': 121.0,
 'Text on Photo - Fonteee': 13880.0,
 'Name Art Photo Editor - Focus n Filters': 8788.0,
 'Tattoo Name On My Photo Editor': 44829.0,
 'Mandala Coloring Book': 4326.0,
 '3D Color Pixel by Number - Sandbox Art Coloring': 1518.0,
 'Learn To Draw Kawaii Characters': 55.0,
 'Photo Designer - Write your name with shapes': 3632.0,
 '350 Diy Room Decor Ideas': 27.0,
 'FlipaClip - Cartoon animation': 194216.0,
 'ibis Paint X': 224399.0,
 'Logo Maker - Small Business': 450.0,
 "Boys Photo Editor - Six Pack & Men's Suit": 654.0,
 'Superheroes Wallpapers | 4K Backgrounds': 

In the above code snippet, we've done the following:

* Looped through the `android` dataset, storing app name and # of reviews.
* Executed an if statement that looks at whether or not we've already added the app name to the dictionary, and whether or not the number of reviews associated is greater than the row currently being examined. 
    * If the app name is not already in the dictionary, a new key/value pairing will be created using the number of reviews as the value. 
    * In the event that the app name of the currently examined row already exists in the dictionary, BUT the number of reviews is not greater than what already exists in the dictionary, the row will be ignored. 
    
We are looking for 9,659 rows (10,840 total rows - 1,181 duplicate = 9659 unique rows), and after checking the length of the dictionary (see below), We can confirm that the dictionary has the desired data.

In [104]:
len(reviews_max)

9659

#### Utilizing dictionary to create new, de-duped dataset
Next, we'll use the `reviews_max` dictionary (and the desired review values that it contains) to rework our dataset. 

In the code below, we're 1) comparing the reviews for the app in our dictionary to the examined row, and 2) seeing whether or not the name of the app being examined is in our `already_added` list. If the number of reviews match, and we haven't added this row to the `already_added` list, we'll add the entire row to the new `android_clean` list.

In [105]:
android_clean = []
already_added = []
for row in android:
    name = row[0]
    n_reviews = float(row[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
android_clean

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up'],
 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps',
  'ART_AND_DESIGN',
  '4.7',
  '87510',
  '8.7M',
  '5,000,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'August 1, 2018',
  '1.2.4',
  '4.0.3 and up'],
 ['Sketch - Draw & Paint',
  'ART_AND_DESIGN',
  '4.5',
  '215644',
  '25M',
  '50,000,000+',
  'Free',
  '0',
  'Teen',
  'Art & Design',
  'June 8, 2018',
  'Varies with device',
  '4.2 and up'],
 ['Pixel Draw - Number Art Coloring Book',
  'ART_AND_DESIGN',
  '4.3',
  '967',
  '2.8M',
  '100,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design;Creativity',
  'June 20, 2018',
  '1.1',
  '4.4 and up'],
 ['Paper flowers instructions',
  'ART_AND_DESIGN',
  '4.4',
  '167',
  '5.6M',
  '50,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'March 26, 2017

A quick check confirms that we have the expected number of rows:

In [106]:
len(android_clean)

9659

### Removing apps for non-English speaking audiences

We also need to filter our list to an English-speaking audience. Meaning, only apps with English titles should be considered. We can utilize ASCII character numbers to weed out non-English characters. English characters are numbered below 127 in the ASCII range, and these numbers are derived from the `ord()` function. The one caveat we'll need to consider is the fact that some characters (ie. TM and emojis) also fall outside of the 0 - 127 ASCII range, though they do not necessarily indicate a non-English title. 

As a result, we will further filter the list by limiting the number of ASCII characters with a number over 127 to 3. This isn't a perfect solution, but should put us in a decent spot. 

In [107]:
def character_detector(string):
    false_counter = 0
    #Loop through characters in a string
    for character in string:
        #If ASCII number is greater than 127, add it to our false-counter variable
        if ord(character) > 127:
            false_counter += 1
    if false_counter > 3:
        return False
    return True

#Testing character_detector
print(character_detector('Instagram'))
print(character_detector('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(character_detector('Docs To Go™ Free Office Suite'
))
print(character_detector('Instachat 😜'))


True
False
True
True


Let's use this function to filter App Store data.

In [108]:
#Narrowing iOS app list to those for English-speakers
ios_english = []
for row in ios:
    #If the character_detector returns True, add entire row to ios_english list 
    if character_detector(row[1]):
        ios_english.append(row)

explore_data(ios_english,0,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


We've cut total row count (excluding header) to 6,183, down from 7,197. Next we'll do the same for android data.

In [109]:
#Narrowing list of Google Play apps to those for English-speakers
android_english = []
for row in android_clean:
    #If the character_detector returns True, add entire row to android_english list 
    if character_detector(row[0]):
        android_english.append(row)
        
explore_data(android_english,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


### Isolating free apps
We only want to consider data associated with free apps, because that is the kind of app we'll be developing. The following code narrows our list to free apps in both datasets. 

In [110]:
#Generating free app list for iOS
ios_free = []
#Loop through iOS dataset
for row in ios_english:
    price = row[4]
    #If app is free, append the row to a new list called ios_free
    if price == '0.0':
        ios_free.append(row)
        
print('# of free IOS apps:', len(ios_free))

# of free IOS apps: 3222


In [111]:
#Generating free app list for Android/Google Play
android_free = []
for row in android_english:
    price = row[7]
    if price == '0':
        android_free.append(row)
        
print('# of free Android apps:', len(android_free))

# of free Android apps: 8864


## Identifying characteristics of profitable apps
Ultimately, we need to find an app profile that fits both the App Store and Google Play, the primary reason being that both marketplaces combined make for the largest opportunity. 

In the same vein, the validation strategy touches both of these marketplaces. It comprises of 3 steps:

1. Buld a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store. 

### Columns to be used for genre tallying
iOS: index 11

Android: index -4, 2

### Generating frequency tables to tally app types

In [112]:
#Defining frequency table function, sorting function and running for iOS data
def freq_table(dataset,index):
    table = {}
    total = 0
    
    #Loop through dataset, row by row
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    #Loop through keys in dictionary
    for key in table:
        #Calculate percentages for key, assign to variable
        percentage = (table[key] / total)*100
        #Reassign key value in dictionary to percentage
        table_percentages[key] = percentage
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
display_table(ios_free,11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


By far, the most popular category in the App Store is `Games`, which accounts for ~58% of all free app categories. The runner up is the `Entertainment` genre, at 7%. Which, you could probably lop those 2 categories together. It's safe to say that apps geared towards entertainment are most popular in the app store. It's hard to make an app profile recommendation based upon this information alone. A genre with many apps (like games) is likely saturated, and thus harder to break into. A better scenario would be one where there is a sizeable user base with fewer apps to compete with. 

In [113]:
#Percentage frequency table for Android data -- Category column
display_table(android_free,1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [114]:
#Percentage frequency table for Android data -- Genre column
display_table(android_free,-4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

### Google Play genre breakdown
Regarding the `Category` column, the top 3 most common include Family, Games, and Tools (with Business as a distant 4th option). Looking at the `Genre` column, the top 3 genres include Tools, Entertainment and Education (Business being a close 4th). Rankings for these 2 datasets mirror each other relatively well, though I would guess that the `Category` column contains broader categories than those contained in the `Genre` column. There's much more granularity with the latter. 
In terms of other patterns -- Entertainment/Game categories take the take in terms of most apps available. After that, more practical apps in the education/business/tool category are second-most popular.

### Comparing Genre data between markets
Gaming and Entertainment apps are popular across both marketplaces, though these kinds of apps appear to have less of a lead in the Google Play market. 
The `Photo & Video` category in the App Store has double the popularity that it does in the Google Play Store (making up ~4% of apps vs. ~2% in Google Play). 
The `Family` category in the Google Play store is top tier (making up 18.9% of apps) and at first glance does not appear to have a comparable category partner in the App Store.
The Google Play store also appears to have more `Business` apps (4.59% of total apps) vs the App Store (0.52% of apps are business-related). 
Again, it would be daft to provide a strong app profile recommendation given this analysis, which looks strictly at market saturation. The only recommendation I feel confident in giving is to steer clear of any apps in the Entertainment/Games space, given their heightened saturation in both markets. We still do not have a great sense of how many users are in each app genre, which would help determine opportunity. That's where we're headed next.

In [115]:
#Frequency table for sum of user ratings (App Store)

unique_genres = freq_table(ios_free, 11)

#Loop through genres ID'd in previous frequency table
for genre in unique_genres:
    total = 0
    len_genre = 0
    #With each genre in mind, loop through iOS data
    for row in ios_free:
        genre_app = row[11]
        #If the genre in the iOS data matches the genre in mind, add the ratings to our total variable
        if genre_app == genre:
            ratings_count = float(row[5])
            total += ratings_count
            len_genre += 1
    avg_number_ratings = total / len_genre
    print(genre,": ",avg_number_ratings)

Social Networking :  71548.34905660378
Photo & Video :  28441.54375
Games :  22788.6696905016
Music :  57326.530303030304
Reference :  74942.11111111111
Health & Fitness :  23298.015384615384
Weather :  52279.892857142855
Utilities :  18684.456790123455
Travel :  28243.8
Shopping :  26919.690476190477
News :  21248.023255813954
Navigation :  86090.33333333333
Lifestyle :  16485.764705882353
Entertainment :  14029.830708661417
Food & Drink :  33333.92307692308
Sports :  23008.898550724636
Book :  39758.5
Finance :  31467.944444444445
Education :  7003.983050847458
Productivity :  21028.410714285714
Business :  7491.117647058823
Catalogs :  4004.0
Medical :  612.0


Based upon this App Store data, I would recommend app development in the Navigation category. The category has the most app reviews, signaling heavy usership, and yet the category itself only makes up 0.18% of all apps on the App Store!!

In [116]:
#Frequency table for sum of user ratings (Google Play Store)

unique_genres_android = freq_table(android_free, 1)

for category in unique_genres_android:
    total = 0
    len_genre = 0
    for row in android_free:
        category_app = row[1]
        if category_app == category:
            n_installs = row[5]
            n_installs = n_installs.replace('+','')
            n_installs = float(n_installs.replace(',',''))
            total += n_installs
            len_genre += 1
    avg_number_ratings = total / len_genre
    print(category,": ",avg_number_ratings)

ART_AND_DESIGN :  1986335.0877192982
AUTO_AND_VEHICLES :  647317.8170731707
BEAUTY :  513151.88679245283
BOOKS_AND_REFERENCE :  8767811.894736841
BUSINESS :  1712290.1474201474
COMICS :  817657.2727272727
COMMUNICATION :  38456119.167247385
DATING :  854028.8303030303
EDUCATION :  1833495.145631068
ENTERTAINMENT :  11640705.88235294
EVENTS :  253542.22222222222
FINANCE :  1387692.475609756
FOOD_AND_DRINK :  1924897.7363636363
HEALTH_AND_FITNESS :  4188821.9853479853
HOUSE_AND_HOME :  1331540.5616438356
LIBRARIES_AND_DEMO :  638503.734939759
LIFESTYLE :  1437816.2687861272
GAME :  15588015.603248259
FAMILY :  3695641.8198090694
MEDICAL :  120550.61980830671
SOCIAL :  23253652.127118643
SHOPPING :  7036877.311557789
PHOTOGRAPHY :  17840110.40229885
SPORTS :  3638640.1428571427
TRAVEL_AND_LOCAL :  13984077.710144928
TOOLS :  10801391.298666667
PERSONALIZATION :  5201482.6122448975
PRODUCTIVITY :  16787331.344927534
PARENTING :  542603.6206896552
WEATHER :  5074486.197183099
VIDEO_PLAYERS 

The Google Play Store install data reveals a few things -- The **Communication** category has the most installs, at around 38 million. **Video Players** and **Social** have the 2nd most # of installs, both around 24 million. Following up in install count are **Productivity** and **Photography** categories, with around 17 million downloads. **Tools** and the **Books_and_Reference** categories also show some promise, with 8-10 million reviews apiece. 

## Conclusion

**Photography** looks to be a decent profile, given that it ranks 3rd or 4th in terms of total installs, yet only makes up about 2.9% of all apps on the Google Play Store. The **Photography** category has some favorable aspects in the App Store as well -- it is middle of the road in terms of popularity (28k avg number of ratings) and accounts for 5% of apps on the App Store.