# Profitable App Profiles for the App Store and Google Play Markets

In this guided project by Dataquest, I pretended to work as a data analyst for a company that builds Android and iOS mobile apps. The company only builds apps that are free to download and install. This means that the main revenue for any app is mostly influenced by the number of users who see and engage the in-app ads. My goal was to analyze available data and assist the company developers in understanding the type of apps that are likely to attract more users. 

## Opening and Exploring the Data

As of the third quarter of 2019, there are 2.47 million apps in Google Play and 1.8 million apps in the Apple App Store ([Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)). To avoid spending resources on collecting new data, I used these two data sets for this project:

* A [data set](https://www.kaggle.com/lava18/google-play-store-apps) about 9,000+ Android apps from Google Play; the apps were last updated on August 2018
* A [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) about 7,000+ iOS apps from the App Store; the data was collected on July 2017

I opened the two data sets to continue exploring the data.


In [1]:
from csv import reader

# Google Play data set
opened_file = open('googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
android = list(read_file)

# App Store data set
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)

Dataquest provided a function named `explore_data()` to easily view the selected rows and report the number of rows and columns of the input data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

To use`explore_data()`, the data set must not have a header row.

In [3]:
android_head = android[0]
android = android[1:]

print(android_head)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


The data set from Google Play has 10,841 rows and 13 columns. I used the following columns for data analysis:

* `'App'` - Application name
* `'Category'` - Category the app belongs to
* `'Installs'` - Number of user downloads/installs for the app
* `'Price'` - Price of the app
* `'Genres'` - An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to Music, Game, Family genres.

I found that there is a mismatch between the number of rows (10,841) and the number of unique values in the `'App'` column (9,660 as stated from the [source](https://www.kaggle.com/lava18/google-play-store-apps)). This suggests the presence of duplicate entries in the data set which must be removed before data analysis.

In [4]:
ios_head = ios[0]
ios = ios[1:]

print(ios_head)
print('\n')
explore_data(ios, 0, 3, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17


The data set from the App Store has 7,197 rows and 17 columns. I used the following columns for data analysis:

* `'track_name` - App name
* `'price'` - Price amount
* `'rating_count_tot'` - User rating counts (for all version)
* `'prime_genre'` - Primary genre

[The source of the data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) indicates 7,195 unique values as opposed to the computed 7,197 rows. This suggests duplicate entries in the data set which I needed to remove before data analysis.

## Deleting Wrong Data

[One of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) of the Google Play data set describes an error at row 10472. I first confirmed the error by printing and comparing it to another row from the data set. 

In [5]:
print(android_head)
print('\n')
print(android[10472])
print('\n')
print(android[10])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Text on Photo - Fonteee', 'ART_AND_DESIGN', '4.4', '13880', '28M', '1,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'October 27, 2017', '1.0.4', '4.1 and up']


The app `'Life Made WI-Fi Touchscreen Photo Frame'` has a missing `'Category'` value which shifts the values from the other columns. I removed the said row from the data set. 

In [6]:
del android[10472]
len(android)

10840

## Removing Duplicate Entries - Part 1

For the Google Play data set, I obtained the list and number of unique and duplicate apps. 

In [7]:
unique_apps = []
duplicate_apps = []

for app in android:
    app_name = app[0]
    
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)

print('Here are some examples of duplicate apps in the Google Play data set:')
print(duplicate_apps[:20])
print('\n')
print('Number of unique apps =', len(unique_apps))
print('Number of duplicate apps =', len(duplicate_apps))

Here are some examples of duplicate apps in the Google Play data set:
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


Number of unique apps = 9659
Number of duplicate apps = 1181


The Google Play data set has 1,181 duplicate apps. I inspected some of the duplicate apps, e.g. `'Slack'` and `'Google Ads'`, and observed a difference in the fourth column `'Reviews'`.

In [8]:
for app in android:
    if app[0] == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


In [9]:
for app in android:
    if app[0] == 'Google Ads':
        print(app)

['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


Before any data analysis, I needed to keep one row per unique app in the data set. Rather than randomly removing the duplicates in the data set, I decided to keep the entry with the highest value in the `'Reviews'` column. I assumed that the higher the number of reviews of the app, the more recent its data is. 

## Removing Duplicate Entries - Part 2

To remove the duplicate entries in the Google Play data set, I performed the following:

* Created the empty dictionary `reviews_max`
* Looped through the `android` data set and:
    * Extracted the app name (`name`) and number of reviews (`n_reviews`)
    * If `name` is a key in `reviews_max` and `reviews_max[name] < n_reviews`, update `reviews_max[name]` with `n_reviews`
    * If `name` is not in the `reviews_max`, add a new entry to `reviews_max` with `name` as the key and `n_reviews` as the value
* Inspected the length of `reviews_max`. This should be equal to 9,659 which is the number of unique apps in the Google Play data set.

In [10]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
len(reviews_max)
    

9659

I then used `reviews_max` to remove the duplicate rows by:

* Created two empty lists: `android_clean` (to store the cleaned data set) and `already_added` (to track the apps that are already added in `android_clean`)
* Looped through the `android` data set and:
    * Extracted the app name (`name`) and number of reviews (`n_reviews`)
    * If `n_reviews` is the same as `reviews_max[name]` and `name` is not in `already_added`, append the entire row in `android_clean` and `name` in `already_added`

In [11]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

I used `explore_data()` and ensured that the number of rows in `android_clean` is 9,659.  

In [12]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps - Part 1

Since the company uses English to develop its apps,  I only analyzed the apps that are directed toward an English-speaking audience. However, both the `android_clean` and `ios` data sets have non-English app names. 

In [13]:
print(android_clean[4412][0])
print(android_clean[7940][0])
print('\n')
print(ios[814][2])
print(ios[6734][2])

中国語 AQリスニング
لعبة تقدر تربح DZ


搜狐新闻—新闻热点资讯掌上阅读软件
エレメンタル ファンタジー - 高精細３ＤアクションＲＰＧ


I wrote the `is_english()` function that takes in a string and returns `False` if there's any character in the string that doesn't belong to the set of common English characters (ranges from 0 to 127 according to the ASCII system), otherwise it returns `True`. 

In [14]:
def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

However, `is_english()` couldn't correctly identify some English app names, e.g. `'Docs To Go™ Free Office Suite'` and `'Instachat 😜'`, due to the emojis and special characters that fall outside the ASCII range (> 127). 

In [15]:
print(is_english('Google Ads'))
print(is_english('中国語 AQリスニング'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


## Removing Non-English Apps - Part 2

I edited `is_english()` such that it will only remove an app if its name has more than three characters that fall outside the ASCII. 

In [16]:
def is_english(string):
    count = 0
    for character in string:
        if ord(character) > 127:
            count += 1
            if count > 3:
                return False
            
    return True

print(is_english('Google Ads'))
print(is_english('中国語 AQリスニング'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


I then applied `is_english()` to remove the non-English apps in `android_clean` and `ios` data sets. 

In [17]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    
    if is_english(name):
        android_english.append(app)

for app in ios:
    name = app[2]
    
    if is_english(name):
        ios_english.append(app)

explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188

I am left with 9,614 apps from Google Play and 6,183 apps from the App Store. 

## Isolating the Free Apps

The company only builds apps that are free to download and install, so I isolated the free apps from `android_english` and `ios_english` data sets. 

In [18]:
android_free = []
ios_free = []

for app in android_english:
    price = app[7]
    
    if price == '0':
        android_free.append(app)
    
for app in ios_english:
    price = float(app[5])
    
    if price == 0.0:
        ios_free.append(app)
        
explore_data(android_free, 0, 3, True)
print('\n')
explore_data(ios_free, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Sh

I am left with 8,864 apps from Google Play and 3,222 apps from the App Store for data analysis. 

## Most Common Apps by Genre

I wanted to find an app profile that would fit Google Play and the App Store. The company has the following validation strategy for an app idea:

* Build a minimal Android version of the app and add it to Google Play
* If the app has a good response from the users, the company will develop it further
* If the app is profitable after 6 months, an iOS version of the app will be built and added to the App Store

The company aims to add the app in both Google Play and the App Store. Therefore, the app profiles must be successful in both markets. Before recommending app profiles, I needed to know first which app genres are most common in Google Play and the App Store. 

I created the `freq_table()` function to produce a frequency table for the selected columns of `android_free` (`Category` and `Genres`) and `ios_free` (`prime_genre`) datasets. I expressed the frequency values in percentages. 

In [19]:
def freq_table(dataset, index):
    frequency_table = {}
    for row in dataset:
        a_data_point = row[index]
        
        if a_data_point in frequency_table:
            frequency_table[a_data_point] += 1
        else:
            frequency_table[a_data_point] = 1
            
    for key in frequency_table:
        frequency_table[key] = round(100*frequency_table[key]/len(dataset), 2) 
    
    return frequency_table

Dataquest provided the function `display_table()` which displays the percentages generated by `freq_table()` in descending order.  

In [20]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Analysis of the `Category` column of `android_free` dataset shows that the most common app category is `FAMILY` (18.91%), followed by `GAME` (9.72%), `TOOLS` (8.46%), `BUSINESS` (4.59%), and `LIFESTYLE` (3.9%). 

In [21]:
display_table(android_free, 1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


Meanwhile, analysis of the `Genres` column of `android_free` shows that the most common app genres are `Tools` (8.45%), `Entertainment` (6.07%), `Education` (5.35%), `Business` (4.59%), and `Productivity` (3.89%). 

In [22]:
display_table(android_free, 9)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

Results indicate that there is a good balance of English free apps in Google Play for fun and practical applications. 

As for the `prime_genre` column of the `ios_free` dataset, more than half of the apps belong to `Games` primary genre (58.16%). Apps related to `Entertainment` (7.88%), `Photo & Video` (4.97%), `Education` (3.66%), and `Social Networking` (3.29%) are also common. The results of the analysis show that unlike in Google Play, most of English free apps in the App Store are for fun/games. 

In [23]:
display_table(ios_free, 12)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


While I was able to determine the most frequent genres/categories of English free apps in Google Play and the App Store, this does not reflect the number of users or the actual demand for the apps. 

## Most Popular Apps by Category/Genre in Google Play

To have a measure on the number of users per category and genre of English free apps in Google Play, I computed the average number of installs per category or genre of apps using the `Installs` column of `android_free` data set. Data analysis shows the following categories have the highest average number of user installs: `COMMUNICATION`, `VIDEO_PLAYERS`, `SOCIAL`, `PHOTOGRAPHY`, `PRODUCTIVITY`, `GAME`, `TRAVEL_AND_LOCAL`, `ENTERTAINMENT`, `TOOLS`, and `NEWS_AND_MAGAZINES`. 

In [24]:
android_category_ft = freq_table(android_free, 1)
category_installs = []

for category in android_category_ft:
    total = 0
    len_category = 0
    
    for row in android_free:
        category_app = row[1]
        
        if category_app == category:
            installs = row[5].replace('+','').replace(',','')
            installs = float(installs)
            total += installs
            len_category += 1
            
    avg_installs = total/len_category
    
    category_vs_installs = (avg_installs, category)
    category_installs.append(category_vs_installs)
    
category_installs_sorted = sorted(category_installs, reverse=True)
for category in category_installs_sorted:
    print(category[1], ":", f"{category[0]:,.1f}")

COMMUNICATION : 38,456,119.2
VIDEO_PLAYERS : 24,727,872.5
SOCIAL : 23,253,652.1
PHOTOGRAPHY : 17,840,110.4
PRODUCTIVITY : 16,787,331.3
GAME : 15,588,015.6
TRAVEL_AND_LOCAL : 13,984,077.7
ENTERTAINMENT : 11,640,705.9
TOOLS : 10,801,391.3
NEWS_AND_MAGAZINES : 9,549,178.5
BOOKS_AND_REFERENCE : 8,767,811.9
SHOPPING : 7,036,877.3
PERSONALIZATION : 5,201,482.6
WEATHER : 5,074,486.2
HEALTH_AND_FITNESS : 4,188,822.0
MAPS_AND_NAVIGATION : 4,056,941.8
FAMILY : 3,695,641.8
SPORTS : 3,638,640.1
ART_AND_DESIGN : 1,986,335.1
FOOD_AND_DRINK : 1,924,897.7
EDUCATION : 1,833,495.1
BUSINESS : 1,712,290.1
LIFESTYLE : 1,437,816.3
FINANCE : 1,387,692.5
HOUSE_AND_HOME : 1,331,540.6
DATING : 854,028.8
COMICS : 817,657.3
AUTO_AND_VEHICLES : 647,317.8
LIBRARIES_AND_DEMO : 638,503.7
PARENTING : 542,603.6
BEAUTY : 513,151.9
EVENTS : 253,542.2
MEDICAL : 120,550.6


As for the genre of English free apps in Google Play, the following have the highes average number of user installs: `Communication`, `Adventure;Action & Adventure`, `Video Players & Editors`, `Social`, `Arcade`, `Casual`, `Puzzle;Action & Adventure`, `Photography`, `Educational;Action & Adventure`, and `Productivity`.

In [25]:
android_genre_ft = freq_table(android_free, 9)
genre_installs = []

for genre in android_genre_ft:
    total = 0
    len_genre = 0
    
    for row in android_free:
        genre_app = row[9]
        
        if genre_app == genre:
            installs = row[5].replace('+','').replace(',','')
            installs = float(installs)
            total += installs
            len_genre += 1
            
    avg_installs = total/len_genre
    
    genre_vs_installs = (avg_installs, genre)
    genre_installs.append(genre_vs_installs)
    
genre_installs_sorted = sorted(genre_installs, reverse=True)
for genre in genre_installs_sorted:
    print(genre[1], ":", f"{genre[0]:,.1f}")

Communication : 38,456,119.2
Adventure;Action & Adventure : 35,333,333.3
Video Players & Editors : 24,947,335.8
Social : 23,253,652.1
Arcade : 22,888,365.5
Casual : 19,569,221.6
Puzzle;Action & Adventure : 18,366,666.7
Photography : 17,840,110.4
Educational;Action & Adventure : 17,016,666.7
Productivity : 16,787,331.3
Racing : 15,910,645.7
Travel & Local : 14,051,476.1
Casual;Action & Adventure : 12,916,666.7
Action : 12,603,588.9
Strategy : 11,199,902.5
Tools : 10,802,461.2
Tools;Education : 10,000,000.0
Role Playing;Brain Games : 10,000,000.0
Lifestyle;Pretend Play : 10,000,000.0
Casual;Music & Video : 10,000,000.0
Card;Action & Adventure : 10,000,000.0
Adventure;Education : 10,000,000.0
News & Magazines : 9,549,178.5
Music : 9,445,583.3
Educational;Pretend Play : 9,375,000.0
Puzzle;Brain Games : 9,280,666.7
Word : 9,094,458.7
Racing;Action & Adventure : 8,816,666.7
Books & Reference : 8,767,811.9
Puzzle : 8,302,861.9
Video Players & Editors;Music & Video : 7,500,000.0
Shopping : 7,0

Using the code below, I inspected some of the genres to have an idea of its corresponding app category. I found that `Adventure;Action & Adventure`, `Puzzle;Action & Adventure`, and `Educational;Action & Adventure` genres are all under the `FAMILY` category. The genre `Casual` has apps only under `FAMILY` and `GAME` categories. 

In [26]:
for app in android_free:
    if app[9] == 'Adventure;Action & Adventure':     # Set the app genre here
        print(app[0], ":", app[1])

Leo and Tig : FAMILY
Transformers Rescue Bots: Hero Adventures : FAMILY
ROBLOX : FAMILY


Looking at the average number of installs for both app categories and genres, English free apps designed for communication, photo and video, social networking, productivity, and games have the most number of users in Google Play.

## Most Popular Apps by Genre on the App Store

To have an idea on the number of users per genre of English free apps in the App Store, I computed the average total number of user ratings per genre using the `rating_count_tot` column of `ios_free` data set. Results show `Navigation`, `Reference`, `Socia Networking`, `Music`, `Weather`, `Book`, `Food & Drink`, `Finance`, `Photo & Video`, and `Travel` genres have the highest average total number of user ratings. 

In [27]:
ios_genre_ft = freq_table(ios_free, 12)
genre_rating = []

for genre in ios_genre_ft:
    total = 0
    len_genre = 0
    
    for row in ios_free:
        genre_app = row[12]
        
        if genre_app == genre:
            ratings = float(row[6])
            total += ratings
            len_genre += 1
    
    avg_ratings = total/len_genre
    
    genre_vs_rating = (avg_ratings, genre)
    genre_rating.append(genre_vs_rating)
    
genre_rating_sorted = sorted(genre_rating, reverse=True)
for genre in genre_rating_sorted:
    print(genre[1], ":", f"{genre[0]:,.1f}")

Navigation : 86,090.3
Reference : 74,942.1
Social Networking : 71,548.3
Music : 57,326.5
Weather : 52,279.9
Book : 39,758.5
Food & Drink : 33,333.9
Finance : 31,467.9
Photo & Video : 28,441.5
Travel : 28,243.8
Shopping : 26,919.7
Health & Fitness : 23,298.0
Sports : 23,008.9
Games : 22,788.7
News : 21,248.0
Productivity : 21,028.4
Utilities : 18,684.5
Lifestyle : 16,485.8
Entertainment : 14,029.8
Business : 7,491.1
Education : 7,004.0
Catalogs : 4,004.0
Medical : 612.0


Since the company was looking for app profiles that have a high number of users in both Google Play and App Store, I now compared the results of my data analysis for `android_free` and `ios_free` as shown.


| Rank | Most installed app categories (Google Play) | Most installed app genres (Google Play) | Most rated  app genres (App Store) |
|:----:|:-------------------------------------------:|:---------------------------------------:|:----------------------------------:|
|   1  |                COMMUNICATION                |              Communication              |             Navigation             |
|   2  |                VIDEO_PLAYERS                |       Adventure;Action & Adventure      |              Reference             |
|   3  |                    SOCIAL                   |         Video Players & Editors         |          Social Networking         |
|   4  |                 PHOTOGRAPHY                 |                  Social                 |                Music               |
|   5  |                 PRODUCTIVITY                |                  Arcade                 |               Weather              |
|   6  |                     GAME                    |                  Casual                 |                Book                |
|   7  |               TRAVEL_AND_LOCAL              |        Puzzle;Action & Adventure        |            Food & Drink            |
|   8  |                ENTERTAINMENT                |               Photography               |               Finance              |
|   9  |                    TOOLS                    |      Educational;Action & Adventure     |            Photo & Video           |
|  10  |              NEWS_AND_MAGAZINES             |               Productivity              |               Travel               |

The results of the comparison show English free apps with applications to social networking, photography, and video have the most number of users in Google Play and the App Store. 

## Conclusions

From the results, I concluded that developing English free apps for social networking, photography, or video could be profitable for both Google Play and App Store markets.   