# App Profile Recommendation
This project analyzes free-to-download apps in both Google Play and the App Store in order to see which types of apps attract the most English-speaking users. More users means more revenue, as the sole source of free-to-download apps are usually in-app ads and in-app purchases.

### Opening and Exploring the Data
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. We will be using 2 files that contain a sample of 10,000 Android apps (googleplaystore.csv) and 7,000 iOS apps (AppleStore.csv). We will use the below explore_data function to help us explore these datasets

In [1]:
# All of our imports here
from csv import reader
import math

In [2]:
# This function allows us to print rows of a dataset in a readable way. 
# This function assumes that dataset does not include the header row.
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

explore_data requires the dataset argument to be a list of lists. 
Below, we open the two files and create list of lists objects to represent the data in those files.

In [3]:
with open('AppleStore.csv', encoding='utf8') as apple_file:
    apple_reader = reader(apple_file)
    apple_data = list(apple_reader)
    
with open('googleplaystore.csv', encoding='utf8') as google_file:
    google_reader = reader(google_file)
    google_data = list(google_reader)

Let's check to make sure we correctly imported the data by looking at the first few rows, and checking the size.

In [4]:
explore_data(apple_data, 0, 10, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061

In [5]:
explore_data(google_data, 0, 10, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

Let's look at the columns and see if any in particular could help us identify what types of apps attract more users. First, for the App Store (Apple: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps). I have included links to the original dataset for more detailed descriptions of the columns.

In [6]:
print(apple_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


It looks like there are two types of metrics that would be useful. One type of metric describes user engagement. This includes columns like rating_count_tot and user_rating. The other type of metric describes the content of app. This includes columns like cont_rating and prime_genre. Other columns like price will be important, as we will want to filter out the apps that are not free since they are out of the scope of this experiment.

We do the same analysis on the Google Data (https://www.kaggle.com/lava18/google-play-store-apps):

In [7]:
print(google_data[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Here, we can split the useful columns into the same 2 categories. For the user engagement columns, we have Rating, Reviews, and Installs. For the app content columns, we have Category, Content Rating, and Genres. Again, we will use the Type column to filter out the apps that are not free.

Also note that these are just guesses. The analysis we do in this project may show that how up-to-date and frequently an app is patched/improved may also have a significant effect on user engagement.

In [8]:
# Let's also remove the headers from both data sets:
apple_data_no_header = apple_data[1:]
google_data_no_header = google_data[1:]

### Deleting Wrong Data

Before we analyze the data, we have to clean it first (remove/correct wrong data, remove duplicate data, and/or modify data to fit the purpose of our analysis). Remember that we only want to look at apps that are free to download and directed toward an English-speaking audience. This means we need to:

1. Remove non-English apps.
2. Remove apps that aren't free.

The Kaggle discussion board that this dataset came from also indicates that there is a wrong rating for entry 10472: https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015. The 'Rating' is missing and the rest of the columns are shifted as a result. We will check this row.

In [9]:
print(google_data_no_header[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


We will delete this (run this only once, or it will delete the next row as well):

In [10]:
del google_data_no_header[10472]

Let's check that the row was deleted:

In [11]:
print(google_data_no_header[10472])
print(google_data_no_header[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']
['Sat-Fi Voice', 'COMMUNICATION', '3.4', '37', '14M', '1,000+', 'Free', '0', 'Everyone', 'Communication', 'November 21, 2014', '2.2.1.5', '2.2 and up']


After reading through the Discussion Forum for the Apple App Store, it looks like there are no weird edge cases of wrong data in the Apple data set: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion. Now let's work on removing duplicate data.

In the Google Data set, we can see that we have some duplicates. For example, here is some duplicate data for the Instagram app.

In [12]:
for app in google_data_no_header:
    app_name = app[0]
    if app_name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Removing duplicate data is a 2-step process. We must:

1. Identify duplicate date.
2. Determine how to remove duplicate data.

For the first step, we can use the below code to help us identify duplicate data. We will first handle duplicates in the Google Play dataset.

In [13]:
duplicate_apps = []
unique_apps = []

for app in google_data_no_header:
    app_name = app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)
        
print("Number of duplicate apps in Google data: " + str(len(duplicate_apps)))
print("\n")
print("Examples of duplicate apps in Google data: " + str(duplicate_apps[:15]))

Number of duplicate apps in Google data: 1181


Examples of duplicate apps in Google data: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Now that we have a way of identifying the duplcates, we need to determine how we delete the duplicate data. Deleting at random is not the best method here. If we look at the Instagram example above, we can see that the 4th column changes for each row. This column is the number of ratings. The difference in number of ratings could suggest that the data was collected at different times. We can handle this by only keeping the row with the most number of rows, indicating the most up-to-date column.

We will do this by creating a dictionary of {app_name: max_rating_count} for each app, then creating a new dataset from this dictionary. After we do this, we should expect to have this many rows:

In [14]:
print("Expected Length:", len(unique_apps))

Expected Length: 9659


In [15]:
reviews_max = {}

for app in google_data_no_header:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max:
        if reviews_max[name] < n_reviews:
            reviews_max[name] = n_reviews
    else:
        reviews_max[name] = n_reviews
        
print(list(reviews_max.items())[:10])

[('Photo Editor & Candy Camera & Grid & ScrapBook', 159.0), ('Coloring book moana', 974.0), ('U Launcher Lite – FREE Live Cool Themes, Hide Apps', 87510.0), ('Sketch - Draw & Paint', 215644.0), ('Pixel Draw - Number Art Coloring Book', 967.0), ('Paper flowers instructions', 167.0), ('Smoke Effect Photo Maker - Smoke Editor', 178.0), ('Infinite Painter', 36815.0), ('Garden Coloring Book', 13791.0), ('Kids Paint Free - Drawing Fun', 121.0)]


Let's check to see that the length of the dictionary is 9659:

In [16]:
print(len(reviews_max))

9659


Now let's build our new, cleaned data set with duplicate rows removed:

In [17]:
android_clean = []
already_added = []

for app in google_data_no_header:
    name = app[0]
    n_reviews = float(app[3])
    if name not in already_added and n_reviews == reviews_max[name]:
        android_clean.append(app)
        already_added.append(name)

In the above code block, we create two lists: android_clean (a list of lists that will be our cleaned data set with duplicate rows removed) and already_added (a list of app names we have already added). We iterate through the google data set, and if the review count is the max review count for that app, we add that row to the android_clean list, and add the app name to already_added. We have to do the latter just in case there is a scenario where the duplicte rows are identical (even the rating counts may be the same).

Let's check that the length of our android_clean list is 9659.

In [18]:
print(len(android_clean))

9659


Now let's work on removing apps that are for non-English speakers. For this exercise, we will assume that any app that has a non-English character in its name is not for English speakers. For example:

In [19]:
print(apple_data_no_header[813][1])
print(apple_data_no_header[6731][1])
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


How do we identify characters that are not English? Every character has an associated ASCII value, and the English ASCII values fall within the range of 0 - 127. We can obtain the ASCII value of a character by using the ord() built-in function. Let's write a function that accepts a string, iterates through its characters, then checks the ASCII values. It should return True if the string only has English characters (ASCII values within 0 - 127) and False otherwise.

In [20]:
def is_English_old(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

# Testing our function:
print(is_English_old('Instagram'))
print(is_English_old('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_English_old('Docs To Go™ Free Office Suite'))
print(is_English_old('Instachat 😜'))

True
False
False
False


We can see a clear weakness with our function above. It considers apps that contain special characters like ™ and emojis like 😜 non-English, because their ASCII values fall outside of the 0-127 range.

In [21]:
print(ord('™'))
print(ord('😜'))

8482
128540


We will resolve this by allowing the app name to have up to 3 characters outside of the 0 - 127 ASCII range. This is not perfect, but should let us keep good data in our dataset that we would otherwise throw out.

In [22]:
def is_English(string):
    num_non_english = 0
    for character in string:
        if ord(character) > 127:
            num_non_english += 1
            if num_non_english > 3:
                return False
    return True

# Testing our new function:
print(is_English('Instagram'))
print(is_English('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_English('Docs To Go™ Free Office Suite'))
print(is_English('Instachat 😜')) 

True
False
True
True


Let's now clean the android_clean dataset so it only contains apps for English speakers:

In [23]:
android_clean_eng = []
for app in android_clean:
    app_name = app[0]
    if is_English(app_name):
        android_clean_eng.append(app)
        
print("We now have this many rows:", len(android_clean_eng))
print(list(reviews_max.items())[:10])

We now have this many rows: 9614
[('Photo Editor & Candy Camera & Grid & ScrapBook', 159.0), ('Coloring book moana', 974.0), ('U Launcher Lite – FREE Live Cool Themes, Hide Apps', 87510.0), ('Sketch - Draw & Paint', 215644.0), ('Pixel Draw - Number Art Coloring Book', 967.0), ('Paper flowers instructions', 167.0), ('Smoke Effect Photo Maker - Smoke Editor', 178.0), ('Infinite Painter', 36815.0), ('Garden Coloring Book', 13791.0), ('Kids Paint Free - Drawing Fun', 121.0)]


To summarize everything that we've done so far for the Google Data Set, we:
1. Removed the 1 bad row that was missing data in its Category column.
2. Removed duplicates, kept the single record that has the most user ratings.
3. Removed apps for non-English-speaking users.
Let's do all this now for the Apple data Set. 

Let's remind ourselves of the columns in the Apple Data Set:

In [24]:
print(apple_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


We will need the 'track_name' (index 1) and 'rating_count_tot' (index 5).

In [25]:
# Building max_ratings_apple dictionary.
max_ratings_apple = {}
for app in apple_data_no_header:
    app_name = app[1]
    rating_count = float(app[5])
    if app_name in max_ratings_apple:
        if max_ratings_apple[app_name] < rating_count:
            max_ratings_apple[app_name] = rating_count
    else:
        max_ratings_apple[app_name] = rating_count 
        
# Building new Apple dataset with no duplicates.
apple_clean = []
apple_apps_added = []
for app in apple_data_no_header:
    app_name = app[1]
    rating_count = float(app[5])
    max_rating = max_ratings_apple[app_name]
    if app_name not in apple_apps_added and rating_count == max_rating:
        apple_clean.append(app)
        apple_apps_added.append(app_name)

# Removing non-English apps in Apple dataset.
apple_clean_eng = []
for app in apple_clean:
    app_name = app[1]
    if is_English(app_name):
        apple_clean_eng.append(app)
        
print("Inspect apple data:", apple_clean_eng[:10])

Inspect apple data: [['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1'], ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1'], ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1'], ['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '

We are almost done with cleaning all our data. The last step is to remove the non-free apps. Let's check which columns in each dataset give us this information.

In [26]:
print(google_data[0])
print(apple_data[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In the Google data set, we will check if the Type column is 'Free' (index 6), and in the Apple data set, we will check if the price column is 0.0 (index 4).

In [27]:
android_clean_eng_free = []
for google_app in android_clean_eng:
    app_type = google_app[6]
    if app_type == 'Free':
        android_clean_eng_free.append(google_app)
        
apple_clean_eng_free = []
for apple_app in apple_clean_eng:
    app_price = float(apple_app[4])
    if app_price == 0:
        apple_clean_eng_free.append(apple_app)

print("We have this many free, English apps in Google data set:", len(android_clean_eng_free))
print("We have this many free, English apps in Apple data set:", len(apple_clean_eng_free))

We have this many free, English apps in Google data set: 8863
We have this many free, English apps in Apple data set: 3220


### Analyzing the Data
Let's suppose that our plan to deploy the app involves 3 steps:
1. Build a minimal Android version of the app and deploy on Google Play.
2. If the app has a good response from users, develop the app further.
3. If the app is profitable after 6 months, we deploy an iOS version of the app to the App Store.
We need an app that is successful in both markets.

Let's start off by first getting a sense of what genres are the most common for each market. We'll do this by building some frequency tables.

Let's inspect the columns of each dataset to see which we could use to develop frequency tables for genres.

In [28]:
print(apple_data[0])
print(google_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


For Apple, we can use 'prime_genre' (index 11) to determine genre.

For Google, we can use 'Category' (index 1) and 'Genres' (index 9) to determine genre.

We first will create a function that accepts a dataset and column index and generates the frequency table as a dictionary (table of frequencies in percentages) of any column in the dataset.

In [29]:
'''
These 2 functions will generate a dictionary where the key is the column value of interest, 
and the value is the frequency of that column value as a percentage.
''' 
def freq_table(dataset, index):
    total_num_samples = len(dataset)
    freq_dict = {}
    
    for row in dataset:
        index_val = row[index]
        if index_val in freq_dict:
            freq_dict[index_val] += 1
        else:
            freq_dict[index_val] = 1
            
    return generate_percentage_table(freq_dict, total_num_samples)

def generate_percentage_table(freq_dict, total_num_samples):
    freq_pct_dict = {}
    for key in freq_dict:
        freq_pct_dict[key] = 100 * freq_dict[key] / total_num_samples
    
    return freq_pct_dict

We will also implement a helper function that will provide a sorted version of the frequency table.

In [30]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Let's call our display_table function on the Apple data set's prime_genre and the Google data set's Category and Genres columns.

In [31]:
print("Frequencies (%) for Apple apps prime genre:")
display_table(apple_clean_eng_free, 11)

Frequencies (%) for Apple apps prime genre:
Games : 58.13664596273292
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.2919254658385095
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602483
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801242
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


In [32]:
print("Frequencies (%) for Google apps Category:")
display_table(android_clean_eng_free, 1)

Frequencies (%) for Google apps Category:
FAMILY : 18.8987927338373
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.700778517432021
MEDICAL : 3.5315355974275078
SPORTS : 3.3961412614238973
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376733
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.2452894053932075
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496447
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916394
AUTO_AND_VEHICLES : 0.9251946293580052
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189553
PARENTIN

In [33]:
print("Frequencies (%) for Google apps Genres:")
display_table(android_clean_eng_free, 9)

Frequencies (%) for Google apps Genres:
Tools : 8.450863138892023
Entertainment : 6.070179397495205
Education : 5.348076272142615
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.700778517432021
Medical : 3.5315355974275078
Sports : 3.4638384294257025
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376733
Travel & Local : 2.324269434728647
Shopping : 2.2452894053932075
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496447
Arcade : 1.8503892587160105
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.1282861333634209
Racing : 0.9928917973598105
Role Playing : 0.9364774906916394
Libraries & Demo : 0.9364774906916

Keep in mind that the analysis we are doing on the Google and Apple app stores are for free apps for English speakers only. The conclusions we make here may not reflect the trends of the actual app stores.

Let's first look at the App Store's prime_genre frequencies. The most common genre is "Games", and the runner-up is "Entertainment". In fact, apps that are Games make up the majority of all apps in the app store. The least common types of apps are "Navigation", "Medical", and "Catalogs", all clustered at the bottom. Most of the apps tend to be geared towards entertainment ("Games", "Entertainment", "Photo % Video", "Social Networking", "Sports", "Music"). Even if we took out the "Games" percentage, the entertainment apps are more common than the practical apps like education, shopping, utilities, productivity, and lifestyle.

This frequency table alone cannot be used to recommend the type of app that is most popular. Just because there are a lot of "Games" apps in the App Store does not mean that a "Games" app will be the most popular.

We see some slightly different patterns in the Google app store. The most common "Category" is Family, with Tools being the runner-up. The most common genre is "Tools", with "Entertainment" being the runner-up. In addition, there is no one genre or category that completely dominates the Google store like game apps do in the App Store. Everything is a lot more evenly distributed. One other interesting note is that there is a "Tools" value for both Category and Genre that is in the top 3 for both frequency tables.

Again, it is important to note that these frequency tables do not show which genre of app will generate the most users. It only shows which types of free and English-language apps are most commonly found on the App and Google Stores.

Now, we will try and figure out what the most popular types of apps are. We can do this by calculating the average number of users for each genre. For Google Play, we can use Installs (Index 5), and for the App Store, we can use rating_count_tot (Index 5).

Starting off with Apple, we will:
1. Get the unique prime_genre values of the Apple dataset.
2. Calculate the total number of rating_count_tot for each genre.
3. Calculate the total number of apps belonging to each genre.
4. Divide the total number of user ratings by the total number of apps for that gene to get an estimation of the number of users per genre.

In [34]:
prime_genre_freq_table = freq_table(apple_clean_eng_free, 11)
apple_genre_popularity = {}

for genre in prime_genre_freq_table:
    num_apps = 0
    num_ratings = 0
    
    for app in apple_clean_eng_free:
        app_genre = app[11]
        if app_genre == genre:
            num_apps += 1
            num_ratings += float(app[5])
    
    genre_average_num_ratings = num_ratings / num_apps
    apple_genre_popularity[genre] = genre_average_num_ratings

sorted_apple_genre_popularity = {
    k: v for k, v in sorted(apple_genre_popularity.items(), key=lambda item: item[1], reverse=True)
}
print(sorted_apple_genre_popularity)

{'Navigation': 86090.33333333333, 'Reference': 74942.11111111111, 'Social Networking': 71548.34905660378, 'Music': 57326.530303030304, 'Weather': 52279.892857142855, 'Book': 39758.5, 'Food & Drink': 33333.92307692308, 'Finance': 31467.944444444445, 'Photo & Video': 28441.54375, 'Travel': 28243.8, 'Shopping': 26919.690476190477, 'Health & Fitness': 23298.015384615384, 'Sports': 23008.898550724636, 'Games': 22812.92467948718, 'News': 21248.023255813954, 'Productivity': 21028.410714285714, 'Utilities': 18684.456790123455, 'Lifestyle': 16485.764705882353, 'Entertainment': 14029.830708661417, 'Business': 7491.117647058823, 'Education': 7003.983050847458, 'Catalogs': 4004.0, 'Medical': 612.0}


Based on the above data, a Navigation app would get us the most users. However, Navigation apps are one of the least common on the App Store, which would be part of the reason why they have the highest average number of ratings. In addition, this fact may be because Navigation apps have the most difficult entry-to-market, either because it may be difficult to build them, or difficult to survive against the competitors. As a result, I would recommend a Social Networking app, as they are more common, have a high average number of users, and if we are planning on having our only source of revenue be ad-revenue, which is directly correlated to the number of users, I would suggest a Social Networking app based on Apple data.

Now we will do the same for Google Play data. If we look at the Installs column though, we see the following values:

In [35]:
display_table(android_clean_eng_free, 5)

1,000,000+ : 15.728308699086089
100,000+ : 11.553650005641432
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605325
1,000+ : 8.394448832223851
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.77265034412727
5,000+ : 4.5131445334536835
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543947
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.04513144533453684


It looks like instead of specific values, we have "binned" values. In order to handle this, we will treat values like "1,000,000+" and "50+" as "1,000,000" and "50". In order to do this, we need to remove the "+" and "," characters so we can convert the values into floats.

Before that, we will first generate a frequency table of the Category column to get the unique app genres.

In [36]:
category_freq_table = freq_table(android_clean_eng_free, 1)
category_freq_table

{'ART_AND_DESIGN': 0.6431230960171499,
 'AUTO_AND_VEHICLES': 0.9251946293580052,
 'BEAUTY': 0.5979916506826131,
 'BOOKS_AND_REFERENCE': 2.1437436533904997,
 'BUSINESS': 4.592124562789123,
 'COMICS': 0.6205573733498815,
 'COMMUNICATION': 3.2381812027530184,
 'DATING': 1.8616721200496447,
 'EDUCATION': 1.1621347173643235,
 'ENTERTAINMENT': 0.9590432133589079,
 'EVENTS': 0.7108202640189553,
 'FINANCE': 3.700778517432021,
 'FOOD_AND_DRINK': 1.241114746699763,
 'HEALTH_AND_FITNESS': 3.0802211440821394,
 'HOUSE_AND_HOME': 0.8236488773552973,
 'LIBRARIES_AND_DEMO': 0.9364774906916394,
 'LIFESTYLE': 3.9038700214374367,
 'GAME': 9.725826469592688,
 'FAMILY': 18.8987927338373,
 'MEDICAL': 3.5315355974275078,
 'SOCIAL': 2.6627552747376733,
 'SHOPPING': 2.2452894053932075,
 'PHOTOGRAPHY': 2.944826808078529,
 'SPORTS': 3.3961412614238973,
 'TRAVEL_AND_LOCAL': 2.335552296062281,
 'TOOLS': 8.462146000225657,
 'PERSONALIZATION': 3.317161232088458,
 'PRODUCTIVITY': 3.8925871601038025,
 'PARENTING': 0.6

In [37]:
google_category_popularity = {}

for category in category_freq_table:
    num_apps = 0
    total_installs = 0
    for app in android_clean_eng_free:
        app_category = app[1]
        if app_category == category:
            num_apps += 1
            installs = float(app[5].replace('+','').replace(',',''))
            total_installs += installs
    avg_installs_per_app = total_installs / num_apps
    google_category_popularity[category] = avg_installs_per_app

sorted_google_category_popularity = {
    k: v for k, v in sorted(google_category_popularity.items(), key=lambda item: item[1], reverse=True)
}
print(sorted_google_category_popularity)

{'COMMUNICATION': 38456119.167247385, 'VIDEO_PLAYERS': 24727872.452830188, 'SOCIAL': 23253652.127118643, 'PHOTOGRAPHY': 17840110.40229885, 'PRODUCTIVITY': 16787331.344927534, 'GAME': 15588015.603248259, 'TRAVEL_AND_LOCAL': 13984077.710144928, 'ENTERTAINMENT': 11640705.88235294, 'TOOLS': 10801391.298666667, 'NEWS_AND_MAGAZINES': 9549178.467741935, 'BOOKS_AND_REFERENCE': 8767811.894736841, 'SHOPPING': 7036877.311557789, 'PERSONALIZATION': 5201482.6122448975, 'WEATHER': 5074486.197183099, 'HEALTH_AND_FITNESS': 4188821.9853479853, 'MAPS_AND_NAVIGATION': 4056941.7741935486, 'FAMILY': 3697848.1731343283, 'SPORTS': 3638640.1428571427, 'ART_AND_DESIGN': 1986335.0877192982, 'FOOD_AND_DRINK': 1924897.7363636363, 'EDUCATION': 1833495.145631068, 'BUSINESS': 1712290.1474201474, 'LIFESTYLE': 1437816.2687861272, 'FINANCE': 1387692.475609756, 'HOUSE_AND_HOME': 1331540.5616438356, 'DATING': 854028.8303030303, 'COMICS': 817657.2727272727, 'AUTO_AND_VEHICLES': 647317.8170731707, 'LIBRARIES_AND_DEMO': 638

Remember that our goal is to find an app genre that would be successful in both the Google and Apple app stores. It looks like Social/Social Networking apps would be a good choice since that genre is the only one in the top 3 for both Google and Apple app stores by popularity.

Let's look at the "Genres" column instead of the "Category" column for the Google data and see what trends we see.

In [38]:
genres_freq_table = freq_table(android_clean_eng_free, 9)
genres_freq_table

{'Art & Design': 0.5979916506826131,
 'Art & Design;Creativity': 0.06769716800180525,
 'Auto & Vehicles': 0.9251946293580052,
 'Beauty': 0.5979916506826131,
 'Books & Reference': 2.1437436533904997,
 'Business': 4.592124562789123,
 'Comics': 0.6092745120162473,
 'Comics;Creativity': 0.01128286133363421,
 'Communication': 3.2381812027530184,
 'Dating': 1.8616721200496447,
 'Education': 5.348076272142615,
 'Education;Creativity': 0.04513144533453684,
 'Education;Education': 0.33848584000902626,
 'Education;Pretend Play': 0.05641430666817105,
 'Education;Brain Games': 0.033848584000902626,
 'Entertainment': 6.070179397495205,
 'Entertainment;Brain Games': 0.07898002933543946,
 'Entertainment;Creativity': 0.033848584000902626,
 'Entertainment;Music & Video': 0.16924292000451313,
 'Events': 0.7108202640189553,
 'Finance': 3.700778517432021,
 'Food & Drink': 1.241114746699763,
 'Health & Fitness': 3.0802211440821394,
 'House & Home': 0.8236488773552973,
 'Libraries & Demo': 0.936477490691639

In [39]:
google_genre_popularity = {}

for genre in genres_freq_table:
    num_apps = 0
    total_installs = 0
    for app in android_clean_eng_free:
        app_genre = app[9]
        if app_genre == genre:
            num_apps += 1
            installs = float(app[5].replace('+','').replace(',',''))
            total_installs += installs
    avg_installs_per_app = total_installs / num_apps
    google_genre_popularity[genre] = avg_installs_per_app

sorted_google_genre_popularity = {
    k: v for k, v in sorted(google_genre_popularity.items(), key=lambda item: item[1], reverse=True)
}
print(sorted_google_genre_popularity)

{'Communication': 38456119.167247385, 'Adventure;Action & Adventure': 35333333.333333336, 'Video Players & Editors': 24947335.796178345, 'Social': 23253652.127118643, 'Arcade': 22888365.48780488, 'Casual': 19569221.602564104, 'Puzzle;Action & Adventure': 18366666.666666668, 'Photography': 17840110.40229885, 'Educational;Action & Adventure': 17016666.666666668, 'Productivity': 16787331.344927534, 'Racing': 15910645.681818182, 'Travel & Local': 14051476.145631067, 'Casual;Action & Adventure': 12916666.666666666, 'Action': 12603588.872727273, 'Strategy': 11339901.3125, 'Tools': 10802461.246995995, 'Lifestyle;Pretend Play': 10000000.0, 'Casual;Music & Video': 10000000.0, 'Tools;Education': 10000000.0, 'Card;Action & Adventure': 10000000.0, 'Adventure;Education': 10000000.0, 'Role Playing;Brain Games': 10000000.0, 'News & Magazines': 9549178.467741935, 'Music': 9445583.333333334, 'Educational;Pretend Play': 9375000.0, 'Puzzle;Brain Games': 9280666.666666666, 'Word': 9094458.695652174, 'Raci

We can see that Communication apps are the most popular in both the Category and Genre analysis. We also see Video Player apps high up in both tables. Social apps are still high in both Category and Genre popularity tables. This might suggest that if we want an app that will attract a large user base, it might be best to create a Social Networking app where users can watch and make vides and communicate to each other via talk/text/video (basically just Instagram/Tik Tok...).

### Changing the Scope

Now, let's suppose that instead of limiting ourselves ad revenue as our only revenue stream, we wanted to make revenue via in-app purchases and subscriptions. If this is the case, instead of looking at number of users and installs, we could try to find out which genres seem to be liked the most by users. We are making the reasonable assumption that a more liked app is more likely to generate in-app purchases and subscriptions. For this, we could examine app ratings for each genre.

We will calculate the average app rating for each genre, starting off with the Apple Store.

In [40]:
prime_genre_freq_table = freq_table(apple_clean_eng_free, 11)
apple_genre_likeability = {}

for genre in prime_genre_freq_table:
    num_apps = 0
    total_score = 0
    
    for app in apple_clean_eng_free:
        app_genre = app[11]
        if app_genre == genre:
            num_apps += 1
            total_score += float(app[7])
    
    genre_average_user_rating = total_score / num_apps
    apple_genre_likeability[genre] = genre_average_user_rating

sorted_apple_genre_likeability = {
    k: v for k, v in sorted(apple_genre_likeability.items(), key=lambda item: item[1], reverse=True)
}
print(sorted_apple_genre_likeability)

{'Catalogs': 4.125, 'Games': 4.037393162393163, 'Productivity': 4.0, 'Business': 3.9705882352941178, 'Shopping': 3.9702380952380953, 'Music': 3.946969696969697, 'Photo & Video': 3.903125, 'Navigation': 3.8333333333333335, 'Health & Fitness': 3.769230769230769, 'Reference': 3.6666666666666665, 'Education': 3.635593220338983, 'Food & Drink': 3.6346153846153846, 'Social Networking': 3.5943396226415096, 'Entertainment': 3.5393700787401574, 'Utilities': 3.5308641975308643, 'Travel': 3.4875, 'Weather': 3.482142857142857, 'Lifestyle': 3.411764705882353, 'Finance': 3.375, 'News': 3.244186046511628, 'Book': 3.0714285714285716, 'Sports': 3.0652173913043477, 'Medical': 3.0}


Upon first glance, it looks like Catalogs apps are the most like apps. However, we should remember that there are very few Catalogs apps on the store, which may influence its rating

In [41]:
print("Frequencies (%) for Apple apps prime genre:")
display_table(apple_clean_eng_free, 11)

Frequencies (%) for Apple apps prime genre:
Games : 58.13664596273292
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.2919254658385095
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602483
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801242
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


Instead, we can see that Games have the 2nd highest user rating and are the most common types of apps on the Apple Store. Assuming that higher user ratings = greater chance of in-app purchases and subscriptions, we should definitely look to build a Game, as games have very high ratings in a large sample size.

We will now do the same for Google Genres and Category columns. Looking at the Rating column of the Google data, we can see the following

In [42]:
for app in android_clean_eng_free:
    if math.isnan(float(app[2])):
        print(app[2])

NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN


We see quite a few NaN values, which will throw off our calculations. Adding and dividing with NaN values results in NaN. We will need to ignore these values in our calculations.

In [43]:
# Checking average user score for each Category.
google_category_likeability = {}

for category in category_freq_table:
    num_apps = 0
    total_user_score = 0
    for app in android_clean_eng_free:
        app_category = app[1]
        if app_category == category:
            user_score = float(app[2])
            if not math.isnan(user_score):
                num_apps += 1
                total_user_score += float(app[2])
    avg_user_score_per_app = total_user_score / num_apps
    google_category_likeability[category] = avg_user_score_per_app

sorted_google_category_likeability = {
    k: v for k, v in sorted(google_category_likeability.items(), key=lambda item: item[1], reverse=True)
}
print(sorted_google_category_likeability)

{'EVENTS': 4.435555555555557, 'BOOKS_AND_REFERENCE': 4.347798742138364, 'EDUCATION': 4.3401960784313705, 'PARENTING': 4.3395833333333345, 'ART_AND_DESIGN': 4.338181818181818, 'PERSONALIZATION': 4.300000000000001, 'BEAUTY': 4.278571428571428, 'SOCIAL': 4.252736318407958, 'HEALTH_AND_FITNESS': 4.236051502145922, 'GAME': 4.2320341047503085, 'WEATHER': 4.229230769230768, 'SHOPPING': 4.227528089887643, 'SPORTS': 4.212605042016807, 'AUTO_AND_VEHICLES': 4.184722222222223, 'PRODUCTIVITY': 4.1819148936170265, 'LIBRARIES_AND_DEMO': 4.178125, 'COMICS': 4.177358490566039, 'FAMILY': 4.171361185983833, 'FOOD_AND_DRINK': 4.1673913043478255, 'PHOTOGRAPHY': 4.164516129032258, 'MEDICAL': 4.147807017543858, 'HOUSE_AND_HOME': 4.140983606557378, 'FINANCE': 4.128373702422146, 'COMMUNICATION': 4.126923076923076, 'ENTERTAINMENT': 4.118823529411763, 'NEWS_AND_MAGAZINES': 4.1045454545454545, 'BUSINESS': 4.10395256916996, 'LIFESTYLE': 4.082078853046592, 'TRAVEL_AND_LOCAL': 4.068156424581004, 'VIDEO_PLAYERS': 4.0

In [44]:
# Checking average user score for each Genre.
google_genre_likeability = {}

for genre in genres_freq_table:
    num_apps = 0
    total_user_score = 0
    for app in android_clean_eng_free:
        app_genre = app[9]
        if app_genre == genre:
            user_score = float(app[2])
            if not math.isnan(user_score):
                num_apps += 1
                total_user_score += float(app[2])
    if num_apps == 0:
        avg_user_score_per_app = 0
    else:
        avg_user_score_per_app = total_user_score / num_apps
    google_genre_likeability[genre] = avg_user_score_per_app

sorted_google_genre_likeability = {
    k: v for k, v in sorted(google_genre_likeability.items(), key=lambda item: item[1], reverse=True)
}
print(sorted_google_genre_likeability)

{'Comics;Creativity': 4.8, 'Health & Fitness;Education': 4.7, 'Strategy;Action & Adventure': 4.6, 'Puzzle;Education': 4.6, 'Simulation;Pretend Play': 4.55, 'Entertainment;Creativity': 4.533333333333333, 'Arcade;Pretend Play': 4.5, 'Sports;Action & Adventure': 4.5, 'Tools;Education': 4.5, 'Strategy;Education': 4.5, 'Racing;Pretend Play': 4.5, 'Casual;Brain Games': 4.475, 'Music;Music & Video': 4.449999999999999, 'Events': 4.435555555555557, 'Education;Brain Games': 4.433333333333334, 'Entertainment;Education': 4.4, 'Puzzle;Creativity': 4.4, 'Arcade;Action & Adventure': 4.4, 'Simulation;Education': 4.4, 'Strategy;Creativity': 4.4, 'Adventure;Action & Adventure': 4.3999999999999995, 'Parenting': 4.3921052631578945, 'Education;Creativity': 4.375, 'Puzzle': 4.355421686746987, 'Art & Design;Creativity': 4.35, 'Casual;Creativity': 4.35, 'Books & Reference': 4.347798742138364, 'Simulation;Action & Adventure': 4.3428571428571425, 'Art & Design': 4.3352941176470585, 'Role Playing;Action & Advent

Based on the Google data, it looks like an app that allows users to read comics/manga generates the highest user ratings, which would likely lead to greater in-app purchases/subscriptions. However, we are looking for an app genre that has a high rating in both the Google and the Apple stores. We saw that in the Apple Store, Games apps were very high in user rating, while Books were quite low. We can also see in the Google data that Games have an average rating just above 4.23/5, which is quite high, and the specific sub-genre of Strategy;Action & Adventure games has the 3rd highest user rating of all sub-genres. This would indicate that we should build a Strategy;Action & Adventure game, as this type of app would generate the highest user-ratings in both Apple and Google app stores, and would therefore lead to the most in-app purchases/subscriptions.

### Conclusion


After collecting, cleaning, and analyzing app data from about 9000 free, English-language, Android apps and 7,000 iOS apps, we can make the following conclusions about the types of apps we should develop:

1. **Social networking app with video-playing and communication features:** If our only source of revenue is ad-revenue, we will want an app that will have the largest number of users, as more users means more ad-revnue. We saw from both the Google and Apple data that a social networking app with additional video and communication capabilities has the highest average number of users, and therefore likely generates the most revenue.
2. **Action & Adventure Strategy game:** If our source of revenue includes in-app purchases and subscriptions, we want to look at the types of apps that generate the highest user-ratings. We are assuming that higher user scores = higher user satisfaction that means a user has a higher chance of spending money on in-app purchases and subscriptions. We saw that Action & Adventure Strategy games have very high, 4+/5 scores on average in both app stores.