# Which Type of Free Apps Attract More Users

## Goal of this Project

The goal of this project is to collect and analyze data from both the App Store and Google Play Store in order to determine which type of free apps are more likely to attract more user engagement thus leading to higher ad revenue for our company.

Our company produces free apps on both the Apple app store and Google Play store already. With more information from the types of apps that are generating more engagement from users we can make better decisions on which types of apps to produce and which would be better on which respective platform. This will result in a bigger profit for our company from the near [100+ billion dollars produced by apps in 2021](https://www.businessofapps.com/data/app-revenues/)

## Opening and Exploring the Data

With millions of apps avaliable on the both platforms it's difficult to parse and analyze such a huge dataset. Fortunately there are two sample data sets from Kaggle using real apps from the [App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) and the [Google Play Store](https://www.kaggle.com/lava18/google-play-store-apps).

-  The Google Play Store data has ~10,000 apps in the dataset
-  The Apple Store Data has ~7000 apps in the data set

We first need to download and open both datasets through Juypter in order to start exploring the data

In [1]:
# Open both the google and apple store app data
opened_ios_file = open('AppleStore.csv')
opened_google_file = open('googleplaystore.csv')
from csv import reader
read_ios_file = reader(opened_ios_file)
read_google_file = reader(opened_google_file)
apps_ios_data = list(read_ios_file)
apps_google_data = list(read_google_file)

After opening both we will then define a function called `explore_data` that will go through a give dataset given a certain range and print the row with spacing to make it easier to read. We will also set in a `rows_and_columns` parameter that, if `True`, will give us the len of the dataset and the number of header columns it has.

In [2]:
# Goes through the dataset and makes reading it much easier
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 

# This gives us the number of rows a dataset has and 
# the length as well if requested
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Let's print a couple rows of each set to ensure the `explore_data` function is working properly and giving us the correct spacing.

In [3]:
explore_data(apps_ios_data, 0, 3)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']




In [4]:
explore_data(apps_google_data, 0, 3)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']




Let's isolate the headers of each column in the data sets and see how they are organized

In [5]:
explore_data(apps_ios_data, 0, 1, True)
explore_data(apps_google_data, 0, 1, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Number of rows: 7198
Number of columns: 16
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows: 10842
Number of columns: 13


Below is a visual represention of them on a table and explaining what each column header means according to the [apple dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) and the [google dataset](https://www.kaggle.com/lava18/google-play-store-apps) descriptions. 

## Column descriptions given for the ios apps in the App Store:

| Column Name | Description |
|-------|-------|
| "id" | App Id |
| "track_name" | App Name|
| "size_bytes" | Size (in Bytes)|
| "currency" | Currency Type |
|"price" | Price amount|
|"ratingcounttot"| User Rating counts (for all version)|
|"ratingcountver"| User Rating counts (for current version)|
|"user_rating" | Average User Rating value (for all version)|
|"userratingver"| Average User Rating value (for current version)|
|"ver" | Latest version code|
|"cont_rating"| Content Rating|
|"prime_genre"| Primary Genre|
|"sup_devices.num"| Number of supporting devices|
|"ipadSc_urls.num"| Number of screenshots showed for display|
|"lang.num"| Number of supported languages|
|"vpp_lic"| Vpp Device Based Licensing Enabled|


## Column descriptions given for the Google Play Store Apps:

| Column Name | Description |
|-------|-------|
| App | Application Name |
| Category | Category the app belongs to|
| Rating | Overall user rating of the app |
| Reviews | Number of user reviews for the app |
|Size | Size of the app|
|Installs| Number of user downloads/installs for the app|
|Type | Paid or Free|
|Price |Price of app|
|Content Rating| Age group the app is targeted at - Children/Mature 21+/ Adult|
|Genres | An app can belong to multiple genres|
|Last Updated| Date when the app was last updated on Play Store|
|Current Ver| Current version of the app avaliable on Play Store|
|Android Ver| Min. required Android version|



We will determine which of these columns will be most useful to us in our analysis later. First we'll have to ensure the datasets are free of any errors that could impact our analysis and sift only the useful data.


### Incorrect number of columns
Reading the discussions in the [google dataset on Kaggle](https://www.kaggle.com/lava18/google-play-store-apps) there is an error in one of the apps that could effect our results in index position `10472` (without the header). Lets clean it up and see if there are any other problem apps in our dataset by checking for apps with missing columns.

In [6]:
print(apps_google_data[10473])
print('\n')
print(len(apps_google_data[10473]))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


12


The app seems to missing the `Category` column, let's delete it using the `del` statement and print the same index location to ensure it is gone.

Afterwards we can check to see if any other apps in the google set and the ios set have apps missing any categories. If there are no results then there aren't any more apps missing columns.

In [7]:
del(apps_google_data[10473])
print(apps_google_data[10472:10474])

[['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up'], ['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']]


In [8]:
for row in apps_google_data:
    if len(row) != 13:
        print(row)

In [9]:
for row in apps_ios_data[0:4]:
    if len(row) != 16:
        print(row)

### Removing Duplicate Apps

Looking into the discussions in the [google dataset on Kaggle](https://www.kaggle.com/lava18/google-play-store-apps) we see there are also duplicate apps listed which can comprimise our findings. Let's find them and get rid of them for both sets of data.

In [10]:
# Makes two seperate lists to sort duplicates and unique apps in
duplicate_google_apps = []
unique_google_apps = []

for app in apps_google_data:
    name = app[0]
    if name in unique_google_apps:
        duplicate_google_apps.append(name)
    else:
        unique_google_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_google_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_google_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [11]:
duplicate_ios_apps = []
unique_ios_apps = []

for app in apps_ios_data:
    name = app[0]
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_ios_apps))
print('\n')
print('example of duplicate apps:', duplicate_ios_apps[:15])

Number of duplicate apps: 0


example of duplicate apps: []


From the results above it seems only the google set of data has duplicates in it. We can ignore the apple app data for the cleaning up of duplicate data.

Looking through the comments on the discussion for the google apps we see that Instagram is one of the 1181 duplicate apps, so let's print each row that has a duplicate of it and further explore the data for each

In [12]:
for app in apps_google_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We can see that the only difference in each dataset is the fourth column which is the number of reviews. This would suggest that these duplicates are just older versions of the app with the highest number of reviews being the most recent version. We can use this and filter out the highest number of reviews and remove most, if not all, of the duplicate apps in this dataset

We want to create a dictionary where each key is equal to a unique app name and the value is equal to the highest number of reviews. We first want to make an empty dictionary called `reviews_max` and `for loop` through the google app data and define which element of the row is the name and the number of reviews. After that we set an `if` statement that, if the app name is in the dictionary and the number of reviews for that app name is greater than the one in the dictionary we will replace the number of reviews for that entry. Otherwise we will add the app name into the dictionary and have a value of the number of reviews to that key. 

Afterward we want to double check that the length of `reviews_max` is matching up to the length of the `apps_google_data` minus an additional entry that we got rid of for having a mismatched number of columns.

In [13]:
reviews_max = {}

for app in apps_google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    # if the name of the app is in review_maxs and its has the
    # highest number of reviews it will be appended to the dictionary
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(apps_google_data) - 1181 - 1) # We substract an extra one due to our deletion earlier
print(len(reviews_max))

9659
9659


 We want to now use `reviews_max` to filter out the duplicates in `apps_google_data` and put those filtered results into a new list `google_clean`. Then we go through the original dataset and defining name and the number of reviews to a variable similar to the above step. We can use an `if` statement to then compare if the name of the app in `reviews_max` has the same number of reviews and if the name of the app is already in a new list created called `already_created` then we append the app to the `google_clean` and the name to `already_added`. We include this `already_added` list as a double check as some apps may have multiple entries AND have the same amount of reviews. Without it we would still end up with those entries having the same amount of reviews and duplicated in our cleaned set.
 
 Then we can print out the length of the already_added list to see if it matches up with the number of apps in reviews_max.

In [14]:
google_clean = []
already_added = []

for app in apps_google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    # need the already_added as a double check as some apps will have
    # multiple entries and the same number of reviews
    if reviews_max[name] == n_reviews and name not in already_added:
        google_clean.append(app)
        already_added.append(name)
        
print(len(already_added))

9659


In [15]:
explore_data(google_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### Filtering out non-English apps

Now that the duplicates are taken care of we want to also clean the data of any non-english apps. The company only makes apps in english and thus non-english apps in their insights aren't relevant to our discussion and analysis. We can accomplish this by using ASCII which is a numerical value associated with each character used. English is in a range of 0 - 127 for the most commonly used English characters.

In [16]:
# We define a function that will check each character in a string and
# check if they are in the range of 0 - 127 ASCII for english 
def english_app (app_name):
    n_ascii = 0
    
    for letter in app_name:
        
        if ord(letter) > 127:
            n_ascii += 1
    
    # We allow three exceptions as many apps may have a symbol
    # that falls out range but is acceptable (e.g. emoji or trademark)
    if n_ascii > 3:
        return False
    else:
        return True
        
print(english_app('Instagram'))
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_app('Docs To Go™ Free Office Suite'))
print(english_app('Instachat 😜'))
print(english_app('Docs To Go™™™™ Free Office Suite'))

True
False
True
True
False


While this isn't 100% full proof in getting rid of all non-English apps it will remove a majority of them leading to better results for our analysis.

Now we can use the function to clean the dataset and then use the `explore_data` function to look at our data.

In [17]:
google_clean_english = []

for app in google_clean:
    name = app[0]
    
    if english_app(name) == True:
        google_clean_english.append(app)
        
explore_data(google_clean_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


In [18]:
ios_clean_english = []

for app in apps_ios_data:
    name = app[1]
    
    if english_app(name) == True:
        ios_clean_english.append(app)
        
explore_data(ios_clean_english, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6184
Number of columns: 16


### Filtering by Price

Out last step to cleaning up the data will be to make a new list of data with only free apps. Since we are a company that only makes free apps for both the App store and Google Play Store we want to only anaylze these free to download apps to gain insight into that market.

In [19]:
free_google_apps = []
free_ios_apps = []

for app in google_clean_english:
    price = app[7]
    
    if price == '0':
        free_google_apps.append(app)
        

for app in ios_clean_english:
    price = app[4]
    
    if price == '0.0':
        free_ios_apps.append(app)
        
explore_data(free_google_apps, 0, 3, True)
print('\n')
explore_data(free_ios_apps, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

### Analysis of Data

At this point we have cleaned all the data of inaccurate data, duplicates, non-English apps, and paid apps. While there may be some outliers left in the datasets we have done a reasonable amount of cleaning and should be left with reliable datasets to use. We can now move onto on intended goal of this project in finding out which free apps attract more user engagement and thus bring in more ad revenue.

In order to determine that we first want to make frequency tables to determine the most common genres in both Google's Play Store and the App Store. The best candidates for each set of data would be to use `'prime_genre'` from the ios apps and `Genres` and `Category` from google apps.

In [20]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [21]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [22]:
display_table(free_google_apps, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [23]:
display_table(free_google_apps, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

#### Google Play Store Data 

For the google play store data it seems there is a wide variety of apps which make up the store without any one genre completely overshadowing the others, the most frequent being the Family category. If you look on the google play store itself and sort by family you do see that it does include games for kids though which we should be mindful of in our analysis. This doesn't diminish though the wide variety of apps that have a sizeable amount on the store which is in contrast to the apple store data we will see below.

While the genres column was useful we are trying to find the big picture findings for which apps to make. From this point forward we will only use the category column from the app data when comparing types of apps on the store as the category only uses one category for classifying an app rather then having multiple genres bogging down the big picture.

##### Suggestion for App creation

My suggestion based on the above information is to create apps in bo the Health and Fitness category.

The Health and Fitness category includes pedometer, sleep tracking, nutrtion tracking, and exercise apps for a wider variety of fitness programs. While the market share on the app store is smaller (~3% of apps on the Google Play Store) this could be a potential space to make a quality fitness app that could gain traction.

A majority of the top fitness apps usually have a monthly paywall beyond the basic introductory workouts. We could create an app which could generate ad revenue by offering a wide variety of fitness class videos and instructions by playing ads before a workout, smaller ads during rest periods, and on load up of the app. The smaller ad revenue could be ofset by a bigger user base and we can offer daily goals and connection to peers in order to keep the user base coming back.


In [24]:
display_table(free_ios_apps, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


#### App Store Data 

In [25]:
table_ios = freq_table(free_ios_apps, 11)

for genre in table_ios:
    total_ur = 0    # sum of user ratings
    len_genre = 0   # number of apps specific to each genre
    
    for app in free_ios_apps: 
        genre_app = app[11]
        
        if genre_app == genre:
            n_rating = float(app[5])
            total_ur += n_rating
            len_genre += 1
            
    avg_n_rating = total_ur/len_genre
    
    print(genre)
    print(avg_n_rating)
    print('\n')

Social Networking
71548.34905660378


Photo & Video
28441.54375


Games
22788.6696905016


Music
57326.530303030304


Reference
74942.11111111111


Health & Fitness
23298.015384615384


Weather
52279.892857142855


Utilities
18684.456790123455


Travel
28243.8


Shopping
26919.690476190477


News
21248.023255813954


Navigation
86090.33333333333


Lifestyle
16485.764705882353


Entertainment
14029.830708661417


Food & Drink
33333.92307692308


Sports
23008.898550724636


Book
39758.5


Finance
31467.944444444445


Education
7003.983050847458


Productivity
21028.410714285714


Business
7491.117647058823


Catalogs
4004.0


Medical
612.0




In [26]:
display_table(free_google_apps, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [27]:
table_google = freq_table(free_google_apps, 1)

for category in table_google:
    total_in = 0       #sum of installs for genre
    len_category = 0   #number of apps specific to genre
    
    for app in free_google_apps:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+' , '')
            n_installs = float(n_installs.replace(',' , ''))
            total_in += n_installs
            len_category += 1
            
    avg_num_in = total_in / len_category
    print(category, ':', avg_num_in)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Looking at the App Store data we can see that games occupies 58% of the apps on the store by a huge margin with the next highest dominating app being entertainment at 7.8%. It also seems that more entertainment based apps (games, photos and video, social networking, etc.) compared to practical apps (education, shopping, utilities, productivity, etc.) are present in the store. 

With the present data we have it would seem that the games category could be a possible avenue for our company to create app for on the apple store. However, just because a large amount of apps are games on the platform doesn't automatically coorelate to engagement or a large number of users on game apps. 