# Profitable App Suggestions for the App Store and Google Play Markets

## Introduction

In this project, we will analyze data about app profiles from the App Store and Google Play Store. Our goal is to suggest a profitable app profile that our company should target for both markets. Additionally, the company we are working with only creates free apps, and wants to target English-speaking customers.

In order to suggest a profitable app profile, we are going to analyze the popularity of app profiles between both iOS and Android markets. The company will gain most of its revenue through in-app ads and purchases. At the end, we will suggest an app profile that the company should target in order to attract the most customers and profits.

## Loading the Data

First, let's open the datasets:

In [1]:
import csv

# Open Apple Store dataset
with open('AppleStore.csv', 'r') as open_ios:
    ios_data = list(csv.reader(open_ios))
    ios_header = ios_data[0]
    ios = ios_data[1:]

# Open Google Play Store dataset
with open('googleplaystore.csv', 'r') as open_android:
    android_data = list(csv.reader(open_android))
    android_header = android_data[0]
    android = android_data[1:]

Now that we've opened the datasets, we want to create a function called `explore_data()` in order to explore the datasets more efficiently.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
# Explore datasets
print("App Store Data:")
print(ios_header)
print('\n')
explore_data(ios, 0, 3, rows_and_columns=True)
print('-'*50)
print("Google Play Store Data:")
print(android_header)
print('\n')
explore_data(android, 0, 3, rows_and_columns=True)

App Store Data:
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16
--------------------------------------------------
Google Play Store Data:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'A

Now we're going to look at just the column names to see if there are any columns shared between both datasets so we can analyze them more.

In [4]:
# Column Names
print('App Store Columns:')
print(ios_header)
print('\n')
print('Google Play Store Columns:')
print(android_header)

App Store Columns:
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Google Play Store Columns:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


As we can see, the App Store dataset has 7,197 apps and 16 columns while the Google Play Store dataset has 10,841 apps and 13 columns. Columns from the App Store dataset that might be helpful for our analysis are `track_name`, `price`, `rating_count_tot`, `rating_count_ver`, and `prime_genre`. Columns from the Google Play Store dataset that might be helpful are `App`, `Category`, `Rating`, `Reviews`, `Installs`, `Price`, and `Genres`.

## Data Cleaning: Deleting Wrong Data

Now that we have an idea of what is in our datasets, we need to remove data that is inaccurate and irrelevant to the problem we are trying to solve. In this case, we are only looking for apps that are free and target an English-speaking audience.

First, we will look at data that may be inaccurate. According to a [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) about the Google Play Store dataset, one of the rows has missing data. We need to print this row to investigate further.

In [5]:
# Missing data for row 10472
print(android_header)
print('\n')
print(android[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


According to the discussion, there is a missing column value, causing the rest of the columns to shift over in the same row. To see if this is actually the case, we need to compare the length of the row with the header:

In [6]:
print(len(android_header))
print(len(android[10472]))

13
12


As we can see, there is in fact one less column for row 10472. After further analysis, we find that we are missing the `Category` column, so the rest of the columns shifted one spot to the left. In order to fix this issue, we will delete this row entirely from the dataset to save time, since deleting the row will not affect any analysis we make from the dataset afterwards.

In [7]:
# Only run once:
del android[10472]
print(len(android))  # Check if row was deleted

10840


## Removing Duplicate Entries

### Part One

If we explore both datasets more, we notice that there are multiple cases where duplicate apps occur (apps appear more than once). For example, the Instagram app appears four times in the `android` dataset:

In [8]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Duplicate apps can cause us to analyze the data incorrectly, and could lead to misleading conclusions. In order to find how many times a duplicate app occurs in the dataset, we will create a `for` loop that loops through the `android` dataset, and creates two separate lists: one for duplicate apps and another one for unique apps.

In [9]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

In [10]:
# Check lists
print('Number of duplicate apps:', len(duplicate_apps))
print('Number of unique apps:', len(unique_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[10:20])

Number of duplicate apps: 1181
Number of unique apps: 9659


Examples of duplicate apps: ['FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express']


In our original `android` dataset, we had 10,840 apps. We found that 1,181 of those apps were actually duplicate entries. That leaves us with 9,659 unique apps to analyze. In order to get rid of these duplicate entries, we have to delete them. However, we need to decide on what method we're going to use to delete them.

First, as we noticed above, the Instagram app appeared four times in our dataset. However, each entry for Instagram is not the same. We can see that in the `Reviews` column, each entry has a different number of reviews for Instagram. The row with the highest number of reviews should be the row we want to keep, since it is the most up-to-date entry. It also gives us the most accurate ratings for the app.

### Part Two

In order to remove the duplicate apps, we need to create a dictionary for the unique apps. Additionally, we only want to keep the unique apps with the highest review count. From this dictionary, we will create a new dataset with the unique apps. 

In [11]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])

    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print(len(reviews_max))

9659


As we can see, our dictionary has the correct number of unique apps - 9,659. Now we will use the `reviews_max` dictionary to remove duplicate apps, and only keep the unique apps with the most reviews.

In [12]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])

    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Now we want to inspect our new clean dataset to make sure that everything looks correct:

In [13]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


Our code worked correctly, since the dataset returns the 9,659 unique apps we expected.

## Removing Non-English Apps

### Part One

For the company we are working for, we are only interested in apps targeted towards English-speaking customers. According to the ASCII (American Standard Code for Information Interchange) system, characters and symbols in the English language are represented by a decimal value ranging from 0 to 127. Therefore, any apps that include symbols or characters greater than a value of 127 should be considered as non-English apps.

In order to check each app for non-English characters or symbols, we need to create a function that takes in the app name as a string and returns `False` if there are any characters not included in the English language. Otherwise, the function will return a `True` value.

In [14]:
def english_app(name):
    for character in name:
        if ord(character) > 127:
            return False
    return True

In [15]:
# Example app names
print(english_app('Instagram'))
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_app('Docs To Go™ Free Office Suite'))
print(english_app('Instachat 😜'))

True
False
False
False


Our function is able to tell us if at least one character falls outside of the ASCII range for English characters and symbols. However, as we can see with the last two examples, both apps each contain one character that falls outside of the range we want: '™' and '😜'. Since the rest of the app name contains all English characters for both apps, we can conclude that these apps are actually in English.

### Part Two

In order to filter out non-English apps correctly, we need to modify our `english_app()` function to identify apps with multiple non-English characters. In this case, we will identify apps with more than three non-English characters and classify them as non-English apps.

In [16]:
def english_app(name):
    non_english_counter = 0

    for character in name:
        if ord(character) > 127:
            non_english_counter += 1

    if non_english_counter > 3:
        return False
    else:
        return True

Now we will use our new function on the apps we previously tested:

In [17]:
print(english_app('Instagram'))
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_app('Docs To Go™ Free Office Suite'))
print(english_app('Instachat 😜'))

True
False
True
True


It is important to note that while we can now filter out non-English apps more accurately, we still might get back some apps that are not actually in English. At this point, we are not worried about optimization, so we will continue to use the `english_app()` function to filter our datasets for iOS and Android apps in English. Otherwise, we would have continued to improve our function further.

In [18]:
ios_english = []
android_english = []

for app in ios:
    name = app[1]

    if english_app(name):
        ios_english.append(app)

for app in android_clean:
    name = app[0]

    if english_app(name):
        android_english.append(app)

In [19]:
# Explore datasets and count new rows
print('iOS Data Set:')
explore_data(ios_english, 0, 4, True)
print('-'*50)
print('Android Data Set:')
explore_data(android_english, 0, 4, True)

iOS Data Set:
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 6183
Number of columns: 16
--------------------------------------------------
Android Data Set:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0',

After filtering out non-English apps for each data set, we are left with 6,183 iOS apps and 9,614 Android apps.

## Removing Non-Free Apps

For this project, we are only interested in looking at free apps targeted towards English-speaking customers. In our final data cleaning process, we will filter out paid apps from both datasets and only keep free English apps.

In [20]:
ios_final = []
android_final = []

for app in ios_english:
    price = app[4]

    if price == '0.0':
        ios_final.append(app)

for app in android_english:
    price = app[7]

    if price == '0':
        android_final.append(app)

# Explore and check new rows
print('Final iOS Data Set:')
explore_data(ios_final, 0, 3, True)
print('-'*50)
print('Final Android Data Set:')
explore_data(android_final, 0, 3, True)

Final iOS Data Set:
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16
--------------------------------------------------
Final Android Data Set:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5

## Data Analysis

### Most Common Apps by Genre

### Part One

Now we will begin to analyze our datasets. Our goal for our data analysis is to determine what kinds of apps attract the most users. For this reason, we will first take a look at the most common app genres for both the iOS and the Android market. In the end, we want to create an app for both platforms, so we have to make sure that the app can be successful in both markets.

We will start by creating a frequency table for the `prime_genre` column in the App Store data. Then we will look at frequency tables for the `Genres` and `Category` columns in the Google Play Store data. This will give us an idea of which genres dominate each market.

### Part Two

In order to analyze our frequency tables, we need to transform them into a list of tuples, and then sort them accordingly. First, we will create a function called `freq_table()` that returns a frequency table for any column. This table will be a dictionary expressed in percentages. Then, we will create a function called `display_table()` to convert our tables into tuples and display them in order of highest to lowest percentages.

In [21]:
def freq_table(dataset, index):
    table = {}
    total_rows = 0

    for row in dataset:
        total_rows += 1
        value = row[index]

        if value in table:
            table[value] += 1
        else:
            table[value] = 1

    table_per = {}
    for key in table:
        percentage = (table[key] / total_rows * 100)
        table_per[key] = percentage

    return table_per


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse=True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Part Three

Now we will look at our frequency tables for each column, starting with `prime_genre` for the App Store.

In [22]:
# Display frequency table for prime_genre (iOS)
display_table(ios_final, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


From the iOS dataset, we can see that a majority of the free English apps on the App Store are 'Games' (58.16%). The next most commmon genre is 'Entertainment' (7.88%), then 'Photo & Video' (4.97%), and so on.

The overwhelming majority of these apps are 'Games', so we can say that apps in the App Store are designed more towards entertainment for its users. There are much less practical apps available on the App Store. However, we cannot conclude much more about the overall market. We would need to take a further look at other factors such as the number of users, or the number of ratings. For now, we will continue by comparing genres in the Google Play Store, and see if we can notice any similarities between the two datasets.

In [23]:
# Display frequency table for Genres (Android)
display_table(android_final, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

At first glance, this frequency table may seem confusing. There are too many genres and it doesn't give us a clear answer as to which ones are more common. At least in the iOS dataset, we could clearly see that a majority of free English apps in the App Store were 'Games' (58.16%).

According to the `Genres` column for free English apps in the Google Play Store, 'Tools' is the most common genre at 8.45%. The next most common genre is 'Entertainment' at 6.07%. As we move down towards the list, it gets increasingly more difficult to analyze as we start to see genres become too specific. Because of this reason, the `Genres` column doesn't give us the best summary for which app profiles are the most common in the Google Play Store. Next, we will analyze the `Category` column to see if it can give us a better picture.

In [24]:
# Display frequency table for Category (Android)
display_table(android_final, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

As we can see, the `Category` column gives us a better idea of which app profiles are more common in the Google Play Store, but it still doesn't tell us much. For example, the most common category for free English Android apps is 'Family' at 18.91%. However, this category is too broad. We would have to look somewhere other than the dataset to realize that a lot of 'Family' Android apps are actually games designed for children.

After further analysis, we can conclude at least one thing - practical apps are more common in the Google Play Store than in the App Store. Additionally, there are many more free English apps designed for games and entertainment in the App Store than in the Google Play Store. This tells us a little bit about both markets, but it still doesn't answer our goals. It also doesn't tell us about all of the apps in both markets as a whole. The next thing we will do is analyze popularity of free English apps in both markets.

## Most Popular Apps by Genre on the App Store

In order to find out which genres are more popular in each market, we need to take a look at how users are represented for each genre. This is easy to do with the Google Play Store dataset, because we can analyze the `Installs` column to see which genre has more installs. We don't have this column for the App Store data, but we can use the total number of user ratings per genre instead. In addition, we want to look at the average number of installs and ratings, not the totals.

First we will look at the average number of user ratings for each genre in the App Store data:

In [25]:
ios_genre_table = freq_table(ios_final, 11)

for genre in ios_genre_table:
    total = 0  # Stores sum of user ratings
    len_genre = 0  # Stores number of apps for each genre

    for app in ios_final:
        genre_app = app[11]

        if genre_app == genre:
            user_ratings = float(app[5])
            total += user_ratings
            len_genre += 1

    avg_user_ratings_count = total / len_genre

    print(genre, ':', avg_user_ratings_count)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


After observing the average number of user ratings per genre, we can see that the 'Navigation' genre for the iOS dataset has the highest number of user ratings on average (about 86,090 ratings). The next two highest average user ratings belong to the 'Reference' and 'Social Networking' genres, respectively. Comparing these genres with the frequency tables we made earlier, we still cannot make any conclusions as to which genres we should target for our app.

To see why 'Navigation' has the highest average number of user ratings, we can investigate further:

In [26]:
for app in ios_final:
    genre = app[11]

    if genre == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


After further analysis, the average number of user ratings for the 'Navigation' genre are heavily skewed by the high number of ratings for the 'Waze' and 'Google Maps' apps. Additionally, there are only six free English apps in the 'Navigation' genre in our iOS dataset, so it is not an accurate representation of the iOS market as a whole.

Let's look at the ratings for the 'Reference' and 'Social Networking' genres:

In [27]:
# Ratings for 'Reference' genre
for app in ios_final:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [28]:
# Ratings for 'Social Networking' genre
for app in ios_final:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

For the 'Reference' genre, we see a similar problem with the average number of user ratings being skewed by just a few apps. If we remember from earlier, the iOS market seemed to be dominated by apps made for games and entertainment. Therefore, making a 'Reference' app might not be worth our time.

On the other hand, we can see that there are plenty of social networking apps with a high number of reviews, so we can conclude that social networking apps are pretty popular in the App Store. The high demand for social networking apps also seems to align with the idea that a lot of apps on the App Store are made for entertainment purposes.

With this in mind, we can think about possibly making a social networking app to add in both the App Store and the Google Play Store. The Google Play Store has more practical applications, but we can possibly make a social networking app that is both practical and entertaining to its users. In order to move further with this idea, we need to analyze the popularity of apps in the Google Play Store to see if the data supports our suggestion.

## Most Popular Apps by Genre on the Google Play Store

For the Google Play Store, we can analyze popularity more accurately, since we actually have data on the number of installations per app.

In [29]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


It seems like we have a wide range of values for the number of installs. These values don't specify the exact number of installs per app, but it still gives us a good idea about which app profiles are most popular.

We continue by calculating the average number of installs by genre on the Google Play Store:

In [30]:
category_table = freq_table(android_final, 1)

for category in category_table:
    total = 0
    len_category = 0

    for app in android_final:
        category_app = app[1]

        if category_app == category:
            num_installs = app[5]
            num_installs = num_installs.replace('+', '')
            num_installs = num_installs.replace(',', '')

            total += float(num_installs)
            len_category += 1

    avg_num_installs = total / len_category
    print(category, ':', avg_num_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

We can see that the 'Communication' category has the highest average number of installs with roughly 38.5 million installs. The next highest average number of installs belongs to the 'Video_Players' category with about 24.7 million installs, and then 'Social' with about 23.3 million installs. As we did with the iOS dataset, we need to investigate further to make sure that this data is not skewed.

First, we take a closer look at the 'Communication' category. Remember that the values for the number of installs include a wide range of open-ended values, so we have to specify which ranges we want to look at. We will look at the highest ranges to see if any apps are skewing the data.

In [31]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'
                                      or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
free video calls and chat : 50,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
Dolphin Browser - Fast, Private & Adblock🐬 : 50,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Mail.Ru - Email App : 50,000,000+
Hangouts : 1,000,000,000+
Azar : 50,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure 

It looks like some apps have more than 1,000,000,000 installs, but most of them have much less than that. Therefore, the apps with 1,000,000,000+ installs could be skewing the data. It might not be a good idea to make an app for this genre, especially since there are already well-established apps with billions of installs.

Now let's look at the 'Video_Players' category:

In [32]:
for app in android_final:
    if app[1] == 'VIDEO_PLAYERS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'
                                      or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Vote for : 50,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
LIKE – Magic Video Maker & Community : 50,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
DU Recorder – Screen Recorder, Video Editor, Live : 50,000,000+
KineMaster – Pro Video Editor : 50,000,000+
VMate : 50,000,000+
HD Video Downloader : 2018 Best video mate : 50,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Ringdroid : 50,000,000+
Motorola FM Radio : 100,000,000+


Here it looks like we also have a couple of apps with more than a billion installs each. These apps are 'YouTube' and 'Google Play Movies & TV'. On one hand, we see much fewer apps in this genre with at least 50 million installs. On the other hand, it is a very competitve market. We might not want to go into this market because it is difficult to create a video player that is different than the ones already on the market.

Finally, let's look at the 'Social' category to see if our app would be better suited for this genre:

In [33]:
for app in android_final:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+'
                               or app[5] == '100,000,000+'
                               or app[5] == '50,000,000+'
                               or app[5] == '10,000,000+'
                               or app[5] == '1,000,000+'):
        print(app[0], ':', app[5])

Facebook : 1,000,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Jodel - The Hyperlocal App : 1,000,000+
Love Sticker : 1,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
Love Images : 1,000,000+
Facebook Local : 1,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, stickers and GIF : 1,000,000+
HTC Social Plugin - Facebook : 10,000,000+
Quora : 10,000,000+
Kate Mobile for VK : 10,000,000+
Family GPS tracker KidControl + GPS by SMS Locator : 1,000,000+
Moment : 1,000,000+
Text Me: Text Free, Call Free, Second Phone Number : 10,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Text free - Free Text + Call : 10,000,000+
Tango - Live Video Broadcast : 100,000,000+
ooVoo Video Calls, Messaging & Stories : 50,000,000+
Instagram : 1,000,000,

Here we can see again that there are a few 'Social' apps with over a billion installs each. However, if we look further, we can see that there are many 'Social' apps with at least over a million installs each. This tells us that not only are social apps very popular, but the data is also not skewed as much as the first two genres we saw.

For us, this is a good sign, because it means that there are a lot of potential and profits to be made in this market. However, it is also pretty competitive. For example, Facebook alone has multiple apps built for its platform. It may not be worth it to compete against apps like Facebook, so we will have to go in a different direction.

After careful analysis of the iOS and Android markets, we would recommend creating a 'Social' app and making it available to both the Google Play and the App Store. Ideally, we would recommend creating a free dating app targeted towards customers in the U.S. This app should be created with emphasis on practicality and entertainment. This would make the app more popular and profitable, because the iOS market is dominated by entertainment apps while the Android market offers more representation for practical apps. 

With social media continuing to grow every day, we can see that our suggestion is strongly backed by the data we found in our last section. For both the iOS and Android markets, the social apps genre ranked third in popularity compared to other genres. With this in mind, we would also have to create an app that is going to be able to compete with other social apps. This is why we recommend creating a dating app, because people who don't find a dating app that works for them are likely to try other dating apps. However, we need to do further research on this topic in order to confirm our assumptions. Finally, since our app is free, our main revenue source is going to come from ads and purchases within the app. Perhaps we can offer multiple subscriptions in our app to those who want a better experience (this can include the removal of ads).

## Conclusion

We were able to analyze data from the App Store and the Google Play Store in order to suggest an app that could be most profitable in both markets.

Our suggestion:

- Create a dating app that is free and targeted towards U.S. customers
- The app should offer both practicality and entertainment
- Include a tier-based subscription model that offers multiple packages in order to create a better user experience

It's important to note that while a dating app could become very popular and profitable, there are already multiple dating apps on the market. However, we can still create a dating app that is safe, reliable, and entertaining. Adding different features and subscriptions that are not offered in other dating apps could be the key to making profitability a success with this suggestion.