# **Profitable App Profiles for the Apple App Store and Google Play Markets**
This project was developed as part of the 'Python for Data Science: Fundamentals' training from [Dataquest](https://www.dataquest.io/).

The goal of this project is to determine the types of mobile apps that are the most profitable on the Google Play and Apple App Stores. For this project, I am acting as a data analyst for a company that develops mobile apps for both app stores, and I will be working to enable our team of developers to make data-driven decisions in terms of the kind of apps they build.

This company only builds apps that are free to downlaod and install. The main stream of revenue is through in-app ads, meaing that for any given app, revenue is mostly influenced by the number of users for that app. In this project, we will be analyzing data to assist our developers in determining the types of apps that will attract the greatest number of users.

## Opening and Exploring the Data Sets

As of September 2018, there were 2.1 million apps on the Google Play Store and 2 million apps of the Apple App Store. To save us significant time and money, we will be analyizing a sample of this data. The data sets used in this project can be found at the follwing links:

[Google Play Store](https://www.kaggle.com/lava18/google-play-store-apps)
- Contains data for about 10,000 apps.

[Apple iOS App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)
- Contains data for about 7,200 apps.

Below, we will open the data sets and begin our exploration.

In [1]:
from csv import reader

# Google Play Store data set
opened_file = open('googleplaystore.csv', encoding = 'utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# Apple App Store data set
opened_file = open('AppleStore.csv', encoding = 'utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

The following function will allow us to examine specific rows of data that we define. This will also display the number of rows and columns for the specified data set.

First, we'll take a look at the Google Play Store data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Here we can see that the Google Play Store data set contains 10,841 apps and 13 columns. The columns that may be useful in our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

Up next is the Apple App Store data set.

In [3]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


The Apple App Store data set contains 7,197 apps and 16 columns. The columns here that may of use are 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'. These columns are not as clear on what they represent, but we can find more details on these columns in the data set's [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

## Deleting Wrong Data

The Google Play Store data set actually has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) where we can see a [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) for an error that was found regarding row 10,472. Below we will print the problematic row and compare it with the header and a row that is correct.

In [4]:
print(android[10472]) # the problematic row
print('\n')
print(android_header) # the data set header
print('\n')
print(android[0]) # a correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Row 10,472 refers to the app 'Life Made WI-Fi Touchscreen Photo Frame', and we can see that the rating is shown as 19. This cleary not correct as the maximum rating for a Google Play app is 5. As mentioned in the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), the problem is due a missing values for the 'Category' column. As a result, we will delete this row.

In [5]:
print(len(android))
del android[10472] # run this ONLY ONCE
print(len(android))

10841
10840


Upon reviewing the [discussion section](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion) for the Apple App Store data set, there does not appear to be any similar errors in that set.

## Removing Duplicate Entries

### Part One

As we continue to investigate the Google Play data, we can see that there are some duplicate entries. For example, we have determined that Instagram has four separate entries.

In [6]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Next, we will determine how many total duplicate entries exist for all apps in the Google Play Store data set.

In [7]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In comparison, the Apple iOS App Store data set does not have any duplicate entries.

In [8]:
duplicate_ios_apps = []
unique_ios_apps = []

for app in ios:
    name = app[0]
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)

print('Number of duplicate apps:', len(duplicate_ios_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_ios_apps[:10])

Number of duplicate apps: 0


Examples of duplicate apps: []


When we are analyzing the data, we do not want to count the same app mulitple times. We will need to remove the duplicate entries in order to maintain one entry per app.

As we take a closer look at the duplicates for Instagram, we can see that the main difference in the entries is the fourth position of each row. This position displays the number of reviews, and in this case, the different number of reviews show that the data was collected at different times.

This information can be used to build a criterion for removing the duplicate entries. In this case, the higher that number of reviews means that the data is more recent relative to the others. Instead of removing duplicates at random, we will use this information to retain the row with the highest number of reviews and remove the other entries for any given app.

To remove the duplicates, we will:
- Create a dictionary where each dictionary key is a unique app name, and the corresponding dictionary value is the highest number of reviews for that app.

- Use that information stored in the dictionary to create a new data set, which will contain only one entry per app (where for each app, we will only select the entry with the highest number of reviews).

### Part Two

We will being by creating the dictionary.

In [9]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In a previous block of code, we determined that the number of duplicate entries in the Google Play data set is 1,181. This means that there are 1,181 instances where an app appears more than once. So, the length of the dictionary of unique apps should be equal to the difference between the length of our data set and 1,181.

In [10]:
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


Now we will begin to remove the duplicate entries using the reviews_max dictionary. Again, for the duplicate entries, we will only keep the entries with the highest number of reviews. In the code below:
- We start by initializing two new empty lists, android_clean and already_added.
- We loop through the Google Play (android) data set and for every iteration:
    - We isolate the name and number of reviews for the app.
    - We then add the current row (app) to the android_clean list and the app name to the already_added list if:
        - The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary, and:
        - The name of the app is not already in the already_added list. We need to add this extra condition in the case where the highest number of reviews of a duplicate app is the same for more than one entry. If we only check for reviews_max[name] == n_reviews, we will still end up with duplicate entries for some apps.

In [11]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
                    android_clean.append(app)
                    already_added.append(name) # this needs to be inside of the if statement block

Let's take a look at the new data set we just created to confirm that the number of rows is 9,659.

In [12]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


Just as expected, we have 9,659 rows.

## Removing Non-English Apps

### Part One

After exploring the data sets further,we can see that the names of some of the apps suggest they are not directed toward an English-speaking audience. Here are a couple examples from both data sets:

In [13]:
print(ios[813][1])
print(ios[6731][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


Our company uses English to develop apps and we would like to only analyze the apps directed towards English-speaking audiences. Thus, we will need to remove similar apps to the ones above from the data sets.

Characters that are specific to the English language are represented as an ASCII character with a number from 0 to 127. One option for removing the unwanted names is to find and remove any apps that include characters that are no in that range. We will test this in the function below:

In [14]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
        
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


While this approach does indeed remove apps with names containing characters outside of the 0 to 127 ASCII character range, some of the English app names contain other symbols or emojis that also fall outside of that range. We can see examples of this above with ™ and 😜, as well as — (em dash) and – (en dash) to name a few others.

In [15]:
print(ord('™'))
print(ord('😜'))

8482
128540


Due to this, we will end up removing potentially useful apps fro our data sets.

### Part Two

In order for us to minimize data loss, we will remove names that contain more than three characters that are outside of the ASCII range described above.

In [16]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


This function is still not perfect, however for our analysis this will work just fine. Now we will use this function on both the Google Play and iOS App Store data sets to filter out the non-English apps.

In [17]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
    
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)

explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

We can now see that we are left with 9,614 rows in the Google Play data set and 6,183 rows in the iOS App Store data set.

## Isolating the Free Apps

As mentioned in the introduction, our company only develops apps that are free to download and install, with our main source of revenue consisting of in-app ads. These data sets contain both free and non-free apps, so we will need to isolate only the free apps for our analysis.

In [18]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8864
3222


This leaves us with 8,864 rows in the Google Play data set and 3,222 rows in the iOS App Store data set. This will be enough for us to conduct our analysis.

## Most Common Apps by Genre

### Part One

As we mentioned in the introduction, our goal is to determine what kind of apps will attract more users due to our revenue being highly influenced by the number of people using our apps.

To minimize our risk and overhead, our validation strategy for app ideas is comprised of three steps:
1. Build a minimal android version of the app and add it to the Google Play Store.
2. If the app receives a good response from users, the app will be developed further.
3. If the app is profitable after six months, we build an iOS version of the app to be added to the iOS App Store.

With our end goal being an app that is on both the Google Play and iOS App Stores, we need to find app profiles that are successful on boths of these platforms. For example, a profile that works well on both platforms may be a productivity app that makes use of gamification.

In [19]:
print(android_header)
print('\n')
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


We'll being our analysis by getting a sense of what the most common genres are for each market. To do this, we will build a frequency table for the Genres and Category columns for the Google Play data set and the prime_genre column for the iOS App Store data set.

### Part Two

We will build two functions that we can use to analyze the frequency tables:
- One will generate frequency tables that show percentages (freq_table)
- The other function we can use to display the percentages in a descending order (display_table)

In [20]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
        
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
    
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Part Three

We'll start our analysis by examining the frequency table for the prime_genre column form the iOS App Store data set.

In [21]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Here we can see that among the free English apps, well over half (58.16%) are games. Entertainment is the runner-up with close to 8%, followed by Photo and Video at almost 5%. Education makes up only 3.66% of the apps and Social Networking makes up only 3.29% in the data set.

The general impression from these results is that the majority of apps on the iOS App Store (at least for the free English section) are related to some form of entertainment or fun/enjoyment. These being games, entertainment, photo and video, social networking, sports, music, and so on. Apps with more practical purposes such as education, productivity, weather, and navigation, seem to be more rare. This, however, does not mea that the entertainment-focused apps have the greatest number of users (which is the goal for our own app).

Next up we will examine the Category and Genres columns of the Google Play data set.

In [22]:
display_table(android_final, 1) # we are looking at the Category column here

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

We can already see that the Google Play market is quite different from that of the iOS App Store. There are not many apps designed to be entertainment-focused, while it seems that the more practical-purposed apps take up the majority in this data set. However, when reviewing the Family category (18.9%) on the Google Play Store, we can see that a majority of that categry is comprised of games (9.7%) for kids.

That being said, the practical apps still seem to have a better representation on Google Play over the iOS App Store. This is also confirmed by examining the Genres column for Google Play:

In [23]:
display_table(android_final, -4) # looking at the Genres column

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

It is not very clear as to the differences betweent the Category and Genres columns in the Google Play data set. However, we can clearly see that the Genres column is much more granular in its categorization of the apps. As we are only looking for the bigger picture in the analysis, we will work with only the Categroy column for Google Play moving forward.

We have found that the majority of apps on the iOS App Store are designed for entertainment purposes, while with Google Play, we can see a more balanced landscape between entertainment/fun and practical apps (with practical apps still taking the majority).

Next we will want to determine the kinds of apps that have the most users.

## Most Popular Apps by Genre on the iOS App Store

One option to determine which genres are the most popular (have the most users) is to calculate the average number of installs for each app. The Google Play Store data set contains that information in the Installs column, however, the iOS App Store does not have this information. Instead, we will use the total number of user rating sa proxy, which can be found in the rating_count_tot column in the data set.

Below, we will calculate the average number of user ratings per genre on the iOS App Store.

In [24]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre =+ 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 7584125.0
Photo & Video : 4550647.0
Games : 42705967.0
Music : 3783551.0
Reference : 1348958.0
Health & Fitness : 1514371.0
Weather : 1463837.0
Utilities : 1513441.0
Travel : 1129752.0
Shopping : 2261254.0
News : 913665.0
Navigation : 516542.0
Lifestyle : 840774.0
Entertainment : 3563577.0
Food & Drink : 866682.0
Sports : 1587614.0
Book : 556619.0
Finance : 1132846.0
Education : 826470.0
Productivity : 1177591.0
Business : 127349.0
Catalogs : 16016.0
Medical : 3672.0


Based on the above information, navigation apps appear to have the highest number of user reviews. However, this strongly influenced by Google Maps and Waze, as these two apps have a combined total of almost half a million user reviews.

In [25]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings for the app

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


A similar pattern applies to social networking apps, where the average number of reviews is heavily influenced by a few apps, namely Facebook, Pinterest, Skype, etc. We can also see the same pattern with music apps, where there are few big apps that influence the average, such as Spotify and Pandora.

Our goal is to find which genres are the most popular, and while navigation and social networking appear to be the most popular, the fact that these numbers are skewed by a few apps means that these genres may not be as popular as they seem.

If we take a look at the Referenece apps, we'll see that they have an average of 74,942 user ratings, but this is also skewed by the Bible and Dictionary.com apps:

In [26]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


This genre does seem to have some potential. As we've noted, the iOS App Store appears to be slightly over-saturated with the entertainment/fun apps, so developing a practical app may have a better chance at reach more users.

Other genres that seem popular include weather, book, food and drink, and finance.

In [27]:
print('Weather:')
for app in ios_final:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])
        
print('\n')
print('Book:')
for app in ios_final:
    if app[-5] == 'Book':
        print(app[1], ':', app[5])
        
print('\n')
print('Food & Drink')
for app in ios_final:
    if app[-5] == 'Food & Drink':
        print(app[1], ':', app[5])
        
print('\n')
print('Finance:')
for app in ios_final:
    if app[-5] == 'Finance':
        print(app[1], ':', app[5])

Weather:
The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12

Let's now take a look at the Google Play market.

## Most Popular Apps by Genre on Google Play

For the Google Play data set, we actually have the number of installs, provided by the installs column. This should give us beter clarity on the popularity of each genre. As we'll see below, the values provided by the Installs column are open-ended:

In [28]:
display_table(android_final, 5) # the Installs column

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


For the purposes of this project, we will leave these numbers as they are, meaning that 100,000+ has 100,000 installs, 1,000,000+ has 1,000,000 installs, and so on. We are leaving these as general measures because we are looking to determine which app genres generally attract the most users, so perfect precision in terms of the number of users is not needed.

In order for us to calculate what we need, we'll have to convert each install number to a float, meaning that we will need to remove the commas and plus signs to avoid any errors. We will do this is the loop below, where we will also calculate the average number of installs for each genre/category.

In [29]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

With this information we can see that communication apps have the most installs on average: 38,456,119. This number, however, is heavily skewed by a few apps with over one billion installs (WhatsApp, Facebook Messenger, Hangouts, etc.), and few others with over 100 and 500 million installs:

In [30]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] =='1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we were to remove all of the communcation apps with over 100 million installs, the average would be reduced by about ten times:

In [31]:
under_100_mil = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_mil.append(float(n_installs))
    
sum(under_100_mil) / len(under_100_mil)

3603485.3884615386

The same pattern can be seen for the video players category, which is the runner-up with 24,727,872 installs. This category is dominated by apps such as Youtube and Google Play Movies & TV. Again, we can see this pattern with the social networking apps (dominated by Facebook, Instagram, etc.), photography apps (Google Photos, etc.), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Our main concern is again that these genres may not be as popular as they seem due to the skewed numbers from a few large apps.

The game genre also appears to be quite popular, although we've previously seen that this genre is fairly saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre also looks like a pretty popular category; the average number of installs being 8,767,811. This may be an interesting genre to explore further as we found that this genre also has good potential on the iOS App Store.

Given that our aim to recommend an app genre that shows pontential for being profitable on both the iOS App Store and Google Play Store, we'll take a look at some of the apps from this genre and their number of installs:

In [32]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre contains a wide variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, language or programming tutorials, etc. Even here, it seems as though there is still a small number of very popular apps that skew the average:

In [33]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


It looks like there are only a few very popular apps here, so this market still shows some potential. Let's look at some apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000installs) to get some ideas:

In [34]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+' or app[5] == '5,000,000+' or app[5] == '10,000,000+' or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This genre seems to be dominated by software for processing and reading ebooks, as well as various collections of dictionaries and libraries, so it is probably not a good idea to build a similar app since there'll be significant competition.

We can also see that there are quite a few apps build around the Quran, which suggest that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and iOS App Store markets.

Since these markets are already full of libraries, we would need to add some special features to the app besides the raw version of the book. This could include daily quotes from the book, an audio version of the book, quizzes on the book, a forum for users to discuss the book, etc.

## Conclusions

In this project, we analyzed data about the Google Play Store and iOS App Store with the goal of determining and recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent one) and turning it into an app could be profitable for both the Google Play and iOS App Store markets. These markets are already full of libraries, we would need to add some special features to the app besides the raw version of the book. This could include daily quotes from the book, an audio version of the book, quizzes on the book, a forum for users to discuss the book, etc.