# Profitable App Profiles for the App Store and Google Play Markets 

### Introduction

For this project, we will assume the role of a data analyst at a company that builds Android and iOS apps for an English speaking audience. The apps built are available on both Google Play and the App Store. The company only builds apps that are free to download and install, and their main source of revenue consists of in-app advertisements. Thus, their revenue is greatly influenced by the number of users who use their apps. 

The goal of this project is to analyze the follow datasets — [googleplaystore.csv](https://www.kaggle.com/lava18/google-play-store-apps) and [AppleStore.csv](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) — to help the company's developers understand what type of apps are likely to attract more users. 

### Step 1: Opening and exploring the data 

Before we can begin the data analysis process, we must import the desired datasets and explore them a little.

We first open the two appropriate data sets and save both as lists of lists. Next, we define a function that lets us explore our datasets. The functions takes in 4 parameters and outputs a slice of the dataset as desired. It can also optionally tell us the number of rows and columns the dataset has.

In [1]:
from csv import reader

# The App Store data 
opened_file1 = open('AppleStore.csv', encoding='utf8')
read_file1 = reader(opened_file1)
ios = list(read_file1)
ios_header = ios[0]
ios_data = ios[1:]

# The Google Play data
opened_file2 = open('googleplaystore.csv', encoding='utf8')
read_file2 = reader(opened_file2)
android = list(read_file2)
android_header = android[0]
android_data = android[1:]

# Define a function that lets you explore the datasets
def explore_data(dataset, start, end, rows_columns_counter = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        if row == dataset_slice[-1]:
            print(row)
            print('\n')
            print('That was the last row in the slice')
            print('\n')
        else:
            print(row)
            print('\n')
        
    if rows_columns_counter == True:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
print(android_header)
print('\n')
explore_data(android_data, 0, 1, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


That was the last row in the slice


Number of rows: 10841
Number of columns: 13


Column descriptors for the Android mobile app dataset:

|Heading        |Definition                      
|:--            |:--                      
|App            |Applicantion Name        
|Category       |Category                 
|Rating         |User Rating              
|Reviews        |User Rating Count        
|Size           |Size(in Megabytes)                 
|Installs       |Number of downloads                
|Type           |Paid or Free                      
|Price          |Price                              
|Content Rating |Target Age Group                   
|Genres         |Genres                             
|Last Updated   |Last Update(when scraped)          
|Current Ver    |Current Version                    
|Android Ver    |Minimum Required Version of Android

We see that the Google Play data set has a total of 10841 apps (10842 rows) and 13 columns. The columns that might be useful for the purpose of our analysis are `App`, `Category`, `Reviews`, `Installs`, `Price`, and `Genres`.

In [3]:
print(ios_header)
print('\n')
explore_data(ios_data, 0, 1, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


That was the last row in the slice


Number of rows: 7197
Number of columns: 16


Column descriptors for the iOS mobile app dataset:

|Heading         |Definition                         |
|:--             |:--                                |
|id              |App ID                             |
|track_name      |App Name                           |
|size_bites      |Size(in Bytes)                     |
|currency        |Currency Type                      |
|price           |Price                              |
|rating_count_tot|User Rating Count (all versions)   |
|rating_count_ver|User Rating Count (current version)|
|user_rating     |Avg User Rating (all versions)     |
|user_rating_ver |Avg User Rating (current version)
|ver             |Lastest Version Code
|cont_rating     |Content Rating
|prime_genre     |Primary Genre
|sup_device.num  |Number of supporting devices
|ipadSc_urls.num |Number of Screenshots Shown for Display
|lang.num        |Number of Supported Languages
|vpp_lic         |Vpp Device Based Licensing Available

We see that the App Store dataset has a total of 7197 apps (7198 rows) and 16 columns. The columns that might be useful for the purpose of our analysis are `track_name`, `price`, `rating_count_tot`, and `prime_genre`.  

### Step 2: Removing inaccurate data

Data integrity is vital in providing sound conclusions in any data analysis. Thus, it it important to check your data for inconsistencies or errors and correct them as needed. 

First, we try and detect inaccurrate data, and then remove these entries.

According to the discussion board on kaggle, entry 10472 in the Google Play dataset has a rating of 19 stars (out of 5 stars). This is because entry 10472 is missing data in the Category column and thus each column was shifted one column to the left. 

In [4]:
print(android_data[10472])
print('\n')
print(android_header)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', '11-Feb-18', '1.0.19', '4.0 and up', '']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [5]:
print(len(android_data))
del(android_data[10472])
print(len(android_data))

10841
10840


We have successfully removed entry 10472 in the Google Play dataset. 

In [6]:
def inaccurate_data(dataset_header, dataset):
    inaccurate_rows = []
    for row in dataset:
        if len(row) != len(dataset_header):
            inaccurate_rows.append(row)
    
    print('There are',len(inaccurate_rows), 'rows that have missing or additonal columns.')

print('Google Play dataset')
inaccurate_data(android_header, android_data)
print('\n')
print('App Store dataset')
inaccurate_data(ios_header, ios_data)

Google Play dataset
There are 0 rows that have missing or additonal columns.


App Store dataset
There are 0 rows that have missing or additonal columns.


Above, we verified there are no other rows in either datasets with missing or additonal entries.

### Step 3: Removing duplicates

Duplicate entries will often skew or inflate the results of an analysis. We continue the data cleaning process by checking and removing existing duplicates in the Google Play dataset.

In [7]:
unique_apps = []
duplicate_apps = []

for app in android_data:
    app_name = app[0] 
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)
        
print('Number of apps in the Google Play dataset:', len(android_data))
print('Number of unique apps:', len(unique_apps))
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of apps in the Google Play dataset: 10840
Number of unique apps: 9659
Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Before we remove the duplicate entries, let us further investigate an app that has multiple entries in our dataset — Instagram. 

In [8]:
for app in android_data:
    app_name = app[0] 
    if app_name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']


We see that the duplicate entries of the app, Instagram, differs only in the number of reviews received. To obtain an accurate analysis, we should only keep the entry of Instagram which has the highest number of reviews since it is logically the most recent entry. 

In [9]:
reviews_max = {}

for app in android_data:
    app_name = app[0]
    n_reviews = float(app[3])
    
    if app_name in reviews_max and reviews_max[app_name] < n_reviews:
        reviews_max[app_name]  = n_reviews
    elif app_name not in reviews_max: 
        reviews_max[app_name] = n_reviews

print(len(reviews_max))

9659


We create a dictionary where each key is a unique app name and the corresponding dictionary value is the hight number of reviews for that app. We previously identified there were a total of 9659 unique apps. We verified that the length of our dictionary is the same as the number of unique apps in our dataset.  

In [10]:
android_clean = []
already_added = []

for app in android_data:
    app_name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[app_name] and app_name not in already_added:
        android_clean.append(app)
        already_added.append(app_name)
        
print(len(android_clean))

9659


Now we begin the process of removing duplicates. 

We start by initializing two empty lists, the android_clean list to store unique apps, and the already_added list to keep track of apps that we have already accounted for.  

We loop through the android data set, and for every iteration:
* We add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:
 + The number of reviews of the current app matches the maximum number of reviews of that app as described in the reviews_max dictionary; and
 + The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.
 
Once again, we verified that the length of our list of clean data is equal to 9569, the number of unique apps in our dataset. 

In [11]:
unique_apps = []
duplicate_apps = []

for app in ios_data:
    app_name = app[0] 
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)
        
print('Number of apps in the App Store dataset:', len(ios_data))
print('\n')
print('Number of unique apps:', len(unique_apps))

Number of apps in the App Store dataset: 7197


Number of unique apps: 7197


According to the results in the above code cell, the App Store dataset only contain unqie entries. Thus, there is no need to remove duplicates. 

### Step 3: Removing Non-English Apps

Because the company only produces free apps intended for an English speaking audience, we should remove non-English apps from our data. This usually entails apps that do not contain English characters only such as 'Instachat 😜', or '爱奇艺PPS -《欢乐颂2》电视剧热播'. 

English text characters all have a value in the range of 0 to 127 according to the ASCII system. Thus, we should remove apps that have chracter values outside the range 0 to 127. We can easily identify the ASCII value of a character using the built-in function ord(). 

In [12]:
non_english = []

for app in android_clean:
    for character in app[0]:
        if ord(character) > 127:
            non_english.append(app)
            
print('Examples of non-English apps:')
print('\n')
print(non_english[2])
print('\n')
print(non_english[3])
print('\n')
print(non_english[6])

Examples of non-English apps:


['AutoScout24 Switzerland – Find your new car', 'AUTO_AND_VEHICLES', '4.6', '13372', 'Varies with device', '1,000,000+', 'Free', '0', 'Everyone', 'Auto & Vehicles', '3-Aug-18', 'Varies with device', 'Varies with device']


['Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo', 'AUTO_AND_VEHICLES', '4.6', '7880', 'Varies with device', '100,000+', 'Free', '0', 'Everyone', 'Auto & Vehicles', '10-May-18', '4.6.5', 'Varies with device']


['Docs To Go™ Free Office Suite', 'BUSINESS', '4.1', '217730', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Business', '2-Apr-18', 'Varies with device', 'Varies with device']


According to the code cell above, it seems that removing apps whose names contain character values outside the range of 0 to 127 may also remove apps that are actually intended for an English speaking audience. This is due to special characters (™, — (em dash), – (en dash), etc.) or emojis in some English app names that fall outside of the English text ASCII range. 

To minimize the impact of data loss, we will increase the threshold for special characters and will only remove an app if its name has more than three characters whose ASCII values fall outside the English text range.

In [13]:
def english_app(a_string):
    non_ascii = 0    
    
    for character in a_string:
        value = ord(character)
        if value > 127:
            non_ascii += 1 
        
    if non_ascii > 3:
        return False         
    else:
        return True 
            
print(english_app('Instagram'))
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_app('Instachat 😜'))

True
False
True


Though the the function written above to filter out non-English apps has not yet been fully optimized, it will suffice for the time being. 

In [14]:
ios_english_clean = []
android_english_clean = []

for app in ios_data:
    if english_app(app[1]) == True:
        ios_english_clean.append(app)

for app in android_clean:
    if english_app(app[0]) == True:
        android_english_clean.append(app)
        
explore_data(ios_english_clean, 0, 1, True)
print('\n')
explore_data(android_english_clean, 0, 1, True)

['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


That was the last row in the slice


Number of rows: 6183
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


That was the last row in the slice


Number of rows: 9614
Number of columns: 13


We used the `english_app` function to filter out the non-English apps from both our datasets without duplicates. We are left with 6183 iOS apps and 9614 Android apps.  

### Step 4: Isolate Free Apps

Thus far in the data cleaning process, we have removed inaccurate data, duplicate app entries and apps not intended for English speakers. Since we are interested in analyzing what type of free apps consumner would be most interested in, the last step in our data cleaning process is to isolate free apps in our datasets. 

In [15]:
android_final = []
ios_final = []


for app in android_english_clean:
    if app[7] == '0':
        android_final.append(app)

for app in ios_english_clean:
    if app[4] == '0':
        ios_final.append(app)

print(len(android_final))
print(len(ios_final))

8864
3222


There are 8864 Android apps and 3222 iOS apps that are free and intended for English speakers in our datasets.  

### Step 5: Most Common App by Genre

In order to minimize risks and overheads, the company has a 3 step validation strategy for releasing an app. The company wants to build a minimal version of an application for the Android environment initially. If the app is well received based on user response, it will further develop and release a refined version. After six months of profitability on the Andorid market, an iOS version of the app will be built and released on the App Store.  

Naturally, the next step in our data analysis process is to find app profiles that are successful on both the Android and iOS market. We begin by investigating what are the most common genres for each market via frequency tables. 

In [16]:
def freq_table(dataset, index):
    dictionary = {}
    total = 0 
    
    for app in dataset:
        total += 1
        value = app[index]
        
        if value in dictionary:
            dictionary[value] += 1
        else: 
            dictionary[value] = 1
    
    dictionary_percent = {}
    for key in dictionary:
        dictionary_percent[key] = round((dictionary[key]/total)*100,2)
    
    return dictionary_percent
        
    
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We defined a function called ```freq_table``` that takes in a dataset and an index of the dataset as inputs. The function returns a frequency table (expressed as percentages), in the form of a dictionary for the column, that corresponds to the index entered. The resulting frequency table of percentages is then displayed in descending order. 

In [17]:
display_table(ios_final, -5) #iOS Prime Genre

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


Upon examining the frequency table for the ```prime_genre``` column of the App Store dataset, in terms of free English apps, the App Store has a high concentration of apps designed for fun and entertainment (Games, Entertainment, Photo & Video, Social Networking, Sports, Music, etc.), while apps intended for practical purposes (Education, Shopping, Utilities, Health & Fitness, etc.) have a lower concentration. 

However, this picture painted by the frequency table may be a bit misleading. The saturation of a certain type of app being offered does not neccesarily equate to a comparable level of demand from consumers. 

In [18]:
display_table(android_final,1) #Android Category

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


At first glance, it seems that the Google Play market has a good mixture of apps designed for fun, and apps designed for pratical purposes (Family, Tool, Business, Lifestyle, Productivity, etc.). However, upon investigating the Family category, it appears that it is mostly games meants for kids. 

Even so, practical apps seem to have a better representation on Google Play compared to the App Store. 

In [19]:
display_table(android_final,9) #Android Genre

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

The idea that Google Play has a good representation of both fun apps, and apps designed for practical purposes, is further supported by the frequency table above for the `Genres` column. 

The ```Genres``` column in the Google Play dataset seems to be more granular in detail as opposed to the `Category` column. Since we are more interested in the bigger picture, we will continue our analysis using the `Category` column going forward. 

Thus far in our analysis, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. However, like we stated before, we are unsure if the more a type of app is offered, the higher its demand will be. To clarify this uncertainty, let us get an idea about the kind of apps that have the most users.

### Step 6: Most popular apps (user ratings) by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play dataset, this information can be found under the `Installs` column. But this information is absent in the App Store dataset. As a suitable substitution, we will approximate the total number of installs for an app with the total number of user ratings found in the `rating_count_tot` column.


In [20]:
ios_genres = freq_table(ios_final, -5)

for genre in ios_genres:
    total_ratings = 0
    genre_count = 0
    
    for app in ios_final:
        if app[-5] == genre:
            no_of_ratings = float(app[5])
            total_ratings = no_of_ratings + total_ratings
            genre_count += 1
    avg_no_of_ratings = round(total_ratings/genre_count,2)
    print(genre,':', avg_no_of_ratings)

Social Networking : 71548.35
Photo & Video : 28441.54
Games : 22788.67
Music : 57326.53
Reference : 74942.11
Health & Fitness : 23298.02
Weather : 52279.89
Utilities : 18684.46
Travel : 28243.8
Shopping : 26919.69
News : 21248.02
Navigation : 86090.33
Lifestyle : 16485.76
Entertainment : 14029.83
Food & Drink : 33333.92
Sports : 23008.9
Book : 39758.5
Finance : 31467.94
Education : 7003.98
Productivity : 21028.41
Business : 7491.12
Catalogs : 4004.0
Medical : 612.0


It seems like our earlier suspicions were right. Though for fun apps still seem to dominate the App store, there is still significant demand for apps intended for practical use when looking at the data through user demand instead of the number of apps of that type being offered on the App Store. 

In the above code cell, we calculated the average number of ratings per app genre on the App Store. Navigation, Reference, Social Networking, and Music apps have among the highest averages of number of user reviews. Let us further investigate these genres. 

In [21]:
from notebook.services.config import ConfigManager
cm = ConfigManager().update('notebook', {'limit_output': 5})

print('Navigation genre')
for app in ios_final:
    if app[-5] == 'Navigation':        
        print(app[1], ":", app[5]) # Print name of app and number of ratings        
        
print('\n')        
        
print('Social Networking genre')    
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ":", app[5]) # Print name of app and number of ratings
        
print('\n')        
        
print('Music genre')    
for app in ios_final:
    if app[-5] == 'Music':
        print(app[1], ":", app[5]) # Print name of app and number of ratings
        
print('\n')        
        
print('Reference genre')    
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ":", app[5]) # Print name of app and number of ratings
        

Navigation genre
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Social Networking genre
Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 L

Navigation apps has the highest average number of user ratings but this figure is heavily influenced by two main companies, Waze and Google Maps. Likewise, for Social Networking apps, the average number of ratings is heavily influenced by a few companies like Facebook, Pinterest, and Skype. The same applies to Music apps where major players like Pandora, Spotify, and Shazam hold a majority of the demand. It seems that making an app for the three genres mentioned above may not prove to be such a good idea since we would most likely have to compete against giants like Facebook, Google Maps, and Pandora.

Though a similar pattern appears in Reference apps, with big players like the Bible and Dictionary.com, there seems to be potential for a new app in the Reference genre market. One thing the company could do is offer different versions of the Bible, such as the King James Version, or New International Version. Another possibility is the company could turn the Quran into an app. In addition to these electronic versions of the Bible and the Quran, the app could also include different features such as daily quotes from the book, audio versions of the book, historical facts or original illustrations about pieces in the book, etc. Since Dictionary.com is a popular choice in the Reference genre, developers could also embed a dictionary, native to language version of the book, within the app which will allow users to look up the definitions of words while reading the book.

Since the App Store is already saturated with apps intended for fun and entertainment, a pratical app may have some ground on the market to establish itself.

Another genre that seems popular include weather. People generally do not spend much time checking the weather which would mean in generating little revenue from in-app advertisements. In addition, getting reliable second party data on live weather conditions may require our apps to connect to non-free API's. We could develop our own API's but modeling weather prediction is outside the scope of the company. 

For now, we will stick with our reccommendation. 

### Step 7: Most popular apps (user installations) by Genre on Google Play

Now, we move on to analyzing the average number of installations by genre, which is identified by the `Category` column in our dataset. Before we start though, we need to address the open-ended values in the `Installs` column such as 100+, 1,000+, 5,000+, etc.. Since we are only interested in the big picture of which app genre attracts the most user, we will assume 100+ to mean 100 installs, 5,000+ to mean 5000 installs, and so on. 


In [22]:
android_category = freq_table(android_final, 1)

for category in android_category:
    total_installs = 0
    category_count = 0
    
    for app in android_final:
        if app[1] == category:
            no_of_installs = app[5].replace('+','')
            no_of_installs = no_of_installs.replace(',','')
            total_installs = float(no_of_installs) + total_installs
            category_count += 1
    
    avg_no_of_installs = round(total_installs/category_count,2)
    print(category,':', avg_no_of_installs)

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 647317.82
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 8767811.89
BUSINESS : 1712290.15
COMICS : 817657.27
COMMUNICATION : 38456119.17
DATING : 854028.83
EDUCATION : 1833495.15
ENTERTAINMENT : 11640705.88
EVENTS : 253542.22
FINANCE : 1387692.48
FOOD_AND_DRINK : 1924897.74
HEALTH_AND_FITNESS : 4188821.99
HOUSE_AND_HOME : 1331540.56
LIBRARIES_AND_DEMO : 638503.73
LIFESTYLE : 1437816.27
GAME : 15588015.6
FAMILY : 3695641.82
MEDICAL : 120550.62
SOCIAL : 23253652.13
SHOPPING : 7036877.31
PHOTOGRAPHY : 17840110.4
SPORTS : 3638640.14
TRAVEL_AND_LOCAL : 13984077.71
TOOLS : 10801391.3
PERSONALIZATION : 5201482.61
PRODUCTIVITY : 16787331.34
PARENTING : 542603.62
WEATHER : 5074486.2
VIDEO_PLAYERS : 24727872.45
NEWS_AND_MAGAZINES : 9549178.47
MAPS_AND_NAVIGATION : 4056941.77


Communication, Video Players, Photography, and Productivity apps have among the highest average number of user installations. Let us further investigate these app genres.

In [23]:
print('Communications genre')
for app in android_final:
    no_of_installs = app[5].replace('+','')
    no_of_installs = no_of_installs.replace(',','')    
    if (app[1] == 'COMMUNICATION' and float(no_of_installs)>500000000):
        print(app[0], ":", app[5]) # Print name of app and number of ratings

print('\n')

print('Video Players genre')
for app in android_final:
    no_of_installs = app[5].replace('+','')
    no_of_installs = no_of_installs.replace(',','')    
    if (app[1] == 'VIDEO_PLAYERS' and float(no_of_installs)>100000000):
        print(app[0], ":", app[5]) # Print name of app and number of ratings

print('\n')

print('Photography genre')
for app in android_final:
    no_of_installs = app[5].replace('+','')
    no_of_installs = no_of_installs.replace(',','')    
    if (app[1] == 'PHOTOGRAPHY' and float(no_of_installs)>50000000):
        print(app[0], ":", app[5]) # Print name of app and number of ratings

print('\n')

print('Productivity genre')
for app in android_final:
    no_of_installs = app[5].replace('+','')
    no_of_installs = no_of_installs.replace(',','')    
    if (app[1] == 'PRODUCTIVITY' and float(no_of_installs)>100000000):
        print(app[0], ":", app[5]) # Print name of app and number of ratings
        

Communications genre
WhatsApp Messenger : 1,000,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+


Video Players genre
YouTube : 1,000,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+


Photography genre
B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+


Once again, it appears that these app genres are mostly dominated by big name companies such as Facebook (Messenger), Google (Gmail, Youtube, Google Play Movies & TV, Google Photos, Google Play Books), and Microsoft. Like before, these genres seem more popular than they really are and we believe that competing against large companies like Google and Facebook may not be the best business strategy. 

The Photography genre seems viable but apps like these probably generate much of their revenue from in-app purchase while the company is accustomed to in-app advertisements as their source of revenue.

The game genre also seems popular in Google Play, but as we have already established, the saturation of for-fun apps has already grown too large for us to establish ourselves over all the competition. 

Since we previously recommended developing an app for the Books and Reference genre, let us look further into this category. Our aim is to develop an app that will be profitable on both the App Store and Google Play after all. 

In [24]:
print('Books and Reference genre')
for app in android_final:
    no_of_installs = app[5].replace('+','')
    no_of_installs = no_of_installs.replace(',','')    
    if (app[1] == 'BOOKS_AND_REFERENCE' and float(no_of_installs)>500000):
        print(app[0], ":", app[5]) # Print name of app and number of ratings


Books and Reference genre
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Aldiko Book Reader : 10,000,000+
Wattpad 📖 Free Books : 100,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 

Electronic book libraries and dictionary apps seem to make up much of the Reference and Books genre. However, we once again notice the appearance of apps centered around popular books like the Bible, and the Quran. This further validates our previous recommendation — building an app around a popular book or its translation and including added in-app features like a built-in dictionary or an online discussion forum — to be a profitable business strategy. These extra features will help us stand out among the competition.  

### Conclusions

We analyzed data about mobile apps on the App Store and Google Play. The business goal was to recommend a profitable app profile the company could use to develop a free app intended for English Speakers, that would prove to be profitable on both the Android and iOS market.

We concluded that taking a popular book, such as the Bible (King James Version or New International Version) or the Quran, and turning it into an app could be a viable bnusiness strategy. In order to help us stand out amond the competition, we could include add features in the app such as,
* audio versions of the book.
* built in dictionary.
* daily quotes from the book.
* an online discussion forum for the book.
* original illustrations or artwork for piece from the book by independent artists.
* historical or geographical facts based on the book.

Not only could these extra features help us differentiate ourselves from the competition, they could also generate additional income by presenting themselves as in-app purchases to users. 
