## Mobile Apps: Analyzing the Google and iOS App Store for Profitable New App Ideas
* Analysis and Name of Company Created by **Eduardo Torres**
* TyrAppTor, Inc. 
* Last Updated: 04/04/2020

In this project, I will analyze publically available sample datasets from the Google and iOS App Store to help enable the application development team to make data-driven decisions on apps with highest potential revenue opportunities for the company. I will take the role of the Data Analyst for a company that builds free Android and iOS mobile apps and targets English Speakers. For purposes of this analysis, the company's name is TyrAppTor Inc., (TAT).

TAT builds Android and iOS Apps that are available to the public free to download and install, and TAT's main source of revenue consists of in-app ads. Considering how the company generates revenue, I will be analyzing the available data to find the best genres/categories to invest in app development. To better enable the app development decision making, I will determine:

* Most common apps by genre in the iOS and Google Play App Store
* Most popular apps determined by number of installs in the iOS and Google Play App Store

**<font color=Blue>Datasource Documentation:</font>**
**1.** [Google Play App Store](https://www.kaggle.com/lava18/google-play-store-apps) 
**2.**  [Apple iOS App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

## Summary of Results

As a result of the analysis, my findings indicate that a profitable app idea is one that takes a popular book and passes it through a gamification process that can create more of an active approach to reading. This idea can be in the form of a 3-D book where the user gets to follow the main character throughout the book and engages the user to be part of that story. This has the potential to explore a new approach to learning which can have a great marketing impact on the game and reputation of TyrAppTor, Inc.

## Exploring and Cleaning Publicly Available Data to Avoid Additional Costs

In [1]:
# The datasource function will help automate the import of other datasets
def datasource(source):
    opened_file = open(source)
    from csv import reader
    read_file = reader(opened_file)
    dataset = list(read_file)
    opened_file.close()
    return dataset
    
# Importing iOS App Store Dataset
ios_apps = datasource('AppleStore.csv')
ios_header = ios_apps[0]
ios_data = ios_apps[1:]

# Importing Google Play Store Dataset
android_apps = datasource('googleplaystore.csv')
android_header = android_apps[0]
android_data = android_apps[1:]

In [2]:
# The explore_data function helps make the datasource exploration readable (see below example)
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

**<font color=Grey>Explores the Google Play Store Raw Data:</font>**

In [3]:
print('Google Play Store Data')
print(android_header)
print('\n')
explore_data(android_data, 0, 3, True)

Google Play Store Data
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


**<font color=Grey>Explores the iOS App Store Raw Data:</font>**

In [4]:
print('iOS Apple Store Data')
print(ios_header)
print('\n')
explore_data(ios_data, 0, 3, True)
print('\n')

iOS Apple Store Data
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16




**<font color=Black>Data Exploration:</font>**

During the data exploration, I found that the Google Play data set contains *13* columns and aproximately *10,841* apps, as for the iOS App Store, I found that the data set contains *16* columns and aproximately *7,197* apps. Furthermore, I determined that the columns that could be useful for the analysis are listed in the below tables: 

**<font color=Green>Google Play Store:</font>**

|Column              | Defenition                        |
|:-------------------| :---------------------------------|
|App                 | App Name                          |
|Category            | Category the app belongs to       |
|Rating              | Overall User Rating               |
|Reviews             | Number of User Reviews            | 
|Installs            | Number of User Downloads/installs |
|Type                | Paid or Free                      |
|Price               | Price of the App.                 |
|Content Rating      | Age Group the App. Targets.       |
|Genres              | Available Genre Types             |

**<font color=Green>Apple iOS App Store:</font>**

|Column              | Defenition                      | 
|:-------------------| :-------------------------------|
|track_name          | App Name                        |
|currency            | Currency Type                   |
|price               | Price Amount                    |
|ratingcounttot      | User Rating Counts (All v.)     |
|ratingcountver      | User Rating counts (Current v.) |
|user_rating         | Average User Rating             |
|cont_rating         | Content Rating                  |
|prime_genre         | Primary Genre                   |

**<font color=Black>Data Cleaning and Preparation:</font>**

To make sure that the analysis is appropriate, the data source will need to go through cleaning and preparation before analysis can be performed. In this section, I will be detecting, correcting, and removing inaccurate and duplicate data.

To make sure I use my time efficiently, I am using the dicussions from the data source documentations to identify errors within the data sources. As such, my readings from the documentations indicate that the Google Apps data source index 10472 is missing the category, which is shifting data and must be removed. 

In [5]:
#For purposes of validations, I am comparing the index before and after 10472 to make sure
#the category and shiftment of data is actually happening
print(android_header)
print('\n')
print(android_data[10471])
print('\n')
print(android_data[10472])
print('\n')
print(android_data[10473])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [6]:
#The below deletes the 10472 index from the analysis
del android_data[10472]

**<font color=Black>Data Cleaning and Preparation - Continued:</font>**

The documentation also indicate that the data set contains multiple duplicate entries. Below I compiled a list to provide the number of duplicate entries, which is 1,181. These duplicates will need to be removed; however, these additional records cannot be removed randomly, instead will be removed strategically.

After looking through the example duplicates in the code below, I identified that, for example, 'Google Ads', returns three additional records. Besides the number of reviews, everything in those records are identical. The difference in the number of reviews can provide insight into how recent these entries are, as such, the highest number of reviews will be treated as the most recent and the rest of the records will be eliminated. The result I expect to see after removing the duplicates is 9659, which is the difference between the actual number in the data source and the duplicates.

In [7]:
#The below identifies duplicate and unique apps in list form for the Android App Store
duplicate_apps = []
unique_apps = []

for app in android_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])
print('\n')

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']




In [8]:
#In this section, I look through the example duplicate apps to identify a strategy to remove them
print(android_header)
print('\n')
for app in android_data:
    if app[0] == 'Google Ads':
        print(app)
        print('\n')

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']




In [9]:
#The below will identify the expect number of records expected once duplicates have been removed
reviews_max = {}

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Expected length:', len(android_data) - 1181) #Knowing that there are 1181 dups, I expect to see 9659 records
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


In [10]:
#The below will seperate clean data and and remove duplicate values using the reviews_max for the logic
android_clean = []
already_added = []

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) # make sure this is inside the if block

In [11]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


**<font color=Black>Data Cleaning and Preparation - Continued:</font>**

Once removing the duplicates, I explore the results to make sure that the expected results are correct using the explore_data function, which I see to be correct.

The next part of the preparation is to focus on the audience. In looking through the data, I see that there are multiple apps directed to Non-English speakers, which is not my focus. The next part of the preparation is to isolate Enlgish apps to continue with the analysis.

To isolate non-english apps, I will write a function (enlgish) that will utilize the ASCII (American Standard Code for Information Interchange) system, which provides an English Text range from 0 to 127 to help identify if a character belongs to the set of Enlgish characters. The only problem with this approach is that there are certain apps that use emojis and trade mark symbols, which puts them outside the 0-127 Enlgish character range, see test example below). To minimize data loss, I will update the function to only remove apps that have more than three characters outside of the English range, which will allow for apps with either three emojis and/or trademark symbols to be included in the analysis.

In [12]:
#Provided Examples of Non-English Apps
print('Provided Examples of Non-English Apps:')
print(ios_data[813][1])
print(ios_data[6731][1])
print(android_clean[4412][0])
print(android_clean[7940][0])

Provided Examples of Non-English Apps:
爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


In [13]:
#Functions Determines if the String Input is English.
def english(string):
    for char in string:
        if ord(char) > 127:
            return False
    return True

In [14]:
#Tests example for the English function
print('Is '+'Instagram '+'an English App?:', english('Instagram'))
print('Is '+'爱奇艺PPS -《欢乐颂2》电视剧热播 '+'an English App?:', english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('Is '+'Docs To Go™ Free Office Suite '+'an English App?:', english('Docs To Go™ Free Office Suite'))
print('Is '+'Instachat 😜 '+'an English App?:', english('Instachat 😜'))

Is Instagram an English App?: True
Is 爱奇艺PPS -《欢乐颂2》电视剧热播 an English App?: False
Is Docs To Go™ Free Office Suite an English App?: False
Is Instachat 😜 an English App?: False


In [15]:
#Update to the Enlgish Functions to Include 3 characters outside the Enlgish Range
def new_english(string):
    not_english = 0
    
    for char in string:
        if ord(char) > 127:
            not_english += 1
            
    if not_english > 3:
        return False
    else:
        return True

In [16]:
#Tests example for the New English function
print('Is '+'Instagram '+'an English App?:', new_english('Instagram'))
print('Is '+'爱奇艺PPS -《欢乐颂2》电视剧热播 '+'an English App?:', new_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('Is '+'Docs To Go™ Free Office Suite '+'an English App?:', new_english('Docs To Go™ Free Office Suite'))
print('Is '+'Instachat 😜 '+'an English App?:', new_english('Instachat 😜'))

Is Instagram an English App?: True
Is 爱奇艺PPS -《欢乐颂2》电视剧热播 an English App?: False
Is Docs To Go™ Free Office Suite an English App?: True
Is Instachat 😜 an English App?: True


In [17]:
#The below will isolate English apps into its own list
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if new_english(name) == True:
        android_english.append(app)
        
for app in ios_data:
    name = app[1]
    if new_english(name) == True:
        ios_english.append(app)

**<font color=Grey>Explores Google English App Play Store Data:</font>**

In [18]:
explore_data(android_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


**<font color=Grey>Explores iOS English App Store Data:</font>**

In [19]:
explore_data(ios_english, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


**<font color=Black>Data Cleaning and Preparation - Continued:</font>**

The last part of the data cleaning and preparation is to look at the price aspect. As mentioned in the introduction, TAT generates revenue through in-app ads from free apps to the public. The data set contains paid and free apps, which will require free apps to be isolated. 

The below process isolates free apps to address the last step in the cleaning process. The final app count for the Android data is *8,864*. 

In [20]:
#The below seperates apps with a price of 0 and classifies them as part of a new list called Android
android = []
ios = []

for app in android_english:
    price = app[7]
    if price == '0':
        android.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios.append(app)

**<font color=Grey>Explores Google Free English App Play Store Data:</font>**

In [21]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


**<font color=Grey>Explores iOS Free English App Store Data:</font>**

In [22]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


## Identifying Apps with the Highest Revenue Potential

TAT's revenue is directly impacted by the number of users interacting with its applications, which makes the process of data-driven decision making extremly important for the company. The main portion of this analysis is to identify applications already in use with the highest potential number of users.

Additionally, TAT's is a cost conscious company, which means that potential application ideas have to go through a validation process made up of three parts:
 * Minimal software development and market testing
 * Assuming good responses during the initial development, further development is made
 * Once profitable after six months, iOS and Android apps are built

In [23]:
print('Google Play Store')
print(android_header)
print('\n')
print('iOS App Store')
print(ios_header)

Google Play Store
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


iOS App Store
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


**<font color=Black>Most Common Genres/Categories in each App Store:</font>**

In this section, I am investigating apps with the most common Genres and Categories in both the Google and iOS App Store. I will start by looking at common generes in both app markets. I will be using the (Category and Genres) columns for the Google Play Store and (prime_genre) column for the iOS App Store. Using these columns, I created multiple frequency tables to organize values as percentages of total in descending order. These frequency tables provide insight into which genres/categories are most common in the app market place. 

In [24]:
#The nested fuction below organizes column information in a frequency table as a percentage of total in dec order
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        column = row[index]
        
        if column in table:
            table[column] += 1
        else:
            table[column] = 1
            
    percentage_table = {}
    for key in table:
        percentage = (table[key] / total) * 100
        percentage_table[key] = percentage
            
    return percentage_table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

**<font color=Grey>Google App Play Store Analysis:</font>**

Looking through the Google Play Store genre frequency table, I see that there is no significant distinction between genres. Tools rank as the most common, followed by Entertainment, Education and Business (8.4%, 6.1%, 5.3%, 4.6%, respectively). This indicates that the Google Play Store shows a more balanced landscpae for fun and practical applications. However, in looking through the Category frequency table, I see a different story. The Category table indicates that the Family category is the most common, followed by Games and Tools (18.9%, 9.7&, and 8.5%, respectively). 

The interesting part of the data is that Family contains a significat portion of children games, which indicate that the Google App Store is heavily dominated by game related apps, which are the most common. Since I am looking for the most common apps, the genere column is not very helpful for my analysis. I believe the Category Colum provides a better breakdown and representation of the most common, and for this reason, will move forward only using the Category Column for the Google Play Store.

In [25]:
# Most common 'Genre' in the Google Play Store
display_table(android, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [26]:
# Most common 'Category' in the Google Play Store
display_table(android, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

**<font color=Grey>iOS App Store Analysis:</font>**

As for the free English iOS App Store anlysis, the Games genre stands out with a 58.16%, which is significantly higher than any of the other genres. The Entertainment, Photo & Video, Education, and Social Networking generes are next in line and are more balanced out (7.9%, 5.0%, 3.7%, 2.3%, respectively).

While it is clear that the free English iOS Apps data demonstrates a market heavily dominated by Games, Entertainment, Photo & Video, Education and Social Networking, I will continue to investigate to determine if commonality also indicates popularity, in terms of number of installs (user counts).

In [27]:
# Most common 'Prime Genre' in the Google Play Store
display_table(ios, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


**<font color=Black>Identifying App Popularaty by the Number of Users:</font>**

In this section of the analysis, I will focus on identifying app popularity driven by the number of users. As such, I will be using the Installs column from the Google Play Store to identify popularity. Considering that the iOS App Store does not contain an installs column, I will default to the rating_count_tot to indicate popularity for apps.

**<font color=Grey>Google App Play Store Analysis:</font>**

Using the display_table function that I previously created, I am able to quickly see what genre popularity looks like in a freequency table form. Now, while the frequency table provides insight, its difficult to read into popularity considering how open-ended these numbers look. This will require cleaning and standardization, meeaning, I will be removing (,'s and +'s) and converting the string values into floats in order to properly calculate average of installs within the genres.

In [28]:
display_table(android, 5) # the Installs columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [29]:
import string

def clean_string(input):
    clean = input
    for char in clean:
        if char in string.punctuation:
            clean = clean.replace(char,"")
    return clean

In [30]:
def genre_installs(string, n_one, n_two):    
    categories_dic = freq_table(string, n_one)
    for category in categories_dic:
        total = 0 #sum of installs specific to each genre
        len_category = 0 #store the number of apps specific to each genre
        for row in android:
            category_app = row[n_one]
            installs = row[n_two]
            if category_app in category:
                installs = clean_string(installs)
                installs = int(installs)
                total += installs
                len_category += 1

        average_num = total / len_category
        print(category, ':', average_num)

In [31]:
genre_installs(android,1,5)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

In [32]:
for row in android:
    app = row[1]
    name = row[0]
    rating = row[5]
    if app == 'COMMUNICATION'and (rating == '1,000,000,000+'or rating == '500,000,000+'or rating == '100,000,000+'):
        print(name, ":", rating)

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

**<font color=Grey>Google App Play Store Analysis Continued:</font>**

Once I cleaned and looped over the Google Play App store data, I was able to determine that the communication genere had the the most installs with an approximate number of  38,456,119 installs. The only problem with this number is that looking deeper into the data, I see that outliers are skewing my results. This indicates that if these outliers were to be removed, the data could indicate that the original average is much smaller than origionally anticipated.

While the communications genere is interesting, there are large players that have a lot of power and control this space, like WhatsApp. This can also be said about games and video apps, like Youtube. The market is over saturated, which means that the chances of generating revenue in these genres will not be as optimistic and would not recommend building apps into these spaces.

One genere that is interesting and has the potential to generate revenue is the BOOKS_AND_REFERENCE, which has a estemiated avereage of 8,767,812 installs. While this genere may also have a few outliers, there are a few apps that make up that outlier group that is not significant enough to ignore this genere. From the information in the frequency tables, I see that a great idea would be to take an app that is popular and make it into an app that can be more engaging (a game version of a book or a simply 3-D audio book) than just a normal raw book.

In [33]:
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [34]:
# This demonstrates outliers within the genre
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [35]:
#This section convers a section that is still worth exploring
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

**<font color=Grey>iOS App Store Analysis:</font>**

As for the iOS App Store, the navigation genre leads the charts; howerver, outliers (i.e. Waze and Google Maps) make up a significant number of those installs. Unfortunately, most of the popular genres are heavily influeced by a few popular apps which makes the genre appear more popular that it really is.

The purpose of this analysis was to find an app idea that can be applied to both the Google Play Store and iOS App Store. Since the Google Play Store has a great opportunity within the Books and Reference genre, exploring this genre in the iOS App Store considering that it has a good average of installs in the iOS App Store is a good idea.

Looking through the frequency table, I see that this genre is domincated by the Bible and Dictionary Apps; however, looking closer to the data, I see that there might also be an opportunity to enter this genre and provide an app that takes a popular book and creates a version that is more engaging to the user.

In [36]:
genres_ios = freq_table(ios, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


In [37]:
for app in ios:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


## Conclusion

In this project, I took the role of the analyst to analyze the Google and iOS app store data for potential profitable new app ideas. 

In my analysis, I concluded that a profitable app idea is one that takes a popular book and passes it through a gamification process that can create more of an active approach to reading. This idea can be in the form of a 3-D book where the user gets to follow the main character throughout the book and engages the user to be part of that story. This has the potential to explore a new approach to learning which can have a great marketing impact on the game and reputation of TyrAppTor, Inc.