# Introduction

This project is about analyzing app data from Google Play and Apple App stores. We'll pretend we're data analysts working for a company building Android and iOS apps in English. Our company only build apps that are free, and revenue comes from in-app ads. 

We'd like to profile the most profitable apps on the Google Play and Apple App stores Going through the data will help our developers understand what types of apps users gravitate towards.

The data for [Google Play][1] and [Apple App Store][2] can be downloaded at Kaggle.

[1]:https://www.kaggle.com/lava18/google-play-store-apps
[2]:https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

## Opening and Exploring the Data

First we open the files and give them corresponding variable names. The header column is separated from the data for quick access. Here we will use the `explore_data()` function for exploration. It prints the rows in the list so they're readable, and finds the number of rows and columns if `rows_and_columns` is `True`. It assumes the input dataset doesn't have a header row.

The first few rows of each data set are printed along with the number of rows and columns. We also try and identify some columns that could help with our analysis.

In [2]:
from csv import reader

#Open .csv files
file1 = open("AppleStore.csv", encoding='utf8')
file2 = open("googleplaystore.csv", encoding='utf8')

apple_file = reader(file1)
apple_apps_data = list(apple_file) #lists of list

google_file = reader(file2)
google_apps_data = list(google_file)

In [3]:
#Separating the header from the data set 
apple_head = apple_apps_data[0]
apple_data = apple_apps_data[1:]

google_head = google_apps_data[0]
google_data = google_apps_data[1:]

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)
        print('\n') # new empty line for separation
        
    if rows_and_columns:
        print("Number of rows:", len(dataset))
        print("Number of columns:", len(dataset[0]))

#First few rows
print("Apple Rows")
explore_data(apple_data, 1, 3, True)
print('\n')
print("Google Rows")
explore_data(google_data, 1, 3, True)

Apple Rows
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Google Rows
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [4]:
#Exploring Columns
print(apple_head, '\n')
print(google_head, '\n')

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 



The columns we could use need to be related to price (we develop free apps only) and the user ratings for the app. They're detailed in two tables here:

| Google Column Name | Description |
|:-----------:|:------------:|
| 'Rating' | User rating of the app |
| 'Installs' | Number of downloads |
| 'Price' | Price of the App |
| 'Type' | Whether an app is paid or free |

| Apple Column Name | Description |
|:---------:|:---------:|
| 'user_rating' | Average user rating (for all version) |
| 'user_rating_ver' | Average user rating (for current version) |
| 'Price' | Price of the app |

## Wrong Data

The discussion section for the Google Play Store data set describes an error for row 10472 (data set without the header). Printing row 10472, the header, and another row show the rating for row 10472 has a rating of 19, which is incorrect, since the maximum rating is 5. Therefore we'll delete this row.

In [5]:
print(google_head, '\n') #header
print(google_data[10472], '\n') # incorrect
print(google_data[10473]) #correct

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [6]:

del google_data[10472] # Running this more than once will delete more data.

## Duplicate Apps
### Part One

From the discussion section for the Google Play Store data, duplicate entries for the same applications have been found. An example is Instagram:

In [7]:
for app in google_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Using a `for` loop we see that there's 1,181 duplicate apps. In this case, getting rid of the duplicate app data will make our analysis more accurate. The duplicates won't be removed randomly. Take the Instagram app duplicates. All the data in each row is the same except for the 4th entry, which the number of user reviews. The different amount of user reviews suggests the data was taken at different times. It seems the higher the number of reviews, the more recent the data is. As such, we will keep the row with the highest amount of user reviews.

In [8]:
duplicate = []
unique = []

for app in google_data:
    app_name = app[0]
    if app_name in unique:
        duplicate.append(app_name)
    else:
        unique.append(app_name)

print('Number of duplicate apps:', len(duplicate))
print('\n')
print('Examples of duplicate apps:', duplicate[:3])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business']


### Part Two

After removing the duplicate Google Play Store apps we should be left with 9659 unique apps.

In [9]:
print('Length after duplicates:', len(google_data) - 1181)

Length after duplicates: 9659


To remove the duplicates, we'll create a dictionary where each key is a unqiue app name and the corresponding dictionary value is the highest number of reviews for that particular app. 

In [10]:
reviews_max = {} #empty dictionary

for app in google_data:
    name = app[0] #app name
    n_reviews = float(app[3]) #number of reviews for the app
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews

    elif name not in reviews_max:
        reviews_max[name] = n_reviews

len(reviews_max) # is 9659 as expected

9659

Here we use the `reviews_max`dictionary to remove the duplicate rows. The list `android_clean` will hold a list of lists of our cleaned data while the `already_added` list helps us keep track of already added apps.

In [11]:
android_clean = [] #list for new cleaned data set
already_added = [] #app names

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English Apps
### Part One

Exploring the data sets long enough, we find that some apps aren't in English:

In [12]:
#Non-English Apps
print(apple_data[813][1])
print(apple_data[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


Since our company only develops English apps, we'll remove the non-English ones. One way to do this is looking at each app name and find the characters, letters, and symbols that don't belong to the English alphabet.

The `ord()` function gives an ASCII demical number corresponding to the input character. For example, the decimal number for `b` is `98`. In ASCII, the characters for English are in the range 0 to 127, so any numbers greater than 127 don't belong to the set of English characters. 

Using this information, we'll build a function that will iterate through each indivdual character in the app names and check if the number is in the range 0 to 127.

In [13]:
#function that iterates over input string

def English(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(English('Instagram'))
print(English('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


Certain English app names are identified as `False` because they use characters like emojis and other symbols (™) aren't in the ASCII range.

In [14]:
print(English('Docs To Go™ Free Office Suite'))
print(English('Instachat 😜'))
print('\n')
print(ord('™'))
print(ord('😜'))

False
False


8482
128540


Using the current `English` function, we'll lose important data since many English apps will be incorrectly labeled. To reduce this impact, we'll remove an app only if the name has more than 3 non-ASCII characters.

In [15]:
def English(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

In [16]:
print(English('Docs To Go™ Free Office Suite'))
print(English('Instachat 😜'))
print(English('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


In [17]:
#Filtering the data sets with English function

google_eng = []
apple_eng = []

for app in android_clean:
    name = app[0]
    
    if English(name) == True:
        google_eng.append(app)

for app in apple_data:
    name = app[1]
    
    if English(name) == True:
        apple_eng.append(app)

#Exploring to see how many remaining rows are left

explore_data(google_eng,0,3,True)
print('\n')
explore_data(apple_eng, 0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

Now we have 9,614 Google Play apps and 6,183 Apple apps, all in English.

## Isolating the Free Apps

We'll need to isolate the free apps from the non-free ones in order to start our analysis. We'll loop through each data set to isolate the free ones.

In [18]:
google_free = []
apple_free = []

for app in google_eng:
    price = app[7]
    
    if price == '0':
        google_free.append(app)

for app in apple_eng:
    price = app[4]
    
    if price == '0.0':
        apple_free.append(app)

explore_data(google_free,0,3,True)
print('\n')
explore_data(apple_free,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

Our final data sets have 8864 Google Play apps and 3222 Apple apps.

## Most Common Apps by Genre
### Part One

As said in the introduction, we'd like to figure out the types of apps that attract the most users since our revenue is influenced by how many people are using our apps.

We'll use this validation strategy for app ideas:
    1. Build a Android version of the app and publish it on Google Play.
    2. If the app gets a large response, we'll develop further.
    3. After six months, if the app is profitable, we'll build an iOS version of the app and publish it on the App Store.

Since we want the app on both Google Play and the App Store, we want to find successful app profiles on both markets. To do this we'll generate frequency tables to find out what the most common genres in each market, referring to the `prime-genre` column in the Apple data set and the `Genres` and `Category` columns in the  Google data set. 

### Part Two

In [19]:
#Function to generate frequency tables that show percentage
def freq_table(dataset, index):
    freq_dict = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]

        if value in freq_dict:
            freq_dict[value] += 1
        else:
            freq_dict[value] = 1
    
    freq_percentage = {}
    
    for key in freq_dict:
        percent = (freq_dict[key] / total) * 100
        freq_percentage[key] = percent
        
    return freq_percentage
     
#Function that displays those percentages in descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
        print(entry[1] + ':', entry[0])

### Part Three

In [20]:
display_table(apple_free, -5) #prime_genre

Games: 58.16263190564867
Entertainment: 7.883302296710118
Photo & Video: 4.9658597144630665
Education: 3.662321539416512
Social Networking: 3.2898820608317814
Shopping: 2.60707635009311
Utilities: 2.5139664804469275
Sports: 2.1415270018621975
Music: 2.0484171322160147
Health & Fitness: 2.0173805090006205
Productivity: 1.7380509000620732
Lifestyle: 1.5828677839851024
News: 1.3345747982619491
Travel: 1.2414649286157666
Finance: 1.1173184357541899
Weather: 0.8690254500310366
Food & Drink: 0.8069522036002483
Reference: 0.5586592178770949
Business: 0.5276225946617008
Book: 0.4345127250155183
Navigation: 0.186219739292365
Medical: 0.186219739292365
Catalogs: 0.12414649286157665


In the `prime_genre` column of the App Store data set, the most common genre is "Games" at 58.16% followed by "Entertainment" at 7.88%. Most of apps fall into the entertainment category ("Photo & Video", "Social Networking", etc.) while lifestyle apps ("Productivity", "Health & Fitness") fall in the middle. The general trend is Apple users mostly have entertainment apps. A successful app profile would likely be in the entertainment section; however, the frequency of the genre might not imply there's a large number of users. 

In [21]:
display_table(google_free, -4) #Genres

Tools: 8.449909747292418
Entertainment: 6.069494584837545
Education: 5.347472924187725
Business: 4.591606498194946
Productivity: 3.892148014440433
Lifestyle: 3.892148014440433
Finance: 3.7003610108303246
Medical: 3.531137184115524
Sports: 3.463447653429603
Personalization: 3.3167870036101084
Communication: 3.2378158844765346
Action: 3.1024368231046933
Health & Fitness: 3.0798736462093865
Photography: 2.944494584837545
News & Magazines: 2.7978339350180503
Social: 2.6624548736462095
Travel & Local: 2.3240072202166067
Shopping: 2.2450361010830324
Books & Reference: 2.1435018050541514
Simulation: 2.0419675090252705
Dating: 1.861462093862816
Arcade: 1.8501805054151623
Video Players & Editors: 1.7712093862815883
Casual: 1.7599277978339352
Maps & Navigation: 1.3989169675090252
Food & Drink: 1.2409747292418771
Puzzle: 1.128158844765343
Racing: 0.9927797833935018
Role Playing: 0.9363718411552346
Libraries & Demo: 0.9363718411552346
Auto & Vehicles: 0.9250902527075812
Strategy: 0.913808664259927

For the `Genres` column, the most common genres are "Tools" at 8.45%, "Entertainment" at 6.07%, and "Education" at 5.35%. Compared to the App Store, where most of the apps are designed for fun, the Google Play market is more balanced, with both productivity and fun apps. Near the bottom of the list, most of the multiple genre apps are labeled as "Education". The abundance of this label may be explained in the Category table below.

In [22]:
display_table(google_free, 1) #Category

FAMILY: 18.907942238267147
GAME: 9.724729241877256
TOOLS: 8.461191335740072
BUSINESS: 4.591606498194946
LIFESTYLE: 3.9034296028880866
PRODUCTIVITY: 3.892148014440433
FINANCE: 3.7003610108303246
MEDICAL: 3.531137184115524
SPORTS: 3.395758122743682
PERSONALIZATION: 3.3167870036101084
COMMUNICATION: 3.2378158844765346
HEALTH_AND_FITNESS: 3.0798736462093865
PHOTOGRAPHY: 2.944494584837545
NEWS_AND_MAGAZINES: 2.7978339350180503
SOCIAL: 2.6624548736462095
TRAVEL_AND_LOCAL: 2.33528880866426
SHOPPING: 2.2450361010830324
BOOKS_AND_REFERENCE: 2.1435018050541514
DATING: 1.861462093862816
VIDEO_PLAYERS: 1.7937725631768955
MAPS_AND_NAVIGATION: 1.3989169675090252
FOOD_AND_DRINK: 1.2409747292418771
EDUCATION: 1.1620036101083033
ENTERTAINMENT: 0.9589350180505415
LIBRARIES_AND_DEMO: 0.9363718411552346
AUTO_AND_VEHICLES: 0.9250902527075812
HOUSE_AND_HOME: 0.8235559566787004
WEATHER: 0.8009927797833934
EVENTS: 0.7107400722021661
PARENTING: 0.6543321299638989
ART_AND_DESIGN: 0.6430505415162455
COMICS: 0.62

The `Category` column gives a better breakdown of how the apps are grouped. We see the "Family" category is much larger than the others at 18.91%, followed by "Game" at 9.72% and "Tools" at 8.46%. Below we see that most of the apps in the Family section are for children and focused on education. Apps in the Family category with a focus on education seems to be the most successful profile.

![Family Category](http://localhost:8888/files/OneDrive/Projects/app-store-project/Google_Play_Family.png)

## Most Popular Apps by Genre 
### The App Store

We've seen the App Store is full of fun apps for entertainment, while the Google Play store is more balanced with both fun and entertainment. We'd like to find out which genres have the most users. In the Google Play data set, this information is in the `Installs` column, but there is no similar data in the App Store data set. Instead, we'll use the total number of user ratings, found in `rating_count_tot`, as a substitute. To start, we'll calulate the average number of user ratings per app genre.

In [23]:
prime_genre_freq = freq_table(apple_free, -5) #prime_genre

for genre in prime_genre_freq:
    total = 0 #sum of the number of user ratings
    len_genre = 0 #number of apps specific to each genre
    
    for app in apple_free:
        genre_app = app[-5]
        if genre_app == genre:
            user_ratings_tot = float(app[5])
            total += user_ratings_tot
            len_genre += 1
    
    user_ratings_avg = total / len_genre
    print(genre + ':', user_ratings_avg)

Social Networking: 71548.34905660378
Photo & Video: 28441.54375
Games: 22788.6696905016
Music: 57326.530303030304
Reference: 74942.11111111111
Health & Fitness: 23298.015384615384
Weather: 52279.892857142855
Utilities: 18684.456790123455
Travel: 28243.8
Shopping: 26919.690476190477
News: 21248.023255813954
Navigation: 86090.33333333333
Lifestyle: 16485.764705882353
Entertainment: 14029.830708661417
Food & Drink: 33333.92307692308
Sports: 23008.898550724636
Book: 39758.5
Finance: 31467.944444444445
Education: 7003.983050847458
Productivity: 21028.410714285714
Business: 7491.117647058823
Catalogs: 4004.0
Medical: 612.0


Looking at the results, we see that the `Navigation` genre has the highest average number of total user reviews. Waze and Google Maps skew the total with their review numbers being much higher than the others. This imbalance doesn't allow for an equal representation of the available applications.

In [24]:
for app in apple_free:
    name = app[1]
    total_ratings = float(app[5])
    if app[-5] == 'Navigation':
        print(name + ':', total_ratings) 

Waze - GPS Navigation, Maps & Real-time Traffic: 345046.0
Google Maps - Navigation & Transit: 154911.0
Geocaching®: 12811.0
CoPilot GPS – Car Navigation & Offline Maps: 3582.0
ImmobilienScout24: Real Estate Search in Germany: 187.0
Railway Route Search: 5.0


Given we found the most common genres in the App Store were about fun and entertainment, let's look through the "Social Networking" genre:

In [25]:
for app in apple_free:
    name = app[1]
    total_ratings = float(app[5])
    if app[-5] == 'Social Networking' and total_ratings > 0:
        print(name + ':', total_ratings) 

Facebook: 2974676.0
Pinterest: 1061624.0
Skype for iPhone: 373519.0
Messenger: 351466.0
Tumblr: 334293.0
WhatsApp Messenger: 287589.0
Kik: 260965.0
ooVoo – Free Video Call, Text and Voice: 177501.0
TextNow - Unlimited Text + Calls: 164963.0
Viber Messenger – Text & Call: 164249.0
Followers - Social Analytics For Instagram: 112778.0
MeetMe - Chat and Meet New People: 97072.0
We Heart It - Fashion, wallpapers, quotes, tattoos: 90414.0
InsTrack for Instagram - Analytics Plus More: 85535.0
Tango - Free Video Call, Voice and Chat: 75412.0
LinkedIn: 71856.0
Match™ - #1 Dating App.: 60659.0
Skype for iPad: 60163.0
POF - Best Dating App for Conversations: 52642.0
Timehop: 49510.0
Find My Family, Friends & iPhone - Life360 Locator: 43877.0
Whisper - Share, Express, Meet: 39819.0
Hangouts: 36404.0
LINE PLAY - Your Avatar World: 34677.0
WeChat: 34584.0
Badoo - Meet New People, Chat, Socialize.: 34428.0
Followers + for Instagram - Follower Analytics: 28633.0
GroupMe: 28260.0
Marco Polo Video Walki

Again, known popular app giants like Facebook and Tumblr dominate the results with review numbers over 100,000+. Smaller social apps struggle to compete in this market, each averaging around 300 to 500 reviews. Creating a social application in the `Social Networking` genre and getting it to be profitable will be a challenge. `Utilities` has on average 18,684 total reviews but the apps in this genre seem to be more balanced. 

In [26]:
for app in apple_free:
    name = app[1]
    total_ratings = float(app[5])
    if app[-5] == 'Utilities':
        print(name + ':', total_ratings) 

Google – Search made just for mobile: 479440.0
Flashlight Ⓞ: 130450.0
My Verizon: 126948.0
myAT&T: 108507.0
The Calculator - Free and Easy Calculating!: 99244.0
Speedtest by Ookla: 65016.0
Google Chrome – The Fast and Secure Web Browser: 55750.0
Alarm Clock HD Free - Digital Alarm Clock Display: 54496.0
Calculator Pro for iPad Free - Smart Calculator: 47803.0
Bitmoji - Your Personal Emoji: 37972.0
Calculator‰: 36879.0
Flashlight ！: 35769.0
Truecaller - Spam Identification & Block: 27791.0
Browser and File Manager for Documents: 27750.0
Flashlight for iPhone , iPod and iPad: 26697.0
RainbowKey - Color keyboard themes, fonts & GIF: 23063.0
Calculator HD Free: 14986.0
QR Reader for iPhone: 12683.0
Puffin Web Browser: 11677.0
Evolution Calculator - CP & XP - for Pokemon GO!: 9269.0
Gboard — a new keyboard from Google: 8873.0
Free QR Code Reader & Barcode Scanner for iPhone: 7343.0
GIF Keyboard: 5979.0
Alarm Clock for Me - Best Wake Up Music & Clock: 4959.0
Quick Scan - QR Code Reader: 4783

The `Utilities` genre looks a little more promising. Although the Google app holds the top spot, for most phones, most of these apps are already built-in. A lot of them are calculators, flashlights, or keyboard theme apps; they likely have extra features that make it worthwhile for users to download them.

One app idea could be a basic phone storage analyzer and cleaner. The analyzer would provide a breakdown of how much space each app on the phone takes up, and the space of any other files. The user would have the option to look at the nonessential files of each app and delete as they wish. Our app built-in cleaner, when activated, will automatically go through the files on the phone and find unused memory, cache, temporary files and show it to the user. Then the user choose choose to clear those files or not. 

Since the App Store has mostly fun and social apps, a tool application could be succesful. It also fits in with the Google Play Store frequency table, where the popular genres and categories are mostly productivity applications.

Let's look at the most popular apps by genre for the Google Play Store.

### The Google Play Store

The data set for Google Play has install numbers for each app, but the given values aren't very clear cut: 

In [27]:
display_table(google_free, 5) #Installs column

1,000,000+: 15.726534296028879
100,000+: 11.552346570397113
10,000,000+: 10.548285198555957
10,000+: 10.198555956678701
1,000+: 8.393501805054152
100+: 6.915613718411552
5,000,000+: 6.825361010830325
500,000+: 5.561823104693141
50,000+: 4.7721119133574
5,000+: 4.512635379061372
10+: 3.5424187725631766
500+: 3.2490974729241873
50,000,000+: 2.3014440433213
100,000,000+: 2.1322202166064983
50+: 1.917870036101083
5+: 0.78971119133574
1+: 0.5076714801444043
500,000,000+: 0.2707581227436823
1,000,000,000+: 0.22563176895306858
0+: 0.04512635379061372
0: 0.01128158844765343


An app with `5,000+` installs might have 5,000 or 7,000 installs, but since we only want to find out which app genres attract the most users, we'll leave the installs numbers as is. So an app with `1,000,000+` installs will have 1,000,000 installs, an app with `100,000+` installs will have 100,000 installs, etc. With that in mind, we'll use a nested loop to calculate the average number of installs per app genre.

In [28]:
categories_freq = freq_table(google_free, 1)

for category in categories_freq:
    total = 0 # sum of installs specific to each genre
    len_category = 0 # number of apps specific to each genre
    
    for app in google_free:
        category_app = app[1] 
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    
    installs_avg = total / len_category
    print(category + ':', installs_avg)

ART_AND_DESIGN: 1986335.0877192982
AUTO_AND_VEHICLES: 647317.8170731707
BEAUTY: 513151.88679245283
BOOKS_AND_REFERENCE: 8767811.894736841
BUSINESS: 1712290.1474201474
COMICS: 817657.2727272727
COMMUNICATION: 38456119.167247385
DATING: 854028.8303030303
EDUCATION: 1833495.145631068
ENTERTAINMENT: 11640705.88235294
EVENTS: 253542.22222222222
FINANCE: 1387692.475609756
FOOD_AND_DRINK: 1924897.7363636363
HEALTH_AND_FITNESS: 4188821.9853479853
HOUSE_AND_HOME: 1331540.5616438356
LIBRARIES_AND_DEMO: 638503.734939759
LIFESTYLE: 1437816.2687861272
GAME: 15588015.603248259
FAMILY: 3695641.8198090694
MEDICAL: 120550.61980830671
SOCIAL: 23253652.127118643
SHOPPING: 7036877.311557789
PHOTOGRAPHY: 17840110.40229885
SPORTS: 3638640.1428571427
TRAVEL_AND_LOCAL: 13984077.710144928
TOOLS: 10801391.298666667
PERSONALIZATION: 5201482.6122448975
PRODUCTIVITY: 16787331.344927534
PARENTING: 542603.6206896552
WEATHER: 5074486.197183099
VIDEO_PLAYERS: 24727872.452830188
NEWS_AND_MAGAZINES: 9549178.467741935
MA

From the list we see the communication category has the most number of average installs: 38,456,119. This is followed by the video players category. Thesehigh numbers are skewed by apps such as Facebook Messenger, WhatsApp, and Google Duo. Many of these apps have above an average of 100 million or 1 billion installs. The video players category also suffers from this fate as well, with a few apps (such as YouTube) inflating the value of installs. 

In [29]:
for app in google_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0] + ':', app[5])

WhatsApp Messenger: 1,000,000,000+
imo beta free calls and text: 100,000,000+
Android Messages: 100,000,000+
Google Duo - High Quality Video Calls: 500,000,000+
Messenger – Text and Video Chat for Free: 1,000,000,000+
imo free video calls and chat: 500,000,000+
Skype - free IM & video calls: 1,000,000,000+
Who: 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji: 100,000,000+
LINE: Free Calls & Messages: 500,000,000+
Google Chrome: Fast & Secure: 1,000,000,000+
Firefox Browser fast & private: 100,000,000+
UC Browser - Fast Download Private & Secure: 500,000,000+
Gmail: 1,000,000,000+
Hangouts: 1,000,000,000+
Messenger Lite: Free Calls & Messages: 100,000,000+
Kik: 100,000,000+
KakaoTalk: Free Calls & Text: 100,000,000+
Opera Mini - fast web browser: 100,000,000+
Opera Browser: Fast and Secure: 100,000,000+
Telegram: 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer: 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure: 100,000,000+
Viber Messenger: 500,000,000+
WeC

The inflation trend continues in the Social (Facebook, Tumblr, Google+) and Productivity (Microsoft, Google, Evernote) categories. The high installs for popular app giants like Facebook make it so these categories seem more popular than they really are, and it would be hard to compete with those types of apps.

Recall that Google Play's most common genre is Tools, and the categories tend to be balanced with both productivity and gaming apps. Since we want an app that's profitable in the App Store and Google Play, exploring the Tools category is a good bet. Below is a list of the apps in tools category:

In [30]:
for app in google_free:
    if app[1] == 'TOOLS':
        print(app[0] + ':', app[5])

Google: 1,000,000,000+
Google Translate: 500,000,000+
Moto Display: 10,000,000+
Motorola Alert: 50,000,000+
Motorola Assist: 50,000,000+
Moto Suggestions ™: 1,000,000+
Moto Voice: 10,000,000+
Calculator: 100,000,000+
Device Help: 100,000,000+
Account Manager: 100,000,000+
myMetro: 10,000,000+
File Manager: 50,000,000+
My Telcel: 50,000,000+
Calculator - free calculator, multi calculator app: 10,000,000+
ASUS Sound Recorder: 10,000,000+
iWnn IME for Nexus: 5,000,000+
Samsung Max - Data Savings & Privacy Protection: 10,000,000+
Android TV Remote Service: 1,000,000+
ZenUI Help: 10,000,000+
Calculator - free calculator ,multi calculator app: 100,000+
SHAREit - Transfer & Share: 500,000,000+
ZenUI Keyboard – Emoji, Theme: 10,000,000+
Files Go by Google: Free up space on your phone: 10,000,000+
SD card backup: 1,000,000+
Nokia mobile support: 5,000,000+
File Manager -- Take Command of Your Files Easily: 10,000,000+
Samsung Calculator: 100,000,000+
Clear: 10,000,000+
Phone: 10,000,000+
HTC Lo

AZ PLAYER HD: 10,000+
AZ REMOTE CONTROL: 50,000+
Loreal - BA Makeup: 1,000+
Elif Ba Oyunu: 5,000+
BC Wildfire: 5,000+
BD tools: 10,000+
Remote for Samsung TV & BluRay Players (Read Desc): 500,000+
Vehicle Case Checker BD: 10,000+
Remote for Sony TV & Sony Blu-Ray Players MyAV: 1,000,000+
Remote for Panasonic TV+BD+AVR: 100,000+
Unseen - No Last Seen: 5,000,000+
BF-Calc: 100,000+
BG Advisor™: 1,000+
HTC Sense Input-BG: 100,000+
HTC Sense Input - BG: 10,000+
ReactNative BG Geolocation: 1,000+
Simple Photo BG Changer: 10+
BG Remover & Eraser Pro: 500+
BG-FLEET: 10+
Bg+ Call Blocker: 10,000+
ScorePal: 10,000+
GEM™: 500+
VIVA BH: 100,000+
bi-Cube Mobile Token: 1,000+
All Language Translator: 5,000,000+
Virtual DJ Sound Mixer: 500,000+
Metal Detector Pro 2015: 100,000+
BK Formula Calculator 2: 100+
Service Disabler: 100,000+
BK Formula Calculator: 100+
K-Rain BL: 1,000+
BM Online OEC Verification: 100,000+
TuenMun BM: 1,000+
BM speed test: 10+
Sam.BN: 1,000+
BP Assist: 100+
BQ Services: 1,00

EH kontrollrakendus: 5,000+
Ei Electronics AudioLINK: 10,000+
Code on the egg: 10,000+
EI Calculator: 100+
Verdant EI: 50+
ei Calc: 10+
Painel EJ SSH - INTERNET GRÁTIS: 50+
Alex Fuel Calculator for EK: 500+
Sanu Ek Pal Chain - Raid: 500+
SIM Card Info: 1,000,000+
The translator: 1,000,000+
Instabridge - Free WiFi Passwords and Hotspots: 10,000,000+
Keyboard ManMan: 10,000,000+
HTC Sense Input - EN: 5,000,000+
Bluetooth Pair: 1,000,000+
ep-liggare - Elektronisk personalliggare: 100+
Tester EP: 100+
EP Kalkulator el. energije: 100+
ES App Locker: 1,000,000+
ES File Explorer: 100,000+
ES Dark Theme for free: 1,000,000+
ES Classic Theme: 1,000,000+
ES Holo Theme: 500,000+
ES File Explorer & Manager, Locker Xplorer 2018: 1,000+
ES Summer Chill Theme for Free: 100,000+
OpenGL ES Extensions - The OpenGL Utility: 50,000+
File Ex - ES File Explorer: 1,000+
OpenGL ES CapsViewer: 10,000+
OpenGL-ES Info: 10,000+
Dr. ES PV Calculator: 500+
ES File Explorer & File Manager 2018: 5,000+
HTC Sense Inpu

Google dwarfs the other apps with more than 1 billion downloads, but the most common apps are flashlights, calculators, or keyboards, similar to the collection in the App Store `Utilites` genre. Let's look at apps in middle to higher section of popularity (5,000,000+ to 100,000,000+):

In [35]:
for app in google_free:
    if app[1] == 'TOOLS' and (app[5] == '5,000,000+' or app[5] == '10,000,000+' or app[5] == '50,000,000+'):
        print(app[0] + ':', app[5])

Moto Display: 10,000,000+
Motorola Alert: 50,000,000+
Motorola Assist: 50,000,000+
Moto Voice: 10,000,000+
myMetro: 10,000,000+
File Manager: 50,000,000+
My Telcel: 50,000,000+
Calculator - free calculator, multi calculator app: 10,000,000+
ASUS Sound Recorder: 10,000,000+
iWnn IME for Nexus: 5,000,000+
Samsung Max - Data Savings & Privacy Protection: 10,000,000+
ZenUI Help: 10,000,000+
ZenUI Keyboard – Emoji, Theme: 10,000,000+
Files Go by Google: Free up space on your phone: 10,000,000+
Nokia mobile support: 5,000,000+
File Manager -- Take Command of Your Files Easily: 10,000,000+
Clear: 10,000,000+
Phone: 10,000,000+
HTC Lock Screen: 10,000,000+
AT&T Smart Wi-Fi: 10,000,000+
Google app for Android TV: 10,000,000+
Sound Recorder: Recorder & Voice Changer Free: 10,000,000+
Remote Link (PC Remote): 10,000,000+
HTC Sense Input: 10,000,000+
App vault: 10,000,000+
My love: 10,000,000+
DuraSpeed: 10,000,000+
Digital Alarm Clock: 10,000,000+
Alarm Clock Free: 10,000,000+
Alarm Clock: 5,000,

Although we narrowed our range, there are still a lot of flashlight and calculator apps. The lack of diversity will help our storage analyzer app stand out. There don't seem to be an abundance of storage cleaner apps, but the ones that appear, such as `Clean Master Lite`, are in the 5,000,000+ to 50,000,000+ install range. Getting by with basic storage cleaner and analyzer facilities won't be enough to compete. We would need more features such as sorting apps by size or when they were last used, when they were downloaded, etc. We also need to keep in mind of different Android versions so our app runs smoothly. 

# Conclusion

For this project, we analyzed app data from Google Play Store and Apple App Store data sets, searching for a profile that would be profitable on both markets. We concluded a storage analyzer app would be the most profitable. The App Store market is more about entertainment, while the Play Store market is more balanced towards entertainment and productivity. A productivity app would certainly thrive in both environments with time. 