# Profitable App Profiles for the App Store and Google Play Markets
---
In the project, I am acting as a data analyst for a company that builds Android and iOS mobile apps. Our apps are available on Google Play and the App Store.

We only build apps that are free to download, and our main source of revenue consists of in-app ads.  This means our revenue for any given app is mostly influenced by the number of users who use our app - the more users that see and engage with the ads, the better. We will also be directing our apps at an English-speaking audience.

The goal of this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

## Opening and Exploring the Data
---
As of September 2018, there were roughly 2 million iOS apps accessible on the App Store, and 2.1 million Android apps on Google Play.

Gathering data for more than 4 million apps requires a lot of time and cash, so we'll attempt to use a sample of the data to analyze. To avoid spending time on gathering new data ourselves, we should initially attempt to check whether we can locate any pertinent existing data at no expense. Fortunately, these are two data sets that appear to be reasonable for our objectives:

* [A data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data on roughly 10,000 Android apps from Google Play; the data was gathered in August 2018.
* [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) containing information on roughly 7,000 iOS apps from the App Store; the data was gathered in July 2017.

We'll begin by opening these two data sets.

In [1]:
from csv import reader

apple_opened_file = open('AppleStore.csv')
apple_read_file = reader(apple_opened_file)
apple_dataset = list(apple_read_file)
apple_header = apple_dataset[0]
apple_data = apple_dataset[1:]


google_opened_file = open('googleplaystore.csv')
google_read_file = reader(google_opened_file)
google_dataset = list(google_read_file)
google_header = google_dataset[0]
google_data = google_dataset[1:]

To make them simpler for us to investigate, we will make a function named `explore_data()` that we can use to print easily understood rows.

In [2]:
# dataset is expected to be a list of lists
# start and end are expected to be ints
# rows_and_columns is a defaulted bool for getting the table size
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    
    if rows_and_columns:
        print('Number of rows: ', len(dataset))
        print('Number of columns: ', len(dataset[0]))

Now let's investigate the Apple Store data set.

In [3]:
print(apple_header)
print('\n')
explore_data(apple_data, 1, 4, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows:  7197
Number of columns:  16


We see now that the iOS data set has 7197 apps and 16 columns. The columns that seem useful right now are `track_name`, `'currency'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'`. You can find the documentation [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) to better understand the column titles.

Now let's look at the Google Play data set.

In [4]:
print(google_header)
print('\n')
explore_data(google_data, 1, 4, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:  10841
Number of columns:  13


The Google data set contains more apps at 10841 apps and but less columns at 13 columns. The columns that seem interesting are `'App'`, `'Category'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, and `'Genre'`.

## Deleting Wrong Data
---
Prior to starting our analysis, we have to ensure the data we analyze is related to our company, otherwise our results would be inaccurate. This means we have to: 

* Recognize incorrect information, and change or delete it. 
* Recognize duplicates, and delete the them. 

First we will investigate the error reported in [this](https://www.kaggle.com/lava18/google-play-store-apps/discussion) discussion post on the Google Play data set.

In [5]:
print(google_header)
print("\n")
explore_data(google_data, 10472, 10473)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




Here we see that this row is missing the `'Rating'` column because the the value is `'19'` and the highest possible value is `'5'`.

We will go ahead and remove this row and confirm that it has been removed.

In [6]:
explore_data(google_data, 10472, 10473)
del google_data[10472]
explore_data(google_data, 10472, 10473)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




[This](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion) discussion section on the App Store data set doesn't seem to show any errors in the data that need to be removed.

## Removing Duplicate Entries
---
[This](https://www.kaggle.com/lava18/google-play-store-apps/discussion) discussion section on the Google Play data set states that a known duplicated app is Instagram. Lets check to see if this is true.

In [7]:
for app in google_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We can see that the Instagram app has 4 duplicate entries. Now that we confirmed that there is this duplication in the data set, lets check to see how many duplicate apps appear in the data set.

In [8]:
duplicate = []
unique = []

for app in google_data:
    name = app[0]
    
    if name in unique:
        duplicate.append(name)
    else:
        unique.append(name)

print('Number of duplicate rows: ' + str(len(duplicate)))
print('\n')
print('Examples of duplicated apps: ' + str(duplicate[:15]))

Number of duplicate rows: 1181


Examples of duplicated apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Now we see that there are going to be 1181 duplicate entries in our data set. We need to remove the duplicates but we don't want to do it at random. Looking at the Instagram example, we can see that the unique value between each of them is the number of reviews. We can determine which was the most recent retrieval of app data by whichever row has the greatest number of reviews.

Lets see how many apps we should expect to see after we are done removing the duplicates.

In [9]:
expected_length = str(len(google_data) - len(duplicate))
print('Expected length after duplication removal: ', expected_length)

Expected length after duplication removal:  9659


So we see that we should have 9659 entries left after removing the duplicates.

First we will create a dictionary to hold the name of each app paired with its highest review count.

In [10]:
reviews_max = {}

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Size of review_max :', len(reviews_max))

Size of review_max : 9659


We see that the dictionary has the same number of entries as our expected amount so we should be ready to remove the duplicates.

Next we will create a new data set that will have only the highest reviewed occurance of the app. We do this by checking each app against our dictionary to see if the app matches our highest number of reviews. We also check to see if it is in our `already_added` list to make sure that if multiple duplicates have the same number of reviews, they won't be added twice.

In [11]:
google_clean = []
already_added = []

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        google_clean.append(app)
        already_added.append(name)
    
print('Size of google_clean: ', len(google_clean))
print('\n')

for app in google_clean:
    name = app[0]
    if name == 'Instagram':
        print(app)

Size of google_clean:  9659


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We again matched our expected amount of entries. We have also rerun our initial check and see that there is only one entry for Instagram and it is the one with the highest number of reviews.

## Removing Non-English Apps
---

Since our company is aiming their apps at English speakers, we need to remove any apps in each data set that are not aimed at English speakers.

First, we will create a function called `isEnglish()` to check to see if a string contains ASCII characters that are non-English, which will be any character above 127 in Unicode. We will use the `ord()` built in function to convert the characters in the strings to Unicode. Since characters like the trademare symbol and emojis fall outside of this criteria, we will also allow for up to 3 characters that fall outside of ASCII characters so as to not lose viable data.

In [12]:
def isEnglish(string):
    count = 0
    for letter in string:
        if ord(letter) > 127:
            count += 1
    if count > 3:
        return False
    return True

# Testing
print(isEnglish('Instagram'))
print(isEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(isEnglish('Docs To Go™ Free Office Suite'))
print(isEnglish('Instachat 😜'))

True
False
True
True


Now we will use this function to remove the apps that are non-English from both data sets.

In [13]:
google_data_english = []
apple_data_english = []

for app in google_clean:
    name = app[0]
    if isEnglish(name):
        google_data_english.append(app)

for app in apple_data:
    name = app[1]
    if isEnglish(name):
        apple_data_english.append(app)

print('Total Google Play Apps: ', len(google_clean))
print('Total English Google Play Apps: ', len(google_data_english))
print('Total Non-English Google Apps Removed: ', len(google_clean) - len(google_data_english))
print('\n')
print('Total App Store Apps: ', len(apple_data))
print('Total English App Store Apps: ', len(apple_data_english))
print('Total Non-English App Store Apps Removed: ', len(apple_data) - len(apple_data_english))

Total Google Play Apps:  9659
Total English Google Play Apps:  9614
Total Non-English Google Apps Removed:  45


Total App Store Apps:  7197
Total English App Store Apps:  6183
Total Non-English App Store Apps Removed:  1014


So we removed 45 Non-English apps from the Google Play data set and 1014 from the App Store data set.

## Isolating the Free Apps
---

We also mentioned that our company specializes in free apps, so we need to isolate the free apps in each of these data sets.

We will do that by taking our previous english only data sets and running them through one more loop that checks if they are a free app.

In [14]:
google_data_final = [] 
apple_data_final = []

for app in google_data_english:
    price = app[7]
    if price == '0':
        google_data_final.append(app)

for app in apple_data_english:
    price = app[4]
    if price == '0.0':
        apple_data_final.append(app)

print('Total English Google Play Apps: ', len(google_data_english))
print('Total Free English Google Play Apps: ', len(google_data_final))
print('Total Paid Google Apps Removed: ', len(google_data_english) - len(google_data_final))
print('\n')
print('Total English App Store Apps: ', len(apple_data_english))
print('Total Free English App Store Apps: ', len(apple_data_final))
print('Total Paid App Store Apps Removed: ', len(apple_data_english) - len(apple_data_final))

Total English Google Play Apps:  9614
Total Free English Google Play Apps:  8864
Total Paid Google Apps Removed:  750


Total English App Store Apps:  6183
Total Free English App Store Apps:  3222
Total Paid App Store Apps Removed:  2961


We can see that we are down to 8864 Google Play apps and 3222 App Store Apps that are free and directed at an English audience. So now we finally have our data sets cleaned, lets move on to the analysis.

## Most Common Apps by Genre
---

As we mentioned in the beginning, we are wanting to produce an app on both the Google Play Store and the App Store. To minimize the risks and costs, we have a validation strategy in place to guide us: 

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

So since we want to make an app in both markets, we will need to find a type of app that works on both the Google Play Store and the App Store. For this we will build frequency tables for both stores based on the genre columns in both the data sets and the Category column from the Google Play data set.

We'll build two functions we can use to analyze the frequency tables:

* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order.

In [15]:
def freq_table(dataset, index):
    table = {}
    
    for row in dataset:
        key = row[index]
        if key in table:
            table[key] += 1
        else:
            table[key] = 1
    
    for key in table:
        value = table[key]
        total = len(dataset)
        percent = (value / total) * 100
        table[key] = percent
    
    return table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### App Store `prime_genre` Analysis

In [16]:
display_table(apple_data_final, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that the most common genre in the Free English App Store Apps are `Games` followed by `Entertainment`. `Games` is the dominant app in this market by holding 60% of the total apps. We can also see that most of the apps are built for entertainment (`Games`, `Entertainment`, and `Photo & Video`) where as practical apps (`Education`, `Shopping`, and `Utilities`) are less represented. While it may seem at first glance that a Gaming app is what we should be in our app profile, we can't assume that just because they make up the majority of apps that they also have the most users.

### Google Play `Catagory` Analysis

In [17]:
display_table(google_data_final, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

We can see here that the most popular catagory for Free English Google Play Apps is `Family` followed by `Game`. We can see that these two hold the top 27% of the market. We can also see that in comparison with the App Store, Google Play's market is more spread out among several catagories. Of the top catagories, more of the apps belong to practical uses that in the Apps Store.

### Google Play `Genres` Analysis

In [18]:
display_table(google_data_final, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The immediate difference in the Google Play `Genre` frequency table is that towards the bottom half of the list, we can see that apps are allowed to have multiple genres. This may muddy our results as we move forward so we will use the `Category` table from now on. We can see that our analysis from `Category` table is confirmed by the amount of practical genres that are reprsented.

While we are unable to present a recommended profile for the two markets yet, we can say that the App Store is dominated by more fun apps while the Google Play Store has a more diverse offering.

## Most Popular Apps by Genre on the App Store
---

The previous frequency tables gave us information on the which genres dominate the two markets. Now we want to find out which genre are the most popular with users. For the Google Play data we can easily use the `Installs` section, but the App Store is missing this data. We will use the `rating_count_tot` as a proxy.

Now we will calculate the average number of user ratings for each genre in the App Store.

In [19]:
prime_genre_freq_table = freq_table(apple_data_final, 11)

for genre in prime_genre_freq_table:
    total = 0
    len_genre = 0
    
    for row in apple_data_final:
        genre_app = row[11]
        if genre_app == genre:
            tot_user_ratings = float(row[5])
            total += tot_user_ratings
            len_genre += 1
    
    avg_tot_user_ratings = total / len_genre
    
    print(genre, ': ', avg_tot_user_ratings)          

News :  21248.023255813954
Health & Fitness :  23298.015384615384
Education :  7003.983050847458
Medical :  612.0
Book :  39758.5
Photo & Video :  28441.54375
Productivity :  21028.410714285714
Navigation :  86090.33333333333
Business :  7491.117647058823
Utilities :  18684.456790123455
Food & Drink :  33333.92307692308
Entertainment :  14029.830708661417
Music :  57326.530303030304
Travel :  28243.8
Reference :  74942.11111111111
Catalogs :  4004.0
Weather :  52279.892857142855
Shopping :  26919.690476190477
Lifestyle :  16485.764705882353
Social Networking :  71548.34905660378
Games :  22788.6696905016
Finance :  31467.944444444445
Sports :  23008.898550724636


We can see above that Navigation has the highest number of ratings on average but Google Maps is a popular app for Navigation so let's see how weighted this field is with popular apps.

In [20]:
for app in apple_data_final:
    if app[11] == 'Navigation':
        print(app[1], ': ', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic :  345046
Google Maps - Navigation & Transit :  154911
Geocaching® :  12811
CoPilot GPS – Car Navigation & Offline Maps :  3582
ImmobilienScout24: Real Estate Search in Germany :  187
Railway Route Search :  5


We can see that the two most dominating apps on this list are Google Maps and Waze. The navigation app market would not provide much opputunity for advertisement. This would make it difficult to break in to the Navigation map market. Let's investigate our next option: Reference.

In [21]:
for app in apple_data_final:
    if app[11] == 'Reference':
        print(app[1], ': ', app[5])

Bible :  985920
Dictionary.com Dictionary & Thesaurus :  200047
Dictionary.com Dictionary & Thesaurus for iPad :  54175
Google Translate :  26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran :  18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition :  17588
Merriam-Webster Dictionary :  16849
Night Sky :  12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) :  8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools :  4693
GUNS MODS for Minecraft PC Edition - Mods Tools :  1497
Guides for Pokémon GO - Pokemon GO News and Cheats :  826
WWDC :  762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free :  718
VPN Express :  14
Real Bike Traffic Rider Virtual Reality Glasses :  8
教えて!goo :  0
Jishokun-Japanese English Dictionary & Translator :  0


We can see that the Bible app and the Dictionary.com app dominate this section of the market. If we create a simliar app and analyzed further to see what features could be provided that these competitors lack, then there might be some room to work here. Even better, we could take a popular book or subject and create our own unique reference for ourselves so as to corner that market for ourselves. Lastly we will check the Social Networking option.

In [22]:
for app in apple_data_final:
    if app[11] == 'Social Networking':
        print(app[1], ': ', app[5])

Facebook :  2974676
Pinterest :  1061624
Skype for iPhone :  373519
Messenger :  351466
Tumblr :  334293
WhatsApp Messenger :  287589
Kik :  260965
ooVoo – Free Video Call, Text and Voice :  177501
TextNow - Unlimited Text + Calls :  164963
Viber Messenger – Text & Call :  164249
Followers - Social Analytics For Instagram :  112778
MeetMe - Chat and Meet New People :  97072
We Heart It - Fashion, wallpapers, quotes, tattoos :  90414
InsTrack for Instagram - Analytics Plus More :  85535
Tango - Free Video Call, Voice and Chat :  75412
LinkedIn :  71856
Match™ - #1 Dating App. :  60659
Skype for iPad :  60163
POF - Best Dating App for Conversations :  52642
Timehop :  49510
Find My Family, Friends & iPhone - Life360 Locator :  43877
Whisper - Share, Express, Meet :  39819
Hangouts :  36404
LINE PLAY - Your Avatar World :  34677
WeChat :  34584
Badoo - Meet New People, Chat, Socialize. :  34428
Followers + for Instagram - Follower Analytics :  28633
GroupMe :  28260
Marco Polo Video Walki

Finally, we see here that Social Networking shows a wealth of reviews. Social Networking boasts some of the highest numbers for our 3 sectors that we have looked at in the App Store. If we were able to create an app that worked in the Social Networking sector, we might have a chance at higher profits.

Out of what we have looked at I would recommend that the company look into making a Social Networking app. This market gets a lot of facetime from the user and we also don't have to worry so much on creating content. Our users would create most of the content for us. As has already been done by other apps, we can use strategic advertising based on what our users are interested in. There would be less guess work on this because the users allow us to see there interests based on their content creation.

Next, we will analyze the Google Play market.

## Most Popular Apps by Genre on Google Play
---

Lets take a quick look of the frequencys values for the Google Play apps.

In [23]:
display_table(google_data_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


The first notable things is that the install numbers lack precision. Luckily for us, we don't need very precise numbers for our analysis since we are looking at the averages across the categories. We will also need to clean up those values by removing the '+' and the commas.

In [24]:
category_freq_table = freq_table(google_data_final, 1)

for category in category_freq_table:
    total = 0
    len_category = 0
    
    for row in google_data_final:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            installs = installs.replace('+', '')
            installs = installs.replace(',', '')
            installs = float(installs)
            total += installs
            len_category += 1
    
    avg_installs = total / len_category
    print(category, ': ', avg_installs)

BOOKS_AND_REFERENCE :  8767811.894736841
HEALTH_AND_FITNESS :  4188821.9853479853
COMMUNICATION :  38456119.167247385
WEATHER :  5074486.197183099
COMICS :  817657.2727272727
PHOTOGRAPHY :  17840110.40229885
FOOD_AND_DRINK :  1924897.7363636363
DATING :  854028.8303030303
MAPS_AND_NAVIGATION :  4056941.7741935486
FAMILY :  3695641.8198090694
MEDICAL :  120550.61980830671
SOCIAL :  23253652.127118643
SPORTS :  3638640.1428571427
EDUCATION :  1833495.145631068
TRAVEL_AND_LOCAL :  13984077.710144928
ART_AND_DESIGN :  1986335.0877192982
FINANCE :  1387692.475609756
EVENTS :  253542.22222222222
PRODUCTIVITY :  16787331.344927534
SHOPPING :  7036877.311557789
AUTO_AND_VEHICLES :  647317.8170731707
BUSINESS :  1712290.1474201474
TOOLS :  10801391.298666667
BEAUTY :  513151.88679245283
HOUSE_AND_HOME :  1331540.5616438356
ENTERTAINMENT :  11640705.88235294
NEWS_AND_MAGAZINES :  9549178.467741935
PERSONALIZATION :  5201482.6122448975
PARENTING :  542603.6206896552
LIBRARIES_AND_DEMO :  638503.7

Our most popular category is communication, but we have seen that a social app will work will in the App Store. Social ranks in at 3rd on the list, so let's dive into it further to see if there is potentianl.

In [25]:
for row in google_data_final:
    if row[1] == 'SOCIAL':
        print(row[0], ': ', row[5])

Facebook :  1,000,000,000+
Facebook Lite :  500,000,000+
Tumblr :  100,000,000+
Social network all in one 2018 :  100,000+
Pinterest :  100,000,000+
TextNow - free text + calls :  10,000,000+
Google+ :  1,000,000,000+
The Messenger App :  1,000,000+
Messenger Pro :  1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus :  1,000,000+
Telegram X :  5,000,000+
The Video Messenger App :  100,000+
Jodel - The Hyperlocal App :  1,000,000+
Hide Something - Photo, Video :  5,000,000+
Love Sticker :  1,000,000+
Web Browser & Fast Explorer :  5,000,000+
LiveMe - Video chat, new friends, and make money :  10,000,000+
VidStatus app - Status Videos & Status Downloader :  5,000,000+
Love Images :  1,000,000+
Web Browser ( Fast & Secure Web Explorer) :  500,000+
SPARK - Live random video chat & meet new people :  5,000,000+
Golden telegram :  50,000+
Facebook Local :  1,000,000+
Meet – Talk to Strangers Using Random Video Chat :  5,000,000+
MobilePatrol Public Safety App :  1,000,000+
💘 Whats

We can see that there is a a strong market here with the top apps bringing in over 1 billion downloads. While we won't expect this kind of success, with a little inovation, the social market could be the best entrance into both the App Store and the Google Play markets.

## Conclusion
---

In this project, I was attempting to give a app profile recommendation for a company that makes free apps for english speaker and plans to profit from in app advertising. We collected data from the two largest app stores, the App Store and Google Play. We cleaned the data to remove an errors or unrelated data before finally analyzing the data. We found that a popular section of both markets is the social market. With this information, we came up with the recommendation to create a social app with some inovation to separate itself from the crowd.