# Profitable App Profiles for the App Store and Google Play Markets

The goal of this project is to help developers to get an idea of what type of apps are likely to attract more users on both Google Play and the App Store.

To do this we are going to collect and analyze data about mobile apps available on Google Play and the App Store.

## Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

This time we'll try to analyze a sample of this data. Luckily, these are two data sets that seem suitable for our purpose:

* [googleplaystore.csv](https://www.kaggle.com/lava18/google-play-store-apps/home) data set containing data about approximately ten thousand Android apps from Google Play

* [AppleStore.csv](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) data set containing data about approximately seven thousand iOS apps from the App Store

Let's start by opening the two data sets and then continue with exploring the data.

In [1]:
from csv import reader

googlePlay_dataset = list(reader(open('googleplaystore.csv')))
appStore_dataset = list(reader(open('AppleStore.csv')))

To make it easier to explore the two data sets, we'll first write a function named `explore_data()` that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

* ### Exploring Google Play data set

In [3]:
explore_data(googlePlay_dataset, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13


We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful to our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

* ### Exploring App Store data set

In [4]:
explore_data(appStore_dataset, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16


We have 7197 iOS apps in this data set, and the columns that seem interesting are: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'. Not all column names are self-explanatory in this case, but details about each column can be found in the data set [https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home].

## Deleting Wrong Data

The Google Play data set has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) and we can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) outlines an error for row 10472. Let's print this row and compare it against the header and some another rows that are correct.

In [5]:
print(googlePlay_dataset[0])
print('\n')
explore_data(googlePlay_dataset, 10472, 10475)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']




We can see that row 10473 (10472 if exclude header row) has missing 'Category' column and a column shift happened for next columns.
We'll delete this row then.

In [6]:
print(len(googlePlay_dataset))
del googlePlay_dataset[10473]
print(len(googlePlay_dataset))

10842
10841


## Removing Duplicate Entries

Now we want to check if there are any duplicate entries in those data sets. Let's loop through the Google Play data set and collect any non-unique app names.

In [7]:
duplicate_apps_android = []
unique_apps_android = []

for app in googlePlay_dataset:
    name = app[0]
    if name in unique_apps_android:
        duplicate_apps_android.append(name)
    else:
        unique_apps_android.append(name)

In total, there are 1,181 cases where an app occurs more than once:

In [8]:
print('Number of duplicate apps:', len(duplicate_apps_android))

Number of duplicate apps: 1181


Some of their names:

In [9]:
print('Examples of duplicate apps:\n', duplicate_apps_android[:15])

Examples of duplicate apps:
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app.
We don't want to do it randomly, we need to set a criterion to chose which row we want to keep. Let's see, how different are rows shearing the same name. For example, we can see from printed examples that one duplicated apps is *Slack*, let's check it.

In [10]:
print(googlePlay_dataset[0], '\n')
for app in googlePlay_dataset:
    name = app[0]
    if name == 'Slack':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


We can see that *Slack* entries vary by *Reviews' field. The bigger number of reviews means more resent data (and more reliable rating). Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

To do that, we will:

* Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app
* Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

In [11]:
reviews_max = {}

for app in googlePlay_dataset[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews

Lets inspect the length of the dictionary to make sure it was created as expected. As we found before, number of duplicate apps is 1181.

In [12]:
print('Expected length: ', len(googlePlay_dataset[1:]) - 1181)
print('Actual length: ', len(reviews_max))

Expected length:  9659
Actual length:  9659


Now let's add unique rows with max reviews to the new dataset.
`already_added` list is needed because there might be few rows with the same name and same (highest) number of reviews.

Now, let's use the reviews_max dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:

* We start by initializing two empty lists, android_clean and already_added.
* We loop through the `googlePlay_dataset`, and for every iteration:
    - We isolate the name of the app and the number of reviews.
    - We add the current row (app) to the android_clean list, and the app name (name) to the already_cleaned list if:
        - The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
        - The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [13]:
android_clean = [] #To store new cleaned dataset
already_added = [] #To store app names of already added apps

for app in googlePlay_dataset[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

To ensure everything went as expected:

In [14]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


We can see that our cleaned dataset have 9659 rows, as expected.

## Removing Non-English Apps

After we explore the data sets enough, we noticed the names of some of the apps suggest they are not directed toward an English-speaking audience. Below, we see a couple of examples from both data sets:

In [15]:
print(appStore_dataset[814][1])
print(appStore_dataset[6732][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested in keeping these kind of apps, so we'll remove them. One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, \*, /, etc.) - all in 0 - 127 range ASCII.

We built this function below, and we use the built-in ord() function to find out the corresponding ASCII number of each character.

In [16]:
def is_english(str):
    for c in str:
        if ord(c) > 127:
            return False
    return True

Let's check how function works with some examples.

In [17]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


The function seems to work fine, but some English app names use emojis or other symbols (™, etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form.

To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:

In [18]:
def is_english(str):
    counter = 0
    for c in str:
        if ord(c) > 127:
            counter += 1
        if counter > 3:
            return False
    return True

Let's try the new function to check whether these given app names are detected as English or non-English:

In [19]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


The function is still not perfect, and very few non-English apps might get past our filter, but this seems good enough at this point in our analysis.

Below, we use the `is_english()` function to filter out the non-English apps for both data sets:

In [20]:
android_english = []
ios_english = []

for app in android_clean:
    if is_english(app[0]):
        android_english.append(app)
        
for app in appStore_dataset[1:]:
    if is_english(app[1]):
        ios_english.append(app)

In [21]:
explore_data(android_english, 0, 3, True)
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

We can see that 9614 android and 6183 apps left in our data sets.

## Isolating the free apps

We are only interested to build apps that are free to download, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps, and we'll need to isolate only the free apps for our analysis. Below, we isolate the free apps for both our data sets.

In [22]:
android_final = []
ios_final = []

for app in android_english:
    if app[7] == '0':
        android_final.append(app)
        
for app in ios_english:
    if app[4] == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8864
3222


8864 android and 3222 ios apps are left for analysis.

## Most Common Apps by Genre

Our goal is to determine the kinds of apps that are likely to attract more users. Because we plan to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.
Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

We'll build two functions we can use to analyze the frequency tables:

* One function to generate frequency tables that show percentages
* Another function that we can use to display the percentages in a descending order

In [23]:
def freq_table(dataset, index):
    f_table = {}
    total = 0
    for row in dataset:
        total += 1
        if row[index] in f_table:
            f_table[row[index]] += 1
        else:
            f_table[row[index]] = 1
    percentage_table = {}
    for key in f_table:
        percentage_table[key] = f_table[key] / total * 100
    return percentage_table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We can assume our columns of interest are prime_genre of App Store data set and Genres and Category of Google Play data set.
Next let's display frequency tables for those columns with `display_table` function. 

* Frequency table for the prime_genre column of the App Store data set:

In [24]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


* Frequency table for the Category column of the Google Play data set:

In [25]:
display_table(android_final, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

* Frequency table for the Genres column of the Google Play data set:

In [26]:
display_table(android_final, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

## Most Popular Apps by Genre on the App Store

- Games appeared to be the most common genre among free English apps in App Store with 58.16% share.
- The runner-up genre is Entrtainment with only 7.88%. This genre is close to Games and both of them generally serves the same purpose - entertainment.

Let's compare frequencies of 2 major groops: apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) with those more for entertainment (games, photo and video, social networking, sports, music).

In [27]:
ios_genres = freq_table(ios_final, -5)

ios_entertainment_frequency = ios_genres['Games'] + \
    ios_genres['Photo & Video'] + ios_genres['Social Networking'] + \
    ios_genres['Sports'] + ios_genres['Music']
    
ios_practical_frequency = ios_genres['Education'] + \
    ios_genres['Shopping'] + ios_genres['Utilities'] + \
    ios_genres['Productivity'] + ios_genres['Lifestyle']  
    
print('Entertainment :', ios_entertainment_frequency)
print('Practical :', ios_practical_frequency)

Entertainment : 70.60831781502173
Practical : 12.104283054003725


For the App Store market we can recommend to build app of Entertainment profile rather than Practical profile.

## Most Popular Apps by Category and Genre on the Google Play

Unlike the App Store, the most common genre in the Google Play is Tools - 8.45, following by Entertainment - 6.07, Education - 5.35, Business - 4.59 and Productivity - 3.89.
It looks like unlike the App Store, Practical profile is here prevailing or at least on pair with Entertainment profile.
The frequency tables we generated reveal the most frequent app genres. But to discover what genres have the most users (and make more precise recommendation) we need further analysis.

In [28]:
ios_primeGenre_freqTable = freq_table(ios_final, -5)

for genre in ios_primeGenre_freqTable:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            user_ratings_number = float(app[5])
            total += user_ratings_number
            len_genre += 1
    average_user_ratings_number = int(total / len_genre)
    print(genre, ' : ', average_user_ratings_number)

Weather  :  52279
Entertainment  :  14029
Music  :  57326
Navigation  :  86090
Sports  :  23008
Finance  :  31467
Games  :  22788
Social Networking  :  71548
Reference  :  74942
Productivity  :  21028
Utilities  :  18684
Catalogs  :  4004
News  :  21248
Book  :  39758
Business  :  7491
Education  :  7003
Health & Fitness  :  23298
Lifestyle  :  16485
Medical  :  612
Photo & Video  :  28441
Food & Drink  :  33333
Travel  :  28243
Shopping  :  26919


The results above shows, that despite the majority of apps are games, they are not as popular (22788 user ratings for average game) as some other types of applications. The highest numbers of user ratings have apps of genres like: Navigation - 86090, Reference - 74942, Social Networking - 71548, Music - 57326, Weather - 52279. Generally, apps of this types expected to have biggest numbers of users.

## Calculating the average number of installs per app genre for the Google Play data set

Starting by generating a frequency table for the Category column of the Google Play data set to get the unique app genres:

In [29]:
android_Category_freqTable = freq_table(android_final, 1)
print(android_Category_freqTable)

{'PERSONALIZATION': 3.3167870036101084, 'SOCIAL': 2.6624548736462095, 'HOUSE_AND_HOME': 0.8235559566787004, 'PHOTOGRAPHY': 2.944494584837545, 'PRODUCTIVITY': 3.892148014440433, 'MEDICAL': 3.531137184115524, 'COMMUNICATION': 3.2378158844765346, 'LIBRARIES_AND_DEMO': 0.9363718411552346, 'BOOKS_AND_REFERENCE': 2.1435018050541514, 'AUTO_AND_VEHICLES': 0.9250902527075812, 'BUSINESS': 4.591606498194946, 'DATING': 1.861462093862816, 'EDUCATION': 1.1620036101083033, 'ART_AND_DESIGN': 0.6430505415162455, 'ENTERTAINMENT': 0.9589350180505415, 'BEAUTY': 0.5979241877256317, 'LIFESTYLE': 3.9034296028880866, 'COMICS': 0.6204873646209386, 'GAME': 9.724729241877256, 'EVENTS': 0.7107400722021661, 'FOOD_AND_DRINK': 1.2409747292418771, 'NEWS_AND_MAGAZINES': 2.7978339350180503, 'TRAVEL_AND_LOCAL': 2.33528880866426, 'HEALTH_AND_FITNESS': 3.0798736462093865, 'VIDEO_PLAYERS': 1.7937725631768955, 'PARENTING': 0.6543321299638989, 'TOOLS': 8.461191335740072, 'FAMILY': 18.907942238267147, 'FINANCE': 3.70036101083

Above we came up with an app profile recommendation for the App Store based on the number of user ratings. We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.). However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

In [30]:
for category in android_Category_freqTable:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            installs_number = app[5]
            installs_number = installs_number.replace('+', '')
            installs_number = float(installs_number.replace(',', ''))
            total += installs_number
            len_category += 1
    average_installs_number = int(total / len_category)
    print(category, ' : ', average_installs_number)

PERSONALIZATION  :  5201482
SOCIAL  :  23253652
HOUSE_AND_HOME  :  1331540
PHOTOGRAPHY  :  17840110
PRODUCTIVITY  :  16787331
MEDICAL  :  120550
COMMUNICATION  :  38456119
LIBRARIES_AND_DEMO  :  638503
BOOKS_AND_REFERENCE  :  8767811
AUTO_AND_VEHICLES  :  647317
BUSINESS  :  1712290
DATING  :  854028
EDUCATION  :  1833495
ART_AND_DESIGN  :  1986335
ENTERTAINMENT  :  11640705
BEAUTY  :  513151
LIFESTYLE  :  1437816
COMICS  :  817657
GAME  :  15588015
EVENTS  :  253542
FOOD_AND_DRINK  :  1924897
NEWS_AND_MAGAZINES  :  9549178
TRAVEL_AND_LOCAL  :  13984077
HEALTH_AND_FITNESS  :  4188821
VIDEO_PLAYERS  :  24727872
PARENTING  :  542603
TOOLS  :  10801391
FAMILY  :  3695641
FINANCE  :  1387692
SPORTS  :  3638640
SHOPPING  :  7036877
WEATHER  :  5074486
MAPS_AND_NAVIGATION  :  4056941


The highest numers of installs are in those categories:
- Communication: 38'456'119
- Video Players: 24'727'872
- Social: 23'253'652

## Conclusions

Our goal was to find what type of app can attract maximum users on both App Store and Google Play.

The most popular genres on App Store: Navigation - 86090, Reference - 74942, Social Networking - 71548.
The most popular categories on Google Play: Communication: 38'456'119, Video Players: 24'727'872, Social: 23'253'652.

It appeared that communication/social networking is the most popular genre/category on both markets.
There are also a number of strong competitors in this niche (Facebook, etc), so we need further analysis depending of our app's budget.