# Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## Opening and Exploring the Data:

**Our data sets:**

- For our Google Play Store data set, we will be using [this data set](https://www.kaggle.com/lava18/google-play-store-apps) prepared by machine learning engineer [Lavanya Gupta](https://www.kaggle.com/lava18).

- Four our Apple Store data set, we will be using [this data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) prepared by research engineer [Ramanathan](https://www.kaggle.com/ramamet4).

The function below will be used to display a limited number of rows in our data set and it's size.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

The function below will be used to open our data sets and ready to be cleaned and analyzed.

In [2]:
def open_file(file_name):
    from csv import reader 
    opened_file = open(file_name)
    read_file = reader(opened_file)
    data_set = list(read_file)
    return data_set

**Overview of Google Play Store Data Set:**

By opening our Google Play Store data set, we can see that it has a total of 10,841 entries (Not including the header) and 13 columns.

We will be primarily focusing on this particular columns:
1. 'App' - Contains the name of the application.
2. 'Category' - Contains the category of the application.
3. 'Reviews' - Tells us the number of times the application has been reviewed.
4. 'Installs' - Tells us the number of times the application has been installed.
5. 'Price' - Tells us the price of the application.
6. 'Genres' - Tells us the genre of the application.

In [3]:
google_data_set = open_file('googleplaystore.csv')
explore_data (google_data_set, 0, 4, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


**Overview of Apple Store Data Set:**

By opening our Apple Store data set, we can see that it has a total of 7,197 entries (Not including the header) and 16 columns.

We will be primarily focusing on this particular columns:
1. 'track_name' - Contains the name of the application.
2. 'price' - Tells us the price of the application.
3. 'rating_count_tot' - Tells us how many times the application have been rated.
4. 'prime_genre' - Tells us how the genre of the application.

In [4]:
apple_data_set = open_file('AppleStore.csv')
explore_data (apple_data_set, 0, 4, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7198
Number of columns: 16


## Data Cleaning:

**Invalid Entry:**

By exploring our Google Play Store data set, we can see that on row 10,473 there is an application named 'Life Made WI-Fi Touchscreen Photo Frame', and if we count the columns associated with this application we can see that it only has 12 columns opposed to the regular 13 columns.

This may be a mistake on the data entry, so we'll remove it to make sure it doesn't affect the result of our analysis.

In [5]:
print (google_data_set[10473])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


The code below removes the invalid row.

In [6]:
del google_data_set[10473]

**Duplicate Entries:**

Upon obervation of the Google Play Store data set, we have found that there are numerous instances where there are duplicate entries of specific applications. One instance is the application `Instagram` that appears 4 times in our data set. *(Output of code below shows intances when the app 'Instagram' appeared in our Data Set)*

In [7]:
for app in google_data_set[1:]:
    name = app[0]
    if name == 'Instagram':
        print(app)
        print ('\n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




Using the code below, we were able to discern that there is a total of **1181** duplicate entries in the Google Play data set, and a total of **9659** unique entries.

In [8]:
duplicate_apps = []
unique_apps = []

for app in google_data_set[1:]:
    app_name = app[0]
    
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)
        
print('Number of Duplicate Apps:', len(duplicate_apps))
print('Number of Unique Apps:', len(unique_apps))

Number of Duplicate Apps: 1181
Number of Unique Apps: 9659


We are not going to remove these duplicates randomly, but instead use the 4th column of our Google Play Store data set which is the 'Reviews' column. This displays the total number of reviews for the particular application to our advantage. We will only use the entry with the most number of reviews and ignore other duplicate entries. Reason being the higher the number of reviews, the more recent the data should be.

The code below checks for the entry that has the most reviews and stores it in a dictionary called `reviews_max`.

In [9]:
reviews_max = {}
for app in google_data_set[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

We then use `reviews_max` to identify which entry will be added to our newly cleaned Google Play Store data set `android_clean`.

In [10]:
android_clean = []
already_added = []

for app in google_data_set[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

The code below outputs the first 3 rows of our cleaned Google Play Store data set that does not contain any duplicate entries. It has a total of **9659** rows and **13** columns. 

In [11]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


**Removing None English Applications:**

In the code below we created a function that detects if the name of the application contains characters that are not commonly used in the English text. And to minimize the impact of data loss when checking these application names, the code only marks an application as non-English if it detect 3 or more characters that are not commonly used in the English Text. This is to prevent English applications such as `Instachat 😜` and `Docs To Go™ Free Office Suite` that have the `😜` and `™` characters respectively, getting marked as non-English applications.

In [12]:
def non_english_checker (app_name):
    non_english_character = 0
    for character in app_name:
        ord_value = ord(character)
        
        if ord_value > 127:
            non_english_character += 1
            
            if non_english_character == 3:
                return True
        
    return False

By calling our `non_english_checker` function below, we were able to easily identify that the application `Instachat 😜`, and `Instagram` as English applications, while the application `'爱奇艺PPS -《欢乐颂2》电视剧热播'` falls out of bounds our scope.

In [13]:
print('Instagram - ' , non_english_checker('Instagram'))
print('Instachat 😜 -' , non_english_checker('Instachat 😜'))
print('爱奇艺PPS -《欢乐颂2》电视剧热播 -' , non_english_checker('爱奇艺PPS -《欢乐颂2》电视剧热播'))

Instagram -  False
Instachat 😜 - False
爱奇艺PPS -《欢乐颂2》电视剧热播 - True


The code below uses our `non_english_checker` function in both our Google Play Store data set and Apple Store data set to get only the English applications in their respective store.

In [14]:
android_english = []
ios_english = []

for app in android_clean:
    app_name = app[0]
    
    if non_english_checker(app_name) == False:
        android_english.append(app)
        
for app in apple_data_set:
    app_name = app[1]
    
    if non_english_checker(app_name) == False:
        ios_english.append(app)

The code below displays the first 3 rows of Google Play Store data set that only contains English Applications. It has a total of **9597** applications.

In [15]:
explore_data(android_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9597
Number of columns: 13


The code below displays the first 3 rows of Apple Store data set that only contains English Applications. It has a total of **6156** applications.

In [16]:
explore_data(ios_english, 1, 4, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6156
Number of columns: 16


**Isolating Free Applications:**

Using the code below, we isolated the free apps from our data sets and put them in our variables `google_free_apps` and `ios_free_apps`. We do this by using the 8th column of our Google Play Store data set which is named 'Price', and see if it's value is equal to '0'. We do the same with the 5th column of our Apple Store data set which is named 'price' and check for the value of '0.0'.

After running the code we found out that Google Play Store has **8848** free applications, and Apple Store has **3203**.

In [17]:
google_free_apps = []
ios_free_apps = []

for app in android_english:
    app_price = app[7]
    
    if app_price == '0':
        google_free_apps.append(app)
        
for app in ios_english:
    app_price = app[4]
    
    if app_price == '0.0':
        ios_free_apps.append(app)
        
explore_data(google_free_apps, 0, 3, True)
print('\n=============================================\n')
explore_data(ios_free_apps, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8848
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

## Most Common Apps by Genre:

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the 'prime_genre' column of the App Store data set, and the 'Genres' and 'Category' columns of the Google Play data set.

We created 2 functions in the code below. The first function which is `freq_table` generates our frequency table and tells us how many times a particular genre appears in our data set. Our second function `display_table` sorts our frequency tables in descending order.

In [18]:
def freq_table (dataset, index):
    tally_dictionary = {}
    
    for app in dataset:
        if app[index] in tally_dictionary:
            tally_dictionary[app[index]] += 1
        else:
            tally_dictionary[app[index]] = 1
    
    percentage_dictionary = {}
    
    for genre in tally_dictionary:
        genre_percentage = (tally_dictionary[genre] / len(dataset)) * 100
        percentage_dictionary[genre] = genre_percentage
        
    return percentage_dictionary


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We then use these functions on our App Store data set.

In [19]:
display_table (ios_free_apps, 11)

Games : 58.25788323446769
Entertainment : 7.836403371838902
Photo & Video : 4.995316890415236
Education : 3.6840462066812365
Social Networking : 3.3093974399000934
Shopping : 2.5913206369029034
Utilities : 2.466437714642523
Sports : 2.1542304089915705
Music : 2.0605682172962845
Health & Fitness : 2.0293474867311896
Productivity : 1.7483609116453322
Lifestyle : 1.5610365282547611
News : 1.3424914142990947
Travel : 1.248829222603809
Finance : 1.0927255697783327
Weather : 0.8741804558226661
Food & Drink : 0.8117389946924758
Reference : 0.5307524196066188
Business : 0.5307524196066188
Book : 0.3746487667811427
Navigation : 0.18732438339057134
Medical : 0.18732438339057134
Catalogs : 0.1248829222603809


We can see that in our Apple Store frequency table majority of the genre are under 'Games' which makes up 58% of total applications. It is followed by 'Entertainment' which makes up 7%, 'Photo & Video' at 4% and 'Education at 3%'. By observing the top number of applications we can discern that the majority of the applications in the Apple Store data set is focused on entertainment opposed to practicality.

In [20]:
display_table (google_free_apps, 1) #Category

FAMILY : 18.942133815551536
GAME : 9.697106690777577
TOOLS : 8.453887884267631
BUSINESS : 4.599909584086799
PRODUCTIVITY : 3.899186256781193
LIFESTYLE : 3.887884267631103
FINANCE : 3.7070524412296564
MEDICAL : 3.5375226039783
SPORTS : 3.390596745027125
PERSONALIZATION : 3.322784810126582
COMMUNICATION : 3.2323688969258586
HEALTH_AND_FITNESS : 3.0854430379746836
PHOTOGRAPHY : 2.949819168173599
NEWS_AND_MAGAZINES : 2.802893309222423
SOCIAL : 2.667269439421338
TRAVEL_AND_LOCAL : 2.3395117540687163
SHOPPING : 2.2490958408679926
BOOKS_AND_REFERENCE : 2.1360759493670884
DATING : 1.8648282097649187
VIDEO_PLAYERS : 1.7970162748643763
MAPS_AND_NAVIGATION : 1.3901446654611211
FOOD_AND_DRINK : 1.2432188065099457
EDUCATION : 1.164104882459313
ENTERTAINMENT : 0.9606690777576853
LIBRARIES_AND_DEMO : 0.9380650994575045
AUTO_AND_VEHICLES : 0.9267631103074141
HOUSE_AND_HOME : 0.8024412296564195
WEATHER : 0.7911392405063291
EVENTS : 0.7120253164556962
PARENTING : 0.6555153707052441
ART_AND_DESIGN : 0.64

We then ran our functions on our Google Play Store data set, creating a frequency table based on `Category`. At first glance we can already see the differennce of our current table opposed to the Apple Store frequency table. The 'FAMILY' category makes up 18% of the of the top applications, followed by the 'GAME' category which makes up 9%, 'TOOLS' by 8T and 'BUSINESS' by 4%. 

There is a huge emphasis on the number of practical applications in the Google Play Store compared to the Apple Store.

In [21]:
display_table (google_free_apps, -4) #Genres

Tools : 8.44258589511754
Entertainment : 6.080470162748644
Education : 5.357142857142857
Business : 4.599909584086799
Productivity : 3.899186256781193
Lifestyle : 3.8765822784810124
Finance : 3.7070524412296564
Medical : 3.5375226039783
Sports : 3.4584086799276674
Personalization : 3.322784810126582
Communication : 3.2323688969258586
Action : 3.096745027124774
Health & Fitness : 3.0854430379746836
Photography : 2.949819168173599
News & Magazines : 2.802893309222423
Social : 2.667269439421338
Travel & Local : 2.328209764918626
Shopping : 2.2490958408679926
Books & Reference : 2.1360759493670884
Simulation : 2.0456600361663653
Dating : 1.8648282097649187
Arcade : 1.842224231464738
Video Players & Editors : 1.7744122965641953
Casual : 1.763110307414105
Maps & Navigation : 1.3901446654611211
Food & Drink : 1.2432188065099457
Puzzle : 1.1301989150090417
Racing : 0.9945750452079566
Role Playing : 0.9380650994575045
Libraries & Demo : 0.9380650994575045
Auto & Vehicles : 0.9267631103074141
St

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

## Most Popular Apps by Genre on the App Store

To get an overview on which applications are popular we will use the column 'Installs' on our Google Play Store data set, which tell us the number of times tha applications has been installed. Unfortunately this column does not exist in our Apple Store data set, so we will use 'rating_count_tot' instead as an alternative.

In [22]:
ios_freq_table = freq_table(ios_free_apps, 11)

for genre in ios_freq_table:
    total = 0
    len_genre = 0
    
    for row in ios_free_apps:
        genre_app = row[11]
        
        if genre_app == genre:
            no_of_rating = float(row[5])
            total += no_of_rating
            len_genre += 1
    
    average_no_rating = total / len_genre
    print (genre , '-', average_no_rating)

Social Networking - 71548.34905660378
Photo & Video - 28441.54375
Games - 22886.36709539121
Music - 57326.530303030304
Reference - 79350.4705882353
Health & Fitness - 23298.015384615384
Weather - 52279.892857142855
Utilities - 19156.493670886077
Travel - 28243.8
Shopping - 27230.734939759037
News - 21248.023255813954
Navigation - 86090.33333333333
Lifestyle - 16815.48
Entertainment - 14195.358565737051
Food & Drink - 33333.92307692308
Sports - 23008.898550724636
Book - 46384.916666666664
Finance - 32367.02857142857
Education - 7003.983050847458
Productivity - 21028.410714285714
Business - 7491.117647058823
Catalogs - 4004.0
Medical - 612.0


Based on the output of our code, we can see that the 'Navigation' genre has an average of 86,090 reviews per application. We then look back to our 'Common Apps by Genre' table where we will be able to see that 'Navigation' apps only occupies 0.18% of the total applications in the Apple Store. This is great since now we know that an application in the 'Navigation' genre has the most number of users and a non saturated number of competitors.

But first let us find out who these competitors are.

In [23]:
for app in ios_free_apps:
    if app[-5] == 'Navigation':
        print (app[1] , '-', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic - 345046
Google Maps - Navigation & Transit - 154911
Geocaching® - 12811
CoPilot GPS – Car Navigation & Offline Maps - 3582
ImmobilienScout24: Real Estate Search in Germany - 187
Railway Route Search - 5


Based on the output we see, although the number of competitors in the 'Navigation' genre is not that great, the disparity in the number of users of each application is great. The number of users in 'Waze' and 'Google Maps' make up more than the majority of the userbase.

Despite facing off 2 giant competitors, applications like 'Geocaching®' was still able to create a substantial following. This gives us the idea that if we are able to create an application in the 'Navigation' genre that has a particular niche, our application will thrive not as a direct competitor to either 'Waze' or 'Google Maps', but as complimentary to them.

## Most Popular Apps by Genre on Google Play

By taking a look at our Google Play Store data set, and paying attention to the 'Installs' column. We can see that the data we have doesn't give us the precise number of installs an application have, but an estimate of it in the form of figures like '10,000+' and '1,000+'. 

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000 installs has 100,000 installs, and an app with 1,000,000 installs has 1,000,000 installs, and so on.

In [24]:
display_table(google_free_apps, 5)

1,000,000+ : 15.75497287522604
100,000+ : 11.539330922242314
10,000,000+ : 10.567359855334539
10,000+ : 10.194394213381555
1,000+ : 8.39737793851718
100+ : 6.928119349005425
5,000,000+ : 6.826401446654612
500,000+ : 5.560578661844485
50,000+ : 4.769439421338156
5,000+ : 4.486889692585895
10+ : 3.5375226039783
500+ : 3.2436708860759493
50,000,000+ : 2.2830018083182644
100,000,000+ : 2.1360759493670884
50+ : 1.9213381555153706
5+ : 0.7911392405063291
1+ : 0.5085895117540687
500,000,000+ : 0.27124773960216997
1,000,000,000+ : 0.22603978300180833
0+ : 0.045207956600361664
0 : 0.011301989150090416


In [25]:
android_freq_table = freq_table(google_free_apps, 1)

for category in android_freq_table:
    total = 0
    len_category = 0
    
    for app in google_free_apps:
        category_app = app[1]
        
        if category_app == category:
            no_of_installs = app[5]
            no_of_installs = no_of_installs.replace('+' , '')
            no_of_installs = no_of_installs.replace(',' , '')
            total += float(no_of_installs)
            len_category += 1
            
    average_no_installs = total / len_category
    print (category , '-' , average_no_installs)

ART_AND_DESIGN - 1986335.0877192982
AUTO_AND_VEHICLES - 647317.8170731707
BEAUTY - 513151.88679245283
BOOKS_AND_REFERENCE - 8814199.78835979
BUSINESS - 1712290.1474201474
COMICS - 832613.8888888889
COMMUNICATION - 38590581.08741259
DATING - 854028.8303030303
EDUCATION - 1833495.145631068
ENTERTAINMENT - 11640705.88235294
EVENTS - 253542.22222222222
FINANCE - 1387692.475609756
FOOD_AND_DRINK - 1924897.7363636363
HEALTH_AND_FITNESS - 4188821.9853479853
HOUSE_AND_HOME - 1360598.042253521
LIBRARIES_AND_DEMO - 638503.734939759
LIFESTYLE - 1446158.2238372094
GAME - 15544014.51048951
FAMILY - 3695641.8198090694
MEDICAL - 120550.61980830671
SOCIAL - 23253652.127118643
SHOPPING - 7036877.311557789
PHOTOGRAPHY - 17840110.40229885
SPORTS - 3650602.276666667
TRAVEL_AND_LOCAL - 13984077.710144928
TOOLS - 10830251.970588235
PERSONALIZATION - 5201482.6122448975
PRODUCTIVITY - 16787331.344927534
PARENTING - 542603.6206896552
WEATHER - 5145550.285714285
VIDEO_PLAYERS - 24727872.452830188
NEWS_AND_MAGAZ

Based on the output of our code, the category 'MAPS_AND_NAVIGATION' only has a modest amount of '4,049,274' average installs compared to other categories like 'GAME' that averages '15,544,014' number of installs. This will lead us the impression that applications under the 'MAPS_AND_NAVIGATION' has less user base than those in other categories, but in reality this might not be the case.

It many instances it is normal to see a mobile device with many applications in the 'GAME' category but only one or two in the 'MAPS_AND_NAVIGATIONS' category. This is because applications that are focused on entertainment in general has a greater tendency where the user becomes 'bored'. 

This is rarely the case with applications under the 'MAPS_AND_NAVIGATIONS' category, because what user seek in this type of application is instead practicality. Let us now look at the number of install of other applications in this category.

In [26]:
for app in google_free_apps:
    if app[1] == 'MAPS_AND_NAVIGATION':
        print (app[0] , '-' , app[5])

Waze - GPS, Maps, Traffic Alerts & Live Navigation - 100,000,000+
T map (te map, T map, navigation) - 5,000,000+
MapQuest: Directions, Maps, GPS & Navigation - 10,000,000+
Yahoo! transit guide free timetable, operation information, transfer search - 10,000,000+
Transit: Real-Time Transit App - 5,000,000+
Mapy.cz - Cycling & Hiking offline maps - 1,000,000+
Uber - 100,000,000+
GPS Navigation & Offline Maps Sygic - 50,000,000+
Map and Router Badge - 500,000+
Yandex.Transport - 10,000,000+
Air Traffic - 1,000,000+
Speed Cameras Radar - 1,000,000+
Atlan3D Navigation: Korea navigator - 1,000,000+
Compass - 10,000,000+
Mappy - Plan, route comparison, GPS - 1,000,000+
Gps Route Finder - 100,000+
My Location: GPS Maps, Share & Save Places - 5,000,000+
Yanosik: "antyradar", traffic jams, navigation, camera - 5,000,000+
NAVITIME - Map & Transfer Navi - 5,000,000+
Sygic Car Navigation - 5,000,000+
Czech Public Transport IDOS - 1,000,000+
Karta GPS - Offline Navigation - 1,000,000+
Circle ratio - 

As we can see just like with our Apple Store data set, 'Waze' once again is in domination of those in the 'Navigation' category. But this time multiple other applications in the same category like 'T map' and 'MapQuest' are able to carve their own userbase.

## Conclusion:

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

Based on our research we can conclude that an application in a non saturated market like 'Navigation' will be profitable. We need to develop an application that has a particular niche that does not go against established applications like 'Google Maps' and 'Waze', but compliment them instead.

'Navigation' applications has a higher chance staying installed in user's devices, this is because they consider it as a necessity and rather have it installed for when the need arises. And since our primary source of revenue is advertisements, users when using 'Navigation' applications tend to keep it open for a longer period of time compared to other type of applications. This is because, users use this type of applications as a guide for long distance drives or areas they are unfamiliar with.