# Profitable App Profiles for the App Store and Google Play Markets 

## 1. Introduction

We are working as a data analyst for a company that builds *Android* and *iOS mobile apps*. The comapny builds *free apps* (which are free to download and install). Main revenue of the company is from *in-app ads*. It depends on the number of users. i.e. more the number of users watch and engage with the ads, more the revenue. Our aim here is to help our developers understand what type of apps attract more users. 

Following are the links to the datasets we are going to use in our project for *Apple Store* and *Google Play Store*.

* Dataset containing ~ 10,000 Android apps from Google Play Store __[link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)__

* Dataset containing ~ 7,000 iOS apps from App Store __[link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)__ 

## 2. Exploration of the Data

In this section we are going to explore both the datasets. First we will open the `csv` files and transform the data into list of lists for further analysis.

In [1]:
import csv

open_Apple = open('AppleStore.csv')
read_Apple = csv.reader(open_Apple)
data_Apple = list(read_Apple)
#header_Apple = list(read_Apple)[0]

open_google = open('googleplaystore.csv')
read_google = csv.reader(open_google)
data_google = list(read_google)
#header_google = list(read_google)[0]

Further, we will define a function `explore_data()`, takes the 'dataset', 'start of a row' and 'end of the row' as input and prints the inputted rows of the dataset as output.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows', len(dataset))
        print('Number of columns', len(dataset[0]))

Let us explore the first *3* rows of the `Apple` & `Google` dataset`.

In [3]:
explore_data(data_Apple, 1, 4) 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']




In [4]:
explore_data(data_google, 1, 4)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




Let us look at the number of rows and columns for the Apple dataset.

In [5]:
print('Number of rows', len(data_Apple[1:]))
print('Number of column', len(data_Apple[0]))

Number of rows 7197
Number of column 16


Let us look at the number of rows and columns for the Google dataset.

In [6]:
print('Number of rows', len(data_google[1:]))
print('Number of columns', len(data_google[0]))

Number of rows 10841
Number of columns 13


Let us look at the column names of both the datasets.

In [7]:
print('AppleStore column names:', data_Apple[0]) #Printing the header row for Play Store
print('\nRequired Columns: rating_count_tot, user_rating, cont_rating') #Link to the original data

AppleStore column names: ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

Required Columns: rating_count_tot, user_rating, cont_rating


`For more clarity check:` [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

In [8]:
print('googleplaystore column names:', data_google[0]) #Printing header row for Android Store
print('\nRequired Columns: Rating, Reviews, Installs, Content Rating') #Link to the original data

googleplaystore column names: ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

Required Columns: Rating, Reviews, Installs, Content Rating


`For more clarity check:`[link](https://www.kaggle.com/lava18/google-play-store-apps)

**Above we explored both datasets by** 
1. Printing the *number of rows and columns*.
2. Looked at the *header row*.
3. Looked at the *body* of the data.

## 3. Cleaning the Data for Easer Analysis

In this and the following sections we are going to perform *data cleaning*. This includes 

 * Removing/correcting wrong data 
 * Removing duplicate data
 * Modifying the data to fit the purpose of our analysis

As a first step, we are going to check if there is any missing data in the *google Play Store*. The way we do it is to check if the *length of any row is `not equal` to the length of the header row*. We will delete such rows.

In [9]:
#select the header row for Android Store data
header_google = data_google[0] 

#loop over the data
for row in data_google[1:]:
    header_len = len(header_google)
    row_len = len(row)        
    
    if row_len != header_len:
        print(row)            
        print(data_google.index(row)) 

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10473


Length of the row `10473` will not match the length of the `header` row. So we can delete this row. 

In [10]:
del data_google[10473] 

In [11]:
print(data_google[10473]) #check if the perticular row is deleted

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


We will perform the above action on *Apple Store* data as well

In [12]:
header_Apple = data_Apple[0]

no_row_len = 0
for row in data_Apple[1:]:
    header_len = len(header_Apple)
    row_len = len(row)
    
    if row_len != header_len:
        no_row_len += 1
        
if no_row_len == 0:
    print("There are no missing rows")
else:
    print(no_row_len)

There are no missing rows


We found that there is one row with missing data in *Android Play Store* and we deleted it. We found no missing data in *Apple Store*.

## 4. Removing Duplicate Entries for Android Store

### Part 1 

In here we are going to findout if there are any duplicate Apps in the *Android Store*.

In [13]:
#we have taken 'Facebook' for checking duplicate entries
for app in data_google:
    name = app[0]
    if name == 'Facebook':  
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


Below we will calculate the number of duplicate apps for the *Android Play Store* 

In [14]:
duplicate_apps = [] 
unique_apps = []    


for app in data_google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))

Number of duplicate apps: 1181


Removing the duplicate rows manually is a cumbersome and a laborious process. So we should come up with programmatic way to carry out this process.

**Here are few methods we can implement:**

`Option1:` Choosing the highest number of reviews (column 4) as it will be the more recent review and removing all other data(duplicates). 

`Option2:` Selecting the highest number of installs (column 6) as it will be the most recent one and removing the others (duplicates).

`Option3:` Selecting the last updated (column 11), which will be recently updated app and removing the other duplicates. 

`Option4:` Selecting the latest version (column 12) as it will be the most recent App than the others.

**Here we are going to perform the first method**

* We will create a dictionary called *reviews_max*, where the *key* is *app name* and *value* is *max_reviews* (i.e. maximum reviews recorded by an app)

* We will find out the length of the dictionary (in order to cross check the answer):  
`10840(total apps) - 1181(duplicate apps) = 9659`
       
* We will create a list called *android_clean* where we can add the complete row of an app with maximum reviews.

* We will create a list called *already_added* where we can add the names of apps which are already included in the android_clean list. (We are adding this supplementary information to take care of fact that if the maximum number of reviews is same for more than one duplicate app) 

### Part 2

Let us perform the necessary procedure.

In [15]:
reviews_max = {}    

for row in data_google[1:]:
    name = row[0]
    n_reviews = float(row[3]) 
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [16]:
print('Expected length:', len(reviews_max))

Expected length: 9659


The length of the dictionary, *reviews_max* exactly matches with the expected length. Below we are going to use the *reviews_max* dictionary to remove duplicate rows 

In [17]:
android_clean = []  
already_added = []  


for row in data_google[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(row)
        already_added.append(name)
    

In [18]:
print((android_clean[:3])) 
print('\nNumber of expected rows:', len(android_clean))

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]

Number of expected rows: 9659


As expected we have got *9659* rows.  

## 5. Removing Non-English Apps

### Part 1

The company we are working with is specially for English speaking audience. So we need only English Apps for our analysis. We will delete all other apps.

First we will define a function which checks if the given app has only English alphabets or not, using *ord()* function. 

Below we define a function *english_app()*, with *string* as a parameter. It checks if the given app is english or not

In [19]:
def english_app(string):
    for character in string:
        if ord(character) > 127:
            return False
        else:
            return True

In [20]:
print(english_app('Instagram'))

True


In [21]:
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))

False


In [22]:
print(english_app('Docs To Go™ Free Office Suite'))

True


In [23]:
print(english_app('Instachat 😜'))

True


To check if the function is giving appropriate outcomes, we ran the function with *English* and *non-English* app names. We got the correct answers.

### Part 2

If we use the above function, we may loose some English apps along with non-English ones. So we are going to define one more function very similar to the ealier one. Here we will only remove the app if it has more than three characters with corresponding numbers falling outside the ASCII range. This means if the app has up to three emojis or other special characters, it will still be labelled as Englsh app.

In [24]:
# define a function E_A() with string as a parameter
def E_A(string):
    ord_list = 0  
    for character in string:
        if ord(character) > 127:
            ord_list += 1
            
    if ord_list > 3:
        return False
    else:
        return True

In [25]:
print(E_A('Docs To Go™ Free Office Suite'))

True


In [26]:
print(E_A('Instachat 😜'))

True


In [27]:
print(E_A('爱奇艺PPS -《欢乐颂2》电视剧热播'))

False


Above we checked the function on few apps and it works properly

Now we are going to apply the above function on both *Android Store* and *Apple Play Store*.

In [28]:
android_English_App = [] 

for row in android_clean:
    app = row[0]
    
    if E_A(app) is True:
        android_English_App.append(row)

Apply the function *explore_data()* on the list *android_English_App* and observe the results

In [29]:
print(explore_data(android_English_App, 0, 4, rows_and_columns=True))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows 9614
Number of columns 13
None


Repeat the same process on *Apple Store Data*

In [30]:
apple_English_App = []

for row in data_Apple[1:]:
    app = row[1]
    
    if E_A(app) is True:
        apple_English_App.append(row)

Apply the *explore_data()* function on the list *apple_English_App* and observe the data

In [31]:
print(explore_data(apple_English_App, 0, 4, rows_and_columns=True))

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows 6183
Number of columns 16
None


We have successfully removed the non English apps from both *Android Store* and *Apple Play Store*

## 6. Isolating the Free Apps from the Paid Apps

As we have mentioned earlier, the company is  only interested in the free apps and the main revenue comes from the in-app ads. 

Below we are going to seperate the free apps from the paid apps for both *Android Store* and *Apple Play Store* together.

In [32]:
android_free_app = []
ios_free_app = []

for row in android_English_App:
    price = row[7]
    if price == '0':
        android_free_app.append(row)
    
for row in apple_English_App:
    price = row[4]
    if price == '0.0':    
         ios_free_app.append(row) 


Let us apply *explore_data()* function on both the lists and look at the number of rows and columns.

In [33]:
print(explore_data(android_free_app, 0, 3, rows_and_columns=True ))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows 8864
Number of columns 13
None


In [34]:
print(explore_data(ios_free_app, 0, 3, rows_and_columns=True ))

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows 3222
Number of columns 16
None


## 7. Most Common Apps by Genre

Our goal is to find an app profile that attracts users on both *App Store* and *google play*. Once we identify such a profile, we would like to validate our recommendation by first building a new app fitting this profile on one of the platforms (e.g. android), observing its usage and, if successful port the app to the other platform.

**Our validation strategy for an app idea has 3 steps:**

1) Build a minimal Android version of the app and add it to *Google Play*.

2) If the app has a good response from users, develope it further.

3) If the app is profitable after six months in *Google Play*, build an iOS version of the app and add it to the *App Store*.



**For generating frequency tables to find out most common genres we can use:**

For Google Play: `Column 2 (category)` and `Column 10 (Genres)`

For Play Store: `Column 12 (prime_genre)`



### 7.1 Generating a Frequency Table

First we are going to define a function which creates a frequency table. This frequency tables shows the percentage of each genre.

In [35]:
def freq_table(dataset, index): 
    dict_freq = {}
    total = 0
    
    for row in dataset:
        total += 1
        genre = row[index]
        
        if genre in dict_freq:
            dict_freq[genre] += 1
        else:
            dict_freq[genre] = 1         
                   
    
    dict_freq_percentage = {}
    for value in dict_freq:
        dict_freq[value] /= total
        percentage = dict_freq[value] * 100
        dict_freq_percentage[value] = percentage
        
    return dict_freq_percentage


We will create one more function called *display_table()*, which will display the genre percentages in a descending order 

In [36]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []    
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
    
    
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Below we apply the *display_table()* function on *'android Category'*, *'android Genre'* and *'ios prime_genre'* and observe the output

In [37]:
print('android Category:')
print(display_table(android_free_app, 1))
print('\n')
print('android Genre:')
print(display_table(android_free_app, 9))
print('\n')
print('ios prime_genre:')
print(display_table(ios_free_app, 11))


android Category:
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989

### 7.2 Examining the Frequency Table

**Analysis of the frequency table generated for prime_genre of the App Store data set:**

* The most common genre is *games* (this is more than 50% of the Apps) and next common genre is *Entertainment* apps. 

* Majority of the apps (i.e. more than 70%) fall in the *Entertainment purpose* compared to *Practical purpose*.

* It is premature to recommend an app profile based on the above frequency table, as this table is built using *app genre* and not with any kind of user information. 


**Analysis of the frequency table generated for Category and Genres column of the Google Play data set:**

* The most common genres are *Entertainment* and *Tools*. Among *category*, *Family* with *~ 18%* is on top of the table. Among *genre*, *tools* with *~ 8%* is on top of the table.  

* Here the ratio of *practicle purpose apps* and *Entertainment purpose apps* are almost equal unlike in *app store*.

* As in the case of *app store*, it is not possible to recommend any app profile for *google play store* as well. As these are based on *app genre/category* and not on any user information.

## 8. Most Popular Apps by Genre on Apple Store

In this section we are going to find out the *most popular genres*. We do this by calculating the most *average number of installs for each app genre*. For *Google Play Store*, this information is stored in column *Installs*. For *App Store*, as there is no such information, we will work with column with information on 'total number of user ratings'.

Below we are going to list the *genre* and respective *average rating counts* for 'Apple Store'

In [38]:
unique_genre = freq_table(ios_free_app, 11) 

for genre in unique_genre:
    total = 0 
    len_genre = 0 
    
    for row in ios_free_app:
        genre_app = row[11]
        if genre_app == genre:
            rating_counts = float(row[5])
            total += rating_counts
            len_genre += 1
    
    
    avg_rating_counts = total / len_genre
    print(genre, ':', avg_rating_counts)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Here we recommend an app profile for *IOS app store* based on the user ratings.

Above frequency table shows that there are *five* apps which have *>50000* user rating counts. We are listing them in descending order

* Navigation
* Reference 
* Social Networking
* Music
* Weather

Few other apps which have the rating counts of *>30000* are listed below in descending order

* Book
* Food & drink
* Finance

Some which are <*30000* are listed below

* Travel
* Photo & Video
* Shopping

Below we give the list of *Navigation Apps* with respective rating counts for 'Apple Store', as this app has the highest rating count of *~86000*. 

In [39]:
for row in ios_free_app:
    if row[11] == 'Navigation':
        print(row[1], ':', row[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


## 9. Most Popular Apps by Genre: Google Play

In this section we are going to calculate the *Most Popular Apps by Genre for Google Play Store*. This exercise is straight-forward. Let us list the *category* and respective *average number of installs*. 

In [40]:
unique_app_genre_android = freq_table(android_free_app, 1)

for category in unique_app_genre_android:
    total = 0
    len_category = 0
    
    for row in android_free_app:
        category_app = row[1]
        
        if category_app == category:
            n_installs = row[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs = float(n_installs)
            total += n_installs 
            len_category += 1   
            
    avg_no_n_installs = total / len_category
    print(category, ':', avg_no_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Below we recommend the app profile for *Google Play Store* based on the number of user installs.

Frequency table above shows that there are *nine* apps which are *> 10000000* installs. We are listing them in descending order.

* Communication
* Video-Players
* Social
* Photography
* Productivity
* Game
* Travel & Local
* Entertainment
* Tools

Below we list of few *Communication Apps* (which has a *~ 38456119* average no. of installs) for 'Google Play Store'

In [41]:
for row in android_free_app[:290]:
    if row[1]== 'COMMUNICATION':
        print(row[0], ':', row[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+


Based on our analysis and observation, we have come up with a list of apps common to both *Apple Store* and *Google Play Store*. 

* Social/Social Networking 
* Photography/Photo & Video
* Travel & Local
* Entertainment/Music
* Book & Reference
* Finance
* Communication
* Games
* Food & Drink

## Conclusion

In this project we analyzed app data of *Apple Store* and *Google Play Store*. The objective of this analysis is to recommend free apps which are profitable to both. We have come up with a list of apps which can be profitable to both the stores. These apps are listed below:

* Social/Social Networking 
* Photography/Photo & Video
* Travel & Local
* Entertainment/Music
* Book & Reference
* Finance
* Communication
* Games
* Food & Drink
