# Profitable Apps on Google Play and the Apple App Store
---

## Background 

For the sake of this project, I'll pretend to be a data analyst for a company that develops Android and iOS mobile applications. Our apps may be found on Google Play as well as the Apple App Store.

As a company, we only develop apps that are free to download and install, and in-app advertising is our main source of revenue. In other words, the number of individuals who see and engage with our apps has an impact on the amount of money we make from each app - the greater the number of users who see and interact with the ads, the better. The goal of this project is to do data analysis in order to help our developers in identifying which kinds of apps are most likely to attract more users in the future.

## Data Understanding

As of September 2018, there were roughly 2 million iOS applications accessible on the Apple App Store and 2.1 million Android apps available on Google Play.
Collecting data for over four million applications takes considerable alot of time and money, therefore I'll attempt to examine a sample of data instead. To avoid spending money on collecting new data, I first checked to see whether there were any relevant existing data that could be acquired for free. Fortunately, two datasets seem to be suitable for my purposes:

* A dataset providing information on roughly 10,000 Android apps obtained from Google Play in August 2018. You may directly download the data set from this [URL.](https://raw.githubusercontent.com/Clifton-Lee/Clifton-Lee/master/googleplaystore.csv)

* A dataset providing information on roughly 7,000 iOS apps obtained from the Apple App Store in July 2017. You may directly download the data set from this [URL.](https://github.com/Clifton-Lee/Clifton-Lee/blob/master/AppleStore.csv)

### Open and Explore the data

I'll begin by accessing the two datasets listed above and explore each of them. 

In [1]:
# Open Apple App Store dataset as a list of list 
from csv import reader 
opened_apple_data = open('Applestore.csv',encoding = 'utf8')
read_apple_data = reader(opened_apple_data)
apple_data = list(read_apple_data) 

#open Google Play dataset as a list of list 
opened_google_data = open('googleplaystore.csv', encoding = 'utf8')
read_google_data = reader(opened_google_data)
google_data = list(read_google_data)


To make it simpler to examine the dataset shown above, I wrote a function called `explore_data()` that would be used to display rows in a readable format frequently.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

First few rows of Apple Apps dataset:

In [3]:
# Header row
print(apple_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [4]:
explore_data(apple_data[1:],0,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


This dataset contains 7197 iOS applications, with the following fields of interest: `track name`, `currency`, `price`, `rating count tot`, `rating count ver`, and `prime genre`. In this instance, not all column names are self-explanatory, but information about each column can be found in the dataset [documentation.](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

First few rows of Google Play dataset:

In [5]:
# Header row
print(google_data[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [6]:
explore_data(google_data[1:],0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


The Google Play data collection contains 10841 applications and 13 columns. At a glance, the columns that may be helpful for my study are `App`, `Category`, `Reviews`, `Installs`, `Type`, `Price`, and `Genres`. Information about each column can be found in the dataset [documentation.](https://www.kaggle.com/lava18/google-play-store-apps)

## Data Preparation

Before I begin my analysis, I must ensure that the data I am analyzing is correct, otherwise the findings of my analysis will be incorrect. This implies I'll have to perform the following:

- Identify incorrect data and fix or delete it.
- Identify and delete duplicate data.

Remember that my company only create applications that are free to download and install, and they only develop them for an English-speaking audience. That implies I'll have to perform the following:

- Remove any applications that are not in English.
- Uninstall any applications that aren't free.

### Remove Incorrect Data Entry 

I notice that the Google Play dataset has a dedicated [discussion area](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and one of the conversations explains an error for a specific row (row 10472). 

I'll now check to see whether they're accurate and make any required adjustments to the dataset. Taking into account the user reporting the error might or might have not removed the header row, so the index number might vary.

In [7]:
print(google_data[0],end = '\n\n') # print the column names
explore_data(google_data,10471,10474) # print rows 10471 to 10473 in order to identify the error 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

['Jazz Wi-Fi', 'COMMUNICATION', '3.4', '49', '4.0M', '10,000+', 'Free', '0', 'Everyone', 'Communication', 'February 10, 2017', '0.1', '2.3 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




After investigating, I discovered the mistake and which row it occurred on. Because the user who reported did not include the header row in his count, the mistake is really on row 10473 in my data. Seeing that this is an error I'm going to delete the row. 

In [8]:
del google_data[10473] 

There were no observations in the Apple App Store [discussion](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion) area regarding incorrect data for Apple Apps, hence I will continue as normal for now. 

### Remove Duplicates


#### Google Play Store

In the preceding step, I began the data cleaning process by deleting a row from the Google Play dataset that included erroneous data. I went through the Google Play data set and the comments area, and I noticed that several applications had duplicate listings. Instagram, for example, has four entries:

In [9]:
for app in google_data[1:]:
    name = app[0]
    if name == 'Instagram':
        print(app, end = '\n\n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']

['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']



When I examine the data, I don't want to count specific applications more than once, therefore I need to eliminate the redundant entries and retain just one record per app. In order to do so I will first need to identify all the duplicate applications:

In [10]:
duplicate_apps = []
unique_apps = []

for app in google_data[1:]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps), end= '\n\n')
print('Example of duplicate apps: ', duplicate_apps[:5])

Number of duplicate apps:  1181

Example of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


As an alternative to eliminating duplicates on a random basis, I'll only retain the row with the greatest number of ratings for each particular app and delete the other entries for that app. 

In the previous step, I looped through the Google Play dataset and found that there are 1,181 duplicates. After I remove the duplicates, I should be left with 9,659 rows:

In [11]:
print('Expected length: ', len(google_data[1:]) - 1181)

Expected length:  9659


To remove the duplicates, I will do the following:

* Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
* Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, I'll only select the entry with the highest number of reviews).

In [12]:
# Recall the column names and the position 
print(google_data[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [13]:
reviews_max = {}
for app in google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
#check if it stored the correct instagram app, the reviews count should be '66577446'        
print('Highest Instagram reviews are: ', reviews_max['Instagram'], end = '\n\n') 
print('The Length is: ',len(reviews_max))

Highest Instagram reviews are:  66577446.0

The Length is:  9659


Everything seem to be in order, I will now use the dictionary created above to remove the duplicate rows

In [14]:
android_clean = []
already_added = []

for app in google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)


explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


#### Apple App Store

In the previous step, I managed to remove the duplicate app entries in the Google Play dataset. I will check if I have to do the same for the Apple App dataset.

In [15]:
duplicate_apps = []
unique_apps = []

for app in apple_data[1:]:
    id = app[0]
    if name in unique_apps:
        duplicate_apps.append(id)
    else:
        unique_apps.append(id)
        
print('Number of duplicate apps: ', len(duplicate_apps), end= '\n\n')
print('Example of duplicate apps: ', duplicate_apps[:5])

Number of duplicate apps:  0

Example of duplicate apps:  []


I don't need to do the same for the Apple App Store data because there are no duplicates — I checked using the id column (not the track_name column) because the id column is the primary key of the dataset. 

### Remove Non-English Apps

Remember that my company create applications in English, and I'd want to evaluate only apps that are intended for an English-speaking audience. However, when I looked deep into the data, I discovered that both datasets include applications with names that indicate they're not intended for an English-speaking audience.

In [16]:
print(apple_data[814][1])
print(apple_data[6732][1])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


I am not interested in keeping these apps, so I'll remove them. One way to do this is to remove each app with a name containing a symbol that isn't commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

In [17]:
def is_english(string):
    '''This function identifies if a string is english or not using the ASCII standard. '''
    characters = []
    for letter in string:
        if ord(letter) > 127:
            characters.append(letter)
    
    return len(characters) <= 3

Examples of the function results:

In [18]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


Now that I know my function works quite well, I'll apply it to both datasets and filter out non-English applications.

In [19]:
clean_apple_data =[]
for app in apple_data[1:]:
    if is_english(app[1]):
        clean_apple_data.append(app)
        
clean_google_data = []
for app in android_clean:
    if is_english(app[0]):
        clean_google_data.append(app)
        
print('The number of English Apple apps are: ',len(clean_apple_data))
print('The number of English Google apps are: ',len(clean_google_data))

The number of English Apple apps are:  6183
The number of English Google apps are:  9614


So far in the data cleaning process, I've done the following:

- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps

As I mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. My datasets contain both free and non-free apps; I'll need to isolate only the free apps for my analysis.

Isolating the free apps will be my last step in the data cleaning process. 

### Isolate the Free Apps

In [20]:
free_apple_apps = []
for app in clean_apple_data:
    price = float(app[4])
    if price == 0:
        free_apple_apps.append(app)
        
free_google_apps = []
for app in clean_google_data:
    if '$' in app[7]:
        price = float(app[7].replace('$',''))
    else:
        price = float(app[7])
    if price == 0:
        free_google_apps.append(app)
        
print('The number of Free English Apple apps are: ',len(free_apple_apps))
print('The number of Free English Google apps are: ',len(free_google_apps))

The number of Free English Apple apps are:  3222
The number of Free English Google apps are:  8864


## Data Analysis

To minimize risks and overhead, our validation strategy for an app idea has three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the Apple App Store.

Because our end goal is to add the app on both Google Play and the Apple App Store, I need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

I'll begin the analysis by determining the most common genres for each market. For this, I'll need to build frequency tables for a few columns in my datasets.

In [21]:
print('Remember the column headers are:', end = '\n\n')
print('Apple Apps Data Columns:')
print(apple_data[0],end = '\n\n')
print('Google Apps Data Columns:')
print(google_data[0])

Remember the column headers are:

Apple Apps Data Columns:
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

Google Apps Data Columns:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Based on the columns in both datasets, I'll need to create a frequency table for the Apple App Store dataset's `prime_genre` column and the Google Play dataset's `Genres` and `Category` columns.

I'll create two functions to help me examine the frequency tables:

- One function for creating frequency tables with percentages.
- Also another function to show the percentages in decreasing order.


In [22]:
def freq_table(dataset,index):
    ''' This function creates a frequency table from a list of list without headers'''
    table = {}
    # creates the frequency of each item in the column.
    for row in dataset:
        key = row[index]
        if key in table:
            table[key] += 1
        else:
            table[key] = 1 
    # converts the numbers of each item into percentages of total. 
    for item in table:
        table[item] = round((table[item] / len(dataset)) * 100,2)
        
    return table
    

In [23]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now I'll use the `display_table` function to show the frequency table of the columns `prime_genre`, `Genres`, and `Category`.

In [24]:
display_table(free_apple_apps,11) #prime_genre column in the Apple App dataset

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


In [25]:
display_table(free_google_apps,9) # Genre column in the Google Play dataset

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

In [26]:
display_table(free_google_apps,1) # Category column in the Goople Play dataset

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


### Observation

Now I'll concentrate on studying these frequency tables:

`prime_genre`:

`Games` is the most popular genre among free english apps in the Apple App Store, accounting for **58.16%** of the free english apps listing. The second most popular category is `Entertainment`, which accounts for **7.88%** of the free english apps list.

Remember, I'm just interested in free english applications, so I try not to broaden my views beyond that.

Observing the patterns presented, the bulk of the applications are leisure apps (`Games`,`Entertainment`,`Photo and Video`,etc), giving the impression that the majority of the apps are intended for entertainment.

Despite the fact that the majority of the free english applications in the Apple App Store are games, I cannot suggest an app profile for the Apple App Store market based only on that frequency. A high number of applications for a certain genre does not always indicate that apps in that genre have a significant number of users. I'll have to check into it as well before coming to a decision.


`Category` and `Genre`:

The most popular genres of Google Play free english apps are `Tools`, `Entertainment`, and `Education`. With respective percentage take-ups of **8.45%**, **6.07%**, and **5.35%**. Looking at the category these applications fall into, the bulk of them would be classified as `Family`, with **18.91%**, followed by `Game`, with **9.72%** and `Tools` with **8.46%**.

When comparing the most popular free english applications in the Google Play store to those in the Apple App store, gaming apps are in the top three in both datasets. With this knowledge, I am tempted to investigate the user adoption of gaming applications in order to better understand why these apps are so prevalent in both datasets. However, based on what I've discovered so far, I'm still unable to suggest a certain app profile.

#### Further Analysis 

##### Apple App Store 

Now, I would like to determine the kind of apps with the most users. 

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play dataset, we can find this information in the `Installs` column, but this information is missing for the Apple App Store dataset. As a workaround, I'll take the total number of user ratings as a proxy, which I can find in the `rating_count_tot` column.

I'll start with calculating the average number of user ratings per app genre on the Apple App Store. To do that, I'll need to do the following:

- Isolate the apps of each genre
- Add up the user ratings for the apps of that genre
- Divide the sum by the number of apps belonging to that genre (not by the total number of apps)

In [27]:
# Apple App Store number of average user rating 
unique_genre = freq_table(free_apple_apps,11)
for genre in unique_genre:
    total = 0
    len_genre = 0
    for app in free_apple_apps:
        genre_app = app[11]
        if genre_app == genre:
            user_rating_count = float(app[5])
            total += user_rating_count
            len_genre += 1
    avg_user_rating_count = round(total / len_genre,2)
    print(genre, " : ", avg_user_rating_count)

Social Networking  :  71548.35
Photo & Video  :  28441.54
Games  :  22788.67
Music  :  57326.53
Reference  :  74942.11
Health & Fitness  :  23298.02
Weather  :  52279.89
Utilities  :  18684.46
Travel  :  28243.8
Shopping  :  26919.69
News  :  21248.02
Navigation  :  86090.33
Lifestyle  :  16485.76
Entertainment  :  14029.83
Food & Drink  :  33333.92
Sports  :  23008.9
Book  :  39758.5
Finance  :  31467.94
Education  :  7003.98
Productivity  :  21028.41
Business  :  7491.12
Catalogs  :  4004.0
Medical  :  612.0


 Observing the average user ratings above `Navigation`, `Referencing`,`Social Networking`,`Music` and `Weather` are the largest genres. I will now look at the Google Play Store dataset and then make an assessment on an app profile recommendation for both stores. 

##### Google Play Store

In the previous step, I observed the possible app profiles for the App Store based on the number of user ratings. I have data about the number of installs for the Google Play market, so I should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — I can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [28]:
display_table(free_google_apps, 5) # Install columns

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05
0 : 0.01


For instance, I don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, I don't need very precise data for our purposes — I only want to find out which app genres attract the most users.

I am going to leave the numbers as they are, which means that I'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, I'll need to convert each install number from a string to a float. This means I need to remove the commas and the plus characters, or the conversion will fail and cause an error.

In [29]:
# Google Play Store number of average user rating 
unique_google_genre = freq_table(free_google_apps,1)
for category in unique_google_genre:
    total = 0
    len_category = 0
    for app in free_google_apps:
        category_app = app[1]
        if category_app == category:
            installs = app[5].replace('+','')
            installs = installs.replace(',','')
            total += float(installs)
            len_category += 1
    avg_install_count = round(total / len_category,2)
    print(category, " : ", avg_install_count)

ART_AND_DESIGN  :  1986335.09
AUTO_AND_VEHICLES  :  647317.82
BEAUTY  :  513151.89
BOOKS_AND_REFERENCE  :  8767811.89
BUSINESS  :  1712290.15
COMICS  :  817657.27
COMMUNICATION  :  38456119.17
DATING  :  854028.83
EDUCATION  :  1833495.15
ENTERTAINMENT  :  11640705.88
EVENTS  :  253542.22
FINANCE  :  1387692.48
FOOD_AND_DRINK  :  1924897.74
HEALTH_AND_FITNESS  :  4188821.99
HOUSE_AND_HOME  :  1331540.56
LIBRARIES_AND_DEMO  :  638503.73
LIFESTYLE  :  1437816.27
GAME  :  15588015.6
FAMILY  :  3695641.82
MEDICAL  :  120550.62
SOCIAL  :  23253652.13
SHOPPING  :  7036877.31
PHOTOGRAPHY  :  17840110.4
SPORTS  :  3638640.14
TRAVEL_AND_LOCAL  :  13984077.71
TOOLS  :  10801391.3
PERSONALIZATION  :  5201482.61
PRODUCTIVITY  :  16787331.34
PARENTING  :  542603.62
WEATHER  :  5074486.2
VIDEO_PLAYERS  :  24727872.45
NEWS_AND_MAGAZINES  :  9549178.47
MAPS_AND_NAVIGATION  :  4056941.77


Observing the average user ratings above `Communication`, `Video Player`,`Social`,`Photography` and `Productivity` are the largest categories. I will now analyze both datasets and make my conclusion

## Conclusion

To conlcude, please bear in mind that our aim is to recommend an app genre that has the potential to be profitable on both the Apple App Store and the Google Play Store. Taking the objective into consideration, I decided that a **"social"** application would be the best fit for our delevopers to work on.

The amount of downloads and user ratings for **social** applications are in the top three categories in both the Apple App Store and the Google Play Store, and the competition is not as fierce, indicating that the market for social apps is not oversaturated and that there is room for new entrants. While it may be more challenging in the conventional world since companies would want to create market monopolies, in the technology world, better apps along with intuitive marketing will force them out of monopolistic positions.

It is conceivable that in addition to the raw version of a social media application, we will need to incorporate some additional functions. This may include the *__capacity to have several video conferencing sessions, large media file transfers, and a greater group chat space, among other things.__* 
