# Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build. 

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.


This project will touch on the following topics:
* Pandas
* String methods
* Emojis
* Regular expressions
* Jupyter Notebook

## TASK 1: Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:

- [A data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

Let's start by opening the two data sets and then continue with exploring the data.

In [2]:
!pip install emoji



In [3]:
import pandas as pd
import re
import string
import emoji

In [4]:
# TO DO:
# Read the 2 csv files and save them in a variable. Remember the first role in the data is the header(column names)
app_store = pd.read_csv('datasetcopy/AppleStore.csv')
google_store = pd.read_csv('datasetcopy/googleplaystore.csv')

To make it easier, we will start by exploring the datasets to have a better understanding.

In [42]:
# google_store information
google_store.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9659 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             9659 non-null   object 
 1   Category        9659 non-null   object 
 2   Rating          8196 non-null   float64
 3   Reviews         9659 non-null   object 
 4   Size            9659 non-null   object 
 5   Installs        9659 non-null   object 
 6   Type            9658 non-null   object 
 7   Price           9659 non-null   object 
 8   Content Rating  9659 non-null   object 
 9   Genres          9659 non-null   object 
 10  Last Updated    9659 non-null   object 
 11  Current Ver     9651 non-null   object 
 12  Android Ver     9657 non-null   object 
dtypes: float64(1), object(12)
memory usage: 1.0+ MB


In [43]:
#app_store information
app_store.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6920 entries, 0 to 7196
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                6920 non-null   int64  
 1   track_name        6920 non-null   object 
 2   size_bytes        6920 non-null   int64  
 3   currency          6920 non-null   object 
 4   price             6920 non-null   float64
 5   rating_count_tot  6920 non-null   int64  
 6   rating_count_ver  6920 non-null   int64  
 7   user_rating       6920 non-null   float64
 8   user_rating_ver   6920 non-null   float64
 9   ver               6920 non-null   object 
 10  cont_rating       6920 non-null   object 
 11  prime_genre       6920 non-null   object 
 12  sup_devices.num   6920 non-null   int64  
 13  ipadSc_urls.num   6920 non-null   int64  
 14  lang.num          6920 non-null   int64  
 15  vpp_lic           6920 non-null   int64  
dtypes: float64(3), int64(8), object(5)
memory usage

In [44]:
# google_store description
google_store.describe()

Unnamed: 0,Rating
count,8196.0
mean,4.173243
std,0.536625
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,5.0


In [45]:
# app_store description
app_store.describe()

Unnamed: 0,id,size_bytes,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
count,6920.0,6920.0,6920.0,6920.0,6920.0,6920.0,6920.0,6920.0,6920.0,6920.0,6920.0
mean,858977200.0,202313100.0,1.765039,13406.67,478.747977,3.621604,3.355058,37.316474,3.747688,5.587428,0.993353
std,272156300.0,361874900.0,5.926296,77196.19,3997.063968,1.420939,1.740858,3.789117,1.960864,8.027712,0.081266
min,281656500.0,589824.0,0.0,0.0,0.0,0.0,0.0,9.0,0.0,0.0,0.0
25%,595656600.0,47938820.0,0.0,38.0,2.0,3.5,3.0,37.0,3.0,1.0,1.0
50%,972422900.0,98302460.0,0.0,346.0,26.0,4.0,4.0,37.0,5.0,1.0,1.0
75%,1080950000.0,184330200.0,2.99,3069.75,150.0,4.5,4.5,38.0,5.0,8.0,1.0
max,1188376000.0,4025970000.0,299.99,2974676.0,177050.0,5.0,5.0,47.0,5.0,75.0,1.0


In [47]:
#checking for duplication
google_store.duplicated(keep=False).unique()

array([False])

In [46]:
#checking for duplication
app_store.duplicated(keep=False).unique()

array([False])

In [67]:
def missing_values(google_store):
    """Function that checks for null values and computes the percentage of null values"""
    total = google_store.isnull().sum().sort_values(ascending=False) 
    percentage = round((total /google_store.shape[0]) * 100, 2)
    
    missing_output = pd.concat([total, percentage], axis=1, keys=['Total','Percentage'])
    
    return missing_output

In [68]:
miss_values = missing_values(google_store)
miss_values

Unnamed: 0,Total,Percentage
Rating,1463,15.15
Current Ver,8,0.08
Android Ver,2,0.02
Type,1,0.01
App,0,0.0
Category,0,0.0
Reviews,0,0.0
Size,0,0.0
Installs,0,0.0
Price,0,0.0


In [70]:
def missing_values(app_store):
    """Function that checks for null values and computes the percentage of null values
    Args:
        data: dataframe - data whose missing value is to be determined
    Return:
        missing_output: dataframe - dataframe of total null values with corresponding percentages
    """
    total =app_store.isnull().sum().sort_values(ascending=False) 
    percentage = round((total /app_store.shape[0]) * 100, 2)
    
    missing_output = pd.concat([total, percentage], axis=1, keys=['Total','Percentage'])
    
    return missing_output

In [71]:
miss_values = missing_values(app_store)
miss_values

Unnamed: 0,Total,Percentage
id,0,0.0
track_name,0,0.0
size_bytes,0,0.0
currency,0,0.0
price,0,0.0
rating_count_tot,0,0.0
rating_count_ver,0,0.0
user_rating,0,0.0
user_rating_ver,0,0.0
ver,0,0.0


In [5]:
# Display the first few rows of googleStore_data
google_store.head(2)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up


In [6]:
# Display the first few rows of AppStore_data
app_store.head(2)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,281656475,PAC-MAN Premium,100788224,USD,3.99,21292,26,4.0,4.5,6.3.5,4+,Games,38,5,10,1
1,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1


In [7]:
# number of rows and columns in the data of googleStore_data
google_store.shape

Number_of_rows: 10841
Number_of_Columns: 13


(10841, 13)

In [8]:
# number of rows and columns in the data of AppStore_data
google_store.shape

Number_of_rows: 7197
Number_of_Columns: 16


## TASK 2: Deleting Wrong Data
Outline an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [9]:
# TO DO:
# Explore why this data is incorrect 
Row_inspection = google_store.iloc[[10472,10471,0]]
Row_inspection

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,
10471,Xposed Wi-Fi-Pwd,PERSONALIZATION,3.5,1042,404k,"100,000+",Free,0,Everyone,Personalization,"August 5, 2014",3.0.0,4.0.3 and up
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up


In [10]:
#select row(10472)
defined_index = google_store[(google_store['Category']== 1.9)|
                             (google_store['Genres']== 'February 11, 2018')].index

In [11]:
#delete errored row(10472)
deleted_row = google_store.drop(defined_index,inplace=True)

In [12]:
deleted_row
# the various values where irrelevant for the various attributes in row 10472
#example Genres = February 11, 2018

## TASK 3: Removing Duplicate Entries

If we explore the Google Play data set long enough, we'll find that some apps have more than one entry. For instance, the application Instagram has four entries:

In [13]:
# Check all instances of apps that have more than 1 entry
duplicate_apps = google_store[google_store['App'].duplicated(keep=False)].sort_values(by='App', ascending=False)
duplicate_apps

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
8291,wetter.com - Weather and Radar,WEATHER,4.2,189310,38M,"10,000,000+",Free,0,Everyone,Weather,"August 6, 2018",Varies with device,Varies with device
3652,wetter.com - Weather and Radar,WEATHER,4.2,189313,38M,"10,000,000+",Free,0,Everyone,Weather,"August 6, 2018",Varies with device,Varies with device
3118,trivago: Hotels & Travel,TRAVEL_AND_LOCAL,4.2,219848,Varies with device,"50,000,000+",Free,0,Everyone,Travel & Local,"August 2, 2018",Varies with device,Varies with device
3103,trivago: Hotels & Travel,TRAVEL_AND_LOCAL,4.2,219848,Varies with device,"50,000,000+",Free,0,Everyone,Travel & Local,"August 2, 2018",Varies with device,Varies with device
3202,trivago: Hotels & Travel,TRAVEL_AND_LOCAL,4.2,219848,Varies with device,"50,000,000+",Free,0,Everyone,Travel & Local,"August 2, 2018",Varies with device,Varies with device
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2385,2017 EMRA Antibiotic Guide,MEDICAL,4.4,12,3.8M,"1,000+",Paid,$16.99,Everyone,Medical,"January 27, 2017",1.0.5,4.0.3 and up
2543,1800 Contacts - Lens Store,MEDICAL,4.7,23160,26M,"1,000,000+",Free,0,Everyone,Medical,"July 27, 2018",7.4.1,5.0 and up
2322,1800 Contacts - Lens Store,MEDICAL,4.7,23160,26M,"1,000,000+",Free,0,Everyone,Medical,"July 27, 2018",7.4.1,5.0 and up
1407,10 Best Foods for You,HEALTH_AND_FITNESS,4.0,2490,3.8M,"500,000+",Free,0,Everyone 10+,Health & Fitness,"February 17, 2017",1.9,2.3.3 and up


In [14]:
# Remove all instances of apps that have more than 1 entry
google_store_cleaned =  google_store.drop_duplicates(subset='App',inplace =True)

## TASK 4: Removing Non-English Apps

The names of some of the apps suggest they are not directed toward an English-speaking audience. 

In [15]:
#replace emojis with empty spaces(app_store).
app_store['track_name'] = app_store['track_name'].apply(lambda s: emoji.replace_emoji(s, '')).str.strip()

In [16]:
#remove all emojis and white spaces
list(app_store['track_name'])

['PAC-MAN Premium',
 'Evernote - stay organized',
 'WeatherBug - Local Weather, Radar, Maps, Alerts',
 'eBay: Best App to Buy, Sell, Save! Online Shopping',
 'Bible',
 'Shanghai Mahjong',
 'PayPal - Send and request money safely',
 'Pandora - Music & Radio',
 'PCalc - The Best Calculator',
 'Ms. PAC-MAN',
 'Solitaire by MobilityWare',
 'SCRABBLE Premium',
 'Google – Search made just for mobile',
 'Bank of America - Mobile Banking',
 'FreeCell',
 'TripAdvisor Hotels Flights Restaurants',
 'Facebook',
 'Yelp - Nearby Restaurants, Shopping & Services',
 'Shazam - Discover music, artists, videos & lyrics',
 'Crash Bandicoot Nitro Kart 3D',
 'iQuran',
 ':) Sudoku +',
 'Yahoo Sports - Teams, Scores, News & Highlights',
 'Mileage Log | Fahrtenbuch',
 'Cleartune - Chromatic Tuner',
 'Lifesum – Inspiring healthy lifestyle app',
 'Hangman.',
 'iTranslate - Language Translator & Dictionary',
 'TouchOSC',
 'RadarScope',
 'LinkedIn',
 'Period Tracker Deluxe',
 'Election 2016 Map',
 'Blackjack by Mo

In [17]:
#replace emojis with empty spaces(google_store).
google_store['App'] = google_store['App'].apply(lambda s: emoji.replace_emoji(s, '')).str.strip()

In [18]:
#remove all emojis and white spaces
list(google_store['App'])

['Photo Editor & Candy Camera & Grid & ScrapBook',
 'Coloring book moana',
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps',
 'Sketch - Draw & Paint',
 'Pixel Draw - Number Art Coloring Book',
 'Paper flowers instructions',
 'Smoke Effect Photo Maker - Smoke Editor',
 'Infinite Painter',
 'Garden Coloring Book',
 'Kids Paint Free - Drawing Fun',
 'Text on Photo - Fonteee',
 'Name Art Photo Editor - Focus n Filters',
 'Tattoo Name On My Photo Editor',
 'Mandala Coloring Book',
 '3D Color Pixel by Number - Sandbox Art Coloring',
 'Learn To Draw Kawaii Characters',
 'Photo Designer - Write your name with shapes',
 '350 Diy Room Decor Ideas',
 'FlipaClip - Cartoon animation',
 'ibis Paint X',
 'Logo Maker - Small Business',
 "Boys Photo Editor - Six Pack & Men's Suit",
 'Superheroes Wallpapers | 4K Backgrounds',
 'Mcqueen Coloring pages',
 'HD Mickey Minnie Wallpapers',
 'Harley Quinn wallpapers HD',
 'Colorfit - Drawing & Coloring',
 'Animated Photo Editor',
 'Pencil Sketch Drawing',

In [19]:
# Search for all english apps of app_store
english_pattern = r'[a-zA-Z0-9\s\.,;:!\?\(\)\[\]{}\-\'\"/_\\&\*\+\=%\$\#@<>~`\^|]+'
df_english = app_store[app_store['track_name'].str.contains(english_pattern, regex=True, na=False)]

In [20]:
#show all english apps of app_store
df_english

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,281656475,PAC-MAN Premium,100788224,USD,3.99,21292,26,4.0,4.5,6.3.5,4+,Games,38,5,10,1
1,281796108,Evernote - stay organized,158578688,USD,0.00,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
2,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.00,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1
3,282614216,"eBay: Best App to Buy, Sell, Save! Online Shop...",128512000,USD,0.00,262241,649,4.0,4.5,5.10.0,12+,Shopping,37,5,9,1
4,282935706,Bible,92774400,USD,0.00,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7192,1187617475,Kubik,126644224,USD,0.00,142,75,4.5,4.5,1.3,4+,Games,38,5,1,1
7193,1187682390,VR Roller-Coaster,120760320,USD,0.00,30,30,4.5,4.5,0.9,4+,Games,38,0,1,1
7194,1187779532,Bret Michaels Emojis + Lyric Keyboard,111322112,USD,1.99,15,0,4.5,0.0,1.0.2,9+,Utilities,37,1,1,1
7195,1187838770,VR Roller Coaster World - Virtual Reality,97235968,USD,0.00,85,32,4.5,4.5,1.0.15,12+,Games,38,0,2,1


In [21]:
# Search for all non english apps of app_store
df_non_english = app_store[~app_store['track_name'].str.contains(english_pattern, regex=True, na=False)].index

In [22]:
#show and delete non-english apps of app_store
#list(df_non_english['track_name'])
deleted_df_non_english =app_store.drop(df_non_english,inplace=True)

In [23]:
# Search for all english apps of google_store
english_pattern_g = r'[a-zA-Z0-9\s\.,;:!\?\(\)\[\]{}\-\'\"/_\\&\*\+\=%\$\#@<>~`\^|]+'
df_english_g = google_store[google_store['App'].str.contains(english_pattern_g, regex=True, na=False)]

In [24]:
#show english apps of google_store
df_english_g.shape

(9659, 13)

In [25]:
# Search for all non english apps of google_store
df_non_english_g = google_store[~google_store['App'].str.contains(english_pattern_g, regex=True, na=False)]
df_non_english_g

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver


In [26]:
#show non-english apps of google_store
#list(df_non_english_g['App'])

## TASK 5: Isolating the Free Apps

As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps, and we'll need to isolate only the free apps for our analysis. Below, we isolate the free apps for both our data sets.

In [27]:
# TO DO:
# filter out free apps(google_store).
free_apps_1 = google_store[google_store['Type'].str.contains('Free',na=False)]
free_apps_1

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [28]:
# filter out free apps(app_store).
free_apps_2 =  app_store[app_store['price']== 0.00]
free_apps_2

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
1,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
2,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.0,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1
3,282614216,"eBay: Best App to Buy, Sell, Save! Online Shop...",128512000,USD,0.0,262241,649,4.0,4.5,5.10.0,12+,Shopping,37,5,9,1
4,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
6,283646709,PayPal - Send and request money safely,227795968,USD,0.0,119487,879,4.0,4.5,6.12.0,4+,Finance,37,0,19,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7188,1186384912,Demolition Derby Virtual Reality (VR) Racing,168774656,USD,0.0,18,18,4.0,4.0,1.0.0,12+,Games,38,4,1,1
7192,1187617475,Kubik,126644224,USD,0.0,142,75,4.5,4.5,1.3,4+,Games,38,5,1,1
7193,1187682390,VR Roller-Coaster,120760320,USD,0.0,30,30,4.5,4.5,0.9,4+,Games,38,0,1,1
7195,1187838770,VR Roller Coaster World - Virtual Reality,97235968,USD,0.0,85,32,4.5,4.5,1.0.15,12+,Games,38,0,2,1


## TASK 6: Most Common Apps by Genre

### Part One

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the `prime_genre` column of the App Store data set, and the `Genres` and `Category` columns of the Google Play data set.


### Part Two

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function that we can use to display the percentages in a descending order

In [29]:
# TO DO:
#show dataset (app_store).
free_apps_2.head(2)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
1,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
2,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.0,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1


In [30]:
#display the top 10 common(count) apps by Genre for ios
free_apps_2_counts = free_apps_2['prime_genre'].value_counts(normalize=True).mul(100).round(1).astype(str) + '%'
free_apps_2_counts.head(10)

prime_genre
Games                55.4%
Entertainment         8.3%
Photo & Video         4.3%
Social Networking     3.6%
Education             3.4%
Shopping              3.0%
Utilities             2.8%
Lifestyle             2.2%
Sports                2.0%
Health & Fitness      2.0%
Name: proportion, dtype: object

In [31]:
#display the bottom 10  common(count) apps by Genre for ios
free_apps_2_counts.sort_values(ascending=True).head(10)

prime_genre
Medical         0.2%
Catalogs        0.2%
Navigation      0.5%
Business        0.5%
Reference       0.5%
Weather         0.8%
Food & Drink    1.0%
Book            1.2%
Travel          1.4%
News            1.5%
Name: proportion, dtype: object

In [32]:
#show dataset(google_app)
free_apps_1.head(2)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up


In [33]:
#display the top 10 common(count) apps by Genres for Android 
free_apps_1_counts = free_apps_1['Genres'].value_counts(normalize=True).mul(100).round(1).astype(str) + '%'
free_apps_1_counts.head(10)

Genres
Tools              8.4%
Entertainment      6.1%
Education          5.4%
Business           4.6%
Lifestyle          3.9%
Productivity       3.9%
Finance            3.7%
Medical            3.5%
Sports             3.4%
Personalization    3.3%
Name: proportion, dtype: object

In [34]:
#display the top 10 common(count) apps by Category for Android 
free_apps_1_counts =free_apps_1['Category'].value_counts(normalize=True).mul(100).round(1).astype(str) + '%'
free_apps_1_counts.head(10)

Category
FAMILY             18.5%
GAME                9.9%
TOOLS               8.4%
BUSINESS            4.6%
LIFESTYLE           3.9%
PRODUCTIVITY        3.9%
FINANCE             3.7%
MEDICAL             3.5%
SPORTS              3.4%
PERSONALIZATION     3.3%
Name: proportion, dtype: object

In [35]:
#display the bottom 10 common(count) apps by Genres for Android 
free_apps_1_counts.sort_values(ascending=True).head(10)

Category
BEAUTY                0.6%
COMICS                0.6%
PARENTING             0.7%
ART_AND_DESIGN        0.7%
EVENTS                0.7%
WEATHER               0.8%
HOUSE_AND_HOME        0.8%
AUTO_AND_VEHICLES     0.9%
LIBRARIES_AND_DEMO    0.9%
ENTERTAINMENT         1.1%
Name: proportion, dtype: object

In [36]:
#display the bottom 10 common(count) apps by Category for Android 
free_apps_1_counts.sort_values(ascending=True).head(10)

Category
BEAUTY                0.6%
COMICS                0.6%
PARENTING             0.7%
ART_AND_DESIGN        0.7%
EVENTS                0.7%
WEATHER               0.8%
HOUSE_AND_HOME        0.8%
AUTO_AND_VEHICLES     0.9%
LIBRARIES_AND_DEMO    0.9%
ENTERTAINMENT         1.1%
Name: proportion, dtype: object

## TASK 7: Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

Below, we calculate the average number of user ratings per app genre on the App Store:

In [37]:
# TO DO:
# popular apps (by number of installation) on the app store
grouped = free_apps_2.groupby('prime_genre')
average_user_rating = grouped['user_rating'].mean().round().sort_values(ascending=False)
average_user_rating

prime_genre
Music                4.0
Photo & Video        4.0
Education            4.0
Games                4.0
Health & Fitness     4.0
Shopping             4.0
Business             4.0
Productivity         4.0
Utilities            3.0
Travel               3.0
Sports               3.0
Social Networking    3.0
Reference            3.0
Weather              3.0
News                 3.0
Medical              3.0
Lifestyle            3.0
Food & Drink         3.0
Entertainment        3.0
Navigation           2.0
Finance              2.0
Catalogs             2.0
Book                 2.0
Name: user_rating, dtype: float64

## TASK 8: Most Popular Apps by Genre on Google Play

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [38]:
#remove(,)
free_apps_1 = free_apps_1[free_apps_1['Installs'].notnull()].copy()

In [39]:
free_apps_1['Installs'] =free_apps_1['Installs'].str.replace(',', '',regex=True)

In [40]:
# change value type to float
free_apps_1['Installs'] = free_apps_1['Installs'].str.replace('+', '').astype(float)
list(free_apps_1['Installs'])

[10000.0,
 500000.0,
 5000000.0,
 50000000.0,
 100000.0,
 50000.0,
 50000.0,
 1000000.0,
 1000000.0,
 10000.0,
 1000000.0,
 1000000.0,
 10000000.0,
 100000.0,
 100000.0,
 5000.0,
 500000.0,
 10000.0,
 5000000.0,
 10000000.0,
 100000.0,
 100000.0,
 500000.0,
 100000.0,
 50000.0,
 10000.0,
 500000.0,
 100000.0,
 10000.0,
 100000.0,
 100000.0,
 50000.0,
 100000.0,
 100000.0,
 10000.0,
 100000.0,
 500000.0,
 5000000.0,
 10000.0,
 500000.0,
 10000.0,
 100000.0,
 10000000.0,
 100000.0,
 10000.0,
 10000000.0,
 100000.0,
 100000.0,
 100000.0,
 100000.0,
 1000000.0,
 100000.0,
 1000000.0,
 100000.0,
 100000.0,
 100000.0,
 50000.0,
 100000.0,
 100000.0,
 100000.0,
 10000.0,
 100000.0,
 1000000.0,
 100000.0,
 100000.0,
 10000.0,
 50000.0,
 5000000.0,
 100000.0,
 5000000.0,
 5000000.0,
 500000.0,
 10000000.0,
 100000.0,
 500000.0,
 50000.0,
 100000.0,
 1000000.0,
 100000.0,
 1000000.0,
 50000.0,
 1000000.0,
 500000.0,
 100000.0,
 1000000.0,
 1000000.0,
 100000.0,
 100000.0,
 1000000.0,
 100000.0,


In [41]:
#shows popular apps (by number of installation) on the google play
grouped = free_apps_1.groupby('Genres')
average_installation = grouped['Installs'].mean().round().sort_values(ascending=False).head(10)
average_installation

Genres
Communication                     38322626.0
Adventure;Action & Adventure      35333333.0
Video Players & Editors           24790074.0
Social                            23253652.0
Arcade                            22888365.0
Casual                            19569222.0
Puzzle;Action & Adventure         18366667.0
Photography                       17737668.0
Educational;Action & Adventure    17016667.0
Productivity                      16738958.0
Name: Installs, dtype: float64