## Profitable App Profiles for the App Store and Google Play Markets

Our client builds Android and iOS mobile apps and makes our them available on Google Play and in the App Store.

They only build apps that are **free to download and install**, and their main source of revenue consists of **in-app ads**. This means that the number of users of their apps determines their revenue for any given app — the more users who see and engage with the ads, the better. 

**Our goal for this project is to analyze data to help our client understand what type of apps are likely to attract more users**.

### Data
As of **September 2018**, there were approximately **2 million iOS apps** available on the App Store, and **2.1 million Android apps** on Google Play. In this project we'll be using samples of these data:

* A Kaggle [dataset](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately **`10,000` Android apps from Google Play** which was collected in August 2018. You can also download the dataset directly from [amazon aws](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).

* A Kaggle [dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately **`7,000` iOS apps from the App Store** which was collected in July 2017. You can download the data set directly from [amazon aws](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

### Opening and Exploring the Data
#### Helper Function to load dataset 

In [1]:
def open_dataset(file_name):
    '''
    Parameter
    file_name: file name to open
    returns the opened data as list of list
    '''
    from csv import reader
    
    opened_file = open(file_name, encoding = 'UTF-8')
    read_file = reader(opened_file)
    data = list(read_file)
    return data

#### Helper Function to explore dataset

In [2]:
def explore_data(dataset, start, end, has_header = False):
    '''
    Parameters:
    dataset: which will be a list of lists
    start: integer, the start index to slice
    end: integer, the end index to slice
    has_header: Boolean, indicates if dataset has a header row
    '''
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(f'{row}\n')

    if has_header:
        print(f'Number of rows (data): {len(dataset[1:])}')
        print(f'Number of columns: {len(dataset[0])}')

#### Load the datasets

In [3]:
app_store = open_dataset('./data/AppleStore.csv')

In [4]:
explore_data(app_store, 0, 5, has_header = True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']

['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']

['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']

Number of rows (data): 7197
Number of columns: 16


In [5]:
google_play = open_dataset('./data/googleplaystore.csv')

In [6]:
explore_data(google_play, 0, 5, has_header = True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']

Number of rows (data): 10841
Number of columns: 13


### Deleting Wrong Data
We need to do the following:

* Detect **inaccurate data**, and correct or remove it.
* Detect **duplicate data**, and remove the duplicates.

The Google Play dataset has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and we can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row.

In [7]:
wrong_entry_idx = 10472
print(google_play[wrong_entry_idx - 1])
print('\n')
print(google_play[wrong_entry_idx])
print('\n')
print(google_play[wrong_entry_idx + 1])

['Jazz Wi-Fi', 'COMMUNICATION', '3.4', '49', '4.0M', '10,000+', 'Free', '0', 'Everyone', 'Communication', 'February 10, 2017', '0.1', '2.3 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [8]:
wrong_entry_idx = 10472
print(len(google_play[wrong_entry_idx - 1]))
print(len(google_play[wrong_entry_idx]))
print(len(google_play[wrong_entry_idx + 1]))

13
13
12


**The row at index 10473 is of length 12. That is `google_play[10473]`has a missing entry.**

### Find and delete all row with missing entry
#### Google Play

In [9]:
for row in google_play:
    if len(row) != len(google_play[0]):
        print(google_play.index(row))

10473


In [10]:
del google_play[10473]

for row in google_play:
    if len(row) != len(google_play[0]):
        print(google_play.index(row))

### App Store

In [11]:
for row in app_store:
    if len(row) != len(app_store[0]):
        del app_store[app_store.index(row)]

In [12]:
print(f'Number of data rows (google_play): {len(google_play[1:])}')
print(f'Number of data rows (app_store): {len(app_store[1:])}')

Number of data rows (google_play): 10840
Number of data rows (app_store): 7197


### Checking for Duplicate Entries
#### Google Store

In [13]:
duplicate_names = []
unique_names_google = []

for app in google_play:
    name = app[0]
    if name in unique_names_google:
        duplicate_names.append(name)
    else:
        unique_names_google.append(name)

print(f'Number of unique apps: {len(unique_names_google)}')
print(f'Number of duplicate apps: {len(duplicate_names)}\n')
print(f'Example of duplicate apps:\n{duplicate_names[:15]}')

Number of unique apps: 9660
Number of duplicate apps: 1181

Example of duplicate apps:
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [14]:
for app in google_play:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


#### The Approach...
Looking at the Instagram app, the main difference is on the fourth position (index 3) of each row, which corresponds to the number of reviews. The different numbers indicate the data was collected at different times. We'll only **keep the most recent row with the highest number of reviews** and remove the other entries for any given app.

### App Store

In [15]:
duplicate_names = []
unique_names_apple = []

for app in app_store:
    name = app[0]
    if name in unique_names_apple:
        duplicate_names.append(name)
    else:
        unique_names_apple.append(name)
        
print(f'Number of unique apps: {len(unique_names_apple)}')
print(f'Number of duplicate apps: {len(duplicate_names)}\n')
print(f'Example of duplicate apps:\n{duplicate_names[:15]}')

Number of unique apps: 7198
Number of duplicate apps: 0

Example of duplicate apps:
[]


### Removing Duplicate Entries
To remove the duplicates, we will do the following:

* Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.

* Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [16]:
reviews_max = {}
for app in google_play[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))        

9659


### Using the dictionary created above to remove the duplicate rows

In [17]:
google_clean = []   # list of list without duplicate
already_added = []

for app in google_play[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        google_clean.append(app)
        already_added.append(name)

print(len(google_clean))

9659


### Checking for Non-English Apps
The client usea English for the apps they develop at their company, and we'd like to analyze only the apps that are designed for an English-speaking audience. The **codes** corresponding to the characters commonly use in an English text are in the range **0 to 127**, according to the ASCII (American Standard Code for Information Interchange) system. If an app name contains a character that is **greater than 127**, then it probably means that the app has a non-English name.

#### Helper function to check for ASCII codes for English Languarge characters

In [18]:
def check_english_ascii(name):
    eng_codes = list(range(128))
    conditions = [(ord(char) in eng_codes) for char in name]
    return all(conditions)

In [19]:
print(check_english_ascii('Instagram'))
print(check_english_ascii('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_english_ascii('Docs To Go™ Free Office Suite'))
print(check_english_ascii('Instachat 😜'))

True
False
False
False


### Removing Non-English Apps
If we're going to use the function we've created, we'll lose useful data since many English apps (like `'Docs To Go™ Free Office Suite'` or `'Instachat 😜'`) will be incorrectly labeled as non-English. To minimize the impact of data loss, we'll only remove an app **if its name has more than three characters** with corresponding numbers falling outside the ASCII range. 

In [20]:
def check_three_non_ascii(name):
    eng_codes = list(range(128))
    conditions = [(ord(char) not in eng_codes) for char in name]
    if sum(conditions) > 3:
        return True
    return False

In [21]:
print(check_three_non_ascii('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_three_non_ascii('Docs To Go™ Free Office Suite'))
print(check_three_non_ascii('Instachat 😜'))

True
False
False


### Filter out non-English apps from both datasets
#### Google Play

In [22]:
print(f'Number of rows (unfiltered): {len(google_clean)}')

google_filtered = []
google_non_ascii = []
for app in google_clean:
    name = app[0]
    if check_three_non_ascii(name):
        google_non_ascii.append(app)
    else:
        google_filtered.append(app)
        
        
print(f'Number of rows (filtered): {len(google_filtered)}')
print(f'Number of non ascii row: {len(google_non_ascii)}')

Number of rows (unfiltered): 9659
Number of rows (filtered): 9614
Number of non ascii row: 45


#### Apply Store

In [23]:
print(f'Number of rows (unfiltered): {len(app_store[1:])}')

apple_filtered = []
apple_non_ascii = []
for app in app_store[1:]:
    name = app[0]
    if check_three_non_ascii(name):
        apple_non_ascii.append(app)
    else:
        apple_filtered.append(app)
        
print(f'Number of rows (filtered): {len(apple_filtered)}')
print(f'Number of non ascii row: {len(apple_non_ascii)}')

Number of rows (unfiltered): 7197
Number of rows (filtered): 7197
Number of non ascii row: 0


#### Examples of apps with non-english names

In [24]:
print(google_non_ascii[:3])

[['Flame - درب عقلك يوميا', 'EDUCATION', '4.6', '56065', '37M', '1,000,000+', 'Free', '0', 'Everyone', 'Education', 'July 26, 2018', '3.3', '4.1 and up'], ['သိင်္ Astrology - Min Thein Kha BayDin', 'LIFESTYLE', '4.7', '2225', '15M', '100,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'July 26, 2018', '4.2.1', '4.0.3 and up'], ['РИА Новости', 'NEWS_AND_MAGAZINES', '4.5', '44274', '8.0M', '1,000,000+', 'Free', '0', 'Everyone', 'News & Magazines', 'August 6, 2018', '4.0.6', '4.4 and up']]


### Isolating the Free Apps
Our client only builds apps that are free to download and install. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

#### Helper function to clean price

In [25]:
def clean_price(price):
    if '$' in price:
        price = float(price[1:])
    else:
        price = float(price)
    return price

In [26]:
print(clean_price('$0.99'))

0.99


#### Google Play

In [27]:
print(f'Number of rows (price unclean): {len(google_filtered)}')

google_free = []
google_not_free = []
for app in google_filtered:
    price = clean_price(app[7])
    if price > 0.0:
        google_not_free.append(app)
    else:
        google_free.append(app)
        
        
print(f'Number of rows (free apps): {len(google_free)}')
print(f'Number of rows (not free apps): {len(google_not_free)}')

Number of rows (price unclean): 9614
Number of rows (free apps): 8864
Number of rows (not free apps): 750


#### App Store

In [28]:
print(f'Number of rows (price unclean): {len(apple_filtered)}')

apple_free = []
apple_not_free = []
for app in apple_filtered:
    price = clean_price(app[4])
    if price > 0.0:
        apple_not_free.append(app)
    else:
        apple_free.append(app)
        
        
print(f'Number of rows (free apps): {len(apple_free)}')
print(f'Number of rows (not free apps): {len(apple_not_free)}')

Number of rows (price unclean): 7197
Number of rows (free apps): 4056
Number of rows (not free apps): 3141


## Most Common Apps by Genre: Part One

In [29]:
print(google_play[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [30]:
print(app_store[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## Most Common Apps by Genre: Part Two
We'll need to build a frequency table for the **prime_genre** column of the **App Store** data set, and for the **Genres** and **Category** columns of the **Google Play** data set.

#### Helper function to generate frequency table

In [31]:
def freq_table(data, idx):
    table = {}
    for row in data:
        cat = row[idx]
        if cat in table:
            table[cat] += 1
        else:
            table[cat] = 1
    table = {key: (value*100/len(data)) for key, value in table.items()}
    return table

In [32]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Frequency Table `Category` column (Google Play)

In [33]:
table_google_cat = freq_table(google_free, 1)
#print(table_google_cat)

In [34]:
display_table(google_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.700361010830325
MEDICAL : 3.5311371841155235
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.237815884476534
HEALTH_AND_FITNESS : 3.079873646209386
PHOTOGRAPHY : 2.9444945848375452
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768953
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418774
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075813
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 0

### Frequency Table `Genres` column (Google Play)

In [35]:
table_google_genre = freq_table(google_free, 9)
#print(table_google_genre)

In [36]:
display_table(google_free, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.700361010830325
Medical : 3.5311371841155235
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.237815884476534
Action : 3.1024368231046933
Health & Fitness : 3.079873646209386
Photography : 2.9444945848375452
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.041967509025271
Dating : 1.861462093862816
Arcade : 1.8501805054151625
Video Players & Editors : 1.7712093862815885
Casual : 1.759927797833935
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418774
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075813

### Frequency Table `'prime_genre'` column (App Store)

In [37]:
table_apple_prime_genre = freq_table(apple_free, 11)
#print(table_apple_prime_genre)

In [38]:
display_table(apple_free, 11)

Games : 55.645956607495066
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597632
Sports : 1.9477317554240632
Health & Fitness : 1.873767258382643
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


### Display Frequency Table (sorted)
#### Helper function

In [39]:
def display_table_sorted(data, idx):
    table = freq_table(data, idx)
    table = dict(sorted(table.items(), key=lambda item: item[1], reverse = True))
    for key, value in table.items():
        print(f'{key} : {value}')

In [40]:
display_table_sorted(apple_free, 11)

Games : 55.645956607495066
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597632
Sports : 1.9477317554240632
Health & Fitness : 1.873767258382643
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


### Most Common Apps by Genre: Part Three
#### Analyze the frequency table you generated for the `prime_genre` column of the App Store dataset
* What is the **most common genre**?: **Games with `55.65%`** 
* What is the **next most common?**: **Entertainment with `8.23%`**
* What other patterns do you see?: Photo & Video, Social Networking, Education
* What is the general impression?: Most of the apps are designed more for entertainment (games, photo and video, social networking, sports, music).
* Can you recommend an app profile for the App Store market based on this frequency table alone?: Based on the frequency table I would recommend **Gamming apps** for App Store.
* If there's a large number of apps for a particular genre, does that also imply that apps of that genre generally have a large number of users?: Not necessarily. Further analysis on the **number of users** and **number of downloads** is required.

#### Analyze the frequency table you generated for the Category and Genres column of the Google Play dataset.
#### Category
* What are the **most common genres**?: **FAMILY (`18.91%`)**
* What other patterns do you see?: **Games (`9.72%`), TOOLS (`8.46%`)** 
* Compare the patterns you see for the Google Play market with those you saw for the App Store market: **The patterns are very similar except for the fact that no app stands out as the most popular in Google Play as Games is in App Store**.
* Can you recommend an app profile based on what you found so far? **No convincing recommendation can be made! No app stands out as the most popular in Google Play as Games is in App Store**. 
* Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users?: **There is no decisive winner in this category!**

#### Genres
* What are the **most common genres**?: **Tools with `8.45%`**
* What other patterns do you see?: **Entertainment with `6.07%`, Education, Business with `<6%`** 
* Compare the patterns you see for the Google Play market with those you saw for the App Store market: **The patterns are very similar except for the fact that no app stands out as the most popular in Google Play as Games is in App Store**.
* Can you recommend an app profile based on what you found so far?: **No recommendation! No app stands out as the most popular in Google Play as Games is in App Store** 
* Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users?: **There is no decisive winner in this category!**

### Most Popular Apps by Genre on the App Store
Now, we'd like to determine the kind of apps with the most users. For the **Google Play** data set, we can find this information in the **`Installs`** column, but *this information is missing for the App Store* data set. As a workaround, we'll take the **total number of user ratings** as a proxy, which we can find in the **`rating_count_tot`** app.

In [47]:
table_apple_prime_genre = freq_table(apple_free, 11)
table = {}
for genre in table_apple_prime_genre:
    total = 0
    len_genre = 0
    for app in apple_free:
        genre_app = app[11]
        if genre_app == genre:
            rating = float(app[7])
            total += rating
            len_genre += 1
    avg_user_rating = total / len_genre
    table[genre] = avg_user_rating
    
print('---------------------------------------')
print('App Genre     :    Average User Rating ')
print('---------------------------------------')
table = dict(sorted(table.items(), key=lambda item: item[1], reverse = True))
for key, value in table.items():
    print(f'{key} : {value}')

---------------------------------------
App Genre     :    Average User Rating 
---------------------------------------
Productivity : 3.9596774193548385
Music : 3.9402985074626864
Photo & Video : 3.7934131736526946
Health & Fitness : 3.5789473684210527
Shopping : 3.5330578512396693
Games : 3.5285777580859548
Business : 3.5
Education : 3.484848484848485
Utilities : 3.4541284403669725
Travel : 3.375
Reference : 3.3
Weather : 3.2580645161290325
Entertainment : 3.1482035928143715
Food & Drink : 3.0348837209302326
Social Networking : 2.9965034965034967
Sports : 2.9177215189873418
News : 2.8793103448275863
Medical : 2.875
Lifestyle : 2.5904255319148937
Finance : 2.2202380952380953
Navigation : 2.2
Catalogs : 1.8333333333333333
Book : 1.5984848484848484


#### Analyze the results and try to come up with at least one app profile recommendation for the App Store.

Top 3 genres in App Store are:

    ----------------------------------------------
    | GENRES              |  AVERAGE USER RATING |
    ----------------------------------------------
    | Productivity         |      3.96           |
    ----------------------------------------------
    | Music                |      3.94           |
    ----------------------------------------------
    | Photo & Video        |      3.79           |
    ----------------------------------------------
    
These are our **top 3 genres** recommendations for **APP Store**. 

### Most Popular Apps by Genre on Google Play
We'll consider that an app with **100,000+** installs has **100,000** installs, and an app with **1,000,000+** installs has **1,000,000** installs, and so on.

In [52]:
table_google_cat = freq_table(google_free, 1)

table = {}
for category in table_google_cat:
    total = 0
    len_category = 0
    for app in google_free:
        category_app = app[1]
        if category_app == category:
            num_install = float(app[5].replace('+', '').replace(',', ''))
            total += num_install
            len_category += 1
    avg_installs = total / len_category
    table[category] = avg_installs
    
print('---------------------------------------------')
print('App Genre     :    Average Number of Installs')
print('---------------------------------------------')
table = dict(sorted(table.items(), key=lambda item: item[1], reverse = True))
for key, value in table.items():
    print(f'{key} : {value}')

---------------------------------------------
App Genre     :    Average Number of Installs
---------------------------------------------
COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 

#### Analyze the results, and try to come up with at least one app profile recommendation for Google Play.

Top 3 genres in Google Play are:

    -----------------------------------------------
    | GENRES         | AVERAGE NUMBER OF INSTALLS |
    -----------------------------------------------
    | COMMUNICATION  |          38,456,119        |
    -----------------------------------------------
    | VIDEO_PLAYERS  |          24,727,872        |
    -----------------------------------------------
    | SOCIAL         |          23,253,652        |
    -----------------------------------------------
    
These are our **top 3 genres** recommendations for **Google Play**. 
