# About this project

This is my first project in Jupyter. I am pretending that I work as a data analyst for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of **in-app ads.** This means our revenue for any given app is mostly influenced by the **number of users who use our app** — the more users that see and engage with the ads, the better. 

**Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.**

* Google Play Store Data Set [Documentation](https://www.kaggle.com/lava18/google-play-store-apps)
* Apple App Store Data Set [Documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)
* [solution set](https://github.com/dataquestio/solutions/blob/master/Mission350Solutions.ipynb)


In [1]:
#read apple store data
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apple_apps_data = list(read_file)
apple_apps_header = apple_apps_data[0]
apple_apps_data = apple_apps_data[1:]
print(apple_apps_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [2]:
#read Google store data
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
google_apps_data = list(read_file)
google_apps_header = google_apps_data[0]
google_apps_data = google_apps_data[1:]
print(google_apps_header)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


# Explore data function

* Takes in four parameters
 *  `dataset`, which is expected to be a list of lists
 * `start` and `end`, which are both expected to be integers and represent the starting and the ending indices of a slice from the data set
 * `rows_and_colums`, which is expected to be a boolean and has a `False` as a default argument
* Slices the data set using `dataset[start:end]`
* for each row printed, it adds a new line use `print('\n')`
* Prints the number of rows and columns if `rows_and_columns` is `True`

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
explore_data(apple_apps_data, 1, 4, True)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


In [5]:
explore_data(google_apps_data, 1, 4, True)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


# Data Cleaning
* Detect inaccurate data and correct or remove it
* Detect duplicate data and remove the duplicates
* Remove non-english apps
* Remove paid apps

In [6]:
print(google_apps_data[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
del google_apps_data[10472]

In [8]:
#Removing Duplicates ios App Store
ios_unique_apps = [] 
ios_duplicate_apps = [] 

for app in apple_apps_data: 
    app_name = app[0] 

    if app_name not in ios_unique_apps:
        ios_unique_apps.append(app_name)
    else:
        ios_duplicate_apps.append(app)
        
print(ios_duplicate_apps)

[]


In [9]:
#Removing Duplicates Google Play Store
google_unique_apps = [] 
google_duplicate_apps = [] 

for app in google_apps_data: 
    app_name = app[0] 

    if app_name not in google_unique_apps:
        google_unique_apps.append(app_name)
    else:
        google_duplicate_apps.append(app)
        
print('Number of duplicate apps:', len(google_duplicate_apps))
print('\n')
print('Examples of duplicate apps:', google_duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: [['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'], ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device'], ['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up'], ['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up'], ['join.me - Simple Meetings', 'BUSINESS', '4.0', '6989', 'Varies with device', '1,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 16, 2018', '4.3.0.508', '4.4 and up'], ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', 

In [10]:
#Removing Duplicates from Google Play Store
#Using number of reviews to determine most recent entry and deleting all older entries
#the Reviews columns is at Index 3

reviews_max = {}

for app in google_apps_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    else: 
        if name not in reviews_max:
            reviews_max[name] = n_reviews
            
print(len(reviews_max))
            

9659


In [11]:
#Removing duplicate rows from Android Play store
android_clean = []
already_added = []

for app in google_apps_data:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        #name not in already added to account for same duplicate entries with identical number of reviews
        android_clean.append(app)
        already_added.append(name)
        
print(len(android_clean))
        

9659


# Removing non-english apps
* Going to build a function to remove any non-english apps
* Looping through each data set to remove non-english apps

In [12]:
def check_for_english(string):
    non_eng_count = 0
    for character in string:
        if ord(character) > 127:
            non_eng_count += 1
    if non_eng_count > 3:
        return False
    return True
            
    

In [13]:
print(check_for_english('instagram'))
print(check_for_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_for_english('Docs To Go™ Free Office Suite'))
print(check_for_english('Instachat 😜'))

True
False
True
True


* going to create two new lists
    1. iOS english only | will use `apple_apps_data`
    2. Google play english only | will use `Android_clean`
    
* will loop through each data set, check the name in Index 0, run the name through the `check_for_enlish` function, and if it returns true, we'll append the row to our english only list.

In [22]:
apple_apps_eng_only_data = []
google_apps_eng_only_data = []

for row in apple_apps_data:
    if check_for_english(row[1]) == True:
        apple_apps_eng_only_data.append(row)

for row in android_clean:
    if check_for_english(row[0]) == True:
        google_apps_eng_only_data.append(row)
        
print('Apple Unique English Only Apps')
print('\n')
explore_data(apple_apps_eng_only_data, 1, 4, True)
print('\n')
print('Google Unique English Only Apps')
print('\n')
explore_data(google_apps_eng_only_data, 1, 4, True)

Apple Unique English Only Apps


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 6183
Number of columns: 16


Google Unique English Only Apps


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Fre

# Removing Paid Apps

* iOS floats their price in Index 4
* Android marks their apps as "Free" in Index 6


In [38]:
google_apps_paid = []
apple_apps_paid = []
google_apps_free = []
apple_apps_free = []

for row in google_apps_eng_only_data:
    if row[6] == 'Free':
        google_apps_free.append(row)
    else:
        google_apps_paid.append(row)

for row in apple_apps_eng_only_data:
    if float(row[4]) == 0:
        apple_apps_free.append(row)
    else:
        apple_apps_paid.append(row)

print('Google Free')
print('\n')
explore_data(google_apps_free, 1, 4, True)
print('Apple Free')
print('\n')
explore_data(apple_apps_free, 1, 4, True)
print('Google Paid')
print('\n')
explore_data(google_apps_paid, 1, 4, True)
print('Apple Paid')
print('\n')
explore_data(apple_apps_paid, 1, 4, True)


Google Free


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 8863
Number of columns: 13
Apple Free


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '

# Beginning Analysis

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.

**Google**
'App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'

* Category: 1
* Rating: 2
* Installs: 5
* Content Rating:8
* Genres:9

**Apple**

'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'

* Rating count: 5
* User rating: 7
* Prime genre: 11



In [58]:
google_categories ={}
total_apps = 0

for row in google_apps_free:
    total_apps +=1
    category = row[1]
    if category not in google_categories:
        google_categories[category] = 1
    else:
        google_categories[category] += 1

    google_cat_percentages = {}
for key in google_categories:
    percentage = (google_categories[key] / total_apps) * 100
    google_cat_percentages[key] = round(percentage, 2)
        
print(google_cat_percentages)
    

{'ART_AND_DESIGN': 0.64, 'AUTO_AND_VEHICLES': 0.93, 'BEAUTY': 0.6, 'BOOKS_AND_REFERENCE': 2.14, 'BUSINESS': 4.59, 'COMICS': 0.62, 'COMMUNICATION': 3.24, 'DATING': 1.86, 'EDUCATION': 1.16, 'ENTERTAINMENT': 0.96, 'EVENTS': 0.71, 'FINANCE': 3.7, 'FOOD_AND_DRINK': 1.24, 'HEALTH_AND_FITNESS': 3.08, 'HOUSE_AND_HOME': 0.82, 'LIBRARIES_AND_DEMO': 0.94, 'LIFESTYLE': 3.9, 'GAME': 9.73, 'FAMILY': 18.9, 'MEDICAL': 3.53, 'SOCIAL': 2.66, 'SHOPPING': 2.25, 'PHOTOGRAPHY': 2.94, 'SPORTS': 3.4, 'TRAVEL_AND_LOCAL': 2.34, 'TOOLS': 8.46, 'PERSONALIZATION': 3.32, 'PRODUCTIVITY': 3.89, 'PARENTING': 0.65, 'WEATHER': 0.8, 'VIDEO_PLAYERS': 1.79, 'NEWS_AND_MAGAZINES': 2.8, 'MAPS_AND_NAVIGATION': 1.4}


In [75]:
def explore_column(dataset, index_col):
    temp_dict = {}
    temp_dict_perc = {}
    total = 0

    for row in dataset:
        total += 1
        column = row[index_col]
        if column not in temp_dict:
            temp_dict[column] = 1
        else:
            temp_dict[column] += 1
            
    for key in temp_dict:
        percentage = (temp_dict[key]/ total) * 100
        temp_dict_perc[key] = round(percentage, 2)

    return temp_dict_perc
    

In [76]:
explore_column(google_apps_free, 9) #Genres

{'Art & Design': 0.6,
 'Art & Design;Creativity': 0.07,
 'Auto & Vehicles': 0.93,
 'Beauty': 0.6,
 'Books & Reference': 2.14,
 'Business': 4.59,
 'Comics': 0.61,
 'Comics;Creativity': 0.01,
 'Communication': 3.24,
 'Dating': 1.86,
 'Education': 5.35,
 'Education;Creativity': 0.05,
 'Education;Education': 0.34,
 'Education;Pretend Play': 0.06,
 'Education;Brain Games': 0.03,
 'Entertainment': 6.07,
 'Entertainment;Brain Games': 0.08,
 'Entertainment;Creativity': 0.03,
 'Entertainment;Music & Video': 0.17,
 'Events': 0.71,
 'Finance': 3.7,
 'Food & Drink': 1.24,
 'Health & Fitness': 3.08,
 'House & Home': 0.82,
 'Libraries & Demo': 0.94,
 'Lifestyle': 3.89,
 'Lifestyle;Pretend Play': 0.01,
 'Card': 0.45,
 'Arcade': 1.85,
 'Puzzle': 1.13,
 'Racing': 0.99,
 'Sports': 3.46,
 'Casual': 1.76,
 'Simulation': 2.04,
 'Adventure': 0.68,
 'Trivia': 0.42,
 'Action': 3.1,
 'Word': 0.26,
 'Role Playing': 0.94,
 'Strategy': 0.9,
 'Board': 0.38,
 'Music': 0.2,
 'Action;Action & Adventure': 0.1,
 'Casua

In [63]:
explore_column(apple_apps_paid, 11)

{'Games': 51.27, 'Entertainment': 6.59, 'Music': 2.4, 'Photo & Video': 6.11, 'Health & Fitness': 3.38, 'Business': 1.22, 'Weather': 1.38, 'Utilities': 4.46, 'News': 0.47, 'Education': 9.86, 'Reference': 1.18, 'Productivity': 3.78, 'Navigation': 0.74, 'Lifestyle': 1.62, 'Book': 1.38, 'Finance': 0.44, 'Sports': 1.18, 'Medical': 0.51, 'Travel': 0.68, 'Shopping': 0.03, 'Food & Drink': 0.61, 'Social Networking': 0.68, 'Catalogs': 0.03}


In [77]:
def display_table(dataset, index):
    table = explore_column(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [78]:
#apple Prime Genres
display_table(apple_apps_paid, 11)

Games : 51.27
Education : 9.86
Entertainment : 6.59
Photo & Video : 6.11
Utilities : 4.46
Productivity : 3.78
Health & Fitness : 3.38
Music : 2.4
Lifestyle : 1.62
Weather : 1.38
Book : 1.38
Business : 1.22
Sports : 1.18
Reference : 1.18
Navigation : 0.74
Travel : 0.68
Social Networking : 0.68
Food & Drink : 0.61
Medical : 0.51
News : 0.47
Finance : 0.44
Shopping : 0.03
Catalogs : 0.03


# Apple Analysis
* Gaming is over 50% of apps that are paid for in the app store
* Professional comprises: Education, Utlities, Productivity ~18%
* Entertainment at 7%
* Hobby comprises: Photo & Video, Music, Lifestyle: 10%
* Health: ~4% 

## Additional Analysis Needed

* What avg price are people willing to pay based on the app type?
* How many downloads do these apps get?

In [79]:
#Google Categories
display_table(google_apps_paid, 1)

FAMILY : 24.37
MEDICAL : 10.92
GAME : 10.92
PERSONALIZATION : 10.79
TOOLS : 10.39
PRODUCTIVITY : 3.73
BOOKS_AND_REFERENCE : 3.73
COMMUNICATION : 3.6
SPORTS : 3.2
PHOTOGRAPHY : 2.53
LIFESTYLE : 2.4
FINANCE : 2.26
HEALTH_AND_FITNESS : 2.0
TRAVEL_AND_LOCAL : 1.6
BUSINESS : 1.6
WEATHER : 1.07
MAPS_AND_NAVIGATION : 0.67
DATING : 0.67
VIDEO_PLAYERS : 0.53
SOCIAL : 0.4
EDUCATION : 0.4
ART_AND_DESIGN : 0.4
SHOPPING : 0.27
PARENTING : 0.27
NEWS_AND_MAGAZINES : 0.27
FOOD_AND_DRINK : 0.27
ENTERTAINMENT : 0.27
AUTO_AND_VEHICLES : 0.27
LIBRARIES_AND_DEMO : 0.13
EVENTS : 0.13


# Google Category Analysis
* Family at 25%
* Professional comprises: Tools, Productivity, Bookes ~18%
* Health + Medical: ~13% 
* Personalization is its own category: 11%
* Entertainment: Gaming at 7%
* Hobby comprises: Photo, Lifestyle: 5%


## Additional Analysis Needed

* What avg price are people willing to pay based on the app type?
* How many downloads do these apps get?

In [80]:
#Google Genres
display_table(google_apps_paid, 9)

Medical : 10.92
Personalization : 10.79
Tools : 10.39
Education : 3.86
Productivity : 3.73
Books & Reference : 3.73
Communication : 3.6
Sports : 3.2
Action : 3.2
Role Playing : 2.8
Arcade : 2.66
Puzzle : 2.53
Photography : 2.53
Entertainment : 2.53
Lifestyle : 2.4
Finance : 2.26
Health & Fitness : 2.0
Strategy : 1.86
Travel & Local : 1.6
Education;Pretend Play : 1.6
Business : 1.6
Adventure : 1.6
Simulation : 1.2
Casual : 1.2
Weather : 1.07
Board : 1.07
Card : 0.93
Board;Brain Games : 0.93
Maps & Navigation : 0.67
Educational;Pretend Play : 0.67
Education;Education : 0.67
Dating : 0.67
Video Players & Editors : 0.53
Educational : 0.53
Casual;Pretend Play : 0.53
Social : 0.4
Racing : 0.4
Educational;Education : 0.4
Art & Design : 0.4
Arcade;Action & Adventure : 0.4
Action;Action & Adventure : 0.4
Sports;Action & Adventure : 0.27
Simulation;Education : 0.27
Shopping : 0.27
Puzzle;Brain Games : 0.27
Parenting : 0.27
News & Magazines : 0.27
Food & Drink : 0.27
Educational;Creativity : 0.27

# Google Prime Genre Analysis
* Long tail - need some way to further group
* Health + Medical: ~13% 
* Personalization is its own category: 11%
* Entertainment: Gaming at 7%
* Hobby comprises: Photo, Lifestyle: 5%


## Additional Analysis Needed

* What avg price are people willing to pay based on the app type?
* How many downloads do these apps get?

# Approximate number of installs per genre
* Isolate the apps of each genre.
* Sum up the user ratings for the apps of that genre.
* Divide the sum by the number of apps belonging to that genre (not by the total number of apps).


In [106]:
# My way of doing it without a nested loop
apple_app_ratings = {}
genre_count = {}
total_apple_apps = 0
for app in apple_apps_paid:
    genre = app[11]
    if genre not in apple_app_ratings:
        apple_app_ratings[genre] = float(app[5])
        genre_count[genre] = 1
    else:
        apple_app_ratings[genre] += float(app[5])
        genre_count[genre] += 1
        

avg_apple_ratings = {}

for genre in apple_app_ratings:
    avg_apple_ratings[genre] = round(apple_app_ratings[genre] / genre_count[genre], 2)
    
print(avg_apple_ratings)
    
    

{'Games': 6695.86, 'Entertainment': 2131.51, 'Music': 2759.2, 'Photo & Video': 2531.52, 'Health & Fitness': 2679.85, 'Business': 4043.47, 'Weather': 3248.41, 'Utilities': 1326.68, 'News': 3872.36, 'Education': 640.97, 'Reference': 2400.37, 'Productivity': 2247.93, 'Navigation': 1174.59, 'Lifestyle': 902.77, 'Book': 320.41, 'Finance': 882.85, 'Sports': 253.74, 'Medical': 663.73, 'Travel': 602.95, 'Shopping': 2722.0, 'Food & Drink': 579.5, 'Social Networking': 393.0, 'Catalogs': 1309.0}


In [110]:
#With a nested loop

unique_genres = explore_column(apple_apps_paid, 11)
avg_apple_ratings_2 = {}

for genre in unique_genres:
    total = 0 # will store the sum of user ratings for each genre
    len_genre = 0 # will store the number of appes for each genre
    for app in apple_apps_paid:
        genre_app = app[11]
        if genre_app == genre:
            total += float(app[5])
            len_genre += 1
    avg_apple_ratings_2[genre] = round(total / len_genre, 2)
    
print(avg_apple_ratings_2)         


{'Games': 6695.86, 'Entertainment': 2131.51, 'Music': 2759.2, 'Photo & Video': 2531.52, 'Health & Fitness': 2679.85, 'Business': 4043.47, 'Weather': 3248.41, 'Utilities': 1326.68, 'News': 3872.36, 'Education': 640.97, 'Reference': 2400.37, 'Productivity': 2247.93, 'Navigation': 1174.59, 'Lifestyle': 902.77, 'Book': 320.41, 'Finance': 882.85, 'Sports': 253.74, 'Medical': 663.73, 'Travel': 602.95, 'Shopping': 2722.0, 'Food & Drink': 579.5, 'Social Networking': 393.0, 'Catalogs': 1309.0}


In [111]:
def dictionary_table(dictionary):
    table_display = []
    for key in dictionary:
        key_val_as_tuple = (dictionary[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [112]:
dictionary_table(avg_apple_ratings)

Games : 6695.86
Business : 4043.47
News : 3872.36
Weather : 3248.41
Music : 2759.2
Shopping : 2722.0
Health & Fitness : 2679.85
Photo & Video : 2531.52
Reference : 2400.37
Productivity : 2247.93
Entertainment : 2131.51
Utilities : 1326.68
Catalogs : 1309.0
Navigation : 1174.59
Lifestyle : 902.77
Finance : 882.85
Medical : 663.73
Education : 640.97
Travel : 602.95
Food & Drink : 579.5
Social Networking : 393.0
Book : 320.41
Sports : 253.74


# Analysis
* Business does not have a lot of apps on the app store as an overall percentage of paid, but it clearly does have a lot of installs. 
* Given that business generally has a higher willingness to pay, this seems like an interesting opportunity to explore further

In [117]:
unique_genres_google_play = explore_column(google_apps_paid, 1)
avg_google_play_installs = {}

for category in unique_genres_google_play:
    total = 0
    len_category = 0
    
    for app in google_apps_paid:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace(',','')
            installs = installs.replace('+', '')
            installs = float(installs)
            total += installs
            len_category += 1
            
    avg_google_play_installs[category] = round(total / len_category, 0)
    
dictionary_table(avg_google_play_installs)

GAME : 256097.0
FAMILY : 115567.0
WEATHER : 101500.0
ENTERTAINMENT : 100000.0
PHOTOGRAPHY : 98881.0
LIFESTYLE : 65506.0
SPORTS : 51826.0
PRODUCTIVITY : 50431.0
COMMUNICATION : 50372.0
PERSONALIZATION : 40232.0
EDUCATION : 34000.0
HEALTH_AND_FITNESS : 31607.0
FOOD_AND_DRINK : 30000.0
PARENTING : 25050.0
AUTO_AND_VEHICLES : 25025.0
MAPS_AND_NAVIGATION : 24220.0
TOOLS : 22147.0
VIDEO_PLAYERS : 17750.0
BUSINESS : 17731.0
TRAVEL_AND_LOCAL : 15255.0
FINANCE : 10918.0
MEDICAL : 6838.0
ART_AND_DESIGN : 5333.0
SHOPPING : 5050.0
NEWS_AND_MAGAZINES : 2750.0
DATING : 2070.0
SOCIAL : 2000.0
BOOKS_AND_REFERENCE : 833.0
LIBRARIES_AND_DEMO : 100.0
EVENTS : 1.0
