#### Introduction
We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better. 

#### Anticipated Outcome
Our goal is to analyze data to help our developers understand what type of apps are likely to attract more users.

In [1]:
# Open csv files and save each as a list of lists

def open_dataset(file_name):
    
    opened_file = open(file_name)    
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    return data

apple_data=open_dataset('resources/AppleStore.csv')
google_data=open_dataset('resources/googleplaystore.csv')

For additional documention information, use the following links:
1. Google dataset: [link](https://www.kaggle.com/lava18/google-play-store-apps)
2. Apple dataset: [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

In [2]:
# function slices dataset at designated indices to allow for exploration

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
# Print header and first few rows for Apple dataset, including number of rows and columns
apple_header=apple_data[0]
explore_apple=explore_data(apple_data,1, 6, True)       

print("Header")
print(apple_header)
print(explore_apple)

['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0', '2161558', '1289', '4.5', '4', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0', '1724546', '3842', '4.5', '4', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0', '1126879', '3594', '4', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7198
Number of columns: 16
Header
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
None


In [4]:
# Print header and first few rows for Google dataset, including number of rows and columns

google_header=google_data[0]
explore_google=explore_data(google_data,1, 3, True)       

print("Header")
print(google_header)
print(explore_google)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', '15-Jan-18', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13
Header
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
None


## Deleting Wrong Data

The Google Play data set has a dedicated discussion section, and we can see that one of the discussions outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [5]:
# Check for missing rating from google_data
print(google_data[10473])
print('\n')
print(google_header)
print('\n')
print(google_data[1:3])

# delete row with missing data (only run this code once, or it will continue to delete row)
del google_data[10473]

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', '11-Feb-18', '1.0.19', '4.0 and up', '']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', '15-Jan-18', '2.0.0', '4.0.3 and up']]


In [6]:
# Check that row was deleted
len(google_data)

10841

## Romoving Duplicate Entries
### Part One
#### Duplicate entries should not be included in analysis. The following steps were taken to identify duplicate entries:
1. Loop through each dataset
2. Check for duplicate app names
3. Check to see if the app name has been added to the unique apps list. If not, append the app name. If it has been added, add the app name to a duplicate apps list.
4. Check the length of each list, and display several examples of duplicate app names.

In [7]:
# Check for duplicate apps in Apple data
ios_unique_apps = [] 
ios_duplicate_apps = [] 

for app in apple_data: 
    app_name = app[1] 

    if app_name not in ios_unique_apps:
        ios_unique_apps.append(app_name)
    else:
        ios_duplicate_apps.append(app_name)
        
print('unique apps:', len(ios_unique_apps))
print('duplicate apps: ', len(ios_duplicate_apps))
print('Names of duplicates:', ios_duplicate_apps)

unique apps: 7196
duplicate apps:  2
Names of duplicates: ['Mannequin Challenge', 'VR Roller Coaster']


In [8]:
# Check for duplicate apps in Google data
google_unique_apps = [] 
google_duplicate_apps = [] 

for app in google_data: 
    app_name = app[0] 

    if app_name not in google_unique_apps:
        google_unique_apps.append(app_name)
    else:
        google_duplicate_apps.append(app_name)
        
print('Unique apps:',len(google_unique_apps)-1) 
print('Duplicate apps: ', len(google_duplicate_apps))
print('Examples of duplicates:', google_duplicate_apps[:10])

Unique apps: 9659
Duplicate apps:  1181
Examples of duplicates: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


### Part Two
To choose which duplicates to remove, we will select the app with the highest number of reviews. A greater number of reviews indicates that this is the most recently updated entry of the app. To complete this process, we will:
1. Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
2. Use the information stored in the and create a new data set, which will have only one entry per app.

In [9]:
# create a dictionary for Google data with unique app values
google_reviews_max = {}
ios_reviews_max = {}

for row in google_data[1:]:
    name= row[0]
    n_reviews = float(row[3])
    if name in google_reviews_max and google_reviews_max[name] < n_reviews:
        google_reviews_max.update({name:n_reviews})
    if name not in google_reviews_max:
        google_reviews_max[name] = n_reviews
        
for row in apple_data[1:]:
    name= row[1]
    n_reviews = float(row[5])
    if name in ios_reviews_max and ios_reviews_max[name] < n_reviews:
        ios_reviews_max.update({name:n_reviews})
    if name not in ios_reviews_max:
        ios_reviews_max[name] = n_reviews        
        
print('Android length: ', len(google_reviews_max))
print('iOS length: ', len(ios_reviews_max))
# print(google_reviews_max)


Android length:  9659
iOS length:  7195


Now, let's use the reviews_max dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:

- We start by initializing two empty lists, android_clean and already_added.
- We loop through the android data set, and for every iteration:
- We isolate the name of the app and the number of reviews.
- We add the current row (row) to the android_clean list, and the app name (name) to the already_added list if:
- The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
- The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [10]:
#create new dataset
android_clean=[]

# store app names
already_added=[]

for row in google_data[1:]:
    name = row[0]
    n_reviews=float(row[3])    
    if (n_reviews == google_reviews_max[name]) and (name not in already_added):
        # Append the entire row to the android_clean list 
        android_clean.append(row)
        # Append the name of the app name to the already_added list 
        already_added.append(name)
        
#Check to make sure that the # of rows in the list is correct
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', '8-Jun-18', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


In [11]:
#create new dataset
ios_clean=[]

# store app names
ios_already_added=[]

for row in apple_data[1:]:
    name = row[1]
    n_reviews=float(row[5])    
    if (n_reviews == ios_reviews_max[name]) and (name not in ios_already_added):
        # Append the entire row to the android_clean list 
        ios_clean.append(row)
        # Append the name of the app name to the already_added list 
        ios_already_added.append(name)
        
#Check to make sure that the # of rows in the list is correct
explore_data(ios_clean, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0', '2161558', '1289', '4.5', '4', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7195
Number of columns: 16


## Removing Non-English Apps
### Part One

In [12]:
# Create a function to iterate over and input string. 
# For each iteration check whether the number associated with the character is greater than 127(English ASCII characters). 
# If all characters are <127, return true, otherwise, return False
def apps(name):
    for character in name:
        if ord(character) > 127:
            return False 
    return True

# Check results with a few examples
print("1: ", apps('Instachat'))
print("2: ", apps('Docs To Go™ Free Office Suite'))
print("3: ", apps('Instachat 😜'))
print("4: ", apps('爱奇艺PPS -《欢乐颂2》电视剧热播'))

1:  True
2:  False
3:  False
4:  False


In [13]:
#redefine function to include up to 3 non-ASCII characters
def ascii(name):
    non_ascii= 0
    for character in name:
        if ord(character) > 127:
            non_ascii +=1
    if non_ascii>3:
        return False 
    else:
        return True
print("1: ", ascii('Instachat'))
print("2: ", ascii('Docs To Go™ Free Office Suite'))
print("3: ", ascii('Instachat 😜'))
print("4: ", ascii('爱奇艺PPS -《欢乐颂2》电视剧热播'))

1:  True
2:  True
3:  True
4:  False


In [14]:
# Collect English apps in new list for each dataset

android_english=[]

for app in android_clean:
    name= app[0]
    if ascii(name):
        android_english.append(app)
        
ios_english=[]

for app in ios_clean:
    name= app[1]
    if ascii(name):
        ios_english.append(app)        

print(explore_data(android_english, 0, 3, True))
print(explore_data(ios_english, 0, 3, True))


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', '8-Jun-18', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
None
['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0', '2161558', '1289', '4.5', '4', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18'

## Separating Free/Paid Apps

In [15]:
# Remove $ for "Price" column
android_english = [[item.replace('$', '') for item in lst] for lst in android_english]
ios_english = [[item.replace('$', '') for item in lst] for lst in ios_english]

# Check that "$" was removed for android
# for app in android_english:
#     name=app[6]
#     if name == 'Paid':
#         print(app)
 
# Check that "$" was removed for ios
for app in ios_english:
    name=float(app[4])
    if name > 0:
        print(app)       

['362949845', 'Fruit Ninja Classic', '104590336', 'USD', '1.99', '698516', '132', '4.5', '4', '2.3.9', '4+', 'Games', '38', '5', '13', '1']
['500116670', 'Clear Vision (17+)', '37879808', 'USD', '0.99', '541693', '69225', '4.5', '4.5', '1.1.3', '17+', 'Games', '43', '5', '1', '1']
['479516143', 'Minecraft: Pocket Edition', '147787776', 'USD', '6.99', '522012', '1148', '4.5', '4.5', '1.1', '9+', 'Games', '37', '1', '11', '1']
['350642635', 'Plants vs. Zombies', '105379840', 'USD', '0.99', '426463', '680', '5', '4', '1.9.13', '9+', 'Games', '38', '0', '5', '1']
['307727765', 'Doodle Jump', '48741376', 'USD', '0.99', '395261', '88', '4.5', '4.5', '3.17.6', '4+', 'Games', '37', '0', '1', '1']
['488627858', 'Draw Something', '84379648', 'USD', '2.99', '360974', '3', '4.5', '4.5', '3.0.17', '4+', 'Games', '38', '5', '1', '1']
['387428400', 'Infinity Blade', '624107810', 'USD', '0.99', '326482', '177050', '5', '5', '1.4.1', '12+', 'Games', '43', '5', '13', '1']
['625334537', 'Geometry Dash', 

['1035690802', 'Titan Quest', '1323163648', 'USD', '7.99', '1350', '388', '4', '4.5', '1.0.4', '9+', 'Games', '25', '5', '7', '1']
['515890480', 'Little Writer - The Tracing App for Kids', '37417984', 'USD', '0.99', '1348', '2', '4.5', '3', '2', '4+', 'Education', '37', '5', '1', '1']
['496177674', 'Voice Dream Reader', '89168896', 'USD', '14.99', '1343', '1', '4.5', '5', '4.2.0', '17+', 'Education', '37', '4', '26', '1']
['333263435', 'Daily Teachings', '20298752', 'USD', '4.99', '1340', '288', '4.5', '5', '1.7.3', '4+', 'Lifestyle', '38', '0', '1', '1']
['517271093', 'Virtua Tennis Challenge', '401727488', 'USD', '4.99', '1339', '459', '4', '4', '1.2', '4+', 'Games', '39', '4', '9', '1']
['634467171', 'Glitché', '30717952', 'USD', '0.99', '1338', '202', '4', '4.5', '2.12', '4+', 'Photo & Video', '37', '5', '1', '1']
['482365195', 'Kinectimals', '185228498', 'USD', '2.99', '1336', '615', '4', '4', '1.1', '4+', 'Games', '43', '5', '3', '1']
['545623778', 'Camera360 Concept - HelloCamer

['678861146', "Dr. Panda's Airport", '95670272', 'USD', '2.99', '352', '223', '4.5', '4.5', '1.8', '4+', 'Education', '40', '5', '21', '1']
['531968389', 'Deck Pass Plus', '10240000', 'USD', '1.99', '350', '170', '3.5', '3.5', '3.0.1', '4+', 'Sports', '43', '5', '1', '1']
['631446426', 'Writing Wizard - Kids Learn to Write Letters, Alphabet & Words', '42598400', 'USD', '3.99', '349', '94', '4.5', '4.5', '3.0.2', '4+', 'Education', '38', '5', '6', '1']
['656723874', 'Home - A Unique Horror Adventure', '49221632', 'USD', '2.99', '348', '35', '3.5', '3.5', '1.5.4', '12+', 'Games', '43', '5', '1', '1']
['352000376', 'Shift Worker', '7344128', 'USD', '1.99', '348', '33', '4', '4', '1.5.4', '4+', 'Productivity', '40', '0', '1', '0']
['1111617387', 'Mutant Creatures Mod for Minecraft PC Edition - Pocket Mods Guide', '27025408', 'USD', '1.99', '346', '11', '4', '2.5', '1.1', '4+', 'Reference', '38', '4', '1', '1']
['1072395467', '60 Seconds! Atomic Adventure', '348491776', 'USD', '3.99', '346'

['1096204046', 'Caterzillar', '73056256', 'USD', '2.99', '256', '256', '4.5', '4.5', '1', '4+', 'Games', '38', '5', '8', '1']
['492571492', 'Jurassic Park: The Game 4 HD', '468103168', 'USD', '2.99', '256', '41', '4', '4', '1.1.1', '12+', 'Games', '40', '5', '1', '1']
['1069508452', 'League of Light: Silent Mountain - A Hidden Object Mystery (Full)', '1188780032', 'USD', '6.99', '256', '256', '5', '5', '1.0.0', '9+', 'Games', '38', '5', '4', '1']
['937236311', 'YeahKeys - Customize your keyboard', '20318208', 'USD', '2.99', '255', '12', '2', '1.5', '3.7', '12+', 'Utilities', '37', '1', '1', '1']
['1114170869', 'Day of the Tentacle Remastered', '2779450368', 'USD', '4.99', '255', '42', '5', '4.5', '1.2', '12+', 'Games', '37', '5', '5', '1']
['1000668798', 'King Tongue', '422011904', 'USD', '1.99', '255', '63', '4.5', '4.5', '1.0.2', '9+', 'Games', '40', '5', '1', '1']
['1035217840', '[the Sequence]', '132829184', 'USD', '0.99', '254', '164', '5', '5', '1.4', '4+', 'Games', '43', '5', '1

['1065271333', 'Sago Mini Robot Party', '84750336', 'USD', '2.99', '53', '53', '4.5', '4.5', '1', '4+', 'Education', '39', '5', '1', '1']
['1141983956', 'Moana: Rhythm Run', '238605312', 'USD', '6.99', '53', '48', '3', '2.5', '1.1', '4+', 'Entertainment', '40', '5', '1', '1']
['605040858', 'Photon Safe Web Browser with Flash Player for Kids', '34903040', 'USD', '4.99', '53', '2', '2.5', '3', '6.2', '4+', 'Education', '24', '3', '10', '1']
['1099735932', 'Treasure Buster', '41987072', 'USD', '0.99', '53', '53', '4.5', '4.5', '1', '12+', 'Games', '38', '5', '1', '1']
['999702563', 'KORG iDS-10', '88055808', 'USD', '19.99', '52', '19', '4', '5', '2.1.1', '4+', 'Music', '37', '5', '2', '1']
['1068556466', 'Modern Atlas Travel Guides & City Maps — nearby places in Wikipedia & Wikivoyage', '65324032', 'USD', '0.99', '52', '32', '4.5', '4', '1.0.3', '12+', 'Travel', '37', '5', '1', '1']
['944244620', 'Santa Spy Cam! I Caught Santa!', '21561344', 'USD', '0.99', '52', '52', '1', '1', '2.2', '4+

['421988577', 'Shell Shock 1.5', '46864078', 'USD', '0.99', '26', '26', '3', '3', '1.5', '4+', 'Games', '26', '5', '1', '1']
['662611545', 'Lotto Out! - Mexican Loteria', '72475648', 'USD', '0.99', '26', '6', '3.5', '3.5', '2.3', '12+', 'Games', '40', '5', '2', '1']
['799631716', 'mySTATE - State College', '7122944', 'USD', '0.99', '26', '8', '4.5', '3.5', '1.6', '17+', 'Navigation', '38', '0', '1', '1']
['1085660579', 'Camping With Grandpa', '110712832', 'USD', '2.99', '25', '3', '3.5', '3.5', '1.3', '4+', 'Education', '37', '5', '1', '1']
['807416704', 'X-Tactics', '94363648', 'USD', '3.99', '25', '2', '5', '5', '2.4.0', '12+', 'Games', '38', '5', '6', '1']
['985077390', 'Roll Turtle', '92815360', 'USD', '0.99', '25', '3', '5', '4.5', '1.1', '4+', 'Games', '40', '5', '1', '1']
['1161267000', 'Beams.', '100624384', 'USD', '0.99', '25', '25', '4.5', '4.5', '1', '4+', 'Games', '38', '5', '11', '1']
['923873989', 'CreamCam+ auto selfie enhancer', '47947776', 'USD', '1.99', '25', '0', '3.

['1122211704', "Please, Don't Touch Anything 3D", '468603904', 'USD', '5.99', '5', '2', '4', '4.5', '1.3', '9+', 'Games', '25', '5', '1', '1']
['1093438308', 'Moomijis Moomin Stickers', '19173376', 'USD', '0.99', '5', '5', '3', '3', '1.5', '4+', 'Utilities', '37', '2', '1', '1']
['1079094909', 'Custom Cam', '17545216', 'USD', '1.99', '5', '0', '1.5', '0', '2.0.3', '4+', 'Photo & Video', '37', '0', '1', '1']
['1040523600', 'Zasa : An AI Story', '31244288', 'USD', '0.99', '5', '1', '4.5', '4', '1.2.13', '4+', 'Games', '40', '4', '16', '1']
['1140033685', 'Gymnastics Girl Jump American Athlete sports PRO', '38723584', 'USD', '2.99', '5', '3', '3', '3.5', '1.4', '4+', 'Games', '40', '4', '1', '1']
['1141588161', 'HauntedPic', '89677824', 'USD', '0.99', '5', '0', '5', '0', '1.2', '9+', 'Entertainment', '37', '0', '1', '1']
['1087742191', 'Smart Baby Sorter HD - Early Learning Shapes and Colors / Matching and Educational Games for Preschool Kids', '16807936', 'USD', '2.99', '5', '5', '4.5', 

In [16]:
# Create new list of free apps
android_free = []
ios_free = []

for column in android_english:
    cost = float(column[7])
    if cost <= 0:
        android_free.append(column)
        
for column in ios_english:
    cost = float(column[4])
    if cost <= 0:
        ios_free.append(column)
        
print(len(android_free))
print(len(ios_free))

8864
3220


## Compare Android and iOS Apps
#### Inspect to see which genres have the greatest number of apps

In [17]:
# Create a frequency table of genres for iOS, and Android 

ios_genre_counting={}
android_genre_counting={}
android_category_counting={}

for row in ios_free:
    genre=row[11]
    if genre in ios_genre_counting:
        ios_genre_counting[genre] += 1
    else:
        ios_genre_counting[genre]=1
        
for row in android_free:
    genre=row[9]
    if genre in android_genre_counting:
        android_genre_counting[genre] += 1
    else:
        android_genre_counting[genre]=1

for row in android_free:
    category=row[1]
    if category in android_category_counting:
        android_category_counting[category] += 1
    else:
        android_category_counting[category]=1
        
# print('iOS Genre Counts')
# print(ios_genre_counting)
# print("\n")
# print('Android Genre Counts')
# print(android_genre_counting)
# print("\n")
print('Android Category Counts')
print(android_category_counting)

Android Category Counts
{'ART_AND_DESIGN': 57, 'AUTO_AND_VEHICLES': 82, 'BEAUTY': 53, 'BOOKS_AND_REFERENCE': 190, 'BUSINESS': 407, 'COMICS': 55, 'COMMUNICATION': 287, 'DATING': 165, 'EDUCATION': 103, 'ENTERTAINMENT': 85, 'EVENTS': 63, 'FINANCE': 328, 'FOOD_AND_DRINK': 110, 'HEALTH_AND_FITNESS': 273, 'HOUSE_AND_HOME': 73, 'LIBRARIES_AND_DEMO': 83, 'LIFESTYLE': 346, 'GAME': 862, 'FAMILY': 1676, 'MEDICAL': 313, 'SOCIAL': 236, 'SHOPPING': 199, 'PHOTOGRAPHY': 261, 'SPORTS': 301, 'TRAVEL_AND_LOCAL': 207, 'TOOLS': 750, 'PERSONALIZATION': 294, 'PRODUCTIVITY': 345, 'PARENTING': 58, 'WEATHER': 71, 'VIDEO_PLAYERS': 159, 'NEWS_AND_MAGAZINES': 248, 'MAPS_AND_NAVIGATION': 124}


### Find the percentage of Apps in each Genre

In [18]:
# Create function which takes a dataset and index from the dataset and returns a frequency table 
# in which the values are displayed as percentages
def freq_table(dataset, index):
    frequency_table ={}
    
    total_number_of_apps = (len(dataset)) 
    for i in dataset:
        column=i[index]
        if column in frequency_table:
            frequency_table[column]+=1
        else:
            frequency_table[column]=1
    
    percentages={}
    for val in frequency_table:
        proportion=frequency_table[val]/(total_number_of_apps)
        percentage= round(proportion*100,2)
        percentages[val]=percentage
    return percentages


In [19]:
# Create a function which passes the freq_table function and returns a table which sorts the percentages
# in descending order

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])        

In [20]:
# Display table for free, English iOS apps

display_table(ios_free, 11)

Games : 58.14
Entertainment : 7.89
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.52
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.34
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


### Genre Analysis of App Store Free, English Apps
- From the list, we can see that the most common genre is Games, followed by Entertainment.
- Games make up over 50% of the free, English apps available for download.
- Almost 80% of the free, English apps are for entertainment purposes. 
- From this data, we can see that the Games App Store market is saturated. From this particular set of data, we can not determine whether the quantity of apps in a particular genre correlates with the amount of users who download apps from that genre.  

In [21]:
# Display table for free, English Android app categories
display_table(android_free, 1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


### Genre Analysis of Google Play Free, English Apps
- From the data, we can see that the free apps available in the Google Play Store serve more practical uses, and are less for entertainment purposes. This is a stark contrast from the App Store data. This is confirmed by reviewing the Genres category in the frequecy table below.
 

In [22]:
# Display table for free, English Android apps
display_table(android_free, 9)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

### Inspect the average number of users per Genre

In [23]:
prime_genre_table=freq_table(ios_free,11)
print(prime_genre_table)

{'Social Networking': 3.29, 'Photo & Video': 4.97, 'Games': 58.14, 'Music': 2.05, 'Reference': 0.56, 'Health & Fitness': 2.02, 'Weather': 0.87, 'Utilities': 2.52, 'Travel': 1.24, 'Shopping': 2.61, 'News': 1.34, 'Navigation': 0.19, 'Lifestyle': 1.58, 'Entertainment': 7.89, 'Food & Drink': 0.81, 'Sports': 2.14, 'Book': 0.43, 'Finance': 1.12, 'Education': 3.66, 'Productivity': 1.74, 'Business': 0.53, 'Catalogs': 0.12, 'Medical': 0.19}


In [24]:
for genre in prime_genre_table:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            user_rating=float(app[5])
            total += user_rating
            len_genre += 1
    avg_user_rating=round(total/len_genre,2)
    print(genre, ": ", avg_user_rating)

Social Networking :  71548.35
Photo & Video :  28441.54
Games :  22812.92
Music :  57326.53
Reference :  74942.11
Health & Fitness :  23298.02
Weather :  52279.89
Utilities :  18684.46
Travel :  28243.8
Shopping :  26919.69
News :  21248.02
Navigation :  86090.33
Lifestyle :  16485.76
Entertainment :  14029.83
Food & Drink :  33333.92
Sports :  23008.9
Book :  39758.5
Finance :  31467.94
Education :  7003.98
Productivity :  21028.41
Business :  7491.12
Catalogs :  4004.0
Medical :  612.0


In [25]:
# app name and rating count total for each app in a particular genre
for app in ios_free:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [27]:
display_table(android_free, 5)

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05
0 : 0.01


In [29]:
category_table=freq_table(android_free, 1)
print(category_table)

{'ART_AND_DESIGN': 0.64, 'AUTO_AND_VEHICLES': 0.93, 'BEAUTY': 0.6, 'BOOKS_AND_REFERENCE': 2.14, 'BUSINESS': 4.59, 'COMICS': 0.62, 'COMMUNICATION': 3.24, 'DATING': 1.86, 'EDUCATION': 1.16, 'ENTERTAINMENT': 0.96, 'EVENTS': 0.71, 'FINANCE': 3.7, 'FOOD_AND_DRINK': 1.24, 'HEALTH_AND_FITNESS': 3.08, 'HOUSE_AND_HOME': 0.82, 'LIBRARIES_AND_DEMO': 0.94, 'LIFESTYLE': 3.9, 'GAME': 9.72, 'FAMILY': 18.91, 'MEDICAL': 3.53, 'SOCIAL': 2.66, 'SHOPPING': 2.25, 'PHOTOGRAPHY': 2.94, 'SPORTS': 3.4, 'TRAVEL_AND_LOCAL': 2.34, 'TOOLS': 8.46, 'PERSONALIZATION': 3.32, 'PRODUCTIVITY': 3.89, 'PARENTING': 0.65, 'WEATHER': 0.8, 'VIDEO_PLAYERS': 1.79, 'NEWS_AND_MAGAZINES': 2.8, 'MAPS_AND_NAVIGATION': 1.4}


In [34]:
for genre in category_table:
    total = 0
    len_category = 0
    for category in android_free:
        category_app=category[1]
        if category_app==genre:
            installs=category[5]
            installs=installs.replace('+', '')
            installs=installs.replace(',', '')
            installs=float(installs)
            total+= installs
            len_category+=1
    avg_installs=round(total/len_category,2)
    print(genre, ": ", avg_installs)     

ART_AND_DESIGN :  1986335.09
AUTO_AND_VEHICLES :  647317.82
BEAUTY :  513151.89
BOOKS_AND_REFERENCE :  8767811.89
BUSINESS :  1712290.15
COMICS :  817657.27
COMMUNICATION :  38456119.17
DATING :  854028.83
EDUCATION :  1833495.15
ENTERTAINMENT :  11640705.88
EVENTS :  253542.22
FINANCE :  1387692.48
FOOD_AND_DRINK :  1924897.74
HEALTH_AND_FITNESS :  4188821.99
HOUSE_AND_HOME :  1331540.56
LIBRARIES_AND_DEMO :  638503.73
LIFESTYLE :  1437816.27
GAME :  15588015.6
FAMILY :  3695641.82
MEDICAL :  120550.62
SOCIAL :  23253652.13
SHOPPING :  7036877.31
PHOTOGRAPHY :  17840110.4
SPORTS :  3638640.14
TRAVEL_AND_LOCAL :  13984077.71
TOOLS :  10801391.3
PERSONALIZATION :  5201482.61
PRODUCTIVITY :  16787331.34
PARENTING :  542603.62
WEATHER :  5074486.2
VIDEO_PLAYERS :  24727872.45
NEWS_AND_MAGAZINES :  9549178.47
MAPS_AND_NAVIGATION :  4056941.77


In [36]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+
