# Identifying profitable app profiles from Google and IOS app store data

## Introduction

Greetings, my name is Eric. The main aim of this Dataquest project is to identify what kinds of free and English apps attract more users and on the the app stores of both platforms. For this project, I am a data analyst for a company that makes apps for the Google Play and the IOS App Store. The company builds apps in English that are free to download and install, so its only source of income comes from ad revenue based on the number of users that see and interact with the advertisements. 

The approach to handle this project will consist of opening the tabular Google Play (2017) and IOS (2018) [datasets](https://github.com/dataquestio/solutions/blob/master/Mission350Solutions.ipynb) as a list of lists. The following step will be to clean the data and have only complete, free and English app records remaining. The last step will be to identify popular apps  based on either the genre app pool or the user downloads.

The main results across both datasets were that social media apps or apps with very strong social elements were the most prevalent profile. For the Google Play store, communication, video-playing and social apps were the most popular according to user downloads. For the IOS app store, reference and social networking apps show the highest engagement, as well as the runner-ups navigation, music, and weather. I achieved my goal for this project, which was to look for some profitable insights inside the data because I'm very curious about how it behaved. 

## Exploring the data

In [59]:
#Step 1:Time to write a function that can
#open both csv files and convert them into a list of lists.

def csv_to_list(dataset):
    opened_file = open(dataset)
    from csv import reader
    read_file = reader(opened_file)
    list_mode = list(read_file)
    return list_mode

In [60]:
#Google Play app list

GP_app_list = csv_to_list('googleplaystore.csv')
GP_app_header = GP_app_list[0]
GP_app_data = GP_app_list[1:]

#IOS appstore app list

IOS_app_list = csv_to_list('AppleStore.csv')
IOS_app_header = IOS_app_list[0]
IOS_app_data = IOS_app_list[1:]

In [61]:
#Let's explore the data by seeing how it looks.

#It's best to build a function that shows a few rows.

def explore_dataset(dataset_list, start_index, end_index, rows_and_columns=False):
    dataset_slice = dataset_list[start_index:end_index]    
    for row in dataset_slice:
        print(row)
        print('\n') # To add an empty line after each row printed

    if rows_and_columns:
        print('Number of rows:', len(dataset_list))
        print('Number of columns:', len(dataset_list[0]))

In [62]:
#Exploring the Google Play dataset

print(GP_app_header)
print('\n')
explore_dataset(GP_app_data,0,5,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

#### Wow, it has 10,841 rows with 13 columns. So many!

#### Everything except "Last Update", "Current Ver", and "Android Ver" look very relevant to what may be needed. I believe all the other columns can provide insight towards what type of apps might be ideal for the market.

In [63]:
#Exploring the IOS app store dataset

print(IOS_app_header)
print('\n')
explore_dataset(IOS_app_data,0,5,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


#### Much less than the Google Play store, yet it still has many, many apps, 7197 in this case with 16 columns.

#### I believe that 'user_rating_ver', 'ver',  'sup_devices.num', 'ipadSc_urls.num', 'lang.num', and 'vpp_lic' may not be necessary for our study. Everything else looks ideal for finding out what apps are popular.

## Removing Wrong Data


### Deleting Incomplete Records

I was thinking that there may exist empty fields in some columns, as well as some undesired duplicates. As an excel user who has done reports with and without pivot tables, I know how problematic it may become. I believe it is time to build two functions for these problems. I'll deal with the incomplete rows first.

In [64]:
#Time to build a function to find incomplete rows

def Incomplete_row_finder (dataset_list, list_has_header=True, Or_tell_me_column_count=0):
    index_counter = 0 #A rudimentary way to find index numbers
    error_indexes = [] #The spots the errors will occur in
    target_records = [] #A list that will hold all bad records
    
    if list_has_header:
        column_count = len(dataset_list[0])
        
        for row in dataset_list:
            length_x_row = len(row) #amount of elements per row
            
            if length_x_row == column_count:
                index_counter+=1
                
            elif length_x_row != column_count:
                error_indexes.append(index_counter)
                index_counter+=1
                target_records.append(row)
            
        
    else:
        
        for row in dataset_list:
            length_x_row = len(row)
            
            if length_x_row == Or_tell_me_column_count:
                index_counter+=1
                
            elif length_x_row != Or_tell_me_column_count:
                error_indexes.append(index_counter)
                index_counter+=1
                target_records.append(row)
                
    #The return will be the incomplete records and their indexes
    
    return target_records,error_indexes
            

In [65]:
GP_incomplete_row = Incomplete_row_finder(GP_app_data,False,13)
print(GP_incomplete_row)

([['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']], [10472])


In [66]:
#To verify if it worked
print(GP_app_data[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


#### Alright, the index returned by the function worked...and the function itself, as well. I'm very pleased that this can be applied to almost any dataset in list format we can find. The error happens in index 10473 for the Google Play app list. The genre is missing. For now, it can be safely erased because the genre is important.

In [67]:
del GP_app_data[10472]

In [68]:
print(GP_app_list[10472])

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


#### Yes, it is definitely gone now.

#### Time to verify the IOS app store data. We might find interesting things there too.

In [69]:
IOS_app_incomplete_row = Incomplete_row_finder(IOS_app_data,False,16)
print(IOS_app_incomplete_row)

([], [])


#### Really? Good news, no incomplete records in the ios app store data list.

### Finding and Deleting Duplicates

#### Why not build a duplicate finder function? The one right here is my own, built by me. The next one I'll include is based on selective deletion. My version is a duplicate finder and assumes you will erase duplicates and leave one original by giving you their indexes. The other version identifies the duplicates without indexes, but it fits the intentions of not erasing duplicates indiscriminately. 

In [70]:
def duplicate_finder (dataset_list,target_column_index,option):
    
    freq_items = {} # Counting the times an item is found
    Dup_freq_items ={} #Tallying the duplicates by item
    index_counter = 0 #A rudimentary way of finding the index number
    Duplicate_list = [] #The duplicates' index numbers
    Dup_counter = 0 #To know how many duplicates occured in total, one raw number
    
    for row in dataset_list:
        
        target = row[target_column_index]
     
    #If the target exist already in the frequency of items
    
        if target in freq_items:
            Duplicate_list.append(index_counter)
            Dup_counter+=1
            index_counter+=1 #Always at the end to have correct number
            freq_items[target]+=1
            
            if target in Dup_freq_items:
                Dup_freq_items[target]+=1
                
            else:
                Dup_freq_items[target]=2 #To include the times the record occurs, including the originals
            
        else:
            index_counter+=1
            freq_items[target]=1
    
    # You can choose to get the count, frequency or index numbers of duplicates
    
    if option == 'count':
        return Dup_counter
    elif option == 'frequency':
        return Dup_freq_items
    elif option == 'index':
        return Duplicate_list

In [71]:
Duplicate_GP_count = duplicate_finder(GP_app_data,0,'count')

In [72]:
print(Duplicate_GP_count)

1181


In [73]:
def duplicate_identifier(dataset_list,target_column_index):
    unique_items = []
    duplicate_items = []
    for row in dataset_list:
        
        target = row[target_column_index]
        
        if target in unique_items:
            duplicate_items.append(target)
        else:
            unique_items.append(target)
            
    return duplicate_items

#### Let's look at the google app data.

In [74]:
GP_app_dups = duplicate_identifier(GP_app_data,0)
print(len(GP_app_dups))
print('\n')
print(GP_app_dups[:15])

1181


['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


#### I wonder if there is anything different about the duplicates. Let's check with two examples.

In [75]:
for app in GP_app_data:
    name = app[0]
    if name == 'Google Ads':
        print(app)

['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


In [76]:
for app in GP_app_data:
    name = app[0]
    if name == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


In [77]:
print(GP_app_header) #For pinpointing the change

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


#### It's the fourth column that is changing. Interesting...It references the amount of reviews the application receives. When I remove these, I'm just going to leave the highest amount of reviews for the duplicates.

#### Let's verify the ios_app_store.

In [78]:
IOS_app_store_dups = duplicate_identifier(IOS_app_data,1)
print(len(IOS_app_store_dups))
print('\n')
print(IOS_app_store_dups[:15])

2


['Mannequin Challenge', 'VR Roller Coaster']


#### There are 2 duplicates. Now, let's have a look to see if they are different. 

In [79]:
for app in IOS_app_data:
    name = app[1]
    if name == 'Mannequin Challenge':
        print(app)

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']


In [80]:
for app in IOS_app_data:
    name = app[1]
    if name == 'VR Roller Coaster':
        print(app)

['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


In [81]:
print(IOS_app_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


#### 'rating_count_tot', 'rating_count_ver, 'user_rating_ver', 'ver','sup_devices.num', 'ipadSc_urls.num' appear to be the columns changing. For "mannequin" in particular, the 'cont_rating' changed from 4+ to 9+. For IOS, I think keeping the most recent versions and erasing the older duplicates will be better.

### Removing Duplicates: Google Play data and IOS App Store data

How many originals will we have after cleaning the Google Play data?

In [82]:
len(GP_app_data)-len(GP_app_dups)

9659

9659 is our target, excellent. Time to selectively delete these duplicates by getting rid of the ones with lower ratings.

In [83]:
def duplicate_deleter_x(data_set_list_mode,name_index,comparison_index):
    
    dictionary_of_criteria = {} #Function for numeric-based
                                #duplicate deletions
    
    for row in data_set_list_mode:
        
        name = row[name_index] #Name or ID of item
        comparison_item = row[comparison_index] #Value of item compared
        
        if name in dictionary_of_criteria:
            
            if comparison_item > dictionary_of_criteria[name]:
                    dictionary_of_criteria[name]=comparison_item
                    
            elif comparison_item <= dictionary_of_criteria[name]:
                del row
        else:
            dictionary_of_criteria[name]=comparison_item
    
    

In [84]:
Test_list = ['A',1,2,3,4,5,6,]

#### First, the highest number of reviews can be isolated in a single dictionary for the Google apps data

In [85]:
reviews_max = {}
for row in GP_app_data:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name]=n_reviews
        
    elif name not in reviews_max:
        reviews_max[name]=n_reviews

len(reviews_max)

9659

#### The next thing that can be done is to add the records who match the highest number of reviews only once to the list android_clean. The app of the record added then gets appended to the already_added list. If the app matches the highest number of reviews, but was preexistent in the already_added list, it will not be appended to the android_clean list.

In [86]:
android_clean =[]
already_added = []

for row in GP_app_data:
    name = row[0]
    n_reviews = float(row[3])
    
    if reviews_max[name] == n_reviews and name not in already_added:
        android_clean.append(row)
        already_added.append(name)
    

print(len(android_clean))
print(android_clean[0])

9659
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


#### For the IOS App Store data, the dataset had two true duplicates. The plan to proceed will be to keep the most recent versions of those apps. Finding the indexes of the target duplicates would be ideal.

In [87]:
counter_for_ios_app_data = 0 #To count index number
duplicate_indexes_ios = [] #To record indexes and duplicates
duplicate_indexes_ios_counter_only = [] #To record only indexes of duplicates

for app in IOS_app_data:
    
    name = app[1]
    if name == 'Mannequin Challenge' or name == 'VR Roller Coaster':
        duplicate_indexes_ios_counter_only.append(counter_for_ios_app_data)
        duplicate_indexes_ios.append([counter_for_ios_app_data,name])
        counter_for_ios_app_data+=1
    
    else:
        counter_for_ios_app_data+=1
        
print(duplicate_indexes_ios) #To show the duplicate items and indexes

print('\n') # Empty row to help readibility

for index in duplicate_indexes_ios_counter_only:
    print(IOS_app_data[index],"corresponds to ", index)
    print('\n') #The end result shows all duplicates for examination

[[2948, 'Mannequin Challenge'], [4442, 'VR Roller Coaster'], [4463, 'Mannequin Challenge'], [4831, 'VR Roller Coaster']]


['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1'] corresponds to  2948


['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1'] corresponds to  4442


['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1'] corresponds to  4463


['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1'] corresponds to  4831




#### Perfect. Now the indexes for the apps identified as duplicates in an earlier block are known. The most recent versions for "Mannequin Challenge" and "VR Roller Coaster" are 1.4 and 2.0.0 respectively. Therefore, only the entries at indexes at 4463 and 4831 should be erased.

In [88]:
del IOS_app_data[4463]
del IOS_app_data[4831]

In [89]:
#To see if it changed successfully
print(IOS_app_data[4463],"\n",IOS_app_data[4831])

['1041406978', 'DOFUS Touch', '3366912', 'USD', '0.0', '104', '3', '4.0', '4.0', '1.9.28', '12+', 'Games', '37', '5', '6', '1'] 
 ['1062002361', 'LumaFX - infinite video effects', '13921280', 'USD', '2.99', '67', '11', '4.0', '4.5', '2.0.3', '4+', 'Photo & Video', '37', '5', '8', '1']


### Deleting apps in non-English languages

#### It is imperative to remember the company builds apps in English only. Therefore, it is important to only work with data that caters to an English Speaking audience. Trying to write an english checking function might be a good solution. It is useful to remember that Emojis and character combinations such as TM might be identified as non-English if the "ord" function is used because it is over 127. Characters from 0 to 127 are commonly used in the English alphabet. To avoid so much data loss, a good alternative can be to eliminate apps who have more than three characters beyond 127. To be safe, the function will be tested with some app names and other characters.

In [90]:
def English_checker(string):

    Non_English_language_counter = 0
    
    for character in string:
        
        if ord(character)>127:
            Non_English_language_counter+=1
            
    if Non_English_language_counter > 3:
        return False # This will make sense later in the for string
    
    else:
        return True #If English, it will be true

In [91]:
English_checker('Instagram')

True

In [92]:
English_checker('Docs To Go™ Free Office Suite')

True

In [93]:
English_checker('™')

True

In [94]:
English_checker('Instachat 😜')

True

In [95]:
English_checker('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

#### As it seems ready for use, time to use it on the Google Play and IOS app store datasets to see how many apps will be removed.

In [96]:
#Google Play

print('Before the removal of non-English apps:',len(android_clean))

android_clean_English = []

for row in android_clean:
    name = row[0]
    if English_checker(name):
        android_clean_English.append(row)
        
print('\n')
print('After the removal of non-English apps:',len(android_clean_English))

Before the removal of non-English apps: 9659


After the removal of non-English apps: 9614


In [97]:
#IOS app store

print('Before the removal of non-English apps:',len(IOS_app_data))

IOS_app_data_English = []

for row in IOS_app_data:
    name = row[1]
    if English_checker(name):
        IOS_app_data_English.append(row)
        
print('\n')
print('After the removal of non-English apps:',len(IOS_app_data_English))

Before the removal of non-English apps: 7195


After the removal of non-English apps: 6181


#### Small and larger differences are seen here. The Google Play dataset had 45 non-English apps removed for a new total of 9614 apps. The IOS app store data set had 1014 apps removed for a new total of 6181 apps.

### Some good old data cleaning Pt.4: Deleting apps that are not free. 

#### The company has a preference for making free apps, where the main source of revenue is in-app ads. To execute this step, isolating them once more according to this criteria could be a good solution. 

In [98]:
# Google Play apps in English cleaned earlier

print('Before the removal of paid apps: ',len(android_clean_English))

android_clean_English_free = []

for row in android_clean_English:
    price = row[-7]
    if price == 'Free':
        android_clean_English_free.append(row)

print('After the removal of paid apps: ',len(android_clean_English_free))
      

Before the removal of paid apps:  9614
After the removal of paid apps:  8863


In [99]:
# Now for the IOS apps in English


print('Before the removal of paid apps: ',len(IOS_app_data_English))

IOS_app_data_English_free = []

for row in IOS_app_data_English:
    price = float(row[4])
    if price == 0:
        IOS_app_data_English_free.append(row)

print('After the removal of paid apps: ',len(IOS_app_data_English_free))
   

Before the removal of paid apps:  6181
After the removal of paid apps:  3221


#### The Google Play app list had 751 paid apps removed, and it has a new total 8,872 apps. The IOS app store list had many more paid apps removed by comparison, 2,960 of them, and now it has a new total of 3,221 apps. This lets us learn that a higher concentration of paid apps are in the apple store, and a higher saturation of free apps are in the Google Play store. 

### Identifying a profitable app profile

#### The company has three steps for its apps:

   #### 1. It builds a minimal Android version of the app and launches it on Google Play.

   #### 2. If it gets a good response, it gets developed further.

   #### 3. If the app is profitable after six months, a version is made for ios and launches on the App Store.

#### To identify what potential types of apps to make, taking a look at the frequency of the app categories or genres might be an efficient starting point. A frequency table function that outputs in percentages and something that sorts it from greatest to smallest will be the next steps.

In [100]:
def freq_table (dataset, target_index):
    end_freq_table = {}
    number_of_column_entries = 0
    
    for row in dataset:
        item = row[target_index]
        if item in end_freq_table:
            end_freq_table[item]+=1
            number_of_column_entries+=1
        else:
            end_freq_table[item]=1
            number_of_column_entries+=1
            
    return end_freq_table

#### Now what will sort the frequency table is the next step.

In [101]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

#### The Google Play store has the "Genres" and "Category" columns.

In [102]:
#Google Play
GP_Category = display_table(android_clean_English_free,1)

FAMILY : 1675
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53


In [103]:
#Google Play
GP_Genres = display_table(android_clean_English_free,-4)

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 80
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
Arcade;Action & Advent

### Google Play - free and english app insights:

#### 1) What are the most common genres?

The most commong genres appear to be, for genres, family games and tools. For category, they are tools, entertainment, and education.

#### 2) What other patterns can be seen?

While family, game and tool apps are among the strongest contenders, summatively speaking, the other apps tend to live in the realm of specific services, and other forms of social media, and lifestyle oriented applications.. 


#### 3) What is the general impression — are most of the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) or more for entertainment (games, photo and video, social networking, sports, music)? Compare patterns with IOS insights.

The apps here, unlike the IOS counterpart, are less game centric and appear to be more "family", "tool", and "entertainment". There is a much more even spread of apps in comparison to the IOS ones. They appear to be more on the practical side than the entertainment one. 

#### 4) Can you recommend an app profile based on what you found so far? Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users?

I would definitely argue against it. I have no idea what "need" or "want" they are targeting in terms of marketing. However, this table is great for initial insights. For example, from here onwards, I would like to inspect the "family" genre for the Google Play apps to see what things they are addressing. The tables generated here potentially revealed a mix of both areas. The most frequent app genres generally happen where there tends to be a higher flux of users. That woulde be my initial conjecture. 

In [104]:
#IOS App data

IOS_genres = display_table(IOS_app_data_English_free,-5)

Games : 1873
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


### IOS app store -  free and english app insights:

#### 1) What are the most common genres?

The most commong genres appear to be games (overwhelmingly), entertainment and photo apps.

#### 2) What other patterns can be seen?

Aside from the games and social media apps, the remainder of the apps appear to be tool based. They can be tools for finding places to dine, travel, finance, shopping, and more.


#### 3) What is the general impression - more for entertainment or practical purposes?

I would argue both in some measure. Everything that is not game or photo-related appears to be tool based. The games, photos, and social media apps are definitely entertainment centric.

#### 4) Can you recommend an app profile for the App Store market based on this frequency table alone? If there's a large number of apps for a particular genre, does that also imply that apps of that genre generally have a large number of users?

Actually, I would prefer not to do so yet. I would rather take a look at the apps with the highest number of reviews or downloads because the true engagement lies there. Another reason is simply the ambiguety. Games are the most popular, but "What types of games are the most popular?" within the subset is a better question to reach a more accurate profile. Many apps for a particular genre implies there is quite a lot of engagement and could likely have a high number of users. 



### Most popular genres by average user base

#### One way to get a more accurate read on the popularity of the genres is, aside from how many apps populate them, their average amount of installs or downloads. For Google Play, the install column is present. For the IOS app store, rating_count_tot can be used as a proxy. 

In [105]:
#IOS app store

len_genre_ios_freq_table = freq_table(IOS_app_data_English_free,-5)
IOS_genre_rating_avg = {}

for row in IOS_app_data_English_free:
    genre = row[-5]
    ratings = float(row [5])
    
    if genre in IOS_genre_rating_avg:
        IOS_genre_rating_avg[genre]+=ratings
    
    else:
        IOS_genre_rating_avg[genre]=ratings

for key in IOS_genre_rating_avg:
    IOS_genre_rating_avg[key]/=len_genre_ios_freq_table[key]

In [106]:
#Printing for top results

table_display_ios = []
for key in IOS_genre_rating_avg:
    key_val_as_tuple = (IOS_genre_rating_avg[key], key)
    table_display_ios.append(key_val_as_tuple)

    table_sorted = sorted(table_display_ios, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22800.780565937
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


### App profile reccomendation for IOS app store

#### Although there are many games, using the ratings as a proxy for number of installs shows something different. Reference and social networking apps show the highest engagement, as well as the runner-ups navigation, music, and weather. Making an app for any of these genres could be a good idea. This search isn't defined by number of apps, but rather the amount of engagement they have in these areas. Now, let us look at Google Play Store.

In [107]:
#Google Play

Genres_avg_GP = {}
Category_app_count_GP = freq_table(android_clean_English_free,1)

for row in android_clean_English_free:
    category = row[1]
    installs = row[5]
    installs = installs.replace(",","")
    installs = installs.replace("+","")
    installs = float(installs)
    
    if category in Genres_avg_GP:
        Genres_avg_GP[category]+= installs
        
    else:
        Genres_avg_GP[category]= installs
        
for key in Genres_avg_GP:
     Genres_avg_GP[key]/=Category_app_count_GP[key]

In [108]:
#Printing the top results

table_display_GP = []

for key in Genres_avg_GP:
    key_val_as_tuple = (Genres_avg_GP[key], key)
    table_display_GP.append(key_val_as_tuple)

    table_sorted = sorted(table_display_GP, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3697848.1731343283
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

### App profile reccomendation for Google Play store

#### Communication (social), video playing and social apps show the highest engagement. Making an app for any of these categories could be a good idea. 

### Conclusion : General app profile reccomendation

#### In fact, social media apps or apps with many "social media-like" elements are the common thread across the Google and IOS app pools (amongst others). It appears apps made specifically within these realms are downloaded the most often by users.