# Step 1: Open the data set
## In this project, we aim to collect and analyze data about mobile apps available on Google Play and the App store to help developers understand what types of apps are likely to attract more users.<br>We start with defining the function to read the data set row by row repeatedly, and then open the two data sets and read them. 

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
from csv import reader

# open the Apple Store data sets and extract the rows into separate lists
open_file1 = open('AppleStore.csv')
read_file1 = reader(open_file1)
ios = list(read_file1)
ios_head = ios[0]
ios_data = ios[1:]

# open the Google Play Store data sets and extract the rows into separate lists
open_file2 = open('googleplaystore.csv')
read_file2 = reader(open_file2)
android = list(read_file2)
android_head = android[0]
android_data = android[1:]

### We read the data for ios first with the head row

In [3]:
# use the defined function to show the first several rows of the data sets
print(ios_head, '\n')
explore_data(ios_data, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


### We then read android data with the head

In [4]:
print(android_head, '\n')
explore_data(android_data, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


### All the column names of both data sets are listed above and the detailed explanantion for these header can be found here for [AppleStore](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) and [GooglePlayStore](https://www.kaggle.com/lava18/google-play-store-apps/home).<br>We can see that the useful columns for ios include 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', and 'prime_genre'. For android, the useful columns include 'App', 'Category', 'Rating', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

# Step 2: Data cleaning
## Before analyzing data, we need to remove or correct wrong data. Here, we need to remove all the data for apps that are either not free or in English

In [5]:
print(android_data[10472], '\n')  # the incorrect row according to the report
print(android[0], '\n')      # compared to the head row

### if didn't find the exact row, search through the whole data set to check if that row still exists and has the error ###
#for row in android_data:
#    if row[0] == 'Life Made WI-Fi Touchscreen Photo Frame':
#        print(row, '\n')
#        print(android_data.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 



In [6]:
# now delete this entry with the error of missing category
del android_data[10472]
print(len(android_data))

10840


In [7]:
# found a possible duplicate error in Apple Store and delete one of them
#for row in ios_data:
#    if row[1] == 'Mannequin Challenge':
#        print(row, '\n')
#        print(ios_data.index(row))

In [8]:
#del ios_data[4463]
#print(len(ios_data))

# Step 3: Removing duplicate entries
## We found that some of the apps have duplicate entries and we should find out all the duplicates and remove them

### As we learned, there exist some duplicated entries for the same app in the data set for android like Instagram 

In [9]:
# search for the duplicates of Instagram
for row in android_data:
    if row[0] == 'Instagram':
        print(row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


### Now we need to define the funcation to count the number of duplicates in the data set

In [10]:
# create two lists to store entries into two categories and count the numbers 
list_no_duplicate = []
list_duplicate = []

for row in android_data:
    name = row[0]
    if name in list_no_duplicate:
        list_duplicate.append(name)
    else:
        list_no_duplicate.append(name)
        
print('The number of non-duplicated entries is ', len(list_no_duplicate))  
print('The number of duplicated entries is ', len(list_duplicate))

The number of non-duplicated entries is  9659
The number of duplicated entries is  1181


### The reason we do not want to randomly remove the duplicates (in the order of searching) is that it is a better way to keep the entries with the highest number of reviews received and remove all the duplicated entries left, as most reviews received mean that this entry is the most recent one.<br>To reach this, we will begin with creating a dictionary using apps' names as keys and corresponding numbers of reviews as values, and then use this dictionary to remove the entries.

In [11]:
# create the dictionary to store app names and corresponding max number of reviews
reviews_max = {}

for row in android_data:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and n_reviews > reviews_max[name]: # update if with more reviews
        reviews_max[name] = n_reviews
    elif name not in reviews_max: # store the new entries
        reviews_max[name] = n_reviews
# check the if the number of entries in the dictionary is correct
print('The length of the dictionary is correct: ', len(reviews_max) == len(list_no_duplicate), ', and should be ', len(list_no_duplicate))

The length of the dictionary is correct:  True , and should be  9659


### We remove the duplicated by adding the rows with the exact numbers of reviews as stored in the dictionary reviews_max into a new list named android_clean. We loop through the whole data set and in each iteration, we have the name of the app and the number of reviews and we add the current row to the android_clean, and the app name to the already_cleaned list if the number of reviews of the current app is exactly the same as the number of reviews of that app as in the dictionary and the name of the app is not already in the already_added list.<br>The already_cleaned list is necessary to exclude those duplicate entries with the same numbers of reviews.

In [12]:
android_clean = [] # empty list to store new cleaned data set
already_added = [] # empty list to store app names that are already added

for row in android_data:
    name = row[0]
    n_reviews = float(row[3])
    if name not in already_added and n_reviews == reviews_max[name]:
        android_clean.append(row)
        already_added.append(name)
        
explore_data(android_clean, 0, 1, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


# Step 4: Removing apps not in English
## Firstly, we need to write a function to check wether the name is in English

In [13]:
def check_english(a_string):
    for character in a_string:
        if ord(character) > 127:
            return False # return False immediately when one character is falling outside the range
        
    return True

# some tests
print(check_english('Instagram'))
print(check_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_english('Docs To Go™ Free Office Suite'))
print(ord('™'))
print(check_english('Instachat 😜'))
print(ord('😜'))

True
False
False
8482
False
128540


### As we showed above, some characters listed like emojis and special ones have corresponding number larger than 127 but still belong to the category we want to keep, so we do not want to remove these entries which may result in losing useful data.<br>To solve this, we will only remove the entries with more than three characters falling outside the ASCII range.

In [14]:
def check_english_updated(a_string):
    non_ASCII = 0 # count the number of characters falling outside the range
    for character in a_string:
        if ord(character) > 127:
            non_ASCII += 1
    if non_ASCII > 3: # only reject with more than 3 characters falling outside the range
        return False
    else:
        return True
    
# tests on the previous ones with emojis and special characters
print(check_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_english_updated('Docs To Go™ Free Office Suite'))
print(check_english_updated('Instachat 😜'))
    

False
True
True


### Now we use this filter to remove all the entries with more than three characters falling outside the range

In [15]:
android_clean_english = [] # empty list to store the filtered entries
ios_data_english = []

for row in android_clean:
    name = row[0]
    if check_english_updated(name):
        android_clean_english.append(row)

for row in ios_data:
    name = row[1]
    if check_english_updated(name):
        ios_data_english.append(row)
        
explore_data(android_clean_english, 0, 3, True)
print('\n')
explore_data(ios_data_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

# Step 5: Remove apps not free

In [16]:
android_clean_english_free = [] # empty lists to store filtered entries for free apps
ios_data_english_free = []

for row in android_clean_english:
    if row[7] == '0':
        android_clean_english_free.append(row)

for row in ios_data_english:
    if row[4] == '0.0':
        ios_data_english_free.append(row)
        
explore_data(android_clean_english_free, 0, 3, True)
print('\n')
explore_data(ios_data_english_free, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

# Step 6: Find the most common genres for each market
## We need to build a frequency table for each data set

### Since we hope to determine which kinds of apps (free and in engilish) would be likely to attract more users in both markets as usually more users bring higher revenue.<br>To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we then develop it further.
* If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

### Because our ultimate goal is to add the app on both the App Store and Google Play, we need to find app types that are successful on both markets. As a result, we will start with the analysis by getting a sense of the most common genres for each market by building a frequency table for the prime_genre column of the App Store data set and the Genres and Category columns of the Google Play data set.

In [17]:
# define a function to generate the frequency table firstly
def freq_table(dataset, index):
    table = {}
    count = 0
    for row in dataset:
        key = row[index]
        count += 1 # count the total number of entries
        if key in table:
            table[key] += 1
        else:
            table[key] = 1
    
    for key in table:
        table[key] = table[key] / count * 100 # transform into percentage

    return table

# define the function to transforms the frequency table into a list of tuples and sorts the list in a descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
display_table(ios_data_english_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can clearly see that, among all the free English apps, the most popular genre is 'games' that counts more than half by around 58.16%. The runner-up is the entertainment genre, which accounts for 7.88%. All other genres each account for less than 5%. The Photo & Video apps come next by 4.97% with the education and social newtworking rank in 4th and 5th, respectively. Only 10 genres have the share over 2%.

The general impression is that in the App Store for free English apps, the apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.) dominates while the apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are much less.

However, we still cannot recommend which app profile based on the frequency table alone, as a large number of available apps in one genre does not necessarily imply that apps of that genre generally have a large number of users.

In [18]:
display_table(android_clean_english_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

The pattern here in Google Play Store is very different that the percentages of apps designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.) is much lower while apps designed for practical purposes (family, tools, business, lifestyle, productivity, etc.) have a much higher share. But on the other hand, this depends on how these stores categorize the apps. For example, the family category with the highest percentage here by around 18.9% actually consists of mostly games for kids, which can be seen as apps designed for fun instead of practical purposes. Nevertheless, the practical apps in Google Play Store still have a higher share compared to App Store.

In [19]:
display_table(android_clean_english_free, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Due to the large number of available genres in Google Play Store, it is hard to tell the difference between the genres and the categories. There're too many partitions in the genres and as a resuly, the share of each genre is pretty small, such that it is difficult to have a grasp of which kinds of apps will be more attractive and we are unable to get enough information from this, so we should focus on the categories in Google Play Store.

To compare between two markets, App Store seem to have a larger percentage of apps designed for fun while Google Play store has comparable percentages of apps designed for fun and practical purposes. 

So far, we still cannot recommend any app profile based on what we found, the frequency table only reveals the most frequent app genres instead of what genres have the most users.

# Step 7: Find the genres with the most users
## One practical methos is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, for the App Store data set, since this information is not collected, we'll use the rating_count_tot app column of total number of user ratings as an approximation to calculate that.<br>Nested loops are used to count the average numbers of users for different genres.

In [20]:
# use nested loop based on the frequncy table above to calculate the average numbers of users for different genres
genres_ios = freq_table(ios_data_english_free, 11)

for genre in genres_ios: # for each unique genre
    total = 0
    len_genre = 0
    for row in ios_data_english_free:
        genre_app = row[11] # if the entry has the same genre, count += 1
        if genre_app == genre:
            total += float(row[5])
            len_genre += 1
    avg_number_rating = total / len_genre
    print('The average number of reviews of ', genre, 'is ', avg_number_rating)

The average number of reviews of  Health & Fitness is  23298.015384615384
The average number of reviews of  Education is  7003.983050847458
The average number of reviews of  Book is  39758.5
The average number of reviews of  Weather is  52279.892857142855
The average number of reviews of  Social Networking is  71548.34905660378
The average number of reviews of  Catalogs is  4004.0
The average number of reviews of  Navigation is  86090.33333333333
The average number of reviews of  Entertainment is  14029.830708661417
The average number of reviews of  Utilities is  18684.456790123455
The average number of reviews of  Productivity is  21028.410714285714
The average number of reviews of  Lifestyle is  16485.764705882353
The average number of reviews of  Finance is  31467.944444444445
The average number of reviews of  News is  21248.023255813954
The average number of reviews of  Business is  7491.117647058823
The average number of reviews of  Photo & Video is  28441.54375
The average number

We can see that, the navigation, reference, social networking, and music apps are the four genres with most users. However, if we dive deeper into these apps, we can see that these large numbers are actually influences by only few giant apps with a large amount of users like Google Maps, Facebook, Pandora, and Bible, so this does not mean that such genres are suitable for us to develop apps in as these genres are actually less popular than they looks like and it is difficult to compete with these apps. It is hard for other apps in these genres to get higher shares in the markets.
On the other hand, for example, for the genre of game, the average number of users may seem less attractive, but the distirbution is relatively even that all the apps have the chance to get profits from the market, and there so many different kinds of games available on the markets from different comparnies with a large total number of users, which means that this can be easier for starters to get in, so here based on the total and average users for each genre as well as the constituents, we would like to recommend the genre of game to develop apps in.

In [21]:
# show the details of users for each app under specific genres
for row in ios_data_english_free:
    if row[11] == 'Navigation':
        print('Navigation app', row[1], ': ', row[5])
for row in ios_data_english_free:
    if row[11] == 'Reference':
        print('Reference app', row[1], ': ', row[5])
#for row in ios_data_english_free:
#    if row[11] == 'Social Networking':
#        print('Social Networking app', row[1], ': ', row[5])
#for row in ios_data_english_free:
#    if row[11] == 'Music':
#        print('Music app', row[1], ': ', row[5])
#for row in ios_data_english_free:
#    if row[11] == 'Games':
#        print('Games app', row[1], ': ', row[5])

Navigation app Waze - GPS Navigation, Maps & Real-time Traffic :  345046
Navigation app Google Maps - Navigation & Transit :  154911
Navigation app Geocaching® :  12811
Navigation app CoPilot GPS – Car Navigation & Offline Maps :  3582
Navigation app ImmobilienScout24: Real Estate Search in Germany :  187
Navigation app Railway Route Search :  5
Reference app Bible :  985920
Reference app Dictionary.com Dictionary & Thesaurus :  200047
Reference app Dictionary.com Dictionary & Thesaurus for iPad :  54175
Reference app Google Translate :  26786
Reference app Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran :  18418
Reference app New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition :  17588
Reference app Merriam-Webster Dictionary :  16849
Reference app Night Sky :  12122
Reference app City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) :  8535
Reference app LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer T

In [22]:
category_android = freq_table(android_clean_english_free, 1)

for category in category_android: # for each unique genre
    total = 0
    len_category = 0
    for row in android_clean_english_free:
        category_app = row[1] # if the entry has the same genre, count += 1
        if category_app == category:
            number_install = row[5]
            number_install = number_install.replace(',', '') # replace the '+' and ',' to convert into floats (i.e. 10,000+)
            number_install = number_install.replace('+', '')
            total += float(number_install)
            len_category += 1
    avg_number_install = total / len_category
    print('The average number of installs of ', category, 'is ', avg_number_install)

The average number of installs of  FAMILY is  3695641.8198090694
The average number of installs of  TOOLS is  10801391.298666667
The average number of installs of  HEALTH_AND_FITNESS is  4188821.9853479853
The average number of installs of  SPORTS is  3638640.1428571427
The average number of installs of  EDUCATION is  1833495.145631068
The average number of installs of  HOUSE_AND_HOME is  1331540.5616438356
The average number of installs of  DATING is  854028.8303030303
The average number of installs of  PARENTING is  542603.6206896552
The average number of installs of  PRODUCTIVITY is  16787331.344927534
The average number of installs of  VIDEO_PLAYERS is  24727872.452830188
The average number of installs of  PERSONALIZATION is  5201482.6122448975
The average number of installs of  BEAUTY is  513151.88679245283
The average number of installs of  PHOTOGRAPHY is  17840110.40229885
The average number of installs of  LIBRARIES_AND_DEMO is  638503.734939759
The average number of installs o

Still, we can see that here for the Google Play store, the three categories with the most installs are communication, video players, and social, but still the same problem that a few apps of these categories significantly enhance the average installs, which make them look more attractive than they really arr.
For example, if we look at these three categories, we can see that only less than 20 apps in each categories of VIDEO_PLAYERS and SOCIAL have the number of installs greater than one hundred million, which means that the large numbers are mostly contributed by limited extremely popular apps and the potential in these categories is not so good. For the commucation category, it is also not good to see so many apps with installs over 1,000,000,000 that not much space are available in this area.

In [28]:
for app in android_clean_english_free:
    if app[1] == 'VIDEO_PLAYERS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
print('\n')
for app in android_clean_english_free:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
print('\n')
for app in android_clean_english_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+


Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Sk

Then, the next five runner-ups include the categories of 'PRODUCTIVITY', 'PHOTOGRAPHY', 'TRAVEL_AND_LOCAL', 'GAME', and 'FINANCE'. We take similar tests on these categories.
Similarly, we see that for PRODUCTIVITY, PHOTOGRAPHY, and TRAVEL_AND_LOCAL categories, several apps with very large numbers of installs contribute mostly to the high installs in these categories. 
For games, it is good to see that there exist the largest number of apps whose installs are fall in the interval of 100,000,000+, which means that the market for games are very good and there still potential to develop successful apps in this area considering the numerous attractive apps with none of them showing a significantly higher share (no apps with 1,000,000,000+). So game can still be a good choice.

In [31]:
for app in android_clean_english_free:
    if app[1] == 'PRODUCTIVITY' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
print('\n')       
for app in android_clean_english_free:
    if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
print('\n')
for app in android_clean_english_free:
    if app[1] == 'TRAVEL_AND_LOCAL' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])
print('\n')
for app in android_clean_english_free:
    if app[1] == 'GAME' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+


B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
P

To conclude, we tried to analyze the data about both Apple Store and Google Play Store in this project, in order to recommend the category/genre of a new app that can make revenue in both markets. We started with opening data, and then filtered these data with the criterion we needed, and finally looked into the average/total installs/reviews in different category/genre. 
We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.