# Guided Project: Profitable App Profiles for the App Store and Google Play Markets

The purpose of this project is find the type of app genre/ category that has the most active users. The goal is to create an app that can be profitable for both The App Store and Google Play Store. In order to do that, we analyze the sample data obtained from each store, and use statistical measures to come up with a data-driven decision.
<br><br>
We are going to use the following data:
<br>[Apple: App Store sample dataset. This contains 7,200 apps](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)
<br>Download [here](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)
<br>
<br>
[Android: Google Play Store sample dataset. This contains 10,000 apps](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)
<br>
Download [here](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)

## Opening the Data
Below we are doing the following:
<br>1. Opening data
<br>2. Read data
<br>3. Transform the dataset into a list
<br>4. Separate the dataset header
<br>*note: The lists of datasets that we are going to use in this project will not have a header

In [1]:
#The App Store Dataset
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
appstore = list(read_file)
appstore_header = appstore[0]
appstore = appstore[1:]

#The Google Play Store Dataset
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
playstore = list(read_file)
playstore_header = playstore[0]
playstore = playstore[1:]

## Exploring the Data
**explore( )**
<br>To explore the datasets we create a function, explore()
<br>This function is useful since we can use it repeatedly to explore multiple datasets in an organized manner.

In [2]:
def explore(dataset, start, end, rows_and_columns= False):
    dataset_slice = dataset[start:end]#slicing data
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

**Exploring the App Store Data**

In [3]:
print(appstore_header)
print('\n')
explore(appstore, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


**Exploring the Google Play Store Data**

In [4]:
print(playstore_header)
print('\n')
explore(playstore, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


## Cleaning the Data

**<br>We look through the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) of the Google Play Store data, and apparently there is something wrong with row 10472.**
<br>After reading through the content we found that row 10472 doesn't have a category column, which caused a column shift for the next few columns on that row.
<br>We are going to check whether or not the category column is missing, by printing row 10472 of the dataset, and comparing it the dataset header, and row 10473, which is arbitrary.


In [5]:
print(playstore_header)
print('\n')
print(playstore[10472]) #where error lies
print(playstore[10473]) #for comparison


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


We can see that row 10472, which corresponds to Life Mad WI-Fi Touchscreen Photo Frame, have two things that are off:
<br>
<br>1. The Category is 1.9. This is obviously off, because category should be something other than a number.
<br>2. The rating is 19. This is off, because the maximum rating is 5.
<br><br>Therefore, we decide to delete row 10472 in the following:

## Deleting the Wrong Data

In [6]:
#check rows and column before deletion
print('Number of rows:', len(playstore))
print('Number of columns:', len(playstore[10472]))

del(playstore[10472]) #deleting row with error

#verify rows and columns after deletion
print('Number of rows:', len(playstore))
print('Number of columns:', len(playstore[10472]))

Number of rows: 10841
Number of columns: 12
Number of rows: 10840
Number of columns: 13


## Removing Duplicates
After exploring the datasets we found that some apps have duplicate entries. For example, Instagram has four entries in the Google Play Store dataset

In [7]:
print('Instagram has four entries in the Google Play Store dataset:')
print('\n')
for app in playstore:
    name = app[0] #saving app name to a variable named name
    if name == 'Instagram':
        print(app)

Instagram has four entries in the Google Play Store dataset:


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


**count_duplicate( )**
<br>We have confirmed that there is a duplicate entry in the Google Play Store dataset.
<br>Now, we create a function that counts the number of duplicate apps per dataset, count_duplicate( )
<br>This function will do the following:
<br> 1. Separates the duplicate, and unique apps into two separate list.
<br>2. Return the length of the lists, duplicate, unique, and the total of the two lists along with few examples of the apps in the duplicated app lists.

In [8]:
def count_duplicate(dataset, index):
    duplicate = []
    unique =[]
    
    for app in dataset:
        name = app[index]
        if name in unique:
            duplicate.append(name)
        else:
            unique.append(name)
    
    print('# Unique apps =' + str(len(unique)))
    print('# Duplicate Apps =' + str(len(duplicate)))
    print('# Total Apps =' + str(len(duplicate + unique)))
    print('Examples of duplicate apps:', '\n', duplicate[:10])

print('App Store Dataset')
count_duplicate(appstore, 1)
print('\n')
print('Play Store Dataset')
count_duplicate(playstore, 0)

App Store Dataset
# Unique apps =7195
# Duplicate Apps =2
# Total Apps =7197
Examples of duplicate apps: 
 ['Mannequin Challenge', 'VR Roller Coaster']


Play Store Dataset
# Unique apps =9659
# Duplicate Apps =1181
# Total Apps =10840
Examples of duplicate apps: 
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


**remove_duplicate( )**
<br>Now, we want to remove any duplicate entries that exists in both datasets.
<br>In order to do that, 
create a function to remove duplicate, remove_duplicate( )
<br>we will do the following:
1. Create a dictionary
    <br>-each key is a unique app name
    <br>-each value is the highest number of review of the app
2. Use the key and value in dictionary to create a new dataset
    <br>The new dataset with have one entry per app, and each entry will have the highest number of reviews

In [9]:
def remove_duplicate(dataset, index_name, index_nreviews):
    
    reviews_max = {}
    for app in dataset:
        name = app[index_name]
        n_reviews = float(app[index_nreviews]) 
        
        if name in reviews_max and reviews_max[name] < n_reviews:
            reviews_max[name] = n_reviews
            
        elif name not in reviews_max:
            reviews_max[name] = n_reviews
            
    cleaned = []
    added_already = []
    
    for app in dataset:
        name = app[index_name]
        n_reviews = float(app[index_nreviews])
        
        if (n_reviews == reviews_max[name]) and (name not in added_already):
            cleaned.append(app)
            added_already.append(name)

    return cleaned


print('App Store Dataset')
print('Before remove_duplicate( )')
count_duplicate(appstore, 1)
appstore_clean = remove_duplicate(appstore, 1, 5)
print('\n')
print('After remove_duplicate( )')
count_duplicate(appstore_clean, 1)
print('\n')
print('Play Store Dataset')
print('Before remove_duplicate( )')
count_duplicate(playstore, 0)
print('\n')
print('After remove_duplicate( )')
playstore_clean = remove_duplicate(playstore, 0, 3)
count_duplicate(playstore_clean, 0)

App Store Dataset
Before remove_duplicate( )
# Unique apps =7195
# Duplicate Apps =2
# Total Apps =7197
Examples of duplicate apps: 
 ['Mannequin Challenge', 'VR Roller Coaster']


After remove_duplicate( )
# Unique apps =7195
# Duplicate Apps =0
# Total Apps =7195
Examples of duplicate apps: 
 []


Play Store Dataset
Before remove_duplicate( )
# Unique apps =9659
# Duplicate Apps =1181
# Total Apps =10840
Examples of duplicate apps: 
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


After remove_duplicate( )
# Unique apps =9659
# Duplicate Apps =0
# Total Apps =9659
Examples of duplicate apps: 
 []


Now that we have removed the duplicate apps on our datasets, the next thing that we want to de is to remove the non-English apps. In order to do that, we are going to create a function, is_english( )
## Isolating English Apps
is_english( ) is a function that takes string as a parameter. Inside the fucntion, there is a loop that iterates
over the string. On every iteration, it checks if a character is greater than 127 (ASCII is 0 - 127). The fuction return False if it detects non-English characters in the string, and True otherwise. 

In [10]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127: #check if not English
            return False
    
    return True

print('\n')
print('1.', is_english('Instagram'))
print('2.', is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('3.', is_english('a爱奇艺PPS -《欢乐颂2》电视剧热播')) #string[0] = 'a'
print('4.', is_english('爱奇艺PPS -《欢乐颂2》电视剧热播a')) #string[-1] = 'a'
print('5.', is_english('Docs To Go™ Free Office Suite'))
print('6.', is_english('Instachat 😜'))
    



1. True
2. False
3. False
4. False
5. False
6. False


We can see that the is_english( ) function works as intended for example 1, 2, 3, and 4. However, it doesn't works
as inteded for example 5, and 6. This happens, because emojis and characters like ™ are non-ASCII (correspond to a number greater than 127). Hence, the function return False for example 5, and 6.

**Updating is_english( )**
<br>If we use the is_english( ) function that we currently have, we would lose many relevant datas. Therefore, we are
going to update the function. This time we want to loop over the input string, and check if there are more
than three non-English characters. If there are, then return False, and otherwise return True.



In [11]:
def is_english(string):
    
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print('\n')
print('1.', is_english('Instagram'))
print('2.', is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('3.', is_english('a爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('4.', is_english('爱奇艺PPS -《欢乐颂2》电视剧热播a'))
print('5.', is_english('Docs To Go™ Free Office Suite'))
print('6.', is_english('Instachat 😜'))




1. True
2. False
3. False
4. False
5. True
6. True


The new is_english( ) function works as expected. Now, we are going to filter English only apps on our datasets

In [12]:
print('----------------------------Testing: appstore_clean----------------------------')
print('\n')
explore(appstore_clean, 0, 3, True)

print('\n')
print('Filter english only app: apple')
print('\n')
appstore_english = []

for app in appstore_clean:
    name = app[1]
    
    if is_english(name):
        appstore_english.append(app)

explore(appstore_english, 0, 3, True)

print('\n')
print('----------------------------Testing: playstore_clean----------------------------')
print('\n')
explore(playstore_clean, 0, 3, True)


print('\n')
print('Filter english only app: android')
print('\n')
playstore_english = []

for app in playstore_clean:
    name = app[0]
    
    if is_english(name):
        playstore_english.append(app)

explore(playstore_english, 0, 3, True)

----------------------------Testing: appstore_clean----------------------------


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7195
Number of columns: 16


Filter english only app: apple


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', 

**Since we remove the non-English apps, we can see that the number of apps in both datasets have been reduced:**
<br>App Store: 7195 ---> 6181
<br>Play Store: 9659 ---> 9614
<br><br>The next thing that we want to do is to isolate the free apps in both datasets. In order to do that,
we are go loop through each dataset, and isolate the ones that have price == 0 to two new lists, the_appstore, and the_playstore

## Isolating Free Apps

In [13]:
the_appstore = []
the_playstore = []

for app in appstore_english:
    name = app[0]
    price = app[4]
    
    if price == '0.0':
        the_appstore.append(app)

for app in playstore_english:
    name = app[0]
    price = app[7]
    
    if price == '0':
        the_playstore.append(app)

print('App Store: ', len(the_appstore), 'apps')
print('\n')
print('Play Store:', len(the_playstore), 'apps')

App Store:  3220 apps


Play Store: 8864 apps


**As we can see, the number of apps in both dataset has been decreased even more after we isolated the free apps.**

**By now, we have have done the following:**
<br>1. Deleted the the app with inaccurate data
<br>2. Remove duplicates
<br>3. Removed non-English apps
<br>4. Isolated the free apps

<br>**Next, we want to analyze the data**

# Analyzing the Data
## Finding the Most Common Genres / Categories
We are going to find the most common genres/ categories in App Store & Play Store. To do what, we will analyze the prime_genre column for The App Store dataset, and we will analyze the genres and categoris column for The Play Store dataset.

**We will create two functions, freq_table( ), and display_table( )**

freq_table(dataset, index) will return a frequency table dictionary of any column in a dataset, with frequency expressed as a percentage

display_table(dataset, index) will call the freq_table( ) function to generate a frequency table dictionary, and transform the dictionary to a tuple. The reason we transform the dictionary to a tuple is because will use a built-in function, sorted( ), which works with a tuple but not with a dictionary, to sort the table in descending order.

In [14]:
def freq_table(dataset, index):
    table = {} #new dictionary
    total = 0 #useful later to get percentage
    
    for row in dataset:
        total += 1 #this will be used to calculate the percentage of each value
        value = row[index]
        
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    #converting this to percentages
    table_percentages = {} #new dictionary
    for column in table:
        percentage = (table[column] / total) * 100
        table_percentages[column] = percentage
        
    return table_percentages
    

def display_table(dataset, index):
    
    table = freq_table(dataset, index) #generates frequency table using freq_table()
    table_display = []#new list
    for key in table:
        key_value_as_tuple = (table[key], key) #transform to to a tuple
        table_display.append(key_value_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True) #built-in function. reverse is for ascending / desc
                                                         #check documentation
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print('App Store: prime_genre column')
print('\n')
display_table(the_appstore, 11) #prime_genre for apple
print('\n')
print('\n')
print('Play Store: category column')
print('\n')
display_table(the_playstore, 1) #category for android
print('\n')
print('\n')
print('Play Store: genre column')
print('\n')
display_table(the_playstore, 9) #genre for android


        

App Store: prime_genre column


Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205




Play Store: category column


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524


## Common Genre / Category Analysis:

Based on our data, the free English apps market on The App Store is dominated by games & entertainment, which combines to 66% of the market. Games is the most common with 58.1%, followed by entertainment, photo & video with 7.9% and 5% respectively

The free English apps market on The Google Play Store has more diversity. Based on the category column, family is the most common category with 18.9%, followed by games and tools with 9.7% and 8.5% respectively. Based on the genre column, tools is the most common with 8.5%, followed by entertainment and education with 6.1% and 5.3% respectively.

Our findings suggest that The App Store is populated by apps that are fun & entertaining, while The Play Store has a more balanced distribution between practical & fun apps. Next we want to know the genres / categories that are popular with users. To do that, we are going to find apps that have the most active users. 

It is important to note that we have limited information on both datasets, and thus we will work with relevant data on each dataset to come up with analysis. For **The App Store** dataset, we are going to find the **average rating** per genre by utilizing the **rating_count_tot** column. On the other hand, for **The Google Play Store** dataset, we are going to find the **average installs** per genre and category by utilizing the **installs** column.

In the next section, we are going to make a function, **average_xpergroup(dataset, x_index, group_index)**. This function will do the following:
<br>1. Create a new frequency table dictionary using freq_table(dataset, group_index) function that we have previously made.
<br>2. Remove ',', '+', '' from the number of installs/ ratings column
<br>3. Calculate the average installs/ ratings

In [15]:
def average_xpergroup(dataset, x_index, group_index):
    grouped_dataset = freq_table(dataset, group_index) #prime_genre for apple

    for group in grouped_dataset:
        total = 0 
        len_group = 0 
        
        for app in dataset:
            group_app = app[group_index]
            if group_app == group:
                x = app[x_index]
                x = x.replace(',','')
                x = x.replace('+','')
                x = float(x)
                total += x
                len_group += 1
        ave_x = (total / len_group)
        print(group, ':', ave_x)

print('App Store: Average rating per prime_genre')           
average_xpergroup(the_appstore, 5, -5)
print('\n')
print('Play Store: Average install per category')           
average_xpergroup(the_playstore, 5, 1)
print('\n')
print('Play Store: Average install per genre')  
average_xpergroup(the_playstore, 5, -4)

App Store: Average rating per prime_genre
Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22812.92467948718
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Play Store: Average install per category
ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCA

**Note:** We only want to observe 1 column in The Play Store dataset. When comparing The Play Store genre and category, it appears that they are both pretty similar, except for the fact that The Play Store genre has more detailed classifications, while The Play Store category covers broader area. In the current analysis, we want to look at the overall view. Hence, going forward we will only work with the category column for The Play Store dataset.

## TOP 5 Genres & Categories

<br>**App Store: Genre**
<br>Navigation : 	86090.33
<br>Reference : 	74942.11
<br>Social Networking : 71548.35
<br>Music : 	57326.53
<br>Weather : 	52279.89
<br><br>
<br>**Play Store: Category**
<br>Communication : 	38456119.17
<br>Video Players :	24727872.45
<br>Social : 	23253652.13
<br>Photography : 	17840110.40
<br>Productivity : 	16787331.34

## The App Store: Top 5 Genre Analysis

Navigaton has the highest average active users with 86,090 average reviews. However, if we further analyze the genre turns out the majority of the reviews are intended for Waze and Google Map, with 345,046 and 154,911 reviews respectively. The rest of the apps have a low review number, with the highest being Geocaching with 12,811 reviews, which is way below the genre average.
<br>

In [16]:
print('The App Store') 
print('---Navigation---')   
for app in the_appstore:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings
print('\n')


The App Store
---Navigation---
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5




The navigation genre doesn't show a positive outlook. If we were to remove Waze and Google Maps from the genre, we will get a really low average reviews number.

**new_average( )**
<br>We are going to make a new function, new_average( ), which will calculate the new average number of reviews/ installs of a genre/ category if we were to remove some apps from our analysis. In the following, we are going to remove Waze and Google Maps from our analysis consideration, so we will only calculate apps with review < 150000

In [17]:
def new_average(dataset, index_reviews, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        reviews = app[index_reviews]
        reviews = reviews.replace(',', '')
        reviews = reviews.replace('+', '')
        if (app[index_genre] == genre_string) and (float(reviews) < n):
            under_n.append(float(reviews))
    
    print('New average:', sum(under_n) / len(under_n))


new_average(the_appstore, 5, -5, 'Navigation', 150000)



New average: 4146.25


As can be seen from the output above, the new average is extremely low = 4146 reviews.  If we were to be successful in making a navigation app, we have to make one that can compete with Waze and Google Maps. Realistically, it would be difficult to do so. Hence, it is not recommended to create a navigation app.

Reference has the second highest average active users with  74,942 average reviews. Similar to the navigation genre, there are a few apps that skew up the average rating, which in this case are Bible, Dictionary.com.


In [18]:
print('---Reference---') 
for app in the_appstore:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings
print('\n')


---Reference---
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0




The reference genre is interesting to look at, because if were to remove Bible, Dictionary.com from our analysis consideration, then we would still get a relatively high average value. In addition to that, there are still a few apps where we can get our inspiration from. In the following, we will calculate the new average of apps with an average reviews < 200,000 to remove Bible and Dictionary.com

In [19]:
def new_average(dataset, index_reviews, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        reviews = app[index_reviews]
        reviews = reviews.replace(',', '')
        reviews = reviews.replace('+', '')
        if (app[index_genre] == genre_string) and (float(reviews) < n):
            under_n.append(float(reviews))
    
    print('New average:', sum(under_n) / len(under_n))


new_average(the_appstore, 5, -5, 'Reference', 200000)

New average: 10186.9375


One of the apps that has a high number of review is **Muslim Pro**, which seems to be an app designed for muslims. Based on the app subtitle, Ramadan 2017 Prayer Times, Azan, Quran, it seems that the app allows muslims to know information about prayer times, and other relevant information. Religious app like Muslim Pro can be feasibly produced, and since it seems that there is a market for it it would be recommended to produce similar apps.

Social Networking has the third average most active users with 71548 average reviews. Similar to the navigation and reference genre, the high number of average reviews are infuenced by social media giants like Facebook, Pinterest, Skype, etc.

In [20]:
print('---Social Networking---')  
for app in the_appstore:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

---Social Networking---
Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Ta

The social networking genre doesn't show a lot of promise. A lof of the apps with high reviews are produced by tech giants, and it would be difficult to make an app that can compete against them. We are going to calculate the new average number of reviews if we remove apps that are made by tech giants (top 8 apps), so apps that we are considering will have reviews < 170,000

In [21]:
def new_average(dataset, index_reviews, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        reviews = app[index_reviews]
        reviews = reviews.replace(',', '')
        reviews = reviews.replace('+', '')
        if (app[index_genre] == genre_string) and (float(reviews) < n):
            under_n.append(float(reviews))
    
    print('New average:', sum(under_n) / len(under_n))


new_average(the_appstore, 5, -5, 'Social Networking', 170000)

New average: 17984.61224489796


It is interesting find that the new average number of reviews are still relatively high, with 17984 average reviews. However, if we take a look at the rest of the apps, there are notable names like ooVoo, LinkedIn, etc, which can be considered a strong competition if we were to make a social networking app. Hence, we would advise against making one in this genre.

Music has the fourth average most active users with 57327 average reviews. Similar to all the aforementioned genres, the majority of high reviews are only obtained by few highly popular apps in the top 5.
<br>


In [22]:
print('---Music---')  
for app in the_appstore:
    if app[-5] == 'Music':
        print(app[1], ':', app[5]) # print name and number of ratings

---Music---
Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 739

There might be an opportunity in the music apps genre. If we remove the top five apps from consideration, then the rest of the data seems to have a balanced distribution between different types of music apps. In the following, we will calculate the new average of apps with reviews < 135,000

In [23]:
def new_average(dataset, index_reviews, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        reviews = app[index_reviews]
        reviews = reviews.replace(',', '')
        reviews = reviews.replace('+', '')
        if (app[index_genre] == genre_string) and (float(reviews) < n):
            under_n.append(float(reviews))
    
    print('New average:', sum(under_n) / len(under_n))


new_average(the_appstore, 5, -5, 'Music', 135000)

New average: 15511.672131147541


The new average value is 15,511, which can be considered to be in the medium range. However, it would be difficult to compete with the established apps on the genre. Hence, it is not recommended to make a music app.

## Overview below
Weather has the fifth most active user with 52279 average reviews. There are 6 apps with reviews > 100,000, 7 apps with reviews between 1,000 and 50,000, and the other 15 apps have reviews < 1,000.

In [24]:
print('---Weather---')  
for app in the_appstore:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5]) # print name and number of ratings

---Weather---
The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast

If we were to remove the two apps that dominate the weather genre, The Weather Channel, and The Weather Channel App for iPad then it seems that we may still be getting a pretty high average value. In the following, we will calculate the new average of apps with reviews < 200,000

In [25]:
def new_average(dataset, index_reviews, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        reviews = app[index_reviews]
        reviews = reviews.replace(',', '')
        reviews = reviews.replace('+', '')
        if (app[index_genre] == genre_string) and (float(reviews) < n):
            under_n.append(float(reviews))
    
    print('New average:', sum(under_n) / len(under_n))


new_average(the_appstore, 5, -5, 'Weather', 200000)

New average: 29213.96153846154


There might be an opportunity in this genre. The genre still have a relatively high number of average reviews even when removing the top reviewed apps. This means that there is a demand for apps in this genre.

## Summary of findings in The App Store genre

The App Store genre that has most reviews are navigation, reference, social networking, music, and weather. We have analyzed the apps in each genre by trying to remove the top apps from our analysis consideration, and calculating the new average value. Here are the new average for The App Store data: 

<br>**App Store: Genre New Average**
<br>Weather : 29214
<br>Social Networking : 17985
<br>Music : 15512
<br>Reference : 10187
<br>Navigation : 4146

Looking at the information above, we can see that Weather has the highest new average value. There is a high probability that we can be successful in creating a weather application, since there seems to be a strong demand on the market. 

Social networking and music has a relatively high new average number, but we advise against creating apps in those genres. The reason is because they already have apps made by tech giants, like Facebook/Pinterest for the social networking genre, and Pandora/ Spotify for the music genre. 

Reference genre is an interesting niche. One of the apps that catch our attention is Muslim Pro, which is one of the most popular apps in the genre. We can get an inspiration from The Muslim Pro app, and try to make an app that caters to religious people.

Navigation has a really low new average number, after we remove Waze and Google Maps from our analysis. Looking at the new average value we don't recommend creating a navigation app. 

**Current recommendation: Weather and Reference**

## The Play Store: Top 5 Category Analysis

It is important to note that the installs column in The Play Store dataset doesn't have a precise number, most of the values are open ended, ie: 100+, 500+, etc as can be seen below:

In [26]:
display_table(the_playstore, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


Not having a precise number is fine, because we don't need number precision for our analysis. We only want to get an idea of what type of apps are popular. However, since the installs column doesn't have a precise number we can use it to our advantage. In the following, we are going to analyze only the apps with '1,000,000,000+', '500,000,000+' or '100,000,000+' installs in each category.

Communication has the highest number of installs in the category, with 38456119 average installs

In [27]:
print('The Play Store')
print('---Communication---')
for app in the_playstore:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])


The Play Store
---Communication---
WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Cal

The communication category is heavily populated by highly popular apps with more than 100m installs. Realistically, it would be hard to compete against them. Below, we remove apps with 100m+ installs to calculate the new average installs value

In [28]:

def new_average(dataset, index_installs, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        installs = app[index_installs]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        if (app[index_genre] == genre_string) and (float(installs) < n):
            under_n.append(float(installs))
    
    print(sum(under_n) / len(under_n))

new_average(the_playstore, 5, 1, 'COMMUNICATION', 100000000)


3603485.3884615386


As we can see, the new average value is about 10 times less the actual value. Based on this information we can see that the high average installs value are skewed by the highly popular apps, which makes it hard for a new app to compete against.

The video player category has the second highest number of installs, with  24727872 average installs.

In [29]:
print('---Video Players---')
for app in the_playstore:
    if app[1] == 'VIDEO_PLAYERS':
        print(app[0], ':', app[5])
print('\n')


---Video Players---
YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Mak

Similar to the communication category, there are a lot of apps with more than 100m installs. We will try to remove them from our analysis and calculate the new average value in the following:

In [30]:
def new_average(dataset, index_installs, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        installs = app[index_installs]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        if (app[index_genre] == genre_string) and (float(installs) < n):
            under_n.append(float(installs))
    
    print(sum(under_n) / len(under_n))

new_average(the_playstore, 5, 1, 'VIDEO_PLAYERS', 100000000)

5544878.133333334


The new average value is roughly 5 times less than the actual value. The possibility of creating a succesfull app in this category is higher compared to making one in the communication category. However, at this stage we would consider the average value of the other categories before making any decision.

Social has the third most installs in the category, with 23253652 average installs.

In [31]:
print('---Social---')
for app in the_playstore:
    if app[1] == 'SOCIAL':
        print(app[0], ':', app[5])

---Social---
Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys

Similar to the communication and video players category, there are a lot of apps with more than 100m installs. We will try to remove them from our analysis and calculate the new average value in the following:

In [32]:
def new_average(dataset, index_installs, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        installs = app[index_installs]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        if (app[index_genre] == genre_string) and (float(installs) < n):
            under_n.append(float(installs))
    
    print(sum(under_n) / len(under_n))

new_average(the_playstore, 5, 1, 'SOCIAL', 100000000)

3084582.5201793723


The new average value is roughly 8 times less than the actual value. This means communication category has the same pattern as the social category. There are a lot of highly popular apps in his category and it would be hard to compete against them. Hence, we would recommend against making an app in this genre.

The photography category has the fourth highest average installs number, with 17840110 installs.

In [33]:
print('---Photography---')
for app in the_playstore:
    if app[1] == 'PHOTOGRAPHY':
        print(app[0], ':', app[5])

---Photography---
TouchNote: Cards & Gifts : 1,000,000+
FreePrints – Free Photos Delivered : 1,000,000+
Groovebook Photo Books & Gifts : 500,000+
Moony Lab - Print Photos, Books & Magnets ™ : 50,000+
LALALAB prints your photos, photobooks and magnets : 1,000,000+
Snapfish : 1,000,000+
Motorola Camera : 50,000,000+
HD Camera - Best Cam with filters & panorama : 5,000,000+
LightX Photo Editor & Photo Effects : 10,000,000+
Sweet Snap - live filter, Selfie photo edit : 10,000,000+
HD Camera - Quick Snap Photo & Video : 1,000,000+
B612 - Beauty & Filter Camera : 100,000,000+
Waterfall Photo Frames : 1,000,000+
Photo frame : 100,000+
Huji Cam : 5,000,000+
Unicorn Photo : 1,000,000+
HD Camera : 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera : 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor : 1,000,000+
Moto Photo Editor : 5,000,000+
InstaBeauty -Makeup Selfie Cam : 50,000,000+
Garden Photo Frames - Garden Photo Editor : 500,000+
Photo Frame : 10,000,000+
Selfie Ca

PIP-Camera FN Photo Effect : 1,000+
Photo Editor Collage Maker Pro : 100,000,000+
Free Slideshow Maker & Video Editor : 10,000,000+
Thumbnail Maker : 1,000,000+
PhotoFunia : 10,000,000+


To follow a similar analysis procedure as what we have previously done in The Play Store categories, we will remove the apps with more than 100m install from our analysis consideration.

In [34]:
def new_average(dataset, index_installs, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        installs = app[index_installs]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        if (app[index_genre] == genre_string) and (float(installs) < n):
            under_n.append(float(installs))
    
    print(sum(under_n) / len(under_n))

new_average(the_playstore, 5, 1, 'PHOTOGRAPHY', 100000000)

7670532.29338843


The new average value of 7670532 is only around 2.5 times less than the actual value. This number is still relatively high, which means there is a demand on the market. Thus, our chances on creating a succesfull app in this category is higher compared to making one in the other categories.

In [35]:
print('---Productivity---')
for app in the_playstore:
    if app[1] == 'PRODUCTIVITY':
        print(app[0], ':', app[5])

---Productivity---
Microsoft Word : 500,000,000+
All-In-One Toolbox: Cleaner, Booster, App Manager : 10,000,000+
AVG Cleaner – Speed, Battery & Memory Booster : 10,000,000+
QR Scanner & Barcode Scanner 2018 : 10,000,000+
Chrome Beta : 10,000,000+
Microsoft Outlook : 100,000,000+
Google PDF Viewer : 10,000,000+
My Claro Peru : 5,000,000+
Power Booster - Junk Cleaner & CPU Cooler & Boost : 1,000,000+
Google Assistant : 10,000,000+
Microsoft OneDrive : 100,000,000+
Calculator - unit converter : 50,000,000+
Microsoft OneNote : 100,000,000+
Metro name iD : 10,000,000+
Google Keep : 100,000,000+
Archos File Manager : 5,000,000+
ES File Explorer File Manager : 100,000,000+
ASUS SuperNote : 10,000,000+
HTC File Manager : 10,000,000+
MyMTN : 1,000,000+
Dropbox : 500,000,000+
ASUS Quick Memo : 10,000,000+
HTC Calendar : 10,000,000+
Google Docs : 100,000,000+
ASUS Calling Screen : 10,000,000+
lifebox : 5,000,000+
Yandex.Disk : 5,000,000+
Content Transfer : 5,000,000+
HTC Mail : 10,000,000+
Advanc

EY Events Switzerland : 500+
EZ Inspections : 10,000+
EZ Display : 50,000+
EZ File Manager(Root Explorer) : 100,000+
EZ Notes Lite : 1,000+
EZ Switch Widget : 100,000+
MultiMessage for FB Messenger : 50,000+
IDM for Facebook ★ Downloader : 100,000+
Chelsea FC Official Keyboard : 1,000,000+
Avast Wi-Fi Finder : 1,000,000+
Safe Wi-Fi : 1,000+
Wi-Fi Networks : 10,000+
Fast Notes FJ : 500+
FK Events : 5+
Charlotte County, FL : 1,000+
FL House : 1,000+
Florida Cooling Supply HVAC : 500+
Fort Myers FL : 100+
MY GULFPORT FL : 50+
Future Cloud : 100,000+
FQ METER : 1,000+
PLMGSS FR : 10+


Below, we will remove the apps with more than 100m installs from our consideration to see how the new average value compared to the ones in the other categories

In [36]:
def new_average(dataset, index_installs, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        installs = app[index_installs]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        if (app[index_genre] == genre_string) and (float(installs) < n):
            under_n.append(float(installs))
    
    print(sum(under_n) / len(under_n))

new_average(the_playstore, 5, 1, 'PRODUCTIVITY', 100000000)

3379657.318885449


The productivity category has a relatively low average value compared to the other categories. This genre doesn't appear to be in high demand

## Summary of findings in The Play Store  Category

The App Store genre that has most installs are communication, video players, social, photography, and productivity. We have analyzed the apps in each genre by trying to remove the top apps from our analysis consideration, and calculating the new average value. Here are the new average values for The Play Store data: 

<br>**Play Store: Category**
<br>Photography : 7670532
<br>Video Players :	5544878
<br>Communication : 3603485
<br>Productivity : 	3379657
<br>Social : 3084582

As we can see, photography has the highest new average value. If we remove apps with 100m+ installs from our analysis, it appears that photography has the strongest demand compared to the other categories. There is a big chance that we can create a successful app in this category. However, none of the other top 5 categories in The Play Store give a positive outlook.

Since from The Play Store data photography is our recommended category, we want to analyze the photo & video genre in The App Store dataset to look at their number of reviews. In addition to that, if we look back at The App Store data, our recommended genre were weather and reference. We want to know how it performs in The Play Store dataset, so we are also going to analyze them using the same procedure as before.

**In the following, we will analyze Photo & Video from the App Store dataset, followed by reference and weather genre from The Play Store dataset.**

Photo & Video has 28442 average reviews in The App Store dataset

In [37]:
print('---Photo & Video---')  
for app in the_appstore:
    if app[-5] == 'Photo & Video':
        print(app[1], ':', app[5]) # print name and number of ratings

---Photo & Video---
Instagram : 2161558
Snapchat : 323905
YouTube - Watch Videos, Music, and Live Streams : 278166
Pic Collage - Picture Editor & Photo Collage Maker : 123433
Funimate video editor: add cool effects to videos : 123268
musical.ly - your video social network : 105429
Photo Collage Maker & Photo Editor - Live Collage : 93781
Vine Camera : 90355
Google Photos - unlimited photo and video storage : 88742
Flipagram : 79905
Mixgram - Picture Collage Maker - Pic Photo Editor : 54282
Shutterfly: Prints, Photo Books, Cards Made Easy : 51427
Pic Jointer – Photo Collage, Camera Effects Editor : 51330
Color Pop Effects - Photo Editor & Picture Editing : 45320
Photo Grid - photo collage maker & photo editor : 40531
iSwap Faces LITE : 39722
MOLDIV - Photo Editor, Collage & Beauty Camera : 39501
Photo Editor by Aviary : 39501
Photo Lab: Picture Editor, effects & fun face app : 34585
Rookie Cam - Photo Editor & Filter Camera : 33921
FotoRus -Camera & Photo Editor & Pic Collage Maker : 32

The top three apps are made by tech giants, we are going to remove them from the analysis and calculate the new average value

In [38]:
def new_average(dataset, index_installs, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        installs = app[index_installs]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        if (app[index_genre] == genre_string) and (float(installs) < n):
            under_n.append(float(installs))
    
    print(sum(under_n) / len(under_n))

new_average(the_appstore, 5, -5, 'Photo & Video', 150000)

11382.28025477707


The new average value is roughly 2.5 times less than the actual value. There is a decent potential on this genre.

Books and references has 8767811.89 installs on The Play Store dataset

In [39]:
print('---Books And Reference---')
for app in the_playstore:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

---Books And Reference---
E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000

In [40]:
def new_average(dataset, index_installs, index_genre, genre_string, n):
    under_n = []
    
    for app in dataset:
        installs = app[index_installs]
        installs = installs.replace(',', '')
        installs = installs.replace('+', '')
        if (app[index_genre] == genre_string) and (float(installs) < n):
            under_n.append(float(installs))
    
    print(sum(under_n) / len(under_n))

new_average(the_playstore, 5, 1, 'BOOKS_AND_REFERENCE', 100000000)

1437212.2162162163


The new average value is not in the medium- high range, but still a decent value considering reference is an interesting niche in the App Store

Weather has 5074486 reviews in The Play Store dataset

In [41]:
print('---Weather---')
for app in the_playstore:
    if app[1] == 'WEATHER':
        print(app[0], ':', app[5])

---Weather---
The Weather Channel: Rain Forecast & Storm Alerts : 50,000,000+
Weather forecast : 1,000,000+
AccuWeather: Daily Forecast & Live Weather Reports : 50,000,000+
Live Weather Pro : 10,000+
Weather by WeatherBug: Forecast, Radar & Alerts : 10,000,000+
weather - weather forecast : 1,000,000+
MyRadar NOAA Weather Radar : 10,000,000+
SMHI Weather : 1,000,000+
Free live weather on screen : 1,000,000+
Weather Radar Widget : 1,000,000+
Weather –Simple weather forecast : 10,000,000+
Weather Crave : 5,000,000+
Klara weather : 500,000+
Yahoo Weather : 10,000,000+
Real time Weather Forecast : 1,000,000+
METEO FRANCE : 5,000,000+
APE Weather ( Live Forecast) : 5,000,000+
Live Weather & Daily Local Weather Forecast : 1,000,000+
Weather : 10,000,000+
Rainfall radar - weather : 5,000,000+
Yahoo! Weather for SH Forecast for understanding the approach of rain clouds Free : 1,000,000+
The Weather Network : 5,000,000+
Klart.se - Sweden's best weather : 1,000,000+
GO Weather - Widget, Theme, Wa

It is interesting because weather category in The App Store doesn't have any app with 100m+ installs. There is not any apps that dominate this category.

In [42]:
print('---Weather---')
for app in the_playstore:
    if app[1] == 'WEATHER' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

---Weather---


**New Average:**

**-------------App Store-------------**
<br>Weather: 29214
<br>Photo & Video: 11382
<br>Reference: 10187 


**-------------Play Store-------------**
<br>Photography: 7670532
<br>Weather: 5074486
<br>Books & References: 1437212




## Final Summary
When we remove the top apps in each genre/ category that skewed the average value number, apparently weather has the new highest average value in The App Store dataset and placed second in The Play Store dataset. Photo & video genre placed second in the App Store dataset, and placed third in The Play Store dataset. Finally, reference genre placed third both The App Store and The Play Store dataset. 
<br><br>
We recommend creating an app in weather, reference, and photography category/genre. One idea is to combine an app that can be classified to the three aforementioned categories/ genre. However, there is one interesting niche that we have found, which is the reference genre in The App Store. Note that even though reference/books & references placed third in both datasets, it is an interesting niche. There is an app, Muslim Pro, where we can get our inspiration from. We may be able to create an app similar to Muslim Pro, that also incorporate some weather, and photography features.
