## Goal
Our aim is to help our developers understand what type of apps are likely to attract more users on Google Play and the App Store.<br>
**Role** : Data Analyst<br>
**Company**: Mobile Company<br>
**Description**: Builds Android and iOS mobile apps and make them available on Google Play and the App store.
The apps are free to download and install. the company's main source of revenue consist of in-app ads.<br>
The more users who see and engage with the ads, the better. 
**Goal** : Analyze data to help developers understand what type of apps are likely to attract more users on Google Play and the App Store.

## Opening and Exploring the data

In [180]:
from csv import reader

android = list(reader(open('Data/googleplaystore.csv', encoding='utf8')))
android_header = android[0] #header row
android = android[1:] #dataset without header row

In [181]:
apple = list(reader(open('Data/AppleStore.csv', encoding='utf8')))
apple_header = apple[0]
apple = apple[1:]

#### Preview data

In [182]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    #slicing through the  dataset
    
     #Loops through the slice, and for each iteration, prints a row and adds a new line after that row
    for row in dataset_slice:            
        print(row)
        print('\n')
        
    #Prints the number of rows and columns if rows_and_columns is True
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

The function takes in four parameters: _dataset_, _start_, _end_, _rows_and_columns_<br>
**dataset** - it is a list of lists<br>
**start** & **end** - which will both be integers and represent the starting and the ending indices of a slice from the dataset<br>
**rows_and_columns** - which will be a Boolean and has False as a default argument



In [183]:
#Previewing the android data header row
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [184]:
#Previewing the android data
explore_data(android, 1, 4, rows_and_columns= 'TRUE')

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


In [185]:
#Previewing the ios data header row
print(apple_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [186]:
#Previewing the ios data
explore_data(apple, 1,4, rows_and_columns='True')

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


In [187]:
print(apple[1])

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


## Data Cleaning

In the previous step, we opened the two data sets and explored the data.<br>
Before beginning our analysis, we need to make sure the data we analyze is accurate, or the results of our analysis will be wrong.<br>
This means that we need to do the following:<br><br>

- Detect inaccurate data, and correct or remove it.
- Detect duplicate data, and remove the duplicates.<br><br>

Recall that at our company, we only build apps that are free to download and install, and we design them for an English-speaking audience.<br>
This means that we'll need to do the following:<br><br>

- Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播.<br>
- Remove apps that aren't free.<br>

#### Removing duplicates

In [188]:
#Function to check for duplicates
def duplicate_apps(datastore, index): #datastore: android/apple store
    duplicates = []   #empty list to store duplicated apps
    unique_apps = []  #empty lisr to store the non-duplicates apps
    for app in datastore:
        name = app[index]
        if name in unique_apps:
            duplicates.append(name)
        else:
            unique_apps.append(name)
    return unique_apps, duplicates

In [189]:
# Function call
android_apps, android_duplicates = duplicate_apps(android, 0)
apple_apps, apple_duplicates = duplicate_apps(apple,1)

In [190]:
#print first 10 unique android apps
android_apps[:10]

['Photo Editor & Candy Camera & Grid & ScrapBook',
 'Coloring book moana',
 'U Launcher Lite – FREE Live Cool Themes, Hide Apps',
 'Sketch - Draw & Paint',
 'Pixel Draw - Number Art Coloring Book',
 'Paper flowers instructions',
 'Smoke Effect Photo Maker - Smoke Editor',
 'Infinite Painter',
 'Garden Coloring Book',
 'Kids Paint Free - Drawing Fun']

In [191]:
#print first 7 duplicate android apps
android_duplicates[:7]

['Quick PDF Scanner + OCR FREE',
 'Box',
 'Google My Business',
 'ZOOM Cloud Meetings',
 'join.me - Simple Meetings',
 'Box',
 'Zenefits']

In [192]:
#print first 5 unique ios apps
apple_apps[:5]

['Facebook',
 'Instagram',
 'Clash of Clans',
 'Temple Run',
 'Pandora - Music & Radio']

In [193]:
#duplicate ios apps
apple_duplicates

['Mannequin Challenge', 'VR Roller Coaster']

##### Summary

In [194]:
print('Number of unique android apps:',len(android_apps))
print('Number of duplicate android apps:',len(android_duplicates))
print('\n')
print('Number of unique ios apps:',len(apple_apps))
print('Number of duplicate ios apps:',len(apple_duplicates))

Number of unique android apps: 9660
Number of duplicate android apps: 1181


Number of unique ios apps: 7195
Number of duplicate ios apps: 2


In [195]:
#Percentage of unique apps
unique_android_percentage =(len(android_apps)/len(android))*100
print('Percentage of unique android apps: ',unique_android_percentage)

Percentage of unique android apps:  89.10617101743381


In [196]:
unique_apple_percentage =(len(apple_apps)/len(apple))*100
print('Percentage of unique ios apps: ',unique_apple_percentage)

Percentage of unique ios apps:  99.9722106433236


### Investigating the duplicate rows

In [197]:
#Considering an app with duplicates
for app in android:
    if app[0] == 'ZOOM Cloud Meetings':
        print (app)

['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']
['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']


In [198]:
for app in android:
    if app[0] == 'Instagram':
        print (app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


##### Observation
For example in the case of the Instagram app, there is a variation in the reviews column index[3]<br>
We can use this information to build a criterion for removing the duplicates. The higher the number of reviews,the more recent the data should be.<br>
Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.<br><br>

To do that, we will:<br>

- Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app
- Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

In [199]:
#Investigating ios app with duplicates
for app in apple:
    if app[1] == 'VR Roller Coaster':
        print(app)

['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


In [200]:
for app in apple:
    if app[1] == 'Mannequin Challenge':
        print(app)

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']


In [201]:
#pip install numerize

##### Observation
For the ios data, only the app name is duplicate the other values contain essential information that is not duplicate

In [202]:
#Building the dictionary to extract duplicate apps with the highest reviews 

def max_review(datastore, app_index, review_index):
    review_max = {}  #empty dictionary
    for app in datastore:
        name = app[app_index]
        n_reviews = float(app[review_index])
        
        if name in review_max and review_max[name] < n_reviews:
            review_max[name] = n_reviews
        if name not in review_max:
            review_max[name] = n_reviews
    return review_max

In [203]:
# Function call
android_review_max = max_review(android, 0, 3)
apple_review_max = max_review(apple, 1, 7)

In [204]:
# Check length to make sure all duplicates are removed
print('Expected length:', len(android) - 1181)
print('Actual length:', len(android_review_max))

Expected length: 9660
Actual length: 9660


In [205]:
# Check length to make sure all duplicates are removed
print('Expected length:', len(apple) - 2)
print('Actual length:', len(apple_review_max))


Expected length: 7195
Actual length: 7195


In [206]:
#Function that references our frequency table, for name and most reviews, and will remove the duplicate entries and provide us with clean datasets.
def duplicatefree(review_max, store, name_index, review_index):
    clean = []
    added = []
    for app in store:
        name = app[name_index]
        n_reviews = float(app[review_index])
        if (name not in added) and (n_reviews == review_max[name]):
            clean.append(app)
            added.append(name)
    return clean

clean_android = duplicatefree(android_review_max, android, 0, 3)
clean_apple = duplicatefree(apple_review_max, apple, 1, 7)

#### Removing non-english apps

In [207]:
def eng_check(string, english=True):
    count = 0
    for char in string:
        check = ord(char)
        if check > 127:
            count += 1
            if count >= 3:
                english=False
                count = 0
    return english

 The function  detects whether a character belongs to the set of common English characters or not. Using ord() if the number is equal to or less than 127, then the character belongs to the set of common English characters. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range.

In [208]:
#Function call
print(eng_check('Instagram'))
print(eng_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_check('Docs To Go™ Free Office Suite'))
print(eng_check('Instachat 😜'))

True
False
True
True


In [209]:
#Function to iterate around our dataset and check for english apps
def clean_lang(datastore, name_index):
    clean = []            #list of english apps
    for app in datastore:
        if eng_check(app[name_index]):
            clean.append(app)
    return clean

android_eng = clean_lang(clean_android, 0)
apple_eng = clean_lang(clean_apple, 1)

In [210]:
# Check new length
print(len(android_eng))
print(len(apple_eng))

9598
6153


#### Removing non-free apps

In [211]:
#Isolating the free apps in both App stores
android_free = []
apple_free = []

for app in android_eng:
    price = app[7]
    if price == '0':
        android_free.append(app)

for app in apple_eng:
    price = app[4]
    if price == '0.0':
        apple_free.append(app)
        
print(len(android_free))
print(len(apple_free))

8849
3201


## Data Analysis

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.<br>
To minimize risks and overhead, our validation strategy for an app idea has three steps:<br><br>

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.<br><br>

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets.<br>
For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

### Most common apps by Genre

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.

In [212]:
def genre_freq(datastore, genre_index):
    frequency = {}
    for app in datastore:
        genre = app[genre_index]
        
        if genre in frequency:
            frequency[genre] += 1
        else:
            frequency[genre] = 1
    for value in frequency:
        frequency[value] = round((frequency[value]/len(datastore))*100, 2) ## Changes value from count to percentage
    return frequency


In [213]:
## Google Play 'Genres'

display_table(android_free, 9)

Tools : 8.44
Entertainment : 6.08
Education : 5.36
Business : 4.6
Productivity : 3.9
Lifestyle : 3.88
Finance : 3.71
Medical : 3.54
Sports : 3.46
Personalization : 3.32
Communication : 3.23
Action : 3.1
Health & Fitness : 3.09
Photography : 2.95
News & Magazines : 2.8
Social : 2.67
Travel & Local : 2.33
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.05
Dating : 1.86
Arcade : 1.84
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.39
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.92
House & Home : 0.8
Weather : 0.79
Events : 0.71
Adventure : 0.67
Comics : 0.6
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Trivia : 0.42
Casino : 0.42
Educational;Education : 0.4
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;Br

In [214]:
# Transforms the frequency table into a list of tuples for sorting
def display_table(dataset, index):
    table = genre_freq(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [215]:
## Google Play 'Category'

display_table(android_free, 1)

FAMILY : 18.94
GAME : 9.7
TOOLS : 8.45
BUSINESS : 4.6
PRODUCTIVITY : 3.9
LIFESTYLE : 3.89
FINANCE : 3.71
MEDICAL : 3.54
SPORTS : 3.39
PERSONALIZATION : 3.32
COMMUNICATION : 3.23
HEALTH_AND_FITNESS : 3.09
PHOTOGRAPHY : 2.95
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.67
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.8
MAPS_AND_NAVIGATION : 1.39
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.8
WEATHER : 0.79
EVENTS : 0.71
PARENTING : 0.66
ART_AND_DESIGN : 0.64
COMICS : 0.61
BEAUTY : 0.6
1.9 : 0.01


It seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

In [216]:
## Apple App Store 'prime_genre'

display_table(apple_free, 11)

Games : 58.23
Entertainment : 7.84
Photo & Video : 5.0
Education : 3.69
Social Networking : 3.31
Shopping : 2.59
Utilities : 2.47
Sports : 2.16
Music : 2.06
Health & Fitness : 2.03
Productivity : 1.75
Lifestyle : 1.56
News : 1.34
Travel : 1.25
Finance : 1.09
Weather : 0.87
Food & Drink : 0.81
Reference : 0.53
Business : 0.53
Book : 0.37
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


#### Observation
The landscape seems significantly different on Apple Store: We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.<br>

#### Conclusion
The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

### Popularity by genre
The next thing I would like to look at what genres are the most popular. To determine which apps have the most users I'll use the 'Installs' column from the Android dataset. This information is missing for the iOS dataset. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the 'rating_count_tot'. I'll also be using 'Rating' from the Android set and 'user_rating' from the iOS set, which both represent average user rating.

I'll generate a frequency tables similar to above but this time I'll make one table for average number of ratings and another for average rating score. Then we'll multiply those values to get a score that represents overall popularity.

In [217]:
def rating_count(datastore, genre_index, rating_index):
    genres = genre_freq(datastore, genre_index)
    for genre in genres:
        total = 0
        len_genre = 0
        for app in datastore:
            genre_app = app[genre_index]
            if genre_app == genre:
                total += float(app[rating_index])
                len_genre += 1
        avg_ratings = round(total / len_genre, 2)
        print(genre, ':', avg_ratings)


#### Most popular genres on google store

In [218]:
#Function call
print(rating_count(android_free, 9, 3)) #(Genres,Reviews)

Art & Design : 26119.11
Art & Design;Creativity : 5677.83
Auto & Vehicles : 14140.28
Beauty : 7476.23
Books & Reference : 88460.63
Business : 24239.73
Comics : 44180.17
Comics;Creativity : 516.0
Communication : 999089.61
Dating : 21953.27
Education : 16211.38
Education;Creativity : 6384.25
Education;Education : 234564.87
Education;Pretend Play : 25151.4
Education;Brain Games : 192591.0
Entertainment : 103389.25
Entertainment;Brain Games : 79104.29
Entertainment;Creativity : 143559.33
Entertainment;Music & Video : 79679.53
Events : 2555.84
Finance : 38535.9
Food & Drink : 57478.79
Health & Fitness : 78094.97
House & Home : 27113.31
Libraries & Demo : 10925.81
Lifestyle : 33807.31
Lifestyle;Pretend Play : 140995.0
Card : 166334.98
Arcade : 713174.83
Puzzle : 215662.56
Racing : 597997.16
Sports : 214124.96
Casual : 837706.01
Simulation : 142850.22
Adventure : 285217.73
Trivia : 193939.57
Action : 544150.71
Word : 228272.04
Role Playing : 249256.83
Strategy : 1251841.81
Board : 121552.94
M

#### Most popular genres on apple store

In [219]:
print('\n')
print(rating_count(apple_free, 11, 5 )) #(prime_genre, rating_total_count)



Social Networking : 71548.35
Photo & Video : 28441.54
Games : 22910.53
Music : 57326.53
Reference : 79350.47
Health & Fitness : 23298.02
Weather : 52279.89
Utilities : 19156.49
Travel : 28243.8
Shopping : 27230.73
News : 21248.02
Navigation : 86090.33
Lifestyle : 16815.48
Entertainment : 14195.36
Food & Drink : 33333.92
Sports : 23008.9
Book : 46384.92
Finance : 32367.03
Education : 7003.98
Productivity : 21028.41
Business : 7491.12
Catalogs : 4004.0
Medical : 612.0
None


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [220]:
for app in apple_free:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Here are the top 5 iOS app categories with the highest popularity scores:

1. Navigation
2. Social Networking
3. Reference
4. Music
5. Weather

In [221]:
for app in apple_free:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. 

In [222]:
for app in apple_free:
    if app[-5] == 'Music':
        print(app[1], ':', app[5]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

In [223]:
for app in apple_free:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating<br>

Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. 

### Popularity by number of installations

In [224]:
def install_count(datastore, genre_index, install_index):
    genres = genre_freq(datastore, genre_index)
    for genre in genres:
        total = 0
        len_genre = 0
        for app in datastore:
            genre_app = app[genre_index]
            if genre_app == genre:
                installs = app[install_index]
                installs = installs.replace(',', '')
                installs = installs.replace('+', '')
                total += float(installs)
                len_genre += 1
        avg_installs = round(total / len_genre, 2)
        print(genre, ':', avg_installs)

In [225]:
#Function call
print(install_count(android_free, 1, 5))

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 647317.82
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 8814199.79
BUSINESS : 1712290.15
COMICS : 832613.89
COMMUNICATION : 38590581.09
DATING : 854028.83
EDUCATION : 1833495.15
ENTERTAINMENT : 11640705.88
EVENTS : 253542.22
FINANCE : 1387692.48
FOOD_AND_DRINK : 1924897.74
HEALTH_AND_FITNESS : 4188821.99
HOUSE_AND_HOME : 1360598.04
LIBRARIES_AND_DEMO : 638503.73
LIFESTYLE : 1446158.22
GAME : 15544014.51
FAMILY : 3695641.82
MEDICAL : 120550.62
SOCIAL : 23253652.13
SHOPPING : 7036877.31
PHOTOGRAPHY : 17840110.4
SPORTS : 3650602.28
TRAVEL_AND_LOCAL : 13984077.71
TOOLS : 10830251.97
PERSONALIZATION : 5201482.61
PRODUCTIVITY : 16787331.34
PARENTING : 542603.62
WEATHER : 5145550.29
VIDEO_PLAYERS : 24727872.45
NEWS_AND_MAGAZINES : 9549178.47
MAPS_AND_NAVIGATION : 4049274.63
1.9 : 1000000.0
None


Here are the top 5 android app categories with the highest installations:

1. COMMUNICATION
2. SOCIAL
3. ENTERTAINMENT
4. GAME
5. 
On average, communication apps have the most installs: 38,456,119.

In [233]:
#Display apps with various instllations as categorized in the if function
for app in android_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [235]:
for app in android_free:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


In [236]:
for app in android_free:
    if app[1] == 'GAME' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Extreme Car Driving Simulator : 100,000,000+
Trivia Crack : 100,000,000+
Angry Birds 2 : 100,000,000+
Candy Crush Saga : 500,000,000+
8 Ball Pool : 100,000,000+
Subway Surfers : 1,000,000,000+
Candy Crush Soda Saga : 100,000,000+
Clash Royale : 100,000,000+
Clash of Clans : 100,000,000+
Plants vs. Zombies FREE : 100,000,000+
Pou : 500,000,000+
Flow Free : 100,000,000+
My Talking Angela : 100,000,000+
slither.io : 100,000,000+
Cooking Fever : 100,000,000+
Yes day : 100,000,000+
Score! Hero : 100,000,000+
Dream League Soccer 2018 : 100,000,000+
My Talking Tom : 500,000,000+
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 100,000,000+
Zombie Tsunami : 100,000,000+
Helix Jump : 100,000,000+
Crossy Road : 100,000,000+
Temple Run 2 : 500,000,000+
Talking Tom Gold Run : 100,000,000+
Agar.io : 100,000,000+
Bus Rush: Subway Edition : 100,000,00

In [234]:
for app in android_free:
    if app[1] == 'ENTERTAINMENT' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Hotstar : 100,000,000+
Talking Angela : 100,000,000+
IMDb Movies & TV : 100,000,000+
Talking Ben the Dog : 100,000,000+
Netflix : 100,000,000+


In [237]:
for app in android_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

### Analysis of android apps by user rating

In [238]:
def android_ratings(datastore, genre_index, rating_index):
    genres = genre_freq(datastore, genre_index)
    for genre in genres:
        total = 0
        len_genre = 0
        for app in datastore:
            genre_app = app[genre_index]
            if genre_app == genre:
                rating = float(app[rating_index])
                if (rating >= 0) and (rating <= 5): ## Check if rating is a number
                    total += rating
                    len_genre += 1
        avg_ratings = round(total / len_genre, 2)
        print(genre, ':', avg_ratings)

android_ratings(android, 1, 2)

ART_AND_DESIGN : 4.36
AUTO_AND_VEHICLES : 4.19
BEAUTY : 4.28
BOOKS_AND_REFERENCE : 4.35
BUSINESS : 4.12
COMICS : 4.16
COMMUNICATION : 4.16
DATING : 3.97
EDUCATION : 4.39
ENTERTAINMENT : 4.13
EVENTS : 4.44
FINANCE : 4.13
FOOD_AND_DRINK : 4.17
HEALTH_AND_FITNESS : 4.28
HOUSE_AND_HOME : 4.2
LIBRARIES_AND_DEMO : 4.18
LIFESTYLE : 4.09
GAME : 4.29
FAMILY : 4.19
MEDICAL : 4.19
SOCIAL : 4.26
SHOPPING : 4.26
PHOTOGRAPHY : 4.19
SPORTS : 4.22
TRAVEL_AND_LOCAL : 4.11
TOOLS : 4.05
PERSONALIZATION : 4.34
PRODUCTIVITY : 4.21
PARENTING : 4.3
WEATHER : 4.24
VIDEO_PLAYERS : 4.06
NEWS_AND_MAGAZINES : 4.13
MAPS_AND_NAVIGATION : 4.05


ZeroDivisionError: division by zero

Top 5 Android apps with the highest popularity scores:

1. COMMUNICATION
2. VIDEO_PLAYERS
3. SOCIAL
4. MAPS_AND_NAVIGATION
5. PHOTOGRAPHY

## Conclusion
**iOS Popularity Scores**: Navigation, social networking, and reference are the top three categories with the highest popularity scores on iOS. This suggests that users are looking for apps that provide value in terms of social connectivity, helpful reference tools, and easy navigation.<br><br>
**Android Popularity Scores**: Communication, video players, and social are the top three categories with the highest popularity scores on Android. This suggests that Android users are also interested in apps that provide social connectivity, as well as video content and communication tools.<br><br>

**Differences between iOS and Android**:  Navigation is more popular on iOS, while communication is more popular on Android. Companies that build mobile apps should consider these differences when developing and promoting their apps.