# Profitable App Profiles for the App Store and Google Play Markets

In this project, we want to analyze the datasets from the Android and iOS app stores. Our goal is to understand what type of free apps are likely to attract the most users and monetize it. Lets pretend I work for a company that builds Android and iOS mobile apps that are free to download and install. Our main source of revenue consists of in-app advertisements. This means our revenue is directly correlated to how popular the app is among users. 

My goal is to help our developers understand what type of apps are likely to attract the most users. In order to do this, we have outlined a 3-step strategy:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If there is enough positive response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

As of 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

In [1]:
from csv import reader

In [2]:
#Open the Apple Store csv file.
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios_data = ios[1:]

#open the Google Play store csv file.

opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android_data = android[1:]


# Explore the Data

We created the explore_data function so we can access a slice of each data set and insert a new line between each iteration of rows for easy visibility.

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
print(android_header)
print('\n')
print(explore_data(android_data, 0, 4, True))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13
None


In [5]:
print(ios_header)
print('\n')
print(explore_data(ios_data, 0, 4, True))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16
None


# Data Cleaning

First, we will clean the data before analyzing it. We need to identify any inaccurate or faulty rows of data. Then we will remove any duplicate rows, non-English apps, and apps that are not 'free'.


In [6]:


print(android_header)
print(len(android_header))
print('\n')
print(android_data[10472])
print(len(android_data[10472]))



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
13


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
12


There is a wrong rating entry on row 10472.

As we can see, there is no category labeled as indicated by the difference
in the number of columns. Also, the rating is in the wrong column. Instead,
the rating for this app has a value of 19, which is not possible when the highest rating an app can receive is 5.

I will omit this app from the data pool.


In [7]:
print(len(android_data))
del android_data[10472]
print(len(android_data))

10841
10840


In [8]:
#Find more apps with missing values
missing_data = []

for row in android_data:
    if len(row) != len(android_header):
        missing_data.append(row)
        print(row)
        print(android_data.index(row))
    
print(missing_data)

[]


# Identifying Duplicates

In this section, we will try to identify duplicate apps by categorizing apps into two different buckets: Unique apps and Duplicate apps.

In [9]:
duplicate_apps = []
unique_apps = []

for app in android_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Example of duplicate apps: ', duplicate_apps[:15])

Number of duplicate apps:  1181


Example of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


As we can see here, the list of duplicate apps is quite extensive. In the above loop, we check if the name of an app is already in the list of unique apps. If the name of the app is not there, we include the name of the app in the list of unique apps. Otherwise, if it already exists in the unique list, we move it to the duplicate list.

Next, we want to keep only the apps within the duplicate list that have high ratings as they are a reliable data point. We will only remove the ones that are below a certain threshold.

In [10]:
print('Expected length: ', len(android_data) - 1181)

Expected length:  9659


In [11]:
print(android_header)
print('\n')

for app in android_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In the cell above, we have identified several instances of the 'Instagram' app in our android dataset. One key difference is the number of reviews (the 4th column position) is different, indicating to us that the Instagram app appears to have collected reviews at different times.

While we can remove duplicate apps at random from our dataset, the better way is to keep only the app with the most up to date number of reviews - or the highest number of reviews. In the cell below, we will create a new list of unique apps with the maximum number of reviews. 

In [12]:
reviews_max = {}

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
#The conditional statements look similar, but the first 'if' statement
#actually replaces the app name with the reviews that is higher.

print(len(reviews_max))

9659


In this next step, we will use the reviews_max list to match with only the rows of the duplicates with the highest number of reviews. The reviews_max list is only a list of app names along with the highest number of reviews (the most up to date). 
    
 - We loop through the main android_data list.
 - If an app matches with the app name under reviews_max, we keep this app   by categorizing into the android_clean list.
 - At the same time, we keep track of the apps that have already been added by checking if the name of the app is not already in the already_added list.

In [13]:
android_clean = []
already_added = []

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

print("Number of apps in the cleaned list: ", len(android_clean))

Number of apps in the cleaned list:  9659


In [14]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


# Removing Non-English Apps

Exploring through the data, we come across a few apps that are not in English. We want to remove these apps because our focus is on an English-speaking audience. Here are a few examples from both data sets:

In [15]:
print(ios_data[813][1])
print(ios_data[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


We will use the ord() function to determine whether a word is in English. In the English language, the words we commonly use will range from 0 to 127 characters, according to the ASCII (American Standard Code for Information Interchange). Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. 

If the number is equal to or less than 127, then the character belongs to the set of common English characters. If an app name is greater than 127, then we know it has non-English characters. There are a few exceptions to take into account. For example, if the name of the app has characters such as 'TM' or emojis, while still having an English name for the app, we still want to count it. To do this, we create an ASCII counter. If the counter is greater than 3 counts, then the return value will be false hen we use the is_enlish function.

In [16]:
def is_english(word):
    ascii_count = 0
    for letter in word:
        if ord(letter) > 127:
            ascii_count += 1
            
    if ascii_count > 3:
        return False
    else: 
        return True
        
print(is_english('Instachat 😜')) #emojis fall outside the ASCII range
print(is_english('Docs To Go™ Free Office Suite'))

True
True


Using the current android dataset that has been cleaned of duplicates, we remove the non-English apps by looping the name through the is_english function we created earlier. Below shows a sample of the total English Android and iOS apps:

In [17]:
android_english = []
android_nonenglish = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
    else:
        android_nonenglish.append(app)
        
print(android_header)
print('\n')
explore_data(android_english, 0, 3, True)
print('\n')
print(android_header)
print('\n')
explore_data(android_nonenglish, 0, 3, True)
    

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Flame - درب عقلك يوميا', 'EDUCATION', '4.6', '56065', '37M', '1,000,000+', 'Free', '0'

In [18]:
ios_english = []
ios_nonenglish = []

for app in ios_data:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
    else:
        ios_nonenglish.append(app)
        
print(ios_header)
print('\n')
explore_data(ios_english, 0, 3, True)
print('\n')
print(ios_header)
print('\n')
explore_data(ios_nonenglish, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['445375097', '爱奇艺PPS -《欢乐颂2》电视剧热播', '224617472', 'USD', '0.0', '1

Just a quick recap of the data cleaning process so far, we:

1. Removed inaccurate data
2. Removed duplicate app entries
3. Removed non-English apps

As mentioned earlier, our focus is on apps that are free to download and install. The main source of revenue for these types of apps are from in-app advertisements. For the next step, we will isolate only the free apps for our analysis:  

In [19]:
android_free = []
ios_free = []

for app in android_english:
    price = app[7]
    if price == '0':                #free apps are '0', non-free have '$' sign
        android_free.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)
        
print(len(android_free))
print(len(ios_free))

8864
3222


We are left with 8,864 Android apps and 3,222 iOS apps that are free.

# Most Common Apps by Genre

So far we have removed inaccurate data, duplicates, non-English apps and isolated only the free apps. As mentioned earlier in the intro, we want to  minimize the risks and overhead using the following 3 step strategy:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If there is enough positive response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

We need to find app profiles that are successful on both markets. To begin the analysis, we will find the most common genres for each market. To show this, we will build frequency tables for a few columns in our data sets.

In [20]:
def freq_table(dataset, index):
    table = {}   
    total = 0 
    
    for row in dataset:
        total +=1
        column = row[index]
        if column in table:
            table[column] += 1
        else:
            table[column] = 1
            
    table_percentages = {}
    
    for key in table:
        percentage = (table[key]/ total)*100
        table_percentages[key] = percentage
        
    return table_percentages
        

    
#function below will take the table_percentages returned from freq_table
#and convert the key and key values into a tuple, where the value is noted 
#in the first element. This allows the list of percentages to be sorted
#from highest to lowest. The sorted function is by default ascending order
#but if reverse = True, it will sort descending in values from high to low.

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])    

In [21]:
display_table(ios_free, -5) #prime_genre from apps store

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


From the App Store, we notice immediately that the top genre are 'Games', consisting of 58.2% of the total app count. Entertainment is roughly 8%, with Photo & Video following behind with 5% and Education at 3.7%.

The general impression is that entertainment/gaming apps dominate the App Store market. However, this doesn't imply that building a gaming app will guarantee popularity. The 'Games' is likely to be the most competitive category for apps as you have so many. In addition, none of these percentages can help us conclude whether a specific genre of apps has the largest user base either.   

In [22]:
display_table(android_free, 9) #Genre from Google Play

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [23]:
display_table(android_free, 1) #Category from Google Play

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In the Google Play data set, the top 3 most common genres are Tools (8.4%), Entertainment (6.0%), and Education (5.3%). We notice that Google Play's app store has a much more diverse range of Genres, but some are more detailed (e.g. the Education genre has several different branches, ranging from Action, Adventure, Music, etc). The most common Categories are Family, Games, and Tools. If we dig deeper, the Family category is likely to consist of games for kids. 

Compared to the Apple store, the Google market place has better representation for utility apps for practical purposes. However, based on these Category/Genre percentages, we still cannot make a recommendation of what type of app our mobile app company should develop. Also, we cannot deduce whether a specific Category or Group has the largest user base.

We can make an important observation: Apple Store is dominated by apps for fun/entertainment purposes, while the Google Play Store has a more balanced and diverse population of apps.

# Most Popular Apps by Genre on the App Store

To find our which genre is the most popular, we will need to find the total number of users for each app. Normally, we would use the number of installations for this information on the number of users, but since the iOS dataset does not provide that directly, we will take the user ratings as a proxy from the rating_count_tot app.

In [55]:
# renaming the data sets for clearer distinction.
android_final = android_free
ios_final = ios_free


unique_ios_genres = freq_table(ios_final, -5)

for genre in unique_ios_genres:
    total = 0     #stores sum of the number of user ratings (rating_count_tot)
    len_genre = 0 #stores number of apps specific to each genre
    
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_rating = float(app[5])
            total += n_rating
            len_genre += 1
            
    avg_n_rating = total/len_genre
    avg_n_rating = round(avg_n_rating)
    
    print (genre, ': ', "{:,}".format(avg_n_rating))  
            
    

Social Networking :  71,548
Photo & Video :  28,442
Games :  22,789
Music :  57,327
Reference :  74,942
Health & Fitness :  23,298
Weather :  52,280
Utilities :  18,684
Travel :  28,244
Shopping :  26,920
News :  21,248
Navigation :  86,090
Lifestyle :  16,486
Entertainment :  14,030
Food & Drink :  33,334
Sports :  23,009
Book :  39,758
Finance :  31,468
Education :  7,004
Productivity :  21,028
Business :  7,491
Catalogs :  4,004
Medical :  612


Based on the average number of ratings by genre, navigation apps by far have the highest number of users. However, if we refer back to our frequency table, navigation apps only account for 0.19% of total apps on the App Store. This could imply that only a few navigation apps, such as Google Maps, dominate this particular function. If so, the popularity of this app has created a very "sticky" user base that will more often than not, just default to using this small group of existing apps. 

Following behind navigation apps, social networking and reference apps also have very high number of ratings. We will need to dig deeper into these genres as well.

Lets pull the names of the navigation apps from the data set.

In [37]:
for app in ios_final:
    name = app[1]     #track_name
    genre = app[-5]   #prime_genre
    rating_count = app[5]
    if genre == 'Navigation':
        print(name,': ', rating_count)

Waze - GPS Navigation, Maps & Real-time Traffic :  345046
Google Maps - Navigation & Transit :  154911
Geocaching® :  12811
CoPilot GPS – Car Navigation & Offline Maps :  3582
ImmobilienScout24: Real Estate Search in Germany :  187
Railway Route Search :  5


As we can see above, the user base in the navigation genre are mostly using Waze and Google Maps. Waze has more than double the rating count compared to Google Maps, which likely indicates that iPhone users are more likely to prefer this app (possibly due to operating system compatibility).

In [38]:
for app in ios_final:
    name = app[1]     #track_name
    genre = app[-5]   #prime_genre
    rating_count = app[5]
    if genre == 'Social Networking':
        print(name,': ', rating_count)

Facebook :  2974676
Pinterest :  1061624
Skype for iPhone :  373519
Messenger :  351466
Tumblr :  334293
WhatsApp Messenger :  287589
Kik :  260965
ooVoo – Free Video Call, Text and Voice :  177501
TextNow - Unlimited Text + Calls :  164963
Viber Messenger – Text & Call :  164249
Followers - Social Analytics For Instagram :  112778
MeetMe - Chat and Meet New People :  97072
We Heart It - Fashion, wallpapers, quotes, tattoos :  90414
InsTrack for Instagram - Analytics Plus More :  85535
Tango - Free Video Call, Voice and Chat :  75412
LinkedIn :  71856
Match™ - #1 Dating App. :  60659
Skype for iPad :  60163
POF - Best Dating App for Conversations :  52642
Timehop :  49510
Find My Family, Friends & iPhone - Life360 Locator :  43877
Whisper - Share, Express, Meet :  39819
Hangouts :  36404
LINE PLAY - Your Avatar World :  34677
WeChat :  34584
Badoo - Meet New People, Chat, Socialize. :  34428
Followers + for Instagram - Follower Analytics :  28633
GroupMe :  28260
Marco Polo Video Walki

Among social networking apps, Facebook and Pinterest is quite popular. The 3rd, 4th and 5th place apps are all roughly 300k rating counts. The greater majority of the other social networking apps seems to have only a fraction of the user base of the top 5.  

In [39]:
for app in ios_final:
    name = app[1]     #track_name
    genre = app[-5]   #prime_genre
    rating_count = app[5]
    if genre == 'Reference':
        print(name,': ', rating_count)

Bible :  985920
Dictionary.com Dictionary & Thesaurus :  200047
Dictionary.com Dictionary & Thesaurus for iPad :  54175
Google Translate :  26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran :  18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition :  17588
Merriam-Webster Dictionary :  16849
Night Sky :  12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) :  8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools :  4693
GUNS MODS for Minecraft PC Edition - Mods Tools :  1497
Guides for Pokémon GO - Pokemon GO News and Cheats :  826
WWDC :  762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free :  718
VPN Express :  14
Real Bike Traffic Rider Virtual Reality Glasses :  8
教えて!goo :  0
Jishokun-Japanese English Dictionary & Translator :  0


Surprisingly, the Bible app has close to 1 million ratings. While we use total ratings as a proxy for the user base, it is also important to note that it is very common to have a large user base that simply does not bother to rate an app. This means that the Bible app has some potential where we can take this idea and provide other features to make it more user friendly and popular.

One idea is to provide daily quotes from the Bible rather than simply have a raw version of the book. Other ideas include providing audio version for those with sight disabilities, or quizzes about the book, etc. 

Now lets continue with the Google Play market.



# Most Popular Apps by Genre on Google Play

In the Google Play data set, there is a column that indicates total install numbers for each app. Unlike the Apple store, we don't have to use a proxy to estimate the number of installation/users, we can just use the installation numbers in a similar manner to analyze the data set.

At first glance, it is apparent that Google Play install data points are not down to the exact number. Instead they are providing in ranges, which is not very precise.

In [31]:
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


Based on the frequency table, the intallation ranges are open-ended, leaving out this detail can have results that could be misleading. For instance, as the number of installations get to the higher end of ranges, there are much wider gaps between intervals. 

In [53]:
unique_android_genres = freq_table(android_final, 1)

for category in unique_android_genres:
    total = 0
    len_category = 0
    
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            total += float(n_installs)
            len_category += 1
            
    avg_n_installs = total/len_category
    avg_n_installs = round(avg_n_installs)
    
    print (category, ': ', "{:,}".format(avg_n_installs))  
            

ART_AND_DESIGN :  1,986,335
AUTO_AND_VEHICLES :  647,318
BEAUTY :  513,152
BOOKS_AND_REFERENCE :  8,767,812
BUSINESS :  1,712,290
COMICS :  817,657
COMMUNICATION :  38,456,119
DATING :  854,029
EDUCATION :  1,833,495
ENTERTAINMENT :  11,640,706
EVENTS :  253,542
FINANCE :  1,387,692
FOOD_AND_DRINK :  1,924,898
HEALTH_AND_FITNESS :  4,188,822
HOUSE_AND_HOME :  1,331,541
LIBRARIES_AND_DEMO :  638,504
LIFESTYLE :  1,437,816
GAME :  15,588,016
FAMILY :  3,695,642
MEDICAL :  120,551
SOCIAL :  23,253,652
SHOPPING :  7,036,877
PHOTOGRAPHY :  17,840,110
SPORTS :  3,638,640
TRAVEL_AND_LOCAL :  13,984,078
TOOLS :  10,801,391
PERSONALIZATION :  5,201,483
PRODUCTIVITY :  16,787,331
PARENTING :  542,604
WEATHER :  5,074,486
VIDEO_PLAYERS :  24,727,872
NEWS_AND_MAGAZINES :  9,549,178
MAPS_AND_NAVIGATION :  4,056,942


From the average number of installations per category on Google Play, communication (38.5m), social (23.3m), and video players (24.7m) are the top 3 apps. 

In [58]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                     or app[5] == '500,000,000+'
                                        ):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Viber Messenger : 500,000,000+


It appears that WhatsApp Messenger, Messeger, Skype, Hangouts and Google Chrome all have over 1 billion users. Most appears to have some form of instant messaging feature in these apps. In addition, the companies behind the development of these apps are tech giants in this competitive space. Competing against these giants would be a tall order considering the amount of capital and branding they have for advertisements.

I would advise our mobile app company to NOT compete in this space. Instead, we should find a different niche where we can create or peel away a good portion of the market share. 

In [60]:
for app in android_final:
    if app[1] == 'ENTERTAINMENT' and (app[5] == '1,000,000+'
                                     or app[5] == '500,000+'
                                        ):
        print(app[0], ':', app[5])

Complete Spanish Movies : 1,000,000+
Pluto TV - It’s Free TV : 1,000,000+
The green alien dance : 1,000,000+
StarTimes - Live International Champions Cup : 1,000,000+
Cinematic Cinematic : 1,000,000+
Radio Javan : 1,000,000+
🔥 Football Wallpapers 4K | Full HD Backgrounds 😍 : 1,000,000+
Low Poly – Puzzle art game : 1,000,000+
Adult Glitter Color by Number Book - Sandbox Pages : 1,000,000+
Ziggo GO : 1,000,000+
Cinemark Theatres : 1,000,000+
Regal Cinemas : 1,000,000+
AMC Theatres : 1,000,000+
VRV: Anime, game videos & more : 1,000,000+
DramaFever: Stream Asian Drama Shows & Movies : 1,000,000+
Investigation Discovery GO : 1,000,000+
HISTORY: Watch TV Show Full Episodes & Specials : 1,000,000+
VH1 : 1,000,000+
BET NOW - Watch Shows : 1,000,000+
Univision NOW - Live TV and On Demand : 1,000,000+
SHOWTIME : 1,000,000+
SeriesGuide – Show & Movie Manager : 1,000,000+
Comedy Central : 1,000,000+
Universal Kids : 500,000+
MTV : 1,000,000+
Lifetime - Watch Full Episodes & Original Movies : 1,00

Under the Entertainment category, we notice that the competition is fairly spread apart. The issue with this is narrowing down what type of viewing subjects would draw in users. Another big problem is likely forming contracts with the originators of these entertainment sources. Based on the names of these apps, many of these businesses have long been established before moving into the mobile app space. Rather, they simply adapted to the digital environment by providing a viewing experience via smartphones or internet. 

Let's continue.

In [64]:
for app in android_final:
    if app[1] == 'EDUCATION' and (app[5] == '5,000,000+'
                                      or app[5] == '10,000,000+'
                                        ):
        print(app[0], ':', app[5])

Khan Academy : 5,000,000+
Mermaids : 5,000,000+
Math Tricks : 10,000,000+
Learn English Words Free : 5,000,000+
ABC Preschool Free : 5,000,000+
Lumosity: #1 Brain Games & Cognitive Training App : 10,000,000+
English with Lingualeo : 5,000,000+
Quizlet: Learn Languages & Vocab with Flashcards : 10,000,000+
Coursera: Online courses : 5,000,000+
Elevate - Brain Training Games : 5,000,000+
NeuroNation - Focus and Brain Training : 5,000,000+
Free english course : 5,000,000+
Learn 50 languages : 5,000,000+
Rosetta Stone: Learn to Speak & Read New Languages : 5,000,000+
Learn English with Wlingua : 10,000,000+
Remind: School Communication : 10,000,000+
Google Classroom : 10,000,000+
ClassDojo : 10,000,000+
Learn languages, grammar & vocabulary with Memrise : 10,000,000+
HelloTalk — Chat, Speak & Learn Foreign Languages : 5,000,000+


From the Education category, we see that there are several language teaching apps. Several of them have over 10 million installations. This category appears to have a lot of potential as it would be very plausible to create an app based on teaching new languages. 

This market is likely full of libraries available with many different resources. However, our mobile app developers could create an app with special features, such as audio books, daily quizzes, an online forum to ask questions, etc. 

# Conclusion

After analyzing data from both the App Store and Google Play mobile apps, I would suggest focusing on niche spaces with a sizable user base. A category that is easy to replicate, but with a special twist in order to compete against some of the apps already available on the market. Depending on the amount ofresources and capital available for this project, we should focus something that doesn't require a substantial amount of investment in order to create original content (i.e. paying royalties or having to deal with copyrights).
