# What Is an App Profile to Hit the Jackpot with?

## Abstract

- This project is about figuring out what mobile app profiles or genres in App Store and Google Play are **most promising** and which are not. The task is done by the assessement of competitive landscape within each genre - i.e. types of major players, degree of market concentration, etc.
- The datasets used do not cover all apps, but are rather samples with enough observations to be representative of the whole population. The [initial Google Play dataset](https://www.kaggle.com/lava18/google-play-store-apps) contains 10841 values, while the [initial App Store dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) - 7197 values.
- The final goal is to provide app developers and others concerned with the insights regarding what niches to concentrate efforts on and what segments to stay away from. At the end of the project we propose to design a specialised free **app for reading** with special features - either containing the greatest books of all time or religious texts.

# **Initial Data Exploration**

Let's open the datasets we downloaded from [Kaggle](https://www.kaggle.com/) concerning App Store and Google Play apps and explore them by printing the header and the first several rows.

In [6]:
# Read the downloaded data on App Store apps and turn it into list of lists
open_file = open('/Users/mac/downloads/AppleStore.csv')
from csv import reader
read_file = reader(open_file)
Appleapps_data = list(read_file)

# save data to Jupyter online folder using pandas
import pandas as pd
df1 = pd.read_csv('/Users/mac/downloads/AppleStore.csv')
df1.to_csv('Projects_data/AppleStore.csv')

In [7]:
# Read the downloaded data on Google Play apps and turn it into list of lists
open_file = open('/Users/mac/downloads/googleplaystore.csv')
from csv import reader
read_file = reader(open_file)
Googleapps_data = list(read_file)

# save data to Jupyter online folder using pandas
import pandas as pd
df2 = pd.read_csv('/Users/mac/downloads/googleplaystore.csv')
df2.to_csv('Projects_data/googleplaystore.csv')

In [40]:
# Write a function which takes as arguments dataset name, rows range to print and 
# boolean for printing the number of rows and columns
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    # create a list of lists with the range of 
                                          # rows we want to consider
    for row in dataset_slice:  # loop through the newly created slice of rows to print
                               # each one with blank space in-between
        print(row)
        print('\n') 

    if rows_and_columns:  # in the case boolean for this variable is set 'True', the number 
                          # of rows and columns of the considered dataset is printed 
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [41]:
# Create a function, which can print bold text
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))

In [42]:
# Explore our datasets with the previously created function
printmd('**App Store apps dataset**')
print('\n')
explore_data(Appleapps_data,0,3,rows_and_columns=True)
printmd('**Google Play apps dataset**')
print('\n')
explore_data(Googleapps_data,0,3,rows_and_columns=True)

**App Store apps dataset**



['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16


**Google Play apps dataset**



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


From the output above we may see that the formats of both datasets are similar just to some extent. The columns of the most interest to us should contain apps' names, genres, price, and some metrics of popularity. The description of each variable in our datasets is [openly published](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) online in Kaggle.

# **Cleaning Data to Analyse Free Apps Profiles for English-Speaking Audience**

Let's proceed with the data cleaning procedure. <br><br>**First**, there is a mistake in our Google Play dataset - there is a row with missing value:
- We loop through the dataset so that to find rows with length different from the rest, i.e. that with missing values.
- Such rows are then deleted from out dataset.

In [43]:
# Loop each row through the Google Play data and check for differences in length - the output
# is printed for each such row
print('Rows with missing values:')
print('\n')
for row in Googleapps_data:
    if len(row) != len(Googleapps_data[0]):
        print(row)
        print(Googleapps_data.index(row)) # the row number is printed to spot it and delete 
                                          # afterwards

Rows with missing values:


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10473


In [44]:
# Print the number of rows before and after deletion to ensure the procedure was 
# performed correctly
print('Rows before deletion', ':', len(Googleapps_data))
del Googleapps_data[10473]
print('Rows after deletion', ':', len(Googleapps_data))

Rows before deletion : 10842
Rows after deletion : 10841


**Second**, we should find out whether there are duplicates in our datasets and if true, then clean them:
- We loop through our dataset and check whether the string corresponding to the app's name in each row repeats again later.
- We count the number of duplicates and give some examples of them, as well as the number of unique values.
- We delete all the duplicates on the basis of their actuality. For this purpose we loop through the dataset, create a dictionary to refer to and look for the rows with the highest number of reviews, which is assumed to indicate the most recent values in the whole row.
- We print the number of rows before and after the data cleaning procedure to check whether everything went correctly.

In [45]:
print('An example of duplication:')
print('\n')
for i in Googleapps_data: # loop through Google Play data to print all the rows 
                          # related to Instagram
    if i[0] == 'Instagram':
        print(i)
        
duplicates = [] # create a list of duplicates
unique = [] # create a list of unique values
for row in Googleapps_data: # loop through Google Play data to target only the first column, where 
                            # app names are stored
    i = row[0]
    if i in unique:           # if the app name has already been encountered in previous rows,
                              # it then goes to the duplicates list
        duplicates.append(i)
    else:                    
        unique.append(i)
print('\n')
print('Number of duplicates:', len(duplicates))
print('\n')
print('Names of some duplicates:', '\n','\n', duplicates[:15])

An example of duplication:


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Number of duplicates: 1181


Names of some duplicates: 
 
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly

In [46]:
reviews_max = {}    # Create a dictionary for the maximum number of reviews
for row in Googleapps_data[1:]: 
    name = row[0]              # loop for app names
    n_reviews = float(row[3])  # loop for number of reviews
    if name in reviews_max and reviews_max[name] < n_reviews: # if the app name has already been
                    # encountered, but the max value in the dictionary is less than in the row,
        reviews_max[name] = n_reviews # then the value in the dictionary is renewed
    elif name not in reviews_max:     # if app name is new,
        reviews_max[name] = n_reviews # than the value for the number of reviews in that row is 
                                      # given to the corresponding app name 
print('Number of rows in Google Play apps data without duplicates', ':', len(reviews_max))


Number of rows in Google Play apps data without duplicates : 9659


In [47]:
android_clean = [] # create list of lists for cleaned data without duplicates
already_added = [] # create a list of repeated app names 
print('Number of rows before cleaning', ':', len(Googleapps_data))

for row in Googleapps_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if n_reviews == reviews_max[name] and name not in already_added: 
        android_clean.append(row) # add a row to cleaned ones if the number of reviews is the
                                  # greatest and the app name was not repeated before
        already_added.append(name) # add to repeated

Googleapps_data[1:] = android_clean
print('\n')
print('Number of rows after cleaning', ':', len(android_clean))

Number of rows before cleaning : 10841


Number of rows after cleaning : 9659


**Third**, we would consider only apps for english speaking audience and delete those in other languages:
<br>
- For these purposes we write a function which checks whether letters in a string have non-English characters - i.e. outside of the 0-255 ASCII standard range. 
- We then check our function to prove that it actually works.
- We loop through the App Store and Google Play datasets to find characters of foreign languages and then delete such rows.
- We compare the number of rows before and after procedure, as well as print first several rows of cleaned dataset to ensure that everything is done correctly.
<br>
Our function is not perfect and may delete apps for English speaking users, which use many special characters in their name, but the number of those should be negligible and not being able to influence much on our inferences.

In [48]:
# Write a function, which loops through its argument to check the identification code of 
# each letter  and returns boolean
def onlyenglish(string):
    nonenglish = 0   # the number of non-typical characters
    for i in string:
        if ord(i) > 255: # use ord() function to return the code of a character - for standard 
                         # letters it should be within the 0 - 255 range
            nonenglish += 1 # add to the total if non-typical character is encountered
    if nonenglish > 1: # if more than 1 non-typical character, than 'False' is returned.
        return False
    else:
        return True

# Check the function
print(onlyenglish('Instagram'))
print(onlyenglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(onlyenglish('Docs To Go™ Free Office Suite'))
print(onlyenglish('Instachat 😜'))

True
False
True
True


In [49]:
# Get rid of non-English apps by looping through the Google Play data and applying our function
# for sorting characters
print('Number of rows Google Play dataset before cleaning', ':', len(Googleapps_data))
Googleenglish = []
for i in Googleapps_data:
    if onlyenglish(i[0]):
        Googleenglish.append(i)
        
# Check the size of the dataset before and after the procedure    
GoogleappsinEnglish = Googleenglish
print('\n')
print('After cleaning', ':') 
print('\n')
explore_data(GoogleappsinEnglish, 0, 3, True)

Number of rows Google Play dataset before cleaning : 9660


After cleaning :


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9556
Number of columns: 13


In [50]:
# Repeat the same procedure for the App Store dataset
print('Number of rows App Store dataset before cleaning', ':', len(Appleapps_data))
Appleenglish = []
for i in Appleapps_data:
    if onlyenglish(i[1]):
        Appleenglish.append(i)
    
AppleappsinEnglish = Appleenglish
print('\n')
print('After cleaning', ':') 
print('\n')
explore_data(AppleappsinEnglish, 0, 3, True)

Number of rows App Store dataset before cleaning : 7198


After cleaning :


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6130
Number of columns: 16


**Fourth**, we would like to consider only free apps, which are used the most due to their universal accessibility:
- We loop through both datasets and create new ones, where only free apps are stored.
- We then compare the total number of apps in both stores with that of free apps.

In [51]:
# Check the column of apps' category and save only free apps
Free_android = [] # list for saving free apps
for i in GoogleappsinEnglish[1:]:
    if i[6] == 'Free':
        Free_android.append(i)

# Check the column of apps' price and save only those with the '0' value
Free_ios = []
for i in AppleappsinEnglish[1:]:
    if float(i[4]) == 0:
        Free_ios.append(i)

# Compare the result before and after procedure
print('The total number of Android apps', ':', len(GoogleappsinEnglish))
print('The number of free Android apps', ':', len(Free_android))
print('\n')
print('The total number of iOS apps', ':', len(AppleappsinEnglish))
print('The number of free iOS apps', ':', len(Free_ios))

The total number of Android apps : 9556
The number of free Android apps : 8809


The total number of iOS apps : 6130
The number of free iOS apps : 3183


# **Analysing Apps Profiles' Popularity and Targeting the Jackpot to Hit**

After the data cleaning step is done, let's proceed with the actual analysis of our data. Out utimate goal is to find out which genres enjoy the most popularity amongst smart phones users. First, it seems reasonable to explore how many apps are within each genre:
- We creare a new function, which counts the number of rows attributed to each category - i.e. its frequency.
- We build another function, which uses the above-mentioned one to count the percentages of the frequency and sort them in the descending order. This function may help us in the following comparative analysis.
- We check if both functions work well.
- Within Google Play dataset there are two columns, which can be attributed to genre definition (named "categories" and "genres"), hence both are printed.

In [52]:
# Create a function containing a dictionary, which has keys (i.e. index) and the number
# of their occurrence in the considered dataset
def freq_table(dataset, index):
    frequency = {}
    
    for row in dataset:
        if row[index] in frequency:
            frequency[row[index]] += 1
            
        else:
            frequency[row[index]] = 1
    
    return frequency

In [53]:
# Print frequency distribution of various genres
printmd('**Frequency of various genres in App Store**:')
print('\n')
print(freq_table(Free_ios, -5))
print('\n')
printmd('**Frequency of various genres in Google Play**:')
print('\n')
print(freq_table(Free_android, 1))

**Frequency of various genres in App Store**:



{'Social Networking': 104, 'Photo & Video': 160, 'Games': 1860, 'Music': 65, 'Reference': 17, 'Health & Fitness': 65, 'Weather': 28, 'Utilities': 77, 'Travel': 37, 'Shopping': 80, 'News': 43, 'Navigation': 6, 'Lifestyle': 49, 'Entertainment': 251, 'Food & Drink': 26, 'Sports': 69, 'Book': 12, 'Finance': 35, 'Education': 118, 'Productivity': 54, 'Business': 17, 'Catalogs': 4, 'Medical': 6}




**Frequency of various genres in Google Play**:



{'ART_AND_DESIGN': 57, 'AUTO_AND_VEHICLES': 82, 'BEAUTY': 53, 'BOOKS_AND_REFERENCE': 190, 'BUSINESS': 407, 'COMICS': 51, 'COMMUNICATION': 287, 'DATING': 164, 'EDUCATION': 103, 'ENTERTAINMENT': 84, 'EVENTS': 63, 'FINANCE': 327, 'FOOD_AND_DRINK': 109, 'HEALTH_AND_FITNESS': 272, 'HOUSE_AND_HOME': 71, 'LIBRARIES_AND_DEMO': 82, 'LIFESTYLE': 345, 'GAME': 850, 'FAMILY': 1665, 'MEDICAL': 312, 'SOCIAL': 236, 'SHOPPING': 199, 'PHOTOGRAPHY': 261, 'SPORTS': 295, 'TRAVEL_AND_LOCAL': 207, 'TOOLS': 745, 'PERSONALIZATION': 290, 'PRODUCTIVITY': 345, 'PARENTING': 57, 'WEATHER': 71, 'VIDEO_PLAYERS': 158, 'NEWS_AND_MAGAZINES': 248, 'MAPS_AND_NAVIGATION': 123}


In [54]:
# Write a function, which sorts values in the dataset in the descending order and computes 
# frequency percentage of index values
def display_table(dataset, index):
    table = freq_table(dataset, index) # create a dictionary to refer to
    table_display = [] # create a list of tuples, which can be then sorted
    total = 0  # total for computing the percantage share of each value
    for key in table:
        key_val_as_tuple = (table[key], key)   # create tuples with numeric values and keys,
                                               # which can then be sorted
        table_display.append(key_val_as_tuple)
        total += table[key]

    table_sorted = sorted(table_display, reverse = True) # sort list of tuples in descending order
    for entry in table_sorted:                           # print each key with corresponding percentages
        print(entry[1], ':', round(entry[0]/total*100,2), '(%)')

In [55]:
printmd('**Frequency percentage of various genres in App Store**:')
print('\n')
display_table(Free_ios, 11)

**Frequency percentage of various genres in App Store**:



Games : 58.44 (%)
Entertainment : 7.89 (%)
Photo & Video : 5.03 (%)
Education : 3.71 (%)
Social Networking : 3.27 (%)
Shopping : 2.51 (%)
Utilities : 2.42 (%)
Sports : 2.17 (%)
Music : 2.04 (%)
Health & Fitness : 2.04 (%)
Productivity : 1.7 (%)
Lifestyle : 1.54 (%)
News : 1.35 (%)
Travel : 1.16 (%)
Finance : 1.1 (%)
Weather : 0.88 (%)
Food & Drink : 0.82 (%)
Reference : 0.53 (%)
Business : 0.53 (%)
Book : 0.38 (%)
Navigation : 0.19 (%)
Medical : 0.19 (%)
Catalogs : 0.13 (%)


In [56]:
printmd('**Frequency percentage of various categories in Google Play**:')
print('\n')
display_table(Free_android, 1)

**Frequency percentage of various categories in Google Play**:



FAMILY : 18.9 (%)
GAME : 9.65 (%)
TOOLS : 8.46 (%)
BUSINESS : 4.62 (%)
PRODUCTIVITY : 3.92 (%)
LIFESTYLE : 3.92 (%)
FINANCE : 3.71 (%)
MEDICAL : 3.54 (%)
SPORTS : 3.35 (%)
PERSONALIZATION : 3.29 (%)
COMMUNICATION : 3.26 (%)
HEALTH_AND_FITNESS : 3.09 (%)
PHOTOGRAPHY : 2.96 (%)
NEWS_AND_MAGAZINES : 2.82 (%)
SOCIAL : 2.68 (%)
TRAVEL_AND_LOCAL : 2.35 (%)
SHOPPING : 2.26 (%)
BOOKS_AND_REFERENCE : 2.16 (%)
DATING : 1.86 (%)
VIDEO_PLAYERS : 1.79 (%)
MAPS_AND_NAVIGATION : 1.4 (%)
FOOD_AND_DRINK : 1.24 (%)
EDUCATION : 1.17 (%)
ENTERTAINMENT : 0.95 (%)
LIBRARIES_AND_DEMO : 0.93 (%)
AUTO_AND_VEHICLES : 0.93 (%)
WEATHER : 0.81 (%)
HOUSE_AND_HOME : 0.81 (%)
EVENTS : 0.72 (%)
PARENTING : 0.65 (%)
ART_AND_DESIGN : 0.65 (%)
BEAUTY : 0.6 (%)
COMICS : 0.58 (%)


In [57]:
printmd('**Frequency percentage of various genres in Google Play**:')
print('\n')
display_table(Free_android, 9)

**Frequency percentage of various genres in Google Play**:



Tools : 8.45 (%)
Entertainment : 6.06 (%)
Education : 5.37 (%)
Business : 4.62 (%)
Productivity : 3.92 (%)
Lifestyle : 3.91 (%)
Finance : 3.71 (%)
Medical : 3.54 (%)
Sports : 3.42 (%)
Personalization : 3.29 (%)
Communication : 3.26 (%)
Health & Fitness : 3.09 (%)
Action : 3.09 (%)
Photography : 2.96 (%)
News & Magazines : 2.82 (%)
Social : 2.68 (%)
Travel & Local : 2.34 (%)
Shopping : 2.26 (%)
Books & Reference : 2.16 (%)
Simulation : 2.04 (%)
Dating : 1.86 (%)
Arcade : 1.83 (%)
Video Players & Editors : 1.77 (%)
Casual : 1.74 (%)
Maps & Navigation : 1.4 (%)
Food & Drink : 1.24 (%)
Puzzle : 1.14 (%)
Racing : 1.0 (%)
Role Playing : 0.94 (%)
Libraries & Demo : 0.93 (%)
Auto & Vehicles : 0.93 (%)
Strategy : 0.91 (%)
Weather : 0.81 (%)
House & Home : 0.81 (%)
Events : 0.72 (%)
Adventure : 0.66 (%)
Beauty : 0.6 (%)
Art & Design : 0.6 (%)
Comics : 0.57 (%)
Parenting : 0.5 (%)
Card : 0.44 (%)
Trivia : 0.41 (%)
Casino : 0.41 (%)
Educational;Education : 0.4 (%)
Board : 0.39 (%)
Educational : 

From the analysis of the App Store datatset we may conclude the following:
<br>
- Among iOS apps we may see that by the number of free apps accessible games are the most common genre among the users. The runner-up is entertainment, but the difference is huge as compared to the former - 254 vs. 1874 free apps. 
- The frequency data is heavily skewed to the right with four of out of top five apps designed for entertainment purposes and only one for education. The rest of genres constitutes a minority in the distribution. 
- The frequency of different genres doesn't have much explanatory power for their popularity, but might be a representative proxy for market saturation. Some apps might be used just within a short period of time and then deleted, while others are utilized on the daily basis - especially those of education, sports, finance and other less frequent genres in terms of the number of apps within each one. Therefore, we need **alternative metrics** to measure genres popularity, i.e. the most promising types of apps to develop.

From the analysis of the Google Play datatset we may conclude the following:
<br>
- Both entertainment and apps for certain purposes are quite frequent among Google Play freely distributed apps. In the top five genres four are designed rather for practical purposes, while only one for entertainment.
- The frequency data is more or less equally distributed among various genres, without ubiquitous domination of one particular type. It is quite different from what we have seen in the App Store free apps dataset.
- Free apps designed for practical purposes are evidently more popular among Android users, but we still can make no inferences about popularity of certain genres. It is only possible to conclude that iOS apps developers are likely to be more prone to develop products for entertainment, as compared to their Android counterparts.

From the above analysis we have found out that it might be reasonable to find an alternative measure to explore, which app genres are most popular, so that potential developers can use this information to target the market segments better.
- For Apple Store the total number of **rating actions** within each genre is accessible in the related dataset, which could be a proxy of genre popularity.
- For Google Play the total number of **app installations** within each genre is available, which could be an alternative proxy of genre popularity.
<br><br>

**First**, we start by looking at the percentage frequency of each genre within our datasets in descending order and look for intersections, as those genres are more likely to be promising in both stores, i.e. to be popular among both groups of users.

In [58]:
printmd('**Percentage share in the total number of rating actions on iOS devices**:')
iOS_genres = freq_table(Free_ios, -5) # create a dictionary with genre names as keys and 
                                      # their frequences

iOS_table = [] # create list of tuples for subsequent sorting

for genre in iOS_genres: # run nested loop to calculate the average amount of rating actions
                         # in each genre
    total = 0
    len_genre = 0
    
    for i in Free_ios:
        genre_app = i[-5]
        if genre_app == genre:
            number_of_ratings = float(i[5])
            total += number_of_ratings
            len_genre += 1
    average = total/len_genre
    table_tuple = (average, genre)
    iOS_table.append(table_tuple)

total_ratings = 0 # Calculate the total number of rating actions

for i in iOS_table:
    total_ratings += float(i[0])

table_sorted = sorted(iOS_table, reverse = True) # sort values of keys in descending order and 
                                                 # print each row in the list
for entry in table_sorted:
        print(entry[1], ':', round(entry[0]/total_ratings*100, 2),'(%)')

**Percentage share in the total number of rating actions on iOS devices**:

Navigation : 11.8 (%)
Reference : 10.87 (%)
Social Networking : 9.99 (%)
Music : 7.98 (%)
Weather : 7.16 (%)
Book : 6.36 (%)
Food & Drink : 4.57 (%)
Finance : 4.44 (%)
Travel : 4.18 (%)
Photo & Video : 3.9 (%)
Shopping : 3.81 (%)
Health & Fitness : 3.19 (%)
Sports : 3.15 (%)
Games : 3.15 (%)
Productivity : 2.99 (%)
News : 2.91 (%)
Utilities : 2.69 (%)
Lifestyle : 2.29 (%)
Entertainment : 1.95 (%)
Business : 1.03 (%)
Education : 0.96 (%)
Catalogs : 0.55 (%)
Medical : 0.08 (%)


   The proposed measure of defining the most popular genres among iOS users shows that apps for **practical purposes** are dominating the landscape, and therefore the popularity of such apps can be defined to a greater extent by the practical solutions they provide to their users. It is possible to see that Reference, Navigation and Social Networking compise the most frequently rated apps in the Apple Store, though the difference with other genres is not that large.
   <br>
   <br>
   It is noteworthy that games comprise only 3.21% of the total number of app ratings, which supports to some extent our previous hypothesis that the number of apps within each category doesn't say much about genre popularity. However, additional measures are needed for complete validation, e.g. the amount of time spent by users in apps of different genres.

In [59]:
printmd('**Percentage share in the total number of installments on Android devices**:')
Andriod_genres = freq_table(Free_android, 1) # create a dictionary with genre names as keys 
                                             # and their frequences

Android_table = [] # create list of tuples for subsequent sorting

for category in Andriod_genres: # run nested loop to calculate the 
                                # average amount of installations in each genre
    total = 0
    len_category = 0
    
    for i in Free_android:
        category_app = i[1]
        if category_app == category:
            number_of_installs = i[5]
            number_of_installs = number_of_installs.replace('+', '') # we need to get rid of 
                                                    # non-integer signs to calculate averages
            number_of_installs = number_of_installs.replace(',', '')
            total += float(number_of_installs)
            len_category += 1
    average = total/len_category
    
    table_tuple = (average, category)
    Android_table.append(table_tuple)

    
for i in Free_android:      # get rid of non-numeric signs in the amount of installations 
                            # in the dataset itself
    i[5] = i[5].replace('+', '')
    i[5] = i[5].replace(',', '')
    
    
total_installations = 0  # Calculate the total number of installations

for i in Android_table:
    total_installations += float(i[0])

table_sorted = sorted(Android_table, reverse = True) # sort values of keys in descending 
                                                     # order and print each row in the list
for entry in table_sorted:
        print(entry[1], ':', round(entry[0]/total_installations*100, 2),'(%)')

**Percentage share in the total number of installments on Android devices**:

COMMUNICATION : 15.97 (%)
VIDEO_PLAYERS : 10.33 (%)
SOCIAL : 9.65 (%)
PHOTOGRAPHY : 7.41 (%)
PRODUCTIVITY : 6.97 (%)
GAME : 6.46 (%)
TRAVEL_AND_LOCAL : 5.81 (%)
ENTERTAINMENT : 4.89 (%)
TOOLS : 4.51 (%)
NEWS_AND_MAGAZINES : 3.96 (%)
BOOKS_AND_REFERENCE : 3.64 (%)
SHOPPING : 2.92 (%)
PERSONALIZATION : 2.19 (%)
WEATHER : 2.11 (%)
HEALTH_AND_FITNESS : 1.75 (%)
MAPS_AND_NAVIGATION : 1.68 (%)
FAMILY : 1.54 (%)
SPORTS : 1.54 (%)
ART_AND_DESIGN : 0.82 (%)
FOOD_AND_DRINK : 0.8 (%)
EDUCATION : 0.76 (%)
BUSINESS : 0.71 (%)
LIFESTYLE : 0.6 (%)
HOUSE_AND_HOME : 0.57 (%)
FINANCE : 0.57 (%)
COMICS : 0.36 (%)
DATING : 0.36 (%)
AUTO_AND_VEHICLES : 0.27 (%)
LIBRARIES_AND_DEMO : 0.27 (%)
PARENTING : 0.23 (%)
BEAUTY : 0.21 (%)
EVENTS : 0.11 (%)
MEDICAL : 0.05 (%)


   The proposed measure of defining the most popular genres among Android users shows that apps for practical purposes and entertainment are both quite frequently installed by users. Our metric indicates that two of the three most common genres are related to social communication, comprising nearly 26% of the total number of installations. 
    <br>
    <br>
    In the calculations of percentage frequences we found out that the respective numbers are not given in precise figures, but rather as levels - e.g. 1000+, 500+ etc. This makes our calculated percentages less precise as well, turning them into approximate numbers.
    <br>
    <br>
   These results are also quite different from what we have seen when counted the number of apps within each genre. Tools, Entertainment and Education are not in the top three anymore, though they are quite close to that level.

   On sum, we have seen that measuring app genres popularity among users by the number of apps within each genre, the number of ratings given and installations yields different results, though the latter two measures should presumably be closer to the real state of affairs. Out of available data we may conclude that apps being developed in such genres as **social networking** and **communication** are likely to enjoy more popularity among users, while those related to **books** and **reference** might be the second best choice on average both in App Store and Google Play. **Navigation** might also be of interest if concentrating more on the prospects on the iOS apps market. 


**Second**, it is necessary to explore the competitive landscape within each genre of interest pointed out above. We begin with social networking and communication, which are more popular genres among iOS based and Android based smart phone users according to our results. By considering the share of each app in the total number of rating actions for iOS apps and the total number of installations for Android apps we may figure out whether the market structure is concentrated, i.e. more like an oligopolistic one. If true, this in turn should imply that it would be harder to gain a market share for new entrants. At the same time we may also take a look at ratings of different apps within each category - it can be considered as an additional facet of measuring popularity and as an indicator of in-app purchases probability.

In [60]:
# Calculate frequency percentages for each app within its genre and sort 
# results in the descending order for representation
Social_Networking = []
App_name = []
Value = []
Rating = []
for i in Free_ios:
    if i[-5] == 'Social Networking':
        Value = float(i[5]) # store the amount of app rating actions
        App_name = i[1]     # store app's name
        Rating = i[7]       # store app's rating
        Row = (Value, App_name, Rating)  # create tuple for sorting
        Social_Networking.append(Row)
     
total = 0
for i in Social_Networking:
    total += float(i[0])
    
table_sorted = sorted(Social_Networking, reverse = True)
printmd('**iOS apps share in App Store Social Networking genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**iOS apps share in App Store Social Networking genre**



Total number of apps: 104


Facebook : 39.23 (%) | User rating: 3.5
Pinterest : 14.0 (%) | User rating: 4.5
Skype for iPhone : 4.93 (%) | User rating: 3.5
Messenger : 4.63 (%) | User rating: 3.0
Tumblr : 4.41 (%) | User rating: 4.0
WhatsApp Messenger : 3.79 (%) | User rating: 4.5
Kik : 3.44 (%) | User rating: 4.0
ooVoo – Free Video Call, Text and Voice : 2.34 (%) | User rating: 4.5
TextNow - Unlimited Text + Calls : 2.18 (%) | User rating: 3.5
Viber Messenger – Text & Call : 2.17 (%) | User rating: 4.5
Followers - Social Analytics For Instagram : 1.49 (%) | User rating: 4.5
MeetMe - Chat and Meet New People : 1.28 (%) | User rating: 4.0
We Heart It - Fashion, wallpapers, quotes, tattoos : 1.19 (%) | User rating: 5.0
InsTrack for Instagram - Analytics Plus More : 1.13 (%) | User rating: 4.5
Tango - Free Video Call, Voice and Chat : 0.99 (%) | User rating: 4.5
LinkedIn : 0.95 (%) | User rating: 3.5
Match™ - #1 Dating App. : 0.8 (%) | User rating: 3.0
Skype for iPad : 0.79 (%) | User ra

In [61]:
# Calculate frequency percentages for each app within its genre and sort results 
# in the descending order for representation
COMMUNICATION = []
App_name = []
Value = []
Rating = []
for i in Free_android:
    if i[1] == 'COMMUNICATION':
        Value = float(i[5]) # store the amount of app's installations
        App_name = i[0]     # store app's name
        Rating = i[2]       # store app's rating
        Row = (Value, App_name, Rating)    # create tuple for sorting
        COMMUNICATION.append(Row)
        
total = 0
for i in COMMUNICATION:
    total += float(i[0])
    
table_sorted = sorted(COMMUNICATION, reverse = True)
printmd('**Android apps share in Google Play Communication genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**Android apps share in Google Play Communication genre**



Total number of apps: 287


WhatsApp Messenger : 9.06 (%) | User rating: 4.4
Skype - free IM & video calls : 9.06 (%) | User rating: 4.1
Messenger – Text and Video Chat for Free : 9.06 (%) | User rating: 4.0
Hangouts : 9.06 (%) | User rating: 4.0
Google Chrome: Fast & Secure : 9.06 (%) | User rating: 4.3
Gmail : 9.06 (%) | User rating: 4.3
imo free video calls and chat : 4.53 (%) | User rating: 4.3
Viber Messenger : 4.53 (%) | User rating: 4.3
UC Browser - Fast Download Private & Secure : 4.53 (%) | User rating: 4.5
LINE: Free Calls & Messages : 4.53 (%) | User rating: 4.2
Google Duo - High Quality Video Calls : 4.53 (%) | User rating: 4.6
imo beta free calls and text : 0.91 (%) | User rating: 4.3
Yahoo Mail – Stay Organized : 0.91 (%) | User rating: 4.3
Who : 0.91 (%) | User rating: 4.3
WeChat : 0.91 (%) | User rating: 4.2
UC Browser Mini -Tiny Fast Private & Secure : 0.91 (%) | User rating: 4.4
Truecaller: Caller ID, SMS spam blocking & Dialer : 0.91 (%) | User rating: 4.5
Telegram

In [62]:
SOCIAL = []
App_name = []
Value = []
Rating = []
for i in Free_android:
    if i[1] == 'SOCIAL':
        Value = float(i[5])
        App_name = i[0]
        Rating = i[2]
        Row = (Value, App_name, Rating)
        SOCIAL.append(Row)
        
total = 0
for i in SOCIAL:
    total += float(i[0])
    
table_sorted = sorted(SOCIAL, reverse = True)
printmd('**Android apps share in Google Play Social genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**Android apps share in Google Play Social genre**



Total number of apps: 236


Instagram : 18.22 (%) | User rating: 4.5
Google+ : 18.22 (%) | User rating: 4.2
Facebook : 18.22 (%) | User rating: 4.1
Snapchat : 9.11 (%) | User rating: 4.0
Facebook Lite : 9.11 (%) | User rating: 4.3
VK : 1.82 (%) | User rating: 3.8
Tumblr : 1.82 (%) | User rating: 4.4
Tik Tok - including musical.ly : 1.82 (%) | User rating: 4.4
Tango - Live Video Broadcast : 1.82 (%) | User rating: 4.3
Pinterest : 1.82 (%) | User rating: 4.6
LinkedIn : 1.82 (%) | User rating: 4.2
Badoo - Free Chat & Dating App : 1.82 (%) | User rating: 4.3
BIGO LIVE - Live Stream : 1.82 (%) | User rating: 4.4
ooVoo Video Calls, Messaging & Stories : 0.91 (%) | User rating: 4.3
Zello PTT Walkie Talkie : 0.91 (%) | User rating: 4.4
SKOUT - Meet, Chat, Go Live : 0.91 (%) | User rating: 4.3
POF Free Dating App : 0.91 (%) | User rating: 4.2
MeetMe: Chat & Meet New People : 0.91 (%) | User rating: 4.2
textPlus: Free Text & Calls : 0.18 (%) | User rating: 4.1
magicApp Calling & Messaging : 0.

Despite its average user rating, among iOS apps Facebook has almost 40% of all ratings given to apps in Social Networking genre, its closest rival, - Pininterest, - has only 14%, and all the others' share is way smaller compared to the former. A more fragmented competitive landscape can be observed in Google Play Store - in both genres, which roughly correspond to Social Networking genre in Apple Store there are several leaders besides Facebook, many of which are Google products. Nevertheless, even here we may see that top five apps are installed in nearly 50% of cases, which implies high market concentration and the market power of those apps affiliated with Google and Apple. Moreover, the market for the social communication genre seems to be already saturated, as there are over 100 apps in Apple Store and more than 500 apps in Google Play Store, satisfying various demands in this market. All in all, it is **not recommended** to develop a new app in this category - the risks of failure to gain users and certain market share are very high.
<br>
<br>
Let's proceed further with apps related to reading books and references.

In [63]:
BOOKS_AND_REFERENCE = []
App_name = []
Value = []
Rating = []
for i in Free_android:
    if i[1] == 'BOOKS_AND_REFERENCE':
        Value = float(i[5])
        App_name = i[0]
        Rating = i[2]
        Row = (Value, App_name, Rating)
        BOOKS_AND_REFERENCE.append(Row)
        
total = 0
for i in BOOKS_AND_REFERENCE:
    total += float(i[0])
    
table_sorted = sorted(BOOKS_AND_REFERENCE, reverse = True)
printmd('**Android apps share in Google Play Book & Reference genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**Android apps share in Google Play Book & Reference genre**



Total number of apps: 190


Google Play Books : 60.03 (%) | User rating: 3.9
Wattpad 📖 Free Books : 6.0 (%) | User rating: 4.6
Bible : 6.0 (%) | User rating: 4.7
Audiobooks from Audible : 6.0 (%) | User rating: 4.5
Amazon Kindle : 6.0 (%) | User rating: 4.2
Wikipedia : 0.6 (%) | User rating: 4.4
Spanish English Translator : 0.6 (%) | User rating: 4.2
Quran for Android : 0.6 (%) | User rating: 4.7
Oxford Dictionary of English : Free : 0.6 (%) | User rating: 4.1
NOOK: Read eBooks & Magazines : 0.6 (%) | User rating: 4.5
Moon+ Reader : 0.6 (%) | User rating: 4.4
JW Library : 0.6 (%) | User rating: 4.9
HTC Help : 0.6 (%) | User rating: 4.2
FBReader: Favorite Book Reader : 0.6 (%) | User rating: 4.5
English Hindi Dictionary : 0.6 (%) | User rating: 4.4
English Dictionary - Offline : 0.6 (%) | User rating: 4.4
Dictionary.com: Find Definitions for English Words : 0.6 (%) | User rating: 4.6
Dictionary - Merriam-Webster : 0.6 (%) | User rating: 4.5
Dictionary : 0.6 (%) | User rating: 4.5
Cool

In [64]:
Book = []
App_name = []
Value = []
Rating = []
for i in Free_ios:
    if i[-5] == 'Book':
        Value = float(i[5])
        App_name = i[1]
        Rating = i[7]
        Row = (Value, App_name, Rating)
        Book.append(Row)
     
total = 0
for i in Book:
    total += float(i[0])
    
table_sorted = sorted(Book, reverse = True)
printmd('**iOS apps share in Apple Store Book genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**iOS apps share in Apple Store Book genre**



Total number of apps: 12


Kindle – Read eBooks, Magazines & Textbooks : 45.29 (%) | User rating: 3.5
Audible – audio books, original series & podcasts : 18.91 (%) | User rating: 4.5
Color Therapy Adult Coloring Book for Adults : 15.1 (%) | User rating: 5.0
OverDrive – Library eBooks and Audiobooks : 11.76 (%) | User rating: 4.0
HOOKED - Chat Stories : 8.59 (%) | User rating: 4.5
BookShout: Read eBooks & Track Your Reading Goals : 0.16 (%) | User rating: 4.0
Dr. Seuss Treasury — 50 best kids books : 0.08 (%) | User rating: 4.5
Green Riding Hood : 0.07 (%) | User rating: 4.0
Weirdwood Manor : 0.04 (%) | User rating: 4.5
MangaZERO - comic reader : 0.0 (%) | User rating: 4.5
ikouhoushi : 0.0 (%) | User rating: 0.0
MangaTiara - love comic reader : 0.0 (%) | User rating: 0.0


In [65]:
Reference = []
App_name = []
Value = []
Rating = []
for i in Free_ios:
    if i[-5] == 'Reference':
        Value = float(i[5])
        App_name = i[1]
        Rating = i[7]
        Row = (Value, App_name, Rating)
        Reference.append(Row)
     
total = 0
for i in Reference:
    total += float(i[0])
    
table_sorted = sorted(Reference, reverse = True)
printmd('**iOS apps share in Apple Store Reference genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**iOS apps share in Apple Store Reference genre**



Total number of apps: 17


Bible : 73.09 (%) | User rating: 4.5
Dictionary.com Dictionary & Thesaurus : 14.83 (%) | User rating: 4.0
Dictionary.com Dictionary & Thesaurus for iPad : 4.02 (%) | User rating: 4.5
Google Translate : 1.99 (%) | User rating: 3.5
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 1.37 (%) | User rating: 4.5
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 1.3 (%) | User rating: 4.5
Merriam-Webster Dictionary : 1.25 (%) | User rating: 4.5
Night Sky : 0.9 (%) | User rating: 4.5
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 0.63 (%) | User rating: 4.0
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 0.35 (%) | User rating: 4.0
GUNS MODS for Minecraft PC Edition - Mods Tools : 0.11 (%) | User rating: 4.0
Guides for Pokémon GO - Pokemon GO News and Cheats : 0.06 (%) | User rating: 4.5
WWDC : 0.06 (%) | User rating: 3.5
Horror Maps for Minecraft PE - Download Th

A similar picture is observed in the Books & Reference genre for Android apps, where Google Play Books has 60% of installs and Book genre for iOS apps, where Kindle with its average user rating clearly dominates with 45% of times being rated within the category. At the same time, the smaller number of apps in Apple Store, - 29 vs. 189 in Google Play Store, - can be probably explained by the presence of Apple Books app available for iOS users. All in all, universal products of Apple and Google dominate the competitive landscape here.
<br>
<br>
Nonethess, in the Reference genre of iOS apps we may see that there is one dominating app - **Bible** (also the third most popular app installed in Android smart phones - 6% of total installments within category). It has been rated 73% of times in this genre, while the total amount of apps is only 17 - this niche is likely **not saturated** yet. In addition, there are other apps in this category, which are designed for specific books or games - Quran and Minecraft. The fact that the app for using **Bible stands out** so much is revealing - it could be a good idea to develop a similar app geared to just one book or game. We would recommend to consider such options as an app allowing to read the [world greatest books](https://thegreatestbooks.org/) with some features, like cards with crititcal reviews, brief summaries, etc., or an app for reading religious text and prayers aimed at a specific religion like Buddhists, Judaists, etc. 

Let's proceed further with the genre of navigation.

In [66]:
Navigation = []
App_name = []
Value = []
Rating = []
for i in Free_ios:
    if i[-5] == 'Navigation':
        Value = float(i[5])
        App_name = i[1]
        Rating = i[7]
        Row = (Value, App_name, Rating)
        Navigation.append(Row)
     
total = 0
for i in Navigation:
    total += float(i[0])
    
table_sorted = sorted(Navigation, reverse = True)
printmd('**iOS apps share in Apple Store Navigation genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**iOS apps share in Apple Store Navigation genre**



Total number of apps: 6


Waze - GPS Navigation, Maps & Real-time Traffic : 66.8 (%) | User rating: 4.5
Google Maps - Navigation & Transit : 29.99 (%) | User rating: 4.5
Geocaching® : 2.48 (%) | User rating: 3.5
CoPilot GPS – Car Navigation & Offline Maps : 0.69 (%) | User rating: 4.0
ImmobilienScout24: Real Estate Search in Germany : 0.04 (%) | User rating: 3.5
Railway Route Search : 0.0 (%) | User rating: 3.0


In [67]:
MAPS_AND_NAVIGATION = []
App_name = []
Value = []
Rating = []
for i in Free_android:
    if i[1] == 'MAPS_AND_NAVIGATION':
        Value = float(i[5])
        App_name = i[0]
        Rating = i[2]
        Row = (Value, App_name, Rating)
        MAPS_AND_NAVIGATION.append(Row)
        
total = 0
for i in MAPS_AND_NAVIGATION:
    total += float(i[0])
    
table_sorted = sorted(MAPS_AND_NAVIGATION, reverse = True)
printmd('**Android apps share in Google Play Maps & Navigation genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**Android apps share in Google Play Maps & Navigation genre**



Total number of apps: 123


Waze - GPS, Maps, Traffic Alerts & Live Navigation : 20.07 (%) | User rating: 4.6
Uber : 20.07 (%) | User rating: 4.2
GPS Navigation & Offline Maps Sygic : 10.04 (%) | User rating: 4.4
Free GPS Navigation : 10.04 (%) | User rating: 4.2
Yandex.Transport : 2.01 (%) | User rating: 4.0
Yahoo! transit guide free timetable, operation information, transfer search : 2.01 (%) | User rating: 4.4
TomTom GPS Navigation Traffic : 2.01 (%) | User rating: 4.1
Subway Terminator: Smarter Subway : 2.01 (%) | User rating: 4.3
Moovit: Bus Time & Train Time Live Info : 2.01 (%) | User rating: 4.4
Maps, GPS Navigation & Directions, Street View : 2.01 (%) | User rating: 4.2
MapQuest: Directions, Maps, GPS & Navigation : 2.01 (%) | User rating: 4.1
DB Navigator : 2.01 (%) | User rating: 4.0
Compass : 2.01 (%) | User rating: 4.3
AT&T DriveMode : 2.01 (%) | User rating: 3.3
Yanosik: "antyradar", traffic jams, navigation, camera : 1.0 (%) | User rating: 4.4
Yandex.Trains : 1.0 (%) |

The Navigation genre seems to satisfy the demand to a large extent with a few major apps, backed primarily by Google (e.g. Waze). The number of related apps in the App Store is small, which can be probably explained by the fact that iOS users use Apple Maps. On sum, it would be almost impossible for new apps to gain a significant number of users in this segment without being backed by major tech giants. 

Android operating system is being used much more than iOS worldwide - 72.6% vs. 26.7% of corresponding [market shares](https://gs.statcounter.com/os-market-share/mobile/worldwide) for these two systems. Therefore, the benefits of having a successful app in the Google Play Store are way more promising. For this reason it seems appropriate to consider other top genres there and try to figure out whether there is a niche to conquer. 

Let's start exploring Google Play apps in a greater detail with the genre of video players.

In [68]:
VIDEO_PLAYERS = []
App_name = []
Value = []
Rating = []
for i in Free_android:
    if i[1] == 'VIDEO_PLAYERS':
        Value = float(i[5])
        App_name = i[0]
        Rating = i[2]
        Row = (Value, App_name, Rating)
        VIDEO_PLAYERS.append(Row)
        
total = 0
for i in VIDEO_PLAYERS:
    total += float(i[0])
    
table_sorted = sorted(VIDEO_PLAYERS, reverse = True)
printmd('**Android apps share in Google Play Video Players genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**Android apps share in Google Play Video Players genre**



Total number of apps: 158


YouTube : 25.44 (%) | User rating: 4.3
Google Play Movies & TV : 25.44 (%) | User rating: 3.7
MX Player : 12.72 (%) | User rating: 4.5
VivaVideo - Video Editor & Photo Movie : 2.54 (%) | User rating: 4.6
VideoShow-Video Editor, Video Maker, Beauty Camera : 2.54 (%) | User rating: 4.6
VLC for Android : 2.54 (%) | User rating: 4.4
Motorola Gallery : 2.54 (%) | User rating: 3.9
Motorola FM Radio : 2.54 (%) | User rating: 3.9
Dubsmash : 2.54 (%) | User rating: 4.2
Vote for : 1.27 (%) | User rating: 4.2
Vigo Video : 1.27 (%) | User rating: 4.3
VMate : 1.27 (%) | User rating: 4.2
Samsung Video Library : 1.27 (%) | User rating: 4.4
Ringdroid : 1.27 (%) | User rating: 4.4
MiniMovie - Free Video and Slideshow Editor : 1.27 (%) | User rating: 4.5
LIKE – Magic Video Maker & Community : 1.27 (%) | User rating: 4.6
KineMaster – Pro Video Editor : 1.27 (%) | User rating: 4.5
HD Video Downloader : 2018 Best video mate : 1.27 (%) | User rating: 3.7
DU Recorder – Screen Re

Among video players in Google Play top three apps comprise over 60% of the total number of installments. Two of the are actually Google products. Considering also the fact that there are already 159 apps in this genre it is possible to conclude that the prospects of developing an app here are rather negligible.

Let's proceed further with photography.

In [69]:
PHOTOGRAPHY = []
App_name = []
Value = []
Rating = []
for i in Free_android:
    if i[1] == 'PHOTOGRAPHY':
        Value = float(i[5])
        App_name = i[0]
        Rating = i[2]
        Row = (Value, App_name, Rating)
        PHOTOGRAPHY.append(Row)
        
total = 0
for i in PHOTOGRAPHY:
    total += float(i[0])
    
table_sorted = sorted(PHOTOGRAPHY, reverse = True)
printmd('**Android apps share in Google Play Photography genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**Android apps share in Google Play Photography genre**



Total number of apps: 261


Google Photos : 21.48 (%) | User rating: 4.5
Z Camera - Photo Editor, Beauty Selfie, Collage : 2.15 (%) | User rating: 4.4
YouCam Perfect - Selfie Photo Editor : 2.15 (%) | User rating: 4.5
YouCam Makeup - Magic Selfie Makeovers : 2.15 (%) | User rating: 4.6
Sweet Selfie - selfie camera, beauty cam, photo edit : 2.15 (%) | User rating: 4.6
S Photo Editor - Collage Maker , Photo Collage : 2.15 (%) | User rating: 4.4
Retrica : 2.15 (%) | User rating: 4.3
PicsArt Photo Studio: Collage Maker & Pic Editor : 2.15 (%) | User rating: 4.5
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 2.15 (%) | User rating: 4.6
Photo Editor Pro : 2.15 (%) | User rating: 4.3
Photo Editor Collage Maker Pro : 2.15 (%) | User rating: 4.5
Photo Collage Editor : 2.15 (%) | User rating: 4.2
LINE Camera - Photo editor : 2.15 (%) | User rating: 4.3
Cymera Camera- Photo Editor, Filter,Collage,Layout : 2.15 (%) | User rating: 4.4
Candy Camera - selfie, beauty camera, photo editor : 2.1

Similar competitve landscape can be observed in the Photography genre, though the market concentration is smaller with only one leader - Google Photos (21.5%). It seems that these niche is overly saturated with 261 apps - henceforth only very spesialized products with narrow client base may somehow survive here.

Let's further examine the games genre, where there are so many different apps.

In [74]:
GAME = []
App_name = []
Value = []
Rating = []
for i in Free_android:
    if i[1] == 'GAME':
        Value = float(i[5])
        App_name = i[0]
        Rating = i[2]
        Row = (Value, App_name, Rating)
        GAME.append(Row)
        
total = 0
for i in GAME:
    total += float(i[0])
    
table_sorted = sorted(GAME, reverse = True)
printmd('**Android apps share in Google Play Game genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**Android apps share in Google Play Game genre**



Total number of apps: 850


Subway Surfers : 7.56 (%) | User rating: 4.5
Temple Run 2 : 3.78 (%) | User rating: 4.3
Pou : 3.78 (%) | User rating: 4.3
My Talking Tom : 3.78 (%) | User rating: 4.5
Candy Crush Saga : 3.78 (%) | User rating: 4.4
slither.io : 0.76 (%) | User rating: 4.4
Zombie Tsunami : 0.76 (%) | User rating: 4.4
Yes day : 0.76 (%) | User rating: 4.5
Vector : 0.76 (%) | User rating: 4.4
Trivia Crack : 0.76 (%) | User rating: 4.5
Traffic Racer : 0.76 (%) | User rating: 4.5
Temple Run : 0.76 (%) | User rating: 4.3
Talking Tom Gold Run : 0.76 (%) | User rating: 4.6
Super Mario Run : 0.76 (%) | User rating: 3.7
Sonic Dash : 0.76 (%) | User rating: 4.5
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 0.76 (%) | User rating: 4.6
Smash Hit : 0.76 (%) | User rating: 4.5
Skater Boy : 0.76 (%) | User rating: 4.3
Shadow Fight 2 : 0.76 (%) | User rating: 4.6
Score! Hero : 0.76 (%) | User rating: 4.6


In [71]:
Games = []
App_name = []
Value = []
Rating = []
for i in Free_ios:
    if i[-5] == 'Games':
        Value = float(i[5])
        App_name = i[1]
        Rating = i[7]
        Row = (Value, App_name, Rating)
        Games.append(Row)
     
total = 0
for i in Games:
    total += float(i[0])
    
table_sorted = sorted(Games, reverse = True)
printmd('**iOS apps share in Apple Store Games genre**')
print('\n')
print('Total number of apps:', len(table_sorted))
print('\n')
for entry in table_sorted[:20]:
        print(entry[1], ':', round(float(entry[0])/total*100, 2),'(%)','|','User rating:',entry[2])

**iOS apps share in Apple Store Games genre**



Total number of apps: 1860


Clash of Clans : 4.99 (%) | User rating: 4.5
Temple Run : 4.04 (%) | User rating: 4.5
Candy Crush Saga : 2.25 (%) | User rating: 4.5
Angry Birds : 1.93 (%) | User rating: 4.5
Subway Surfers : 1.65 (%) | User rating: 4.5
Solitaire : 1.59 (%) | User rating: 4.5
CSR Racing : 1.59 (%) | User rating: 4.5
Crossy Road - Endless Arcade Hopper : 1.57 (%) | User rating: 4.5
Injustice: Gods Among Us : 1.43 (%) | User rating: 4.5
Hay Day : 1.33 (%) | User rating: 4.5
PAC-MAN : 1.19 (%) | User rating: 3.0
DragonVale : 1.18 (%) | User rating: 4.5
Head Soccer : 1.13 (%) | User rating: 5.0
Despicable Me: Minion Rush : 1.09 (%) | User rating: 4.5
The Sims™ FreePlay : 1.05 (%) | User rating: 4.5
Sonic Dash : 0.98 (%) | User rating: 4.5
8 Ball Pool™ : 0.98 (%) | User rating: 4.5
Tiny Tower - Free City Building : 0.97 (%) | User rating: 4.5
Jetpack Joyride : 0.95 (%) | User rating: 4.5
Bike Race - Top Motorcycle Racing Games : 0.95 (%) | User rating: 4.5


Finally, the Games genre contains the largest amount of apps in App Store (1866) and second largest in Google Play (858). As might be expected, the competitive landscape is very fragmented without any unanimous leader. Therefore, the competition is very harsh here, while the very nature of entertainment genre implies that after a certain period of hype or, at least, moderate popularity the app is going to fade away from the users' sight. In other words, developing an app in this genre will most likely bring only some short-term moderate benefits.

# **Conclusion**


The ultimate goal of the undertaken data analysis is to figure out the most promising app profile. Using our proxies for genres popularity, - number of installments for Google Play apps and number of rating actions for App Store apps, - we may derive that **books and references genre might be quite promising**. Namely, we recommend potential app developers, which are looking for the most promising niches, to consider designing a free app, aimed at a **specific users' needs**, rather than a universal reading tool, which are in abundance. 
- The fist option could be an app with the **greatest books** of all times or one **extremely popular book** like "Harry Potter", with some special features. 
- The second option is some sort of continuation of the first one - to build an app around an **famous athor or a pesonality**, where books would reflect, for example, certain stages of their intellectual life. 
- The third option could be a feautured app for reading **religious texts** and prayers. 


In addition, the proposed genre can be considered as practical and entertaining at the same time depending on the needs of a particular user. All other major genres are either dominated by the products of tech giants like Google and Apple (e.g. navigation), or very fragmented and super competitive (e.g. games). It is noteworthy that for the former group of apps a high user rating is not necessary to be in the top - smooth accessibility and network externalities seemingly play the decisive role in their prevalence, as shown by our metrics.

Considerng the **strategy to enter** the chosen niche, we would propose the following:
- build the first version of the new app and start first with Google Play, as there are way more users there;
- get the initial feedback from users and proceed with product improvement;
- after approximately 6 months examine the dynamics of the number of users and other major metrics, so that to figure out whether the niche and product are promising in the long-term based on the accumulated empirics;
- in case the first stage of the new app proved to be a success, proceed further with entering Apple Store and continuing improving based of users' iterated feedback and segment's general dynamics. In-app purchases may be introduced on this stage for additional features, which might be popular among users.

Nonetheless, we should also keep in mind that our measures of apps popularity are not perfect. We need additional metrics for that in order to provide better insights to companies, who develop and distribute apps, as well as to other stakeholders. A more effective metric could be the **average time** spent by users in apps of different genres, but such data are probably not openly available.