###          Analysis of Apps on Google Play Store And Apple Store that are Likely to Attract More Users

This is Portfolio Project, analyzes Apps from Google Play Store and Apple store, in order to provide insights 
on the type of Apps that an imaginary company's developers should create, that will help boost the profitability of the 
imaginary company.

The goal of this project is to provide informed decisions to our imaginary company's developers, on what free Apps are likely to attract more users. The number of Apps on Google Play Store and Apple Store as of september 2018 is approximately 4 million with Apps on Google Play Store estimated at 2.1 million and Apps on Apple Store estimated to be 2 million. (Source: [Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)). Since the number of Apps on Google Play Store and Apple store is fairly large, it will require a great deal of time and money, hence we'll try analyzing a sample of the Apps from both stores. 

Our inability to source for this dataset ourselves led us to search for other sources from where we can get reliable data that suits the purpose of our analysis. Fortunately, we found some sources that matches our goals:
* A [Kaggle DataSet](https://www.kaggle.com/lava18/google-play-store-apps) gotten from this [link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv), that contains approximately ten thousand Android Apps from Google Play Store.
* A [Kaggle DataSet](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) obtained via the [link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv), that contains approximately seven thousand iOS Apps from Apple Store.

In [1]:
# Apple iOS Apps DataSet
opened_file = open('../Dataset/Applestore.csv', encoding = 'utf8')
from csv import reader
read_file = reader(opened_file)
ios_apps_with_header = list(read_file)
ios_header = ios_apps_with_header[0]
ios_apps = ios_apps_with_header[1:]

# Google Play Apps DataSet
opened_file = open('../Dataset/googleplaystore.csv', encoding = 'utf8')
from csv import reader
read_file = reader(opened_file)
android_apps_with_header = list(read_file)
android_header = android_apps_with_header[0]
android_apps = android_apps_with_header[1:]

Next we'll write a function named `explore_data()`, that we can use repeatedly to print any rows in both datasets.

The function will also be able to show the length of any column in our dataset

In [2]:
def explore_data(dataset, start, end, row_and_column = True):
    dataset_slice = dataset[start:end] # Added a slice so we can print a limited number of rows
    for row in dataset_slice:
        print(row)
        print('\n')
    if row_and_column == True:
        print('Number of rows is', len(dataset))
        print('Number of columns is', len(dataset[3]))

print(android_header, '\n')
explore_data(android_apps, 0, 4, True)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows is 10841
Number of columns is 13


The first printed row shows the header column of Android Apps. The number of Android Apps in our dataset is 10,841 and they are grouped into 13 column descriptions. From the objectives of our analysis, we find that the relevant columns we need for our analysis are;
* App
* Category
* Reviews
* Installs
* Type
* Price
* Genres

In [3]:
print(ios_header, '\n')
explore_data(ios_apps, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows is 7197
Number of columns is 16


There are 7,197 iOS Apps in our dataset, and they are categorized with into 16 column descriptions. The relevant columns for our analysis includes:
* Track_name
* Price
* Rating_count_tot
* Rating_count_ver
* Prime_genre
Some of these column descriptions are not self-explanatory, hence we included the [dataset documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/)

# Detecting incorrect Data in our dataset

We observed that some entries in our data set doesn't match with what we expect, and we assume that we don't know which particular row entry is incorrect. Thus we wrote a function that identifies any incorrect entry and prints the index of such entry. Specifically the `detecting_incorrect_entry` function checks for entries whose length or number of columns does not match with the number of columns in the header. In other words, missing columns.

We can use this function for our iOS dataset as well as for our Google Play dataset.

In [4]:
def detecting_incorrect_entry(dataset, dataset_header):
    for row in dataset:
        header_len = len(dataset_header)
        row_len = len(row)
        if header_len != row_len:
            print(row)
            print(dataset.index(row))

print(detecting_incorrect_entry(android_apps, android_header))
detecting_incorrect_entry(ios_apps, ios_header)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472
None


The incorrect entry is detected to be in `row 10,472` for Google Play apps dataset, and there is no incorrect entry of such nature in the iOS app dataset. This incorrect entry is deleted using the `del` statement.

In [5]:
print(len(android_apps))
del android_apps[10472]
print(len(android_apps))

10841
10840


## Detecting and Removing duplicate Apps

Some Apps have duplicates, that is, they appear more than once in our dataset. For accuracy of our analysis, we need to remove these duplicate Apps.

We wrote the function `remove_duplicate_apps` that takes a list of lists as its argument, and checks if an App existed before in any of the rows and appends it to the list of `duplicate_apps`. And if the App is unique it is appended as a list of `unique_apps`. The function goes ahead to print the length of both duplicate and unique Apps respectively, and a few example of the duplicate Apps

In [6]:
def apps_duplicate(app_name_row):
    unique_apps = []
    duplicate_apps = []
    for app in app_name_row:
        app_name = app[0]
        if app_name in unique_apps: # if the app already exists in unique apps, move it over to the duplicate apps list
            duplicate_apps.append(app_name)
        else:
            unique_apps.append(app_name) # if the app has not existed before, it is appended to the unique apps list 
    print('Number of duplicate Apps:', len(duplicate_apps),)
    print('Number of unique Apps:', len(unique_apps))
    print('Example of duplicate Apps:', duplicate_apps[0:5], '\n')

duplicated_ios_apps = apps_duplicate(ios_apps)
duplicated_android_apps = apps_duplicate(android_apps)
print(duplicated_ios_apps)
print(duplicated_android_apps)

Number of duplicate Apps: 0
Number of unique Apps: 7197
Example of duplicate Apps: [] 

Number of duplicate Apps: 1181
Number of unique Apps: 9659
Example of duplicate Apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings'] 

None
None


Using the logic in the for loop above, for the first iteration in the loop, the `unique_apps` list is still empty and therefore the `else` statement is executed, the second, third etc iterations follow the same process, because for these ones, they haven't existed before in the `unique_apps` list, so its the `else` statement that gets executed. And remember that the else statement appends the `app_name` to `unique_apps` list. So `unique_apps` list continues updating Until it meets an app that has been added before... Only then will the `if` statement get executed, which then appends that `app_name` to the `duplicate_apps` list.

The Above output shows that none of the iOS Apps have any duplicates, but 1,181 Android Apps have duplicates

## Method of removing duplicates

Since some Apps have duplicates that could lead us to incorrect analysis, we need to remove these duplicate Apps. This could be done randomly, but we prefer to use a more systematic method for removing these duplicate Apps. 

A glance at the code output below, where we printed and example of a duplicate App **(instagram)**, shows that, a difference occurs in the various duplicates and this difference is seen in column four that shows the App reviews. The different App reviews is an indication that the data was collected at different times.

In [7]:
for app in android_apps:
    name = app[0]
    if name == 'Instagram':
        print(app[3]) # app[3] is the reviews column of all duplicate apps whith the name 'instagram'

66577313
66577446
66577313
66509917


This could lead us to remove the duplicate Apps using the criterion of **App with highest review**. Thus we will only keep the row with highest reviews and remove the other duplicate rows.

### Creating a dictionary

We can accomplish the above task by first creating a dictionary that takes an App name as key and the highest number of reviews of the App as value.

We started the steps by first creating an empty dictionary called `reviews_max`, then we looped through the `android_apps` list and for each iteration, we:
* Assigned the name of Apps to a variable called name
* And the reviews column to a vairiable called n_reviews

Next we created a condition that, if the name has not been found in the `reviews_max` dictionary while looping, (for which it isn't found, because the dictionary is empty for the first iteration) update the value of that current app to the number of reviews. That is, create a new key, value pair. This code keeps running and updating the dictionary key:value because the `app name` has not been found in the `reviews_max` dictionary before. So it kind of creates a dictionary of all unique apps.


Else, if the app_name has been found in the reviews_max dictionary before, and we get another same name, we need to check if the current review value of this current app is less than the value we already set for that app in the `reviews_max` dictionary. 

* That is, the app is already in our `reviews_max` dictionary and its current review value is less than the one we have already added in the if part, we just update the same app with the higher review value.

In [8]:
reviews_max = {}

for row in android_apps:
    name = row[0]
    n_reviews = float(row[3])
    
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
    elif name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


**PS: The dictionary above is not a part of our data set, it just helps us to store app names as keys and takes the highest number of app reviews as its values, hence it holds just two indicators, "App name and App reviews" and will be used to clean our main android data set of all its duplicate entries.**

In the code block below;
* We created two empty lists which we called, `android_clean` and `already_added`
* Looped through the `android_apps` data set (which excludes the header row), and for each iteration, we append the name of an app to the `name` variable and append the app review to the variable named `n_reviews`.

In the IF condition;

* we are appending an app (row) together with all its components(columns) to the `android_clean` list and
* Appending just the name of the app to the `already_added` list, only IF;

**The number of reviews of the current app is the same as the app's value in our `reviews_max` dictionary and
**The name of the app name has not already been found in the `already_added` list.**

This second condition needs to be added because, the `already_added` list is used to track apps that have been added to the `android_clean` list. So the appendance occurs once an app has met the first condition `( n_reviews == reviews_max )` and the name of the app is not in the `already_added` list (of course all apps are not in the empty `already_added` list yet). This process continues until the looping is done over apps whose names already appear in the `already_added` list, then the appendace won't work over such apps.
The second condition is required because, some apps have the same number of reviews and everything else equal, in other words, they are perfect duplicates and the only way to remove such duplicates is to prevent them from appearing twice, thus the introduction of the second condition

In [9]:
android_clean = [] # this list will store our newly cleaned dataset
already_added = [] # this list will store just app names

for row in android_apps:
    name = row[0]
    n_reviews = float(row[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)

Next we explore our android data set using the `explore_data` function we created before to ensure that it has been rid of all duplicate entries, print a few rows and measure its length

In [10]:
explore_data(android_clean, 0, 3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows is 9659
Number of columns is 13


# Filtering Non-English Apps

The code block below, defines a function called `english_or_not` and it is intended to filter out non-english Apps, since our organization is only indulged in creating English Apps.

It is expected to take in a string as its argument and return `True` if an individual character (be it a letter or number or symbol or punctuation marks) unicode is less than 127, and `False` if it is greater than 127

The margin of 127 is used because, according to the 'American Standard Code for Information Interchange' (ASCII), the numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127.

In [11]:
def english_or_not(app_name):

    for character in app_name:
        
        if ord(character) > 127  :
            return False
        
    return True

print(english_or_not('instagram'))
print(english_or_not('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_or_not('Docs To Go™ Free Office Suite'))
print(english_or_not('Instachat 😜'))

True
False
False
False


The above code block doesn't work as expected because, we see from the output that some English Apps returned `false`. This is due to the characters such as 'TM' and the 'smiley' in 'Docs To Go (TM) Free Office Suite' App and 'Instachat(smiley' App respectively.

To correct this anomaly, we update the function to return `False` only for Apps whose unicode character  `ascii_range` is greater than 3 and `True` for Apps that have less than 3 non-ascii characters

### Updated English Filtering Function

In [12]:
def english_or_not(app_name):
    
    ascii_range = 0
    
    for character in app_name:
        if ord(character) > 127:
            ascii_range += 1   # increment the ascii_range by 1 for each app name iteration whose character is greater than 127
            
        if ascii_range > 3:
            return False
        
    return True
print(english_or_not('instagram'))
print(english_or_not('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_or_not('Docs To Go™ Free Office Suite'))
print(english_or_not('Instachat 😜'))

True
False
True
True


The above function works fine although it is not flawless in perfoming its task, but we'll leave it at that for now.

### using the English filter function to filter out Non english Apps in Google Play and iOS Apps

The code block below filters out non-English apps from both data sets.
Firstly, we creates two empty lists `android_english` and `ios_english`, then we Loops through each data set, call the `english_or_not` function and sets the argument of the `english_or_not` function to be the name column of each data set. 

If `True`, we Append such Apps to the empty list created, thereby creating a new list of english Apps only, which meets the criteria of the `english_or_not` function we created above

We print a few rows using the `explore_data` function created initially

In [13]:
android_english = []
ios_english = []

for apps in android_clean:
    if english_or_not(apps[0]) == True:
        android_english.append(apps)
        
for apps in ios_apps:
    if english_or_not(apps[1]) == True:
        ios_english.append(apps)

print(explore_data(android_english, 0, 3), '\n')
print(explore_data(ios_english, 0, 3))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows is 9614
Number of columns is 13
None 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12

Now we have just 9,614 android Apps and 6,183 iOS Apps

## Final Data Cleaning Process

#### Isolating The free Apps

At the inception of this analysis, we assumed that we are working for a company that creates only free Apps.
But our two data sets contains both free and paid apps, hence in this final process, we are going to select only Apps that are free in both stores.

We start by:
* checking the column that decribes the price status of an App. Our discovery shows that for the iOS data set, the column for App prices is column 4 while the column for prices of googleplay store Apps is column 7

In [14]:
free_android = []
free_ios = []

for app in android_english:
    price = app[7]
    if price == '0':
        free_android.append(app)
        
print(len(free_android))
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        free_ios.append(app)

print(len(free_ios))

8864
3222


The final Data Set cleaning process left us with a total of 8,864 android Apps and 3222 iOS Apps. This is what will be used for our analysis.

## Actual Data Analysis Section

#### Context for selection of App profiles that fit both the Google Play Store and the Apple Store

We mentioned in the introduction, that our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

In order to minimize risks and costs, our validation strategy for an app idea is comprised of three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

## Most common App By Genre
We begin our analysis by getting an idea of the most popular genre of apps in both stores. The relevant columns for this analysis are the `genre` and `Category` columns for the googleplay store data set, which is located in column 9 and 1 respectively, and the `prime_genre` column for the Applestore data set, located in column 11.

To do this we could create two functions, `freq_table` and `display_table`
* The `freq_table` function is expected to create a frequency table for any Apps column and show the percentage of items in the column.
* The `display_table` function is expected to show the percentages in descending order

The `freq_table` function below is expected to take in a data set (list of lists) as its first argument and an integer as its second argument.
* In the function's body, we created an empty list and named it `table`.
* Initialized the variable `total` to `0`, which is expected to count the total number of rows in the data set
* Next we looped through the function's first parameter (which is expected to be a list of lists)
* And for every iteration, we increment the value of `total` by one, (that is, for each row encountered, the value of total will be increased by one) and assigned the column that is expected to represent an App's `genre` (or any other column we that may want to perform the same analysis on) to a variable named `value`
* If any `value` (column) is found in the table (the empty dictionary we created) as key, we count that `value` or column and increment its value by 1. (of course for the first loop, value would not be found in the empty dictionary as key because the dictionary is still empty)
* The `else` statement runs and creates a new pair in the dictionary, That has the column represented here by `value` as its key and sets the value of the key to 1
* If the loop encounters a particular value in the column as key again in the table, the `if` statement runs
* `table` dictionary now holds as key, any column called by the `index` parameter in the `freq_table` function, and the counts of each item in the column as value.

##### part two of the same function

The second part of the `freq_table` function, converts the value of keys in the `table` dictionary to percentages and returns the new dictionary `table_percentage`, which now holds the key:value pair of any column (with values being in percentages). 
This is executed in the folloeing steps:

* A new empty dictionary is initialilized and assigned to a varible called `table_percemtage`
* The old dictionary `table` is looped over (with its key being any column)
* Gets the value of keys in the `table` dictionary (the counts of items in any row) and divides it by `total` (which now hold the total numeber rows in the dataset).
* Still on the same line, Multiplies the result by 100 (in order to get the percentage)
* The `round` function is called on this expression and all the resulting values are rounded to 2 decimal places.
* The whole function (`round`) is assigned to a variable named `percentage_of_items`
* A new key:value pair is created for the `table_percentage` dictionary, where the value of items is set to be the `percentage_of_items`
* The `table_percentage` dictionary is returned.

In [15]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        
        if value in table:
            table[value] += 1
    
        else:
            table[value] = 1
    
    
    table_percentage = {}
    for key in table:
        percentage_of_items = round(table[key] / total * 100, 2)
        table_percentage[key] = percentage_of_items
        
    return table_percentage

def display_table(dataset, index):
    table_freq = freq_table(dataset, index)
    table_display = []
    
    for key in table_freq:
        key_value_as_tuple = (table_freq[key], key) # this set the key's value before the key itself converts to a tuple
        table_display.append(key_value_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


## Analysis of Frequency of iOS Apple Store Apps by Genre

In [16]:
display_table(free_ios, 11)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


From the above outputs, it is seen that the most common genre of free english Apps in the Apple store is the Games genre of Apps, representing about 58% of total free english Apps on the store.
The App genre that follows next, though with large margin to the Games genre is the entertainmet genre, that holds about 8% of all free english Apps. Closely followed is the Photo & Video genre holding  about 5% of free english Apps. Genres such as Education, Social networking, Shopping, utilities, sports, music and Health & fitness follows closely, as they each hold at least 2% of all free english Apps. 

Genres with the least Apps are weather, food & drinks, reference, business, book, navigation, medical and catalogs, as each of them holds less than 1% of all free english Apps in Apple iOS store.

The general impression is that most free english Apps in the Apple store are designed for entertainment and fun purposes. This is because the majority of such free english Apps hold genres such as Games, Entertainmet, Photo and Video, sports, social networking, music and just these genres account for about 79% of total free english Apps in the App store. Apps designed for practical purposes with genres like Education, shopping, utilities, productivity and lifestyle are uncommon in the Apple store and they represent just about 12% of total free english Apps. However, the fact that most Apps are of the entertainment genre doesn't imply that such Apps have the Highest number of user. The supply of such Apps might acutally be greater than the demand for them. 

Although this frequency table does not represent the genres that is most downloaded by user, it rather just shows the genres that are most popular in the iOS App Store.

## Analysis of Google PlayStore Android Apps by Genre

In [17]:
display_table(free_android, 9)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

The representation of of Apps in the Google PlayStore is quite dissimilar from what we saw in the iOS store. The genres in google playstore are more balanced when compared with the Apps store genres. Altough both stores show that Apps designed for entertainment purposes have a significant representation, ranking amongst the top two Genres in both stores. Education Apps also have a good representation in terms of ranking in the both stores, falling within the top four of both stores.

It is also evident in the google playstore that practical purpose Apps are common. This is because genres such as tools, education, business, productivity, finance and medical fall within the top 10 Apps and represent about 30% of all free english Apps in the google playstore.

Although this frequency table does not represent the genres that is most downloaded by user, it rather just shows the genres that are most popular in the google playstore.

## Analysis of Google PlayStore Android Apps by Category

In [18]:
display_table(free_android, 1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


Following the category description of google playstore Apps, Family category, has the highest representation with about 19% of total Apps, followed by the games category with 10% representation of total free english Apps in the google playstore. There are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.).

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

## Analysis of iOS Apps with Most Users
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy for number of installs. We can find this in the rating_count_tot column. This column measures how many users rated each App in the AppStore (since a user can only rate an App he/she has used, This hopes to serve as a good proxy variable).

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:

* Isolate the apps of each genre.
* Sum up the user ratings for the apps of that genre.
* Divide the sum by the number of apps belonging to that genre (not by the total number of apps).
To calculate the average number of user ratings for each genre, we'll use a for loop inside of another for loop.
*for a nested loop, for each iteration of the parent loop, the entire iterations of the child loop runs.

**Steps**:
* We created a variable called `prime_genre` which holds our `freq_table` function (the `freq_table` function is expected to create a frequency table of `prime_genre` column for AppStore dataset and show the percentage of items in the column)
* `prime_genre` is expected to hold the unique App genres
* We looped through the `prime_genre` and for each iteration, we initialized a variable `total` to zero (this variable is expected to hold the sum of all ratings in a particular genre).
* Initialized a new variable `len_genre` to zero (this variable is expected hold the number of apps in each genre)
* Created a nested loop, which iterates through each App in the free iOS Apps and for each iteration, appends the reviews column to a variable called `genre_app`
* If OR each tim `genre_app` is same as `genre`, which is true because `genre` (the iterable variable of the `prime_genre` which now holds the) is the unique Apps genre and `genre_app` also holds the unique apps of the `free_ios` Apps.
* Assign the number of user ratings for each App to a variable called `user_rating`
* Increment the value of `total` to the `user_rating`, that is, the `user_rating` sums up each time the if condition holds true
* Increment the length of each genre by 1.
* Created a variable named `avg_user_rating` outside the nested for loop which holds the average user rating for each genre
* Print `genre_app` with its corresponding average user rating

In [19]:
prime_genre = freq_table(free_ios, 11)

for genre in prime_genre:
    total = 0
    len_genre = 0
    
    for app in free_ios:
        genre_app = app[11]
        
        if genre_app == genre:
            user_rating = float(app[5])
            total += user_rating # this gives the total number of users who rated an app in each genre
            len_genre += 1  # this give the length of an individual genre
    avg_user_rating = total / len_genre
    print(genre, ':', avg_user_rating)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Going by the output above, showing App genres and the average number of all ratings given to individual Apps in each genre. We can see that the App genre with the highest average rating is the `Navigation` genre with an average of 86,090 Apps. This is followed by the `Reference` genre with 74,942 Apps and `Social Netwroking` genre with 71,548 Apps. 

This means that on the average, 71,548 users gave ratings to Apps that falls under the `Social Networking` genre, 86,090 users gave ratings feedback to Apps that have the `Navigation` genre while 74,942 users rated Apps that falls under `Reference` genre.

Very few users gave ratings feedback to Apps that fall under `Medical` genre, this is based on the average ratings of 612 for all Apps under the genre.

#### Detailed Analysis of Large Genres in the AppStore

##### The Navigation Genre

In [20]:
for app in free_ios:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Analysis of Apps that fall under the `Navigation` genre, shows that just two Apps dominates the genre and are thus responsible for the responsible for the large figure we got. The genre is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold.

##### Reference App Genre
Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:

In [21]:
for app in free_ios:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

##### Social Networking App Genre

In [22]:
for app in free_ios:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Social Networking genre shows that Apps like facebook, pinterest, skype, messenger and some other social media giants are responsible for the large average we obtained. Building a social media App might require our company to start up a new company (new social media) which might not be the company's immediate objectives. But going with the above analysis, we could build an App that embeds in it all othe popular social media Apps, thus making the App a one-stop-App for all popular social networks.

## Analysis of GooglePlayStore Apps with Most users (Installs)
This time around we are going to anayze the google play store Apps with most users. One way we can do this is by using the `Installs` column in our google play dataset.

From our previous Analysis, we discovered that there was no clear cut difference between the `Category` column and the `genre` column of our google play dataset. Thus we will be using the `Category` column because this column seems to group its items with less categories than the `genre` column.

This follows the same steps as the iOS data set analysis except that the method `str.replace('old', 'new')` is used here to convert the installs to a float, which can only be done by removing commas(,) and pluses(+) in the data set, else it raises an error.

In [23]:
android_category = freq_table(free_android, 1)

for category in android_category:
    total = 0
    len_category = 0
    
    for app in free_android:
        app_category = app[1]
        
        if app_category == category:
            installs = app[5].replace('+', '')
            installs = float(installs.replace(',', ''))
            total += installs
            len_category += 1
            
    avr_num_installs = total / len_category
    print(category, ':', avr_num_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

From the above analysis, we can see that Apps that falls under the communication genre, on the average have the highest number of installs (38,456,119), This is followed by video player Apps, that on the average, have about 24,727,872 installs and Apps with social genre have about 23,253,652 installs on the average. Photography App genre has about 17,840,110 installs on average and games genre on the average have 15,588,015 installs.

A closer look at these genres will reveal to us that our results are skewed to very high extent because of some few App giants that command over a billion installs, five hundred million installs and a hundred million installs. If we run this check in our communications genre and remove such Apps we may have a more reduced number, thus;

In [24]:
for app in free_android:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Above are the free Apps on google play store that have more than hundred million installs. They are not so many but they largely led to the skewed mean we obtained (38,456,119). Other communication Apps on store that have less than 100,000,000 installs includes:

In [25]:
less_than_100m = []

for app in free_android:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        less_than_100m.append(float(n_installs))
        
    print(app[0], ':', app[5])

0+
F.A Sumon songs : 1,000+
NCLEX-RN Q&A FLASH CARDS - FA Davis : 1,000+
LEADS FA : 1,000+
Farm Heroes Saga : 100,000,000+
All AJK Board Matric Fa Fsc Results : 10+
ESPN Fantasy Sports : 5,000,000+
Heart of Vegas™ Slots – Free Slot Casino Games : 10,000,000+
Fallout Shelter : 10,000,000+
Authy 2-Factor Authentication : 1,000,000+
Facejjang : 10,000,000+
Block Puzzle - Wood Legend : 10,000,000+
Soccer 2018 : 10,000,000+
Flash Light on Call & SMS : 1,000,000+
Fart sound pranks : 50,000,000+
Facebook Creator : 1,000,000+
Swift for Facebook Lite : 500,000+
Friendly for Facebook : 1,000,000+
Facebook Pages Manager : 50,000,000+
Mini for fb lite : 100,000+
Faster for Facebook Lite : 1,000,000+
Facebook Ads Manager : 1,000,000+
Who Viewed My FB Profile : 10,000+
Puffin for Facebook : 500,000+
Lite for Facebook Messenger : 1,000,000+
Profile Tracker - Who Viewed My Facebook Profile : 500,000+
Who viewed my fb profile pro★★ : 100,000+
Video Downloader For FB: Save FB Videos 2018 : 50,000+
Pink 

We can see from the above output that more Apps have less than 100 million installs

##### Calculating the mean of Communication Apps with over 100 million installs and those with less than 100 million installs

In [26]:
over_100m_installs = []

for app in free_android:
    n_installs = app[5]
    n_installs = n_installs.replace('+', '')
    n_installs = n_installs.replace(',', '')
    
    if app[1] == 'COMMUNICATION' and (float(n_installs) >= 100000000):
        over_100m_installs.append(float(n_installs))
        
print(sum(over_100m_installs) / len(over_100m_installs))
print(sum(over_100m_installs) / len(less_than_100m))

374074074.0740741
38846153.84615385


We can see that communication Apps that have over 100 million installs have greater mean value than communication Apps with less than 100 million installs, despite having a smaller amount of Apps in the its category (just 27), while Apps with less than 100 million installs have over 260 Apps in its category. The removal of communication Apps with over 100 million installs reduced the average by roughly 10 times.

We see the same pattern for the video players category, which is the runner-up with 24,727,872 average installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [27]:
for apps in free_android:
    if apps[1] == 'BOOKS_AND_REFERENCE':
        print(apps[0], ':', apps[5])
    

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular book apps that skew the average:

In [28]:
books_with_100m_above_installs = []

for app in free_android:
    n_installs = app[5].replace(',', '')
    n_installs = n_installs.replace('+', '')
    
    if (app[1] == 'BOOKS_AND_REFERENCE') and (float(n_installs) >= 100000000):
        books_with_100m_above_installs.append(float(n_installs))
        
        print(app[0], ':', app[5])
        
print('\n', 'mean of books with over 100m installs:', sum(books_with_100m_above_installs) / len(books_with_100m_above_installs))

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+

 mean of books with over 100m installs: 280000000.0


The above output shows that books with over 100 million installs are quite few but significantly contributed to the high mean value we obtained previously (280,000,000). But we can still rely on the potential of this genre or market because there only a few popular Apps and the market is not saturated with popular Apps like the communication or social networking genres.

Below we revealed free books and reference Apps that have less than a 100 million installs and showed their mean.

In [29]:
books_with_less_than_100m_installs = []

for app in free_android:
    n_installs = app[5].replace(',', '')
    n_installs = n_installs.replace('+', '')
    
    if (app[1] == 'BOOKS_AND_REFERENCE') and (float(n_installs) < 100000000):
        books_with_less_than_100m_installs.append(float(n_installs))
        
        print(app[0], ':', app[5])
        
print('\n')
print('mean of books with less than 100m installs:',sum(books_with_less_than_100m_installs) / len(books_with_less_than_100m_installs))

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
English translation from Bengali : 1

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

The mean of books with less than 100 million installs(1,437,212) is over 190 times smaller than books with 100 million and above installs. This shows that although the market is full of book Apps, the very popular free book Apps amongst users are very few and books Apps that saturates the market are 'not so popular'.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

# conclusion

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.