# Insights from iOS and Google apps to increase revenue.
***

![](https://i0.wp.com/365webresources.com/wp-content/uploads/2016/01/Free-icons-AppStore-and-Google-Play.png?resize=758%2C367&ssl=1)

***
## Front matter
> Our major goal is to give developers insights of what apps are attracting more users and are more profitable, so that they can make data driven decisions on what apps they build.
> We are going to use data science tools to analyze and digest all the information, presenting the results in a friendly and intuitive way.
>The project is divided into Stages for better understanding.

***
***


>## Stage #1

### Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

These are two data sets that seem suitable for our purpose:

   * A data set containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from this [link](https://www.kaggle.com/lava18/google-play-store-apps)
   
   * A data set containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from this [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/data)

#### Let's start by opening the two data sets and then continue with exploring the data : 

In [88]:
opener1 = open("./CSV files/AppleStore.csv")
opener2 = open("./CSV files/googleplaystore.csv")

In [237]:
from csv import reader
read1 = csv.reader(opener1)
read2 = csv.reader(opener2)

In [238]:
appstore = list(read1) # The AppleStore dataset
google = list(read2)   # The GooglePlay dataset

#### Exploring the data... 

- Defining our first function `explore_data` to explore datasets quickly. You could choose to show the count of rows and columns. Default not showing.

In [91]:
def explore_data(dataset, start, end, rows_and_columns=False): # Takes any list of lists, and the rows to show as a slice.
    
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)
        print("\n") # adds a new (empty) line after each row
    
    
    if rows_and_columns:
        print("Number of rows: ", len(dataset))
        print("Number of columns: ", len(dataset[0]))
    
    

#### First we explore the AppleStore dataset we download...

In [92]:
explore_data(appstore, 1, 4 , rows_and_columns=True) # First 3 rows of Apple store apps.w/count.

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows:  7198
Number of columns:  16


In [93]:
print(appstore[0]) # Showing the header columns of Apple store dataset.

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


>Interesting columns for our analysis could be "track_name", "currency", "price", "rating_count_tot", "rating_count_ver" and "prime_genre"

#### GooglePlay dataset explored....

In [94]:
explore_data(google, 1, 4, rows_and_columns=True) # First 3 rows of Google play apps.w/count.

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  10842
Number of columns:  13


In [95]:
print(google[0]) # Showing the header columns of Google play dataset.

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


>Interesting columns here could be: "Reviews", "Installs", "Price", "Content rating", "Genres", "type"

---
#### * For detailed description of the datasets please visit [iOS apps](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/data), [Google apps](https://www.kaggle.com/lava18/google-play-store-apps)
---

* Reading the documentation of the datasets we found that one entry of the googleplay database has one error, so our first step to clean the data is to remove it.

In [96]:
del google[10473] #deleting a row with missing value. Info taken from documentation web.

### Extracting the Headers

In [98]:
google_header = google[0]
google = google[1:]

In [99]:
appstore_header = appstore[0]
appstore = appstore[1:]

>## Stage # 2 


### Functions to search duplicates and row lenght errors.

In [100]:
def check_rows(dataset, rows):
    
    wrong_rows = []
    
    for row in dataset:
        if len(row) != rows:
            wrong_rows.append(row)
    
    if len(wrong_rows) == 0:
        print("All rows with same lenght")
        
    
    return wrong_rows

def check_dup(dataset):
    
    duplicates = []
    
    unique = []
    
    for row in dataset:
        name = row[0]
        reviews = row[3]
        
        if name in unique:
            duplicates.append(name)
        
        else:
            unique.append(name)
    
    return "N of duplicates: ", len(duplicates)

In [101]:
check_rows(google, 13) # Checking errors in google apps rows!!

All rows with same lenght


[]

In [102]:
check_rows(appstore, 16) # Checking errors in appstore apps rows!!!

All rows with same lenght


[]

In [103]:
check_dup(google) # Checking duplicates in google dataset.

('N of duplicates: ', 1181)

* We found 1181 duplicates in GooglePlay store, we won't remove them randomly, we are going to leave the duplicate with the most reviews, interpreted as the updated row.!!

In [104]:
check_dup(appstore) # Checking duplicates in apple store.

('N of duplicates: ', 0)

* Now lets remove the duplicates from the google apps dataset so it keeps just the unique apps more updated in reviews.

* First we define a function to select the mostly reviewed duplicate.

In [105]:
google_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [106]:
def clean_data(dataset):
    
    reviews_max = {}
    
    clean_list = []
    
    added = []
    
    for app in dataset:
        
        name = app[0]
        rev = float(app[3])
        
        if name not in reviews_max:
            reviews_max[name] = rev
        
        elif name in reviews_max and reviews_max[name] < rev:
            reviews_max[name] = rev
            
    for app in dataset:
        
        name = app[0]
        rev = float(app[3])
        
        if name not in added and rev == reviews_max[name]:
            clean_list.append(app)
            added.append(name)
            
    return clean_list
            

In [107]:
print("Expected lenght after removing duplicates: ", len(google) - 1181) # Calculating the expected number of rows after removing duplicates

Expected lenght after removing duplicates:  9659


In [108]:
len(clean_data(google)) # Measuring the lenght of the data set after cleaning function.

9659

In [109]:
google_clean = clean_data(google)
explore_data(google_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:  9659
Number of columns:  13


>#### **Expected lenght equals the actual lenght. ;)

> ### We've got our list with no duplicates and updated, Now lets eliminate non english apps.

>** We are going to build a function that use ASCI property of english characters to filter the apps so only english apps are put in our new cleaner list. If the app name has more than 3 characters 127 ASCI or more its going to be remove.
>
>Lets begin..

In [127]:
def english_check(string):  #This function asses if a string is english or not.
    
    non_ascii = 0
    
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    
    if non_ascii > 3:
        
        return False
    
    else:
        
        return True    

In [240]:
english_check('爱奇艺PPS -《欢乐颂2》电视剧热播') # This app is not of our interest.

False

#### Now the main funcion: `clean_language`

In [138]:
def clean_language(dataset, name_index): # Takes one dataset, and the index (because each dataset has different indexes for the name column)
    
    clean_2 = []
    
    non_eng_apps = []
    
    for row in dataset:
        name = row[name_index]
        
        if english_check(name):
            clean_2.append(row)
        
        else:
            non_eng_apps.append(row)
    
    return clean_2, "N of non english apps: ", len(non_eng_apps)        

In [139]:
len(appstore)

7197

In [140]:
apple_clean = clean_language(appstore, 1)[0]

In [141]:
len(apple_clean)

6183

> We keep only with 6183 apple apps, after removing non english ones.

In [142]:
google_english_only = clean_language(google_clean, 0)[0]

In [143]:
len(google_english_only)

9614

In [145]:
clean_language(google_clean, 0)[1:]

('N of non english apps: ', 45)

>45 apps were non english and been removed from google_clean list, the new list updated is google_english_only. and the new apple dataset is apple_clean. Both are updated with most reviews and only with english apps.

>### Lastly we are going to keep only the free apps, because our company gets revenue from in ads exclusively. 

>For that purpose we need to create a function that filters and remove all the non free apps, evaluated as the price > or == to 0.0 in the apple data set, and using the column **type** in the google dataset. 
Lets do this last cleaning step....

In [158]:
google_header, appstore_header

(['App',
  'Category',
  'Rating',
  'Reviews',
  'Size',
  'Installs',
  'Type',
  'Price',
  'Content Rating',
  'Genres',
  'Last Updated',
  'Current Ver',
  'Android Ver'],
 ['id',
  'track_name',
  'size_bytes',
  'currency',
  'price',
  'rating_count_tot',
  'rating_count_ver',
  'user_rating',
  'user_rating_ver',
  'ver',
  'cont_rating',
  'prime_genre',
  'sup_devices.num',
  'ipadSc_urls.num',
  'lang.num',
  'vpp_lic'])

In [241]:
def free_google(dataset):
    
    final_google = []
    
    for row in dataset:
        
        if row[6] == "Free":
            final_google.append(row)
    
    
    return final_google

In [242]:
final_google = free_google(google_english_only) #final clean of GooglePlay dataset.

In [243]:
len(final_google)

8863

>We are left with 8863 google apps, filtered updated, english only and Free.

In [244]:
def free_apple(dataset):
    
    final_apple = []
    
    for row in dataset:
        if row[4] == "0.0":
            final_apple.append(row)
        
    
    return final_apple

In [245]:
final_apple = free_apple(apple_clean) 

In [246]:
len(final_apple)

3222

***
***
>We are left with 3222 AppleStore apps, that are Free, english only and updated.
***
***

> ## Stage # 3

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

---

* To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

   - Build a minimal Android version of the app, and add it to Google Play.
   - If the app has a good response from users, we then develop it further.
   - If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

---

Because our end goal is to add the app on both the App Store and Google Play, we need to find **app profiles** that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

### Let's begin the analysis by getting a sense of the most common genres for each market. 

>* For this, we build a function `display_table()` that uses our `freq_table()` fx, for the **prime_genre** column of the App Store data set, and the **Genres** and **Category** columns of the GooglePlay data set.

In [247]:
def freq_table(dataset, index):
    
    freq = {}
    
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        
        if value in freq:
            freq[value] += 1
        else:
            freq[value] = 1
        
    table_percent = {}
    
    for key in freq:
        percent = (freq[key] / total) * 100
        table_percent[key] = percent
        
    
    return table_percent

In [254]:
def display_table(dataset, index):
    
    table = freq_table(dataset, index)
    
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    
    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
    
        print(entry[1], ":", round(entry[0],2),"%")

In [258]:
appstore_header

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [259]:
display_table(final_apple, -5) # Showing the percentages from the prime genre column in the applestore apps.

Games : 58.16 %
Entertainment : 7.88 %
Photo & Video : 4.97 %
Education : 3.66 %
Social Networking : 3.29 %
Shopping : 2.61 %
Utilities : 2.51 %
Sports : 2.14 %
Music : 2.05 %
Health & Fitness : 2.02 %
Productivity : 1.74 %
Lifestyle : 1.58 %
News : 1.33 %
Travel : 1.24 %
Finance : 1.12 %
Weather : 0.87 %
Food & Drink : 0.81 %
Reference : 0.56 %
Business : 0.53 %
Book : 0.43 %
Navigation : 0.19 %
Medical : 0.19 %
Catalogs : 0.12 %


### Analyzing the AppleStore "prime genre"

>- The most common genre here is Games with 58% of the Free english apps. The runner up is Entertainment apps with 7.88%. 
>- The majority of the apps in the applestore (english and free) are not for practical usage, instead there are more oriented to entertainment and leisure time.
>- Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

>**The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.**

In [255]:
google_header #Always cheking the headers to get the indexes right.

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [256]:
display_table(final_google, 1) # Frequencies in percentage from the category column of the google apps.

FAMILY : 18.9 %
GAME : 9.73 %
TOOLS : 8.46 %
BUSINESS : 4.59 %
LIFESTYLE : 3.9 %
PRODUCTIVITY : 3.89 %
FINANCE : 3.7 %
MEDICAL : 3.53 %
SPORTS : 3.4 %
PERSONALIZATION : 3.32 %
COMMUNICATION : 3.24 %
HEALTH_AND_FITNESS : 3.08 %
PHOTOGRAPHY : 2.94 %
NEWS_AND_MAGAZINES : 2.8 %
SOCIAL : 2.66 %
TRAVEL_AND_LOCAL : 2.34 %
SHOPPING : 2.25 %
BOOKS_AND_REFERENCE : 2.14 %
DATING : 1.86 %
VIDEO_PLAYERS : 1.79 %
MAPS_AND_NAVIGATION : 1.4 %
FOOD_AND_DRINK : 1.24 %
EDUCATION : 1.16 %
ENTERTAINMENT : 0.96 %
LIBRARIES_AND_DEMO : 0.94 %
AUTO_AND_VEHICLES : 0.93 %
HOUSE_AND_HOME : 0.82 %
WEATHER : 0.8 %
EVENTS : 0.71 %
PARENTING : 0.65 %
ART_AND_DESIGN : 0.64 %
COMICS : 0.62 %
BEAUTY : 0.6 %


### Analyzing the GooglePlay "Category" column

> The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for **practical purposes** (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the **family category** (which accounts for almost 19% of the apps) means mostly **games** for kids.

!["GooglePlayStore pic"](https://camo.githubusercontent.com/9bf24b9efc3d88a3d55f5c09e314987941f0bab5/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f64712d636f6e74656e742f3335302f7079316d385f66616d696c792e706e67)

>#### Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:

In [257]:
display_table(final_google, -4) # Percentages from the genres column in the google apps.

Tools : 8.45 %
Entertainment : 6.07 %
Education : 5.35 %
Business : 4.59 %
Productivity : 3.89 %
Lifestyle : 3.89 %
Finance : 3.7 %
Medical : 3.53 %
Sports : 3.46 %
Personalization : 3.32 %
Communication : 3.24 %
Action : 3.1 %
Health & Fitness : 3.08 %
Photography : 2.94 %
News & Magazines : 2.8 %
Social : 2.66 %
Travel & Local : 2.32 %
Shopping : 2.25 %
Books & Reference : 2.14 %
Simulation : 2.04 %
Dating : 1.86 %
Arcade : 1.85 %
Video Players & Editors : 1.77 %
Casual : 1.76 %
Maps & Navigation : 1.4 %
Food & Drink : 1.24 %
Puzzle : 1.13 %
Racing : 0.99 %
Role Playing : 0.94 %
Libraries & Demo : 0.94 %
Auto & Vehicles : 0.93 %
Strategy : 0.9 %
House & Home : 0.82 %
Weather : 0.8 %
Events : 0.71 %
Adventure : 0.68 %
Comics : 0.61 %
Beauty : 0.6 %
Art & Design : 0.6 %
Parenting : 0.5 %
Card : 0.45 %
Casino : 0.43 %
Trivia : 0.42 %
Educational;Education : 0.39 %
Board : 0.38 %
Educational : 0.37 %
Education;Education : 0.34 %
Word : 0.26 %
Casual;Pretend Play : 0.24 %
Music : 0.2 %
Ra

>The difference between the **Genres** and the **Category** columns is not crystal clear, but one thing we can notice is that the **Genres** column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the **Category** column moving forward.

---

#### Milestone... 

>>Up to this point, we found that the **AppStore** is dominated by apps designed for fun, while **GooglePlay** shows a more balanced landscape of both practical and for-fun apps. 

>Now we are going to study frequencies and relationships between **users** data and **apps** data....let's get started....
>> First we are going to look at the number of reviews each genre has, so we could estimate what apps or genres are most download.

In [260]:
#First we generate a unique dictionary with the names of the genres needed. We start we the cleaned Apple dataset.

unique_genre_apple = freq_table(final_apple, -5)

In [261]:
unique_genre_apple

{'Social Networking': 3.2898820608317814,
 'Photo & Video': 4.9658597144630665,
 'Games': 58.16263190564867,
 'Music': 2.0484171322160147,
 'Reference': 0.5586592178770949,
 'Health & Fitness': 2.0173805090006205,
 'Weather': 0.8690254500310366,
 'Utilities': 2.5139664804469275,
 'Travel': 1.2414649286157666,
 'Shopping': 2.60707635009311,
 'News': 1.3345747982619491,
 'Navigation': 0.186219739292365,
 'Lifestyle': 1.5828677839851024,
 'Entertainment': 7.883302296710118,
 'Food & Drink': 0.8069522036002483,
 'Sports': 2.1415270018621975,
 'Book': 0.4345127250155183,
 'Finance': 1.1173184357541899,
 'Education': 3.662321539416512,
 'Productivity': 1.7380509000620732,
 'Business': 0.5276225946617008,
 'Catalogs': 0.12414649286157665,
 'Medical': 0.186219739292365}

In [206]:
avgs = [] #List of each genre with their corresponding average of ratings per app.

In [212]:
for genre in unique_genre_apple:
    
    total = 0
    
    len_genre = 0
    
    for row in final_apple:
        genre_app = row[-5]
        
        if genre_app == genre:
            user_rat = float(row[5])
            total += user_rat
            len_genre += 1
    
    avgs = round(total / len_genre, 2)
    
    print(genre, ":", avgs) 
    
        

Social Networking : 71548.35
Photo & Video : 28441.54
Games : 22788.67
Music : 57326.53
Reference : 74942.11
Health & Fitness : 23298.02
Weather : 52279.89
Utilities : 18684.46
Travel : 28243.8
Shopping : 26919.69
News : 21248.02
Navigation : 86090.33
Lifestyle : 16485.76
Entertainment : 14029.83
Food & Drink : 33333.92
Sports : 23008.9
Book : 39758.5
Finance : 31467.94
Education : 7003.98
Productivity : 21028.41
Business : 7491.12
Catalogs : 4004.0
Medical : 612.0


- A quick look to the list we see that **Navigation** apps show up with a lot of reviews (aka downloads)

In [269]:
for app in final_apple:
    
    if app[-5] == "Navigation":
        
        print(app[1], ":", app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


>Navigation apps are dominating the ratings and downloads and introducing some bias.

>>The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

>>Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. We could get a better picture by removing these extremely popular apps for each genre and then rework the averages, but we'll leave this level of detail for later.

In [210]:
# Sorting the list of avg reviews, so we can picture the variance of the data,
# the top giants are not representing the average as a whole.

sorted(avgs, reverse = True) 

[86090.33,
 74942.11,
 71548.35,
 57326.53,
 52279.89,
 39758.5,
 33333.92,
 31467.94,
 28441.54,
 28243.8,
 26919.69,
 23298.02,
 23008.9,
 22788.67,
 21248.02,
 21028.41,
 18684.46,
 16485.76,
 14029.83,
 7491.12,
 7003.98,
 4004.0,
 612.0]

In [271]:
for app in final_apple:
    
    if app[-5] == "Reference":
     
        print(app[1], ":", app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


---
---
>**Reference** apps have 74942.11 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating
---
---

***
***
>However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.
>
>This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.
>
>Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

>>   - **Weather apps** — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

>>   - **Food and drink** — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

>>   - **Finance apps** — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.


### Most Popular Apps by Genre on Google Play

In [272]:
display_table(final_google, 5) # Frequencies of the Installs colunns.!!!

1,000,000+ : 15.73 %
100,000+ : 11.55 %
10,000,000+ : 10.55 %
10,000+ : 10.2 %
1,000+ : 8.39 %
100+ : 6.92 %
5,000,000+ : 6.83 %
500,000+ : 5.56 %
50,000+ : 4.77 %
5,000+ : 4.51 %
10+ : 3.54 %
500+ : 3.25 %
50,000,000+ : 2.3 %
100,000,000+ : 2.13 %
50+ : 1.92 %
5+ : 0.79 %
1+ : 0.51 %
500,000,000+ : 0.27 %
1,000,000,000+ : 0.23 %
0+ : 0.05 %


>- One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.
>>- We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.
>>>- To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).


In [217]:
google_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [274]:
unique_google = freq_table(final_google, 1)

In [275]:
unique_google

{'ART_AND_DESIGN': 0.6431230960171499,
 'AUTO_AND_VEHICLES': 0.9251946293580051,
 'BEAUTY': 0.5979916506826132,
 'BOOKS_AND_REFERENCE': 2.1437436533904997,
 'BUSINESS': 4.592124562789123,
 'COMICS': 0.6205573733498815,
 'COMMUNICATION': 3.2381812027530184,
 'DATING': 1.8616721200496444,
 'EDUCATION': 1.1621347173643235,
 'ENTERTAINMENT': 0.9590432133589079,
 'EVENTS': 0.7108202640189552,
 'FINANCE': 3.7007785174320205,
 'FOOD_AND_DRINK': 1.241114746699763,
 'HEALTH_AND_FITNESS': 3.0802211440821394,
 'HOUSE_AND_HOME': 0.8236488773552973,
 'LIBRARIES_AND_DEMO': 0.9364774906916393,
 'LIFESTYLE': 3.9038700214374367,
 'GAME': 9.725826469592688,
 'FAMILY': 18.898792733837304,
 'MEDICAL': 3.5315355974275078,
 'SOCIAL': 2.6627552747376737,
 'SHOPPING': 2.245289405393208,
 'PHOTOGRAPHY': 2.944826808078529,
 'SPORTS': 3.396141261423897,
 'TRAVEL_AND_LOCAL': 2.335552296062281,
 'TOOLS': 8.462146000225657,
 'PERSONALIZATION': 3.317161232088458,
 'PRODUCTIVITY': 3.8925871601038025,
 'PARENTING': 0.

In [287]:
avg_google = []

In [288]:
# Looping, replacing the string and calculating the average.

for category in unique_google:
    
    total = 0
    
    len_cat = 0
    
    for row in final_google:
        
        if row[1] == category:
            string = row[5].replace("+", "")
            string = string.replace(",", "")
            total += float(string)
            len_cat += 1
    
    avg = total / len_cat
    avg_google.append(round(avg, 2))
    
    print(category, ":", round(avg, 2))

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 647317.82
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 8767811.89
BUSINESS : 1712290.15
COMICS : 817657.27
COMMUNICATION : 38456119.17
DATING : 854028.83
EDUCATION : 1833495.15
ENTERTAINMENT : 11640705.88
EVENTS : 253542.22
FINANCE : 1387692.48
FOOD_AND_DRINK : 1924897.74
HEALTH_AND_FITNESS : 4188821.99
HOUSE_AND_HOME : 1331540.56
LIBRARIES_AND_DEMO : 638503.73
LIFESTYLE : 1437816.27
GAME : 15588015.6
FAMILY : 3697848.17
MEDICAL : 120550.62
SOCIAL : 23253652.13
SHOPPING : 7036877.31
PHOTOGRAPHY : 17840110.4
SPORTS : 3638640.14
TRAVEL_AND_LOCAL : 13984077.71
TOOLS : 10801391.3
PERSONALIZATION : 5201482.61
PRODUCTIVITY : 16787331.34
PARENTING : 542603.62
WEATHER : 5074486.2
VIDEO_PLAYERS : 24727872.45
NEWS_AND_MAGAZINES : 9549178.47
MAPS_AND_NAVIGATION : 4056941.77


In [290]:
sorted(avg_google, reverse = True)

[38456119.17,
 24727872.45,
 23253652.13,
 17840110.4,
 16787331.34,
 15588015.6,
 13984077.71,
 11640705.88,
 10801391.3,
 9549178.47,
 8767811.89,
 7036877.31,
 5201482.61,
 5074486.2,
 4188821.99,
 4056941.77,
 3697848.17,
 3638640.14,
 1986335.09,
 1924897.74,
 1833495.15,
 1712290.15,
 1437816.27,
 1387692.48,
 1331540.56,
 854028.83,
 817657.27,
 647317.82,
 638503.73,
 542603.62,
 513151.89,
 253542.22,
 120550.62]

- On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [229]:
for app in final_google:
    
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        
        print(app[0], ':', app[5])


WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

---
>If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:
---

In [231]:
under_100_m = []

for app in final_google:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
round(sum(under_100_m) / len(under_100_m), 2)


3603485.39

>We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

>>Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.
>>The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

>>>The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

---
## Stage # 4
---
### Let's take a look at this genre more deeply:
---

In [291]:
for app in final_google:
    
    if app[1] == "BOOKS_AND_REFERENCE":
        
        print(app[0], ":", app[5])
        

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [234]:
for app in final_google:
    
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


> However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [236]:
for app in final_google:
    
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

>This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.
>>We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.
>>>However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

# Conclusions...

>In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

>We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.