#  **Apps Data Analysis**
### **Introduction**:
This project helps a company that builds free to download Android and iOS mobile apps, where the main source of the revenue consists of in_app ads.

### **Goal**: 
Our goal for this project is to analyze data of android and iOS apps to help our developers understand what type of apps are likely to attract more users.


In [1]:
## Open the file ##

open_file_iOS = open('AppleStore.csv')
open_file_Goo = open('googleplaystore.csv')
from csv import reader
read_file_iOS = reader(open_file_iOS)
read_file_Goo = reader(open_file_Goo)

iOS_data = list(read_file_iOS)
Goo_data = list(read_file_Goo)

## Define the Header ##

iOS_header = iOS_data[0]
Goo_header = Goo_data[0]

## Remove the Header Row ##

iOS_data = iOS_data[1:]
Goo_data = Goo_data[1:]

def explore_data(dataset, start, end, Header_row = False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if Header_row:
        print('Number of rows:', len(dataset[1:]))
        print('Number of columns:', len(dataset[0]))
    else:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print('iOS apps info:')       
explore_data(iOS_data,0,2)
print('\n')
print('GooglePlay apps info:')
explore_data(Goo_data,0,2)

print('\n')

print(iOS_header)
print(Goo_header)


iOS apps info:
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 7197
Number of columns: 17


GooglePlay apps info:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'l

- Here we identify several columns in iOS apps which could be useful for our analysis: 'id', 'price', 'user_rating', 'prime_genre'
- And for GooglePlay: 'App', 'Category', 'Reviews', 'Installs', 'Price', 'Genres'

## 1. Data Cleaning:
- identify the error ( repeated rows or missing column).
- eliminate non-English apps ( Since our target is English apps).

First we identify the rows with missing column:

In [2]:
print(Goo_header)

## Here I print out the row which contains error and the row after that.
print(Goo_data[10472:10474])

## Here I delete the row using the 'del' function.
del(Goo_data[10472])
print('\n')

## Print out row 10472 to see whether it was deleted.
print(Goo_data[10472])



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
[['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'], ['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']]


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In order to find the error with missing column in the data, I wrote a for `loop` combined with `if`.

In [3]:
for row in Goo_data:
    if len(row) != len(Goo_header):
        print(row)
        
for row in iOS_data:
    if len(row) != len(iOS_header):
        print(row)

Since there's no output of this for-loop, there's no row with missing column.

Next, we identify repeated rows in GooglePlay's data:


In [4]:
repeated_row = []
unique_row = []

for row in Goo_data:
    if row in unique_row:
        repeated_row.append(row)
    else:
        unique_row.append(row)
        

print(len(repeated_row))
print(repeated_row)


483
[['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'], ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device'], ['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up'], ['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up'], ['join.me - Simple Meetings', 'BUSINESS', '4.0', '6989', 'Varies with device', '1,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 16, 2018', '4.3.0.508', '4.4 and up'], ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July

Here I try to find the repeated rows, but then I realized that there might be some rows with the same app name but different rating or last updated date since the data was collected at different time points.

So I modified the code in order to find all the repeated rows.

In [5]:
repeated_apps = []
unique_apps = []

for row in Goo_data:
    name = row[0]
    if name in unique_apps:
        repeated_apps.append(name)
    else:
        unique_apps.append(name)

print(len(repeated_apps))
print(repeated_apps)

1181
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express', 'Accounting App - Zoho Books', 'Invoice & Time Tracking - Zoho', 'join.me - Simple Meetings', 'Invoice 2go — Professional Invoices and Estimates', 'SignEasy | Sign and Fill PDF and other Documents', 'Quick PDF Scanner + OCR FREE', 'Genius Scan - PDF Scanner', 'Tiny Scanner - PDF Scanner App', 'Fast Scanner : Free PDF Scan', 'Mobile Doc Scanner (MDScan) Lite', 'TurboScan: scan documents and receipts in PDF', 'Tiny Scanner Pro: PDF Doc Scan', 'Docs To Go™ Free Office Suite', 'OfficeSuite : Free Office + PDF Edi

Since there seems to be an unreasonable amount of repeated rows, I decide to check again.


In [6]:
for row in Goo_data:
    if row[0] == 'Quick PDF Scanner + OCR FREE':
        print(row)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


So it is true that there's a lot of repeated rows.

Now let's remove them from our data.

Instead of removing them randomly, I choose to keep the latest data, which should have the highest number of ratings, and here's how:

In [7]:
reviews_max = {}
for row in Goo_data:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


First, we create an dictionary which contain the most reviews of each apps.
Then we tried to identify those rows in our data and append it into a new list of list.

In [8]:
android_clean = []
already_added = []
for row in Goo_data:
    n_review = float(row[3])
    name = row[0]
    if n_review == reviews_max[name] and name not in already_added:
        android_clean.append(row)
        already_added.append(name)

print(already_added[0:3])
print(android_clean[0:3])
len(android_clean)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint']
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']]


9659

Here the list **android_clean** is the cleaned data we are going to analyze.
Now we're going to do the same analysis on iOS apps.

In [9]:
repeated_row = []
unique_row = []

for row in iOS_data:
    if row in unique_row:
        repeated_row.append(row)
    else:
        unique_row.append(row)
        

print(len(repeated_row))
print(repeated_row)

0
[]


The result shows that there's no repeated row in iOS data.

Let's keep going.

Our next step is to remove non_English apps for the purpose of our goal of analysis.


In [10]:
def Eng(string):
    
    for text in string:
        if ord(text) > 127:
            return False
    return True


Here I create a function in order to identify non-English character. For your information, the text used in English range from 0 to 127. If the text is having an index number greater than 127, it's not commonly used in English.

Let's test whether this function works with several example string.

In [11]:
print(Eng('Instagram'))
print(Eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(Eng('Docs To Go™ Free Office Suite'))
print(Eng('Instachat 😜'))

True
False
False
False


Now we see that in the last three line of this function, it failed to preserve those functions with emoji or the mark `™`.

In [12]:
print(ord('😜'))
print(ord('™'))

128540
8482


The index number of these text are way bigger than those we used to identify English texts, but still they are English apps.
Here, we should identify the function in order to preserve those apps in our data.

In [13]:
def Eng(string):
    
    count = 0
    for text in string:
        if ord(text) > 127:
            count += 1
    if count > 3:
        return False
    return True

This new function can tolerates the text with no more than three non-common English text. Although there might be English app names with more than three non-common English text, but this function shoud be good enough. Now, let's test it with the examples we had.

In [14]:
print(Eng('Instagram'))
print(Eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(Eng('Docs To Go™ Free Office Suite'))
print(Eng('Instachat 😜'))

True
False
True
True


The result is better than the original function. Now, let's use it in our data to filter out non-English apps.

In [15]:
android_English = []

for row in android_clean:
    name = row[0]
    if Eng(name):
        android_English.append(row)

print(len(android_English))

9614


So we have 9614 English apps in the list.
Let's do the same thing for iOS apps.

In [16]:
iOS_English = []

for row in iOS_data:
    name = row[2]
    if Eng(name):
        iOS_English.append(row)

print(len(iOS_English))

6183


So there's 6183 English apps in iOS apps.

Lastly, since we planned to analyze free, English apps, we need to isolate the free apps in our data.

In [17]:
android = []

for row in android_English:
    type = row[6]
    if type == 'Free':
        android.append(row)
print(len(android))

iOS = []

for row in iOS_English:
    price = float(row[5])
    if price == 0.0:
        iOS.append(row)
print(len(iOS))

8863
3222


## 2. Data Analysis

### Common Genres

Finally, after cleaning and filtering our data,we now have the data we want. Now we can start our analysis.

Before we start the analysis, let's talk about the validation startegy of the app and the goal of our analysis.

To minimize the risks and overhead, the whole process of app development is comprised into three steps:
 1. Build a minimal Android version of the app, and add it to Google Play.
 2. If the app has a good response from users, we develop it further.
 3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
 
Since our goal is to develope apps for both App Store and GooglePlay, we need to find apps that are successful on both platform.

For this purpose, we'll need to build frequency tables.



In [18]:
def freq_table(data, index):
    frequency_table = {}
    all = 0
    for row in data:
        all += 1
        target = row[index]
        if target in frequency_table:
            frequency_table[target] += 1
        else:
            frequency_table[target] = 1
    table_percentage = {}
    
    for key in frequency_table:
        percentage = (frequency_table[key] / all) * 100
        table_percentage[key] = percentage
    return table_percentage


Let's test it in iOS on the column `prime genre` whose index number is 12.

In [19]:
iOS_freq = freq_table(iOS,12)
print(iOS_freq)

{'Productivity': 1.7380509000620732, 'Weather': 0.8690254500310366, 'Shopping': 2.60707635009311, 'Reference': 0.5586592178770949, 'Finance': 1.1173184357541899, 'Music': 2.0484171322160147, 'Utilities': 2.5139664804469275, 'Travel': 1.2414649286157666, 'Social Networking': 3.2898820608317814, 'Sports': 2.1415270018621975, 'Health & Fitness': 2.0173805090006205, 'Games': 58.16263190564867, 'Food & Drink': 0.8069522036002483, 'News': 1.3345747982619491, 'Book': 0.4345127250155183, 'Photo & Video': 4.9658597144630665, 'Entertainment': 7.883302296710118, 'Business': 0.5276225946617008, 'Lifestyle': 1.5828677839851024, 'Education': 3.662321539416512, 'Navigation': 0.186219739292365, 'Medical': 0.186219739292365, 'Catalogs': 0.12414649286157665}


Now, we integrate this function with another funtion `display_table()`, which would produce a frequency table in a descending order.

In [20]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key) ##The dictionary value goes first
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print('iOS Prime Genre frequeny table')
print('\n')
iOS_prime_genre = display_table(iOS, 12)

iOS Prime Genre frequeny table


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Now we see that the most common genre in AppStore is `Games`, following by `Entertainment` and then `Photo & Video`. We can see from the data that a big portion of the apps in AppStore is developed for the purpose of entertainment, around 66%. In other words, most of the apps are developed for entertainment (such as games, photo and video, social networking, sports, music) instead of practical purpose (such as education, shopping, utilities, productivity, lifestyle) in AppStore.

Althought the pattern we see in this frequency in this table, we can't conclude that these apps have a large number of users based on this data alone. This data only show the frequency of free-English apps, instead of the total number of installs.

In [21]:
print('android Category frequency table')
print('\n')
android_Category = display_table(android, 1)

android Category frequency table


FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.

In [22]:
print('android Genres frequency table')
print('\n')
android_Genres = display_table(android, 9)
print('android_Genres')

android Genres frequency table


Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto 

In genre frequency table of GooglePlay, the most common genre is `Tools` (8.4%), followed by `Entertainment`(6%) and `Education`(5.3%).

Compared to the pattern we see in AppStore, there is a bigger portion of practical purposes. Although `Entertainment` still made it to the top three, but the rest of the genre in top 5 are all developed for practical purposes such as Tools, Education, Business, and Poductivity.

Same as the conclusion we had in AppStore data, we still need other data such as reviews or number of installs in order to determine whether these apps are successful or not.

### Total Rating Count
As mentioned above, we'll look into total rating count in order to find out the most popular genre.

Let's dig in!


In [23]:
prime_freq = freq_table(iOS,-5)

for genre in prime_freq:
    total_rating = 0
    len_genre = 0
    for row in iOS:
        genre_app = row[-5]
        if genre_app == genre:
            ratings = float(row[6])
            total_rating = total_rating + ratings 
            len_genre += 1
    average_count = total_rating / len_genre
    print(genre)
    print(average_count)


Productivity
21028.410714285714
Weather
52279.892857142855
Shopping
26919.690476190477
Reference
74942.11111111111
Finance
31467.944444444445
Music
57326.530303030304
Utilities
18684.456790123455
Travel
28243.8
Social Networking
71548.34905660378
Sports
23008.898550724636
Health & Fitness
23298.015384615384
Games
22788.6696905016
Food & Drink
33333.92307692308
News
21248.023255813954
Book
39758.5
Photo & Video
28441.54375
Entertainment
14029.830708661417
Business
7491.117647058823
Lifestyle
16485.764705882353
Education
7003.983050847458
Navigation
86090.33333333333
Medical
612.0
Catalogs
4004.0


The most populor app genre in AppStore is `Navigation`, let's see what apps are in this genre.

In [24]:
for row in iOS:
    app = row[2]
    genre = row[-5]
    if genre == 'Navigation':
        print(app)

Waze - GPS Navigation, Maps & Real-time Traffic
Geocaching®
ImmobilienScout24: Real Estate Search in Germany
Railway Route Search
CoPilot GPS – Car Navigation & Offline Maps
Google Maps - Navigation & Transit


The apps in this genre are mostly related to Google Maps with massive data points, which I think would be laborious and it would require years of development. So, let's see the 1st runner up: `Reference.` 

In [25]:
for row in iOS:
    app = row[2]
    genre = row[-5]
    if genre == 'Reference':
        print(app)

Bible
Dictionary.com Dictionary & Thesaurus
Dictionary.com Dictionary & Thesaurus for iPad
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran
Merriam-Webster Dictionary
Google Translate
Night Sky
WWDC
Jishokun-Japanese English Dictionary & Translator
教えて!goo
VPN Express
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools
Guides for Pokémon GO - Pokemon GO News and Cheats
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE)
GUNS MODS for Minecraft PC Edition - Mods Tools
Real Bike Traffic Rider Virtual Reality Glasses


These apps are relatively easy to develop and maintain compare to Navigation apps. If we narrow down and analyze the needs of our target customers, it's easy for us to develope a free-English apps for them. So, I recommend our company to develop an app whose genre is `Reference` for AppStore.

Now, let's look at the apps in GooglePlay.

Here we uitilize the column labeled 'install numbers'. However, the number provided is not precise (100+, 1000+, etc.). We are going to approximate the install number by leaving the numbers as they are, since we don't need very percise number.


In [26]:
Category_freq = freq_table(android,1)
print(Category_freq)

{'ART_AND_DESIGN': 0.6431230960171499, 'AUTO_AND_VEHICLES': 0.9251946293580051, 'BEAUTY': 0.5979916506826132, 'BOOKS_AND_REFERENCE': 2.1437436533904997, 'BUSINESS': 4.592124562789123, 'COMICS': 0.6205573733498815, 'COMMUNICATION': 3.2381812027530184, 'DATING': 1.8616721200496444, 'EDUCATION': 1.1621347173643235, 'ENTERTAINMENT': 0.9590432133589079, 'EVENTS': 0.7108202640189552, 'FINANCE': 3.7007785174320205, 'FOOD_AND_DRINK': 1.241114746699763, 'HEALTH_AND_FITNESS': 3.0802211440821394, 'HOUSE_AND_HOME': 0.8236488773552973, 'LIBRARIES_AND_DEMO': 0.9364774906916393, 'LIFESTYLE': 3.9038700214374367, 'GAME': 9.725826469592688, 'FAMILY': 18.898792733837304, 'MEDICAL': 3.5315355974275078, 'SOCIAL': 2.6627552747376737, 'SHOPPING': 2.245289405393208, 'PHOTOGRAPHY': 2.944826808078529, 'SPORTS': 3.396141261423897, 'TRAVEL_AND_LOCAL': 2.335552296062281, 'TOOLS': 8.462146000225657, 'PERSONALIZATION': 3.317161232088458, 'PRODUCTIVITY': 3.8925871601038025, 'PARENTING': 0.6544059573507841, 'WEATHER':

In [27]:
for category in Category_freq:
    total = 0
    len_category = 0
    for row in android:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            installs = installs.replace('+','')
            installs = installs.replace(',','')
            installs = float(installs)
            total = installs + total
            len_category += 1
    avg_installs = total / len_category
    print(category)
    print(total)

ART_AND_DESIGN
113221100.0
AUTO_AND_VEHICLES
53080061.0
BEAUTY
27197050.0
BOOKS_AND_REFERENCE
1665884260.0
BUSINESS
696902090.0
COMICS
44971150.0
COMMUNICATION
11036906201.0
DATING
140914757.0
EDUCATION
188850000.0
ENTERTAINMENT
989460000.0
EVENTS
15973160.0
FINANCE
455163132.0
FOOD_AND_DRINK
211738751.0
HEALTH_AND_FITNESS
1143548402.0
HOUSE_AND_HOME
97202461.0
LIBRARIES_AND_DEMO
52995810.0
LIFESTYLE
497484429.0
GAME
13436869450.0
FAMILY
6193895690.0
MEDICAL
37732344.0
SOCIAL
5487861902.0
SHOPPING
1400338585.0
PHOTOGRAPHY
4656268815.0
SPORTS
1095230683.0
TRAVEL_AND_LOCAL
2894704086.0
TOOLS
8101043474.0
PERSONALIZATION
1529235888.0
PRODUCTIVITY
5791629314.0
PARENTING
31471010.0
WEATHER
360288520.0
VIDEO_PLAYERS
3931731720.0
NEWS_AND_MAGAZINES
2368196260.0
MAPS_AND_NAVIGATION
503060780.0


The category with the most reviews is `Communication`, but a big portion of it was contributed by certain apps.

In [28]:
for app in android:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+'):
        print(app[0])

WhatsApp Messenger
Google Duo - High Quality Video Calls
Messenger – Text and Video Chat for Free
imo free video calls and chat
Skype - free IM & video calls
LINE: Free Calls & Messages
Google Chrome: Fast & Secure
UC Browser - Fast Download Private & Secure
Gmail
Hangouts
Viber Messenger


Those apps were well-developed and dominate the market of communication apps for years. Since our goal is to develope an app which produce revenue from in app ads, our approach is also different from their business model. Let's look at the second highest downloaded category: `BOOKS_AND_REFERENCE`

In [29]:
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE' :
        print(app[0])

E-Book Read - Read Book for free
Download free book with green book
Wikipedia
Cool Reader
Free Panda Radio Music
Book store
FBReader: Favorite Book Reader
English Grammar Complete Handbook
Free Books - Spirit Fanfiction and Stories
Google Play Books
AlReader -any text book reader
Offline English Dictionary
Offline: English to Tagalog Dictionary
FamilySearch Tree
Cloud of Books
Recipes of Prophetic Medicine for free
ReadEra – free ebook reader
Anonymous caller detection
Ebook Reader
Litnet - E-books
Read books online
English to Urdu Dictionary
eBoox: book reader fb2 epub zip
English Persian Dictionary
Flybook
All Maths Formulas
Ancestry
HTC Help
English translation from Bengali
Pdf Book Download - Read Pdf Book
Free Book Reader
eBoox new: Reader for fb2 epub zip books
Only 30 days in English, the guideline is guaranteed
Moon+ Reader
SH-02J Owner's Manual (Android 8.0)
English-Myanmar Dictionary
Golden Dictionary (EN-AR)
All Language Translator Free
Azpen eReader
URBANO V 02 instruction 

Let's see the most popular apps in this category:

In [30]:
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE' :
        print(app[0],':',app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The apps in this category includes a variety of apps, such  as dictionary, libraries, guides for programming languages, books, magazines, or other kinds of guides.

Although this category is already crammed with libraries, the development of dictionary apps is easier compared to other apps. The data also shows that they are the most downloaded apps in GooglePlay.

## Conclusion

In this analysis, we analyzed the data about apps in AppStore and GooglePlay with the goal of recommending a genre to the app development team.

In the analysis mentioned above, we identify the genre ,`Books and Reference`, in both AppStore and GooglePlay to be the second most downloaded genre, which is an viable genre for our business plan which is producing revenue through in apps ads and build an iOS version if the app is profitable. 

Based on the analysis, I proposed that we developed an app  which adopt the manual for a certain popular game. However, the apps in this category is also quite competitive, we would need other features besides the manual. We can include forum where the players can discuus, exchange informations or find friends to team up (if possible).

