#    Profitable App Profiles for the App Store and Google Play Markets

----------------------------------------------------------------------------------------------------------------------------

Our client company builds Android and iOS mobile apps. They make apps available on Google Play and the App Store. They only build apps that are free to download and install, and their main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use their app — the more users that see and engage with the ads, the better. 

My goal for this project is to analyze data to help the app developers understand what type of apps are likely to attract more users and to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

In [133]:
### The Google Play data set ###
from csv import reader
opened_file = open ("googleplaystore.csv")
read_file = reader(opened_file)
android = list(read_file)#  we are storing our dataset "googleplaystore.csv" as list of lists.
android_header = android[0] # "android_header" variable has the column names of "googleplaystore" dataset 
android = android[1:] #"android" has all the rows except cloumn names

In [134]:
### The App Store data set ###
from csv import reader
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)             
ios = list(read_file) # "AppleStore" dataset is stored as list of lists in "ios"
ios_header = ios[0]  # "ios_header" variable has the column names of "Applestore" dataset 
ios = ios[1:] # "ios" has all the rows except cloumn names

**explore_data()** function is for printing the rows of any particular dataset.It takes four parameters (name of the dataset , starting num of the row to be printed, till what row it has to print,an a boolean value).

In [135]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
              
print(ios_header) # prints the column names of the "AppleStore" dataset
print('\n')
explore_data(ios, 0, 3, True) # when this function is called, it prints first 3 rows from the dataset "AppleStore"        
      
print(android_header) # prints the column names of the "googleplaystore" dataset
print('\n')
explore_data(android, 0, 3, True)# when this function is called, it prints first 3 rows from the dataset "googleplaystore" 

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'J

There are **10841** apps(rows) and **13** columns for **"GooglePlayStore"** data set. The columns that might be useful for our analysis are:
 * App
 * Category
 * Reviews
 * Installs
 * Type
 * Price
 * content Rating
 * Genres
Details about each column can be found in the **"GooglePlaystore"** data set [documentation](https://www.kaggle.com/lava18/google-play-store-apps) 
 
There are **7197** apps(rows) and **16** columns for **"AppleStore"** data set. The columns that might be useful for our analysis are:
 * track_name
 * price
 * size_bytes
 * rating_count_tot
 * Type
 * rating_count_ver
 * cont_rating
 * prime_genre
 
More details about the **"AppleStore"** data set can be found in [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps#AppleStore.csv)

## Data cleaning  

### A. Deleting Wrong data

1. The **"GooglePlayStore"** data set has a dedicated *[discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion)* and one of the discussions describes an error for "10472" row.It has a missing value for the column and missplaced values for other coulmns.So we are removing that ** row(10472)**,(10472 is the row number without considering header).


In [136]:
print(android[10472]) # we are printing the row "10472" to check if it has 
                      #  miising values or wrong data.

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [137]:
print(android[2]) # printing row 2 to compare with row no.10472.


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


In [138]:
# deleting row "10472" from "GooglePlayStore" dataset since it has missing or missplaced values

print(len(android))

del android[10472] # deleting the values in the row 10472 from "GoogleplayStore"

print(len(android)) # to check how many rows are there after del statement

10841
10840


### B. Removing Duplicate Apps 
       a. Checking for Duplicate entries or App's
       b. Removing Duplicate entries
       
After going through the **"GooglPlayStore"** dataset we found that are there some Apps with duplicate entries. 
For our data analysis we don't need duplicate entries. So we need to remove the duplicate entries for each app.


In [139]:
# a. checking for the duplicate entries in "GooglePlayStore"
# and printing the number of duplicate apps.

duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


* From the above code , we came to know that there are **"1181"** number of *Duplicate* app entries in total in the dataset **"GooglePlayStore"**.
* Since the existence of Duplicate entries is confirmed from the above code, now we have to remove those duplicate entries.
* To do that first we tried to consider an App called "Slack" with duplicate entries and printed all its entries.
 * We see that the main difference is in the fourth position(Reviews).This showes that data was taken at different times.
 * The entry with more **Reviews** is considered the  latest.
 * It is always better to have more reviews for any App. **The Rating with more Reviews is considered  better rating(people condider such rating)**.
 * So we will keep the entry that have more Reviews and delete the rest of duplicate entries for each App(Duplicate App entries)


##### Didn't see any duplicate entries (apps) in AppStore dataset.

In [140]:
for app in android:
    name = app[0]
    if name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


In [141]:
print(len(android)) # checking the num.of rows 

10840


In [142]:
# b. deleting the duplicate entries(Duplicate Apps)

#step:1

# dictionary to store unique app names(as key)
#and its coresponding dictionary value "Reviews" (highest number of reviews) 
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3]) # app[3] is the Reviews column
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

        
print(len(android))  # gives total no.of rows  (10840)  
print(len(reviews_max)) 
#gives no.of rows(apps) without duplicate entries (9659)
 
print("Expected length:" , len(android) - 1181) # 1181 is the num od duplicate entries    

10840
9659
Expected length: 9659


In [143]:
 # step 2

android_clean = [] # list to store our cleaned data set
already_added = [] # list to store app names.
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        
        
    

In [144]:
#step 3

explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## To remove non- english apps

If you explore both the data sets, you'll notice some of the apps are not directed toward an English-speaking audience. Below are examples from both datasets.

In [145]:
# examples of non- english apps from both datasets.
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested in keeping these apps, so we'll remove them. One way to go about this is to remove each app with a name containing a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

Behind the scenes, each character we use in a string has a corresponding number associated with it. For instance, the corresponding number for character 'a' is 97, character 'A' is 65, and character '爱' is 29,233. We can get the corresponding number of each character using the ord() built-in function.

In [146]:
def english_app(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True 

# using our function to check whether the below app names are detected as English or non-English

print(english_app('Instagram')) #english app
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播')) #non-english app
print(english_app('Docs To Go™ Free Office Suite')) #english app
print(english_app('Instachat 😜')) #english app

# checking the ASCII values 
print(ord('™'))
print(ord('😜'))

True
False
False
False
8482
128540


* Inside the function, iterate over the input string. For each iteration check whether the number associated with the character is greater than 127. When a character is greater than 127, the function should immediately return False — the app name is probably non-English since it contains a character that doesn't belong to the set of common English characters.
* If the loop finishes running without the return statement being executed, then it means no character had a corresponding number over 127 — the app name is probably English, so the functions should return True.
* we used the built-in ord() function to find out the corresponding encoding number of each character.

* when we used our function "english_app" to check whether certain apps are detected English or non-English, two of the Apps **'Docs To Go™ Free Office Suite'** and **'Instachat 😜'** though they are English apps, our function detected them as non-English apps because it has some special characters like(™) and emoji  😜, whose ASCII values are not in the range of (0 -127) normal english charcters , numbers or special symbols range. They are outside the ASCII range.

* so we are changing the function to solve the above problem.
    * To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

In [147]:
def english_app(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(english_app('Docs To Go™ Free Office Suite'))
print(english_app('Instachat 😜'))
print(english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


In [148]:
# using our changed function(english_app) to filter out non-English apps from 
# both data sets. Loop through each data set. 
#If an app name is identified as English, append whole row to a separate list.

android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if english_app(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if english_app(name):
        ios_english.append(app)
        
#Exploring the data sets and seeing how many rows are remaining for each data set.        
explore_data(android_english, 0, 3, True) 
print('\n')
explore_data(ios_english, 0, 3, True)

#We can see from the output that we're left with 9614 Android apps and 6183 iOS apps.

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

In [149]:
#Isolating the free apps

android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))


8864
3222


so finally after data cleaning process we are left with **8864** (android)Apps In **GooglePlayStore** data set and **3222** (ios)Apps in AppleStore dataset.


##                   Determining the user attracting Apps

* our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

* To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

  * Build a minimal Android version of the app, and add it to Google Play.
  * If the app has a good response from users, we develop it further.
  * If the app is profitable after six months, we build an iOS version of the       app and add it to the App Store.
* Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. 

In [150]:
# creating a function "freq_table()" for generating frequency tables
# that shows percentages

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

# Function "display_table()" is to display the percentages by descending order.
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
    

To display the frequency table of the column **prime_genre** of the **"AppStore"** dataset. We'll analyze the resulting tables.


In [151]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


* After analyzing the frequency table of the **prime_genre** coulmn of **"AppStore"** (resulting tables) ,
  * We can see that among the free English apps, more than a half (58.16%) are games. Entertainment
   apps are close to 8%, followed by photo & video apps, which are close to 5%.
  * Only 3.66% of the apps are designed for education, followed by social networking apps which amount
   for 3.29% of the apps in our data set.
* So, it gives us an impression that App Store with free English apps is dominated by apps that are
  designed for fun(games, entertainment, photo and video, social networking, sports, music, etc.).
* while apps with general purposes (education, shopping, utilities, productivity, lifestyle, etc.) are   rare.


The **"Genres"** and **"Category"** columns of the "GooglePlayStore" data set seems to be related.So we need to Analyse both of their frequency tables to come to the conclusion on most common Apps by Genre.

In [152]:
display_table(android_final, -4) # Analyzing the frequency table of "Genres" column of the "GooglePlayStore" data set

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

After analyzing the frequenct table of the **Genres** column of **GooglePlayStore** , we see that the General Apps(tools , Education ,Business ,Lifestyle) and Entertainment apps have almost equal distribution.
It has the balanced distribution of fun and other general Apps when compared to AppStore Apps.



In [153]:
display_table(android_final, 1) # Analyzing the frequency table of "Category" column of the "GooglePlayStore" data set.

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

We can see that the family category accounts for almost 19% of the apps.
It seems that a good number of apps are designed for general purposes (family, tools, business, lifestyle, productivity, etc). 

There is also representation of fun/entertainment apps as well.

There is not very significant amount of difference. It shows more balanced distribution of both General and Fun apps un like AppStore.

General apps seem to have a better representation on GooglePlayStore compared to AppStore. This picture is also confirmed by the frequency table we see for the "Genres" column.

We found that the **App Store** is dominated by apps designed for **fun**, while **GooglePlayStore** shows a more balance of both **General** and **fun** related apps. 

## Most popular Apps by genre

To get an idea about the kind of apps that have most users, we need find out what genres have most users.

For that purpose we have to calculate the average number of install for each App Genre. 
   * For **GooglePlayStore** data set this information can be obtained from 
     **installs** column. 
   * For **AppStore** data set this information is missing.As a workaround,
     we'll take the total number of user ratings as a proxy, which we can find 
     in the **rating_count_tot** .



## Most popular Apps(by genre) on AppStore

In [154]:
#calculate the average number of user ratings per app genre on the App Store

genres_ios = freq_table(ios_final, -5)
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        #Isolating the apps of each genre
        if genre_app == genre:            
            n_ratings = float(app[5])
            #Summing up the user ratings for the apps of that particular genre
            total += n_ratings
            len_genre += 1
    # Dividing the sum by the number of apps belonging to that genre        
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Travel : 28243.8
Weather : 52279.892857142855
Catalogs : 4004.0
Photo & Video : 28441.54375
Social Networking : 71548.34905660378
Productivity : 21028.410714285714
News : 21248.023255813954
Reference : 74942.11111111111
Utilities : 18684.456790123455
Games : 22788.6696905016
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Music : 57326.530303030304
Navigation : 86090.33333333333
Entertainment : 14029.830708661417
Medical : 612.0
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Health & Fitness : 23298.015384615384
Finance : 31467.944444444445
Book : 39758.5
Shopping : 26919.690476190477


**"Navigation"** apps have the highest number of user reviews(86090) on an average followed by **"Reference"**(74942) and **"Social Networking"**(71548).
 
* The highest number of user reviews for "Navigation" gerne is more influenced by **"Waze"** and **"Google Maps"** as they together have a huge volume of user reviews when compared to others Apps under "Navigation . We can see that from the output of the below code.


In [155]:
for app in ios_final:
    if app[-5] == 'Navigation': # if "prime_genre"(app[-5]) is equal to "Navigation"
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [156]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The average number of the genre "Social Networking" is highly influenced by a few Apps like Facebook, Pinterest, Skype, etc

In [157]:
#checking /analyzing the user reviews of genre "Reference"
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


* Reference apps have 74,942 user ratings on average, but it's actually influenced by the Bible and Dictionary.com. These both have a huge volume of reviews when compared to others.

* Navigation, Social networking , Reference apps might seem more popular than they really are. The average number of ratings seemed to be affected by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold.



## Most Popular Apps by Genre on Google Play

* We have data about the number of **installs** for the GooglePlayStore, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.)

* For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users.

* So, we are considering that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

* We are converting each install number from string to float. This means we need to remove the commas and the plus characters

* To remove characters from strings, we are using **str.replace(old, new)** method

In [158]:
#computing the average number of installs for each genre (category).

categories_android = freq_table(android_final, 1)#generating a frequency table for the Category column of the Google Play data set

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

GAME : 15588015.603248259
COMMUNICATION : 38456119.167247385
TRAVEL_AND_LOCAL : 13984077.710144928
PARENTING : 542603.6206896552
AUTO_AND_VEHICLES : 647317.8170731707
EVENTS : 253542.22222222222
HEALTH_AND_FITNESS : 4188821.9853479853
VIDEO_PLAYERS : 24727872.452830188
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
NEWS_AND_MAGAZINES : 9549178.467741935
BEAUTY : 513151.88679245283
COMICS : 817657.2727272727
BOOKS_AND_REFERENCE : 8767811.894736841
PERSONALIZATION : 5201482.6122448975
PHOTOGRAPHY : 17840110.40229885
FAMILY : 3695641.8198090694
MAPS_AND_NAVIGATION : 4056941.7741935486
PRODUCTIVITY : 16787331.344927534
SHOPPING : 7036877.311557789
ENTERTAINMENT : 11640705.88235294
DATING : 854028.8303030303
FINANCE : 1387692.475609756
EDUCATION : 1833495.145631068
SOCIAL : 23253652.127118643
BUSINESS : 1712290.1474201474
LIBRARIES_AND_DEMO : 638503.734939759
MEDICAL : 120550.61980830671
HOUSE_AND_HOME : 1331540.5616438356
WEATHER : 5074486.197183099
TOOLS : 1080139

* On average, **communication** apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs.
  * If we removed all the communication apps that have over  
   100 million installs, the average would be reduced roughly 
   ten times.

* We see the same pattern for the **video players** category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors).

* The main concern is that these app genres might seem more popular than they really are and seemed to be dominated by a few giants who are hard to compete against.

* The **"Game"** genre seems pretty popular(seeing num of installs)   so We could recomand that as of now . 


In the **AppStore** and **GooglePlatStore** **"Games"** genre seems to be popular with more number of user reviews and More number of installs respectively.

After eliminating the Genre's(whose average is influenced 
by few gaints), came to an concluson that **Game** Genre would be profitable for our company.


## Conclusions

In this project, we analyzed data about the AppStore and GooglePlayStore mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We conclude that **Gaming Apps** would be profitable for both "AppStore" and "GooglePlayStore". Though they are lot gaming apps in both markets, there is always a more scope for better gaming Apps. Both Kids and Adults like gaming Apps.
Developing different Gaming Apps that are targeted for every age group and gender might be more profitable. Also we can focus on some Game-Apps that can be used both for fun and learning(especially kids will love it).