## Profitable App Profiles for the App Store and Google Play Markets

For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better.

### Aim
Our goal for this project is to analyze data from [Google Play](https://www.kaggle.com/lava18/google-play-store-apps) and [App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) datasets both in `csv` formats to understand what type of apps are likely to attract more users.

### Open the datasets

Open the datasets for both GooglePlayStore and AppleStore respectively using `reader`library.

In [31]:
playstore_file = open("googleplaystore.csv")
applestore_file = open("AppleStore.csv")
from csv import reader
playstore = reader(playstore_file)
applestore = reader(applestore_file)
play_data = list(playstore)
apps_data = list(applestore)
print("Headers from playstore data: \n ",play_data[0])
print("\n first 5 rows from playstore:\n",play_data[1:6])
print("\n \n")
print("Headers from AppStore data: \n ",apps_data[0])
print("\n first 5 rows from appstore:\n",apps_data[1:6])



Headers from playstore data: 
  ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']

 first 5 rows from playstore:
 [['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_D

### Explore the dataset

Now,lets explore the datasets. `explore_data()` helps in exploring the datasets. 
`explore_data()` takes four params :

* `dataset` - which is expected to be a list of lists.
    
* `start` and `end` - both expected to be integers and represent the starting and the ending indices of a slice from the data set.
    
* `rows_and_columns` - which is expected to be a Boolean and has `False` as a default argument



In [32]:

def explore_data(dataset, start, end, rows_and_columns=False):  
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        


### Calling the function `explore_data()`
Call `explore_data()` function to see the results for `AppleStore` data
Call `explore_data()` function to see the results for `GooglePlay` data

In [33]:
explore_data(play_data[1:],1,6)
explore_data(play_data[1:],1,4,True)
explore_data(apps_data[1:],1,4)
explore_data(apps_data[1:],1,4,True)


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Ar

### Find the row with missing data
Before proceeding any further, we need to clean our dataset to:
* detect and remove inaccurate data
* detect and remove duplicate data

According to the [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), we see that an error is reported and it is reported at row index 10472 (ie, 10473 including the headers). Print row 10473 to view the data. 

In [34]:
print("Google play data headers: \n",play_data[0:1])
print("Data at index 10473 \n",play_data[10473])

Google play data headers: 
 [['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']]
Data at index 10473 
 ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


From the above results, we see that the entry for column `Category` is missing from the result-set.

### Remove the row with missing data
At row index 10473, the `Category` column is missing and hence, we delete the row using `del` statement

In [35]:
# print the length of dataset before deletion
print("length of google store dataset before deleting row 10473:",len(play_data)) 

# Delete the row with inaccurate data 
del play_data[10473]

# print the length of dataset after deletion
print("\nlength of google store dataset after deleting row 10473: ",len(play_data)) 

length of google store dataset before deleting row 10473: 10842

length of google store dataset after deleting row 10473:  10841


### Check for duplicates and removing them based on criterion

If we explore the [discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion), we notice that some apps have duplicate entries. For example from the code below, Instagram has 4 entries.


In [36]:
for app in play_data[1:]:
    app_name = app[0]
    if app_name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Now, lets find the total number of unique entries and duplicate entries in Google Play dataset. 

* Create two lists one for storing duplicates and other for all the unique entries.
* Loop through the `play_data` dataset and check and append the app name to respective list

In [37]:
max_reviews =0
duplicates =[]
distinct=[]

for app in play_data:
    app_name = app[0]
    if app_name in distinct:
        duplicates.append(app_name)
    else:
        distinct.append(app_name)
print("total unique apps: ",len(distinct))
print("\ntotal duplicate apps: ",len(duplicates))

total unique apps:  9660

total duplicate apps:  1181


### Removing duplicate entries based in criterion
Now, we have the duplicate rows in list named `duplicates` and all unique rows in `distinct` list. 

The duplicate apps cannot be removed randomly, instead if you examine the rows we printed for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews.

We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we will only keep the row with the highest number of reviews and remove the other entries for any given app.

#### To remove duplicates:
* Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
* Use the information stored in the dictionary and create a new data set, which will have only one entry per app 

In [38]:
reviews_max = {}

for app in play_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

We found that there are 1,181 cases where an app occurs more than once, so the length of our dictionary (of unique apps) should be equal to the difference between the length of our data set and 1,181.
We can see that the length of `reviews_max` and `play_data` minus the total number duplicates in the dataset are the same. 

In [39]:
print("length of dataset with  highest number of reviews for each app:",len(reviews_max))

length of dataset with  highest number of reviews for each app: 9659


In [40]:
print("length of dataset after removing the duplicates: ",len(play_data[1:]) - 1181)

length of dataset after removing the duplicates:  9659


From the above, both the values are equal, which shows our dataset is accurate ( length of dataset with the entries with the highest number of reviews for each app(ie,duplicates removed) and the length of our entire dataset minus the number of duplicates match ie, 9659)

### Cleaning the dataset further

For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:

We start by initializing two empty lists, `android_clean` and `already_added`.
We loop through the Goole Play dataset and also using the `reviews_max` dictionary, we append the app name to the list `android_clean` if the app is not already in `already_added` AND maximum reviews for the app is equal to the maximum reviews in `reviews_max` else append app name to `already_added`. (see code cell 51)
Now, lets code it as follows:

In [41]:
android_clean = []
already_added = []
for app in play_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if (reviews_max[name] == n_reviews) and (name not in already_added):
            android_clean.append(app)
            already_added.append(name) 
print("length of the clean dataset:",len(android_clean))

length of the clean dataset: 9659


Let's also confirm that the number of rows is 9,659 using the `explore_data()`.


In [42]:
explore_data(android_clean, 0, 3, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


The dataset has 9659 rows which is the expected result and this ensures that everything went as expected.

### Removing non-English apps

We use English for the apps we develop at our company, and we'd like to analyze only the apps that are directed toward an English-speaking audience.We're not interested in keeping these apps with names that suggest they are not directed toward an English-speaking audience.

One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

We built this function below, and we use the built-in `ord()` function to find out the corresponding encoding number of each character.

In [43]:
print('ASCII for \'a\':',ord('a'))
print('ASCII for \'%\':',ord('%'))
print('ASCII for 4:',ord('4'))

ASCII for 'a': 97
ASCII for '%': 37
ASCII for 4: 52


### Find the non-English characters 
From [ASCII chart](http://www.asciitable.com/),if ASCII value is less than or equal to 127 then it belongs to common English characters.So,if an app name contains a character that is greater than 127, then it probably means that the app has a non-English name. 

We define a function `isEnglish()` to determine if the app name is non-English, which returns `True` or `False` according to the parameter passed.

In [44]:
def isEnglish_app(string):
    for ch in string:
        if ord(ch) > 127 :
            return False
        else:
            return True
        
isEnglish_app('电视剧热播')



False

In [45]:
isEnglish_app('Instachat 😜')

True

In [46]:
isEnglish_app('Docs To Go™ Free Office Suite')

True

### Minimize the impact of data loss

Though `isEnglish()` determines the non-English app names, it is not  efficient enough to identify app names like `'Docs To Go™ Free Office Suite'` or app names with emojis in it because emojis and speacial characters like `™` fall outside the ASCII range.


So to minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. 

In [47]:
def isEnglish_app(string):
    char_count =0
    for ch in string:
        if ord(ch) > 127 :
            char_count += 1
    if char_count > 3:
        return False
    else:
        return True

print(isEnglish_app('电电电电电'))   
print(isEnglish_app('Instachat 😜'))   

False
True


### Filter out non-English apps from both data sets

Using the new function, let's filter out non-English apps from both Google Play dataset and AppStore dataset.

In [48]:
ios_english =[]
gplay_english =[]
for i in android_clean:
    name = i[0]
    if isEnglish_app(name):
        gplay_english.append(i)
        
for i in apps_data[1:]:
    name = i[1]
    if isEnglish_app(name):
        ios_english.append(i)

print("English apps in play store",len(gplay_english))
print("\nEnglish apps in apple store",len(ios_english))
print("\n")
explore_data(gplay_english,1,3, True)
explore_data(ios_english,1,3,True)

English apps in play store 9614

English apps in apple store 6183


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


The number of English apps in Google PlayStore is 9614 and in AppStore is 6183.

### Isolate free apps from the datasets


According to the requirement, we only build apps that are free to download and install. We need to isolate only the free apps for our analysis.
In the case of Google PlayStore apps, we check if the type of app is `Free` to determine if the app is paid/free. And, in AppStore dataset we check the `Price` column. 

In [49]:
gplay_free =[]
ios_free =[]

for app in gplay_english[1:]:
    type = app[6]
    if type == "Free":
        gplay_free.append(app)
        
for app in ios_english[1:]:
    price = float(app[4])
    if price == 0.0:
        ios_free.append(app)

print("Free play store apps: ", len(gplay_free))
 

print("Free Apple store apps: ", len(ios_free))

        

Free play store apps:  8862
Free Apple store apps:  3221


### Most Common Apps: Part 1

Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

We generate frequency tables for the `prime_genre` column of the App Store data set, and for the `Genres` and `Category` columns of the Google Play data set to find the most common genres in each market.

We'll build two functions we can use to analyze the frequency tables:

* One function to generate frequency tables that show percentages

* Another function we can use to display the percentages in a descending order


In [50]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages
# to display the frequency table of the columns in dataset 
# (E.G: prime_genre in AppStore,
# Genres and Category  in Google PlayStore)
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])





### Most common apps by genre: Part 2

#### Common Apps in PlayStore 

In [51]:
print("\nGOOGLE PLAY APPS BY GENRE\n")
display_table(gplay_free, -4) # # -4 is the negative index for genre

print("\nGOOGLE PLAY APPS BY CATEGORY \n")       
display_table(gplay_free,1) # by Category


GOOGLE PLAY APPS BY GENRE

Tools : 8.451816745655607
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.5926427443015125
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5319341006544795
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.238546603475513
Action : 3.1031369893929135
Health & Fitness : 3.080568720379147
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8505980591288649
Video Players & Editors : 1.7716091175806816
Casual : 1.7603249830737984
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto &

From the above result cell, it is evident that category **Family** is the most common in Android apps followed by **Game** and **Tools** and **Tools** is the most common genre on PlayStore followed by **Entertainment** and **Education**.

#### Common Apps in AppStore

In [52]:
print("AppStore apps - BY PRIME GENRE \n")  
display_table(ios_free, -5)# -5 is the negative index for genre


AppStore apps - BY PRIME GENRE 

Games : 58.180689226948154
Entertainment : 7.885749767153058
Photo & Video : 4.967401428127911
Education : 3.6634585532443342
Social Networking : 3.2598571872089415
Shopping : 2.607885749767153
Utilities : 2.5147469729897547
Sports : 2.1421918658801617
Music : 2.049053089102763
Health & Fitness : 2.018006830176964
Productivity : 1.7385904998447685
Lifestyle : 1.5833592052157717
News : 1.334989133809376
Travel : 1.2418503570319777
Finance : 1.11766532132878
Weather : 0.8692952499223843
Food & Drink : 0.8072027320707855
Reference : 0.55883266066439
Business : 0.5277864017385905
Book : 0.43464762496119214
Navigation : 0.18627755355479667
Medical : 0.18627755355479667
Catalogs : 0.12418503570319776


From the above,amoung the free English apps the most common genre in AppStore is the **Games** genre followed by **Entertainment** and **Photo & Video**. These apps are designed for fun or entertainment, and we can thus conclude that from the apps in AppStore, apps used for fun/entertained are common than the other apps such as those used for other practical purposes.

### Comparison between AppStore and PlayStore apps by genre

If we compare the apps by genre in both the datasets usin the frequenct tables alone, users of AppStore apps are more inclined to apps in entertainment and fun genre and Android users are balanced between apps meant for practical purposes and entertainment. 

### Most Popular Apps by Genre on Google Play

### Finding the most popular app on Google Play
To find the most popular app, we can calculate the average number of installs for each app genre. In PlayStore data, we have the `Installs` column that gives us the total number of installs. However, in AppStore dataset, we take into consideration the `rating_count_total` column and calculate the average of user ratings per app genre.

We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.)

In [53]:
display_table(gplay_free, 5) # 5 is the index for Installs column

1,000,000+ : 15.730083502595352
100,000+ : 11.554953735048521
10,000,000+ : 10.550665763935905
10,000+ : 10.189573459715639
1,000+ : 8.395396073121193
100+ : 6.917174452719477
5,000,000+ : 6.82690137666441
500,000+ : 5.563078311893477
50,000+ : 4.773188896411646
5,000+ : 4.513653802753328
10+ : 3.5432182351613632
500+ : 3.2498307379823967
50,000,000+ : 2.3019634394041977
100,000,000+ : 2.132701421800948
50+ : 1.9183028661701647
5+ : 0.7898894154818324
1+ : 0.5077860528097494
500,000,000+ : 0.2708192281651997
1,000,000,000+ : 0.22568269013766643
0+ : 0.045136538027533285


To perform computations(calculate the average number of installs per app genre for the Google Play data set), however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.

In [54]:
category_playstore = freq_table(gplay_free,1) # 1 is the index for Category
category_sorted = sorted(category_playstore, reverse = False)
free_apps_sorted = sorted(gplay_free, reverse = False)
for category in category_sorted:
    total = 0
    len_category = 0
    for app in free_apps_sorted:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', (avg_n_installs))


ART_AND_DESIGN : 2021626.7857142857
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FAMILY : 3697848.1731343283
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
GAME : 15588015.603248259
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
MAPS_AND_NAVIGATION : 4056941.7741935486
MEDICAL : 120550.61980830671
NEWS_AND_MAGAZINES : 9549178.467741935
PARENTING : 542603.6206896552
PERSONALIZATION : 5201482.6122448975
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
SHOPPING : 7036877.311557789
SOCIAL : 23253652.127118643
SPORTS : 3638640.1428571427
TOOLS : 10801391.298666667
TRAVEL_AND_LOCAL : 13984077.71

On an average, ***Communication*** apps(WhatsApp, Viber, Skype etc) have more installs (38456119.167247385) and hence would be more profitable on Google PlayStore. The other two categorys with more installs are ***Video Players*** (24727872.4528302
) and ***Social*** (23253652.1271186
)

### Most Popular Apps by Genre on the App Store

#### Finding the average ratings of Apps by Genre on AppStore

In [55]:
appstore_genre = freq_table(ios_free, -5) 
for genre in  appstore_genre:
    len_genre = 0
    total = 0
    for row in ios_free:
        genre_app = row[-5] 
        if genre_app == genre:
            ratings_count = float(row[5])
            total+= ratings_count
            len_genre +=1
            
    average_ratings = total/len_genre
    print(genre,':', average_ratings)
            
        
    

News : 21248.023255813954
Education : 7003.983050847458
Social Networking : 43899.514285714286
Utilities : 18684.456790123455
Book : 39758.5
Business : 7491.117647058823
Navigation : 86090.33333333333
Health & Fitness : 23298.015384615384
Productivity : 21028.410714285714
Sports : 23008.898550724636
Photo & Video : 28441.54375
Travel : 28243.8
Reference : 74942.11111111111
Games : 22788.6696905016
Entertainment : 14029.830708661417
Finance : 31467.944444444445
Lifestyle : 16485.764705882353
Weather : 52279.892857142855
Food & Drink : 33333.92307692308
Music : 57326.530303030304
Shopping : 26919.690476190477
Catalogs : 4004.0
Medical : 612.0


The top three apps with highest user ratings in AppStore by Genre are **Navigation**, **Reference** and **Music** apps. **Navigation** apps are recommended for the App Store based on average number of ratings.

If we look closer at the results,Social Networking apps in AppStore is not far behind based on average number of ratings ie, 43899.5142857143
. 

In [56]:
for i in ios_free:
    if i[-5] == 'Social Networking':
        print(i[1])

Pinterest
Skype for iPhone
Messenger
Tumblr
WhatsApp Messenger
Kik
ooVoo – Free Video Call, Text and Voice
TextNow - Unlimited Text + Calls
Viber Messenger – Text & Call
Followers - Social Analytics For Instagram
MeetMe - Chat and Meet New People
We Heart It - Fashion, wallpapers, quotes, tattoos
InsTrack for Instagram - Analytics Plus More
Tango - Free Video Call, Voice and Chat
LinkedIn
Match™ - #1 Dating App.
Skype for iPad
POF - Best Dating App for Conversations
Timehop
Find My Family, Friends & iPhone - Life360 Locator
Whisper - Share, Express, Meet
Hangouts
LINE PLAY - Your Avatar World
WeChat
Badoo - Meet New People, Chat, Socialize.
Followers + for Instagram - Follower Analytics
GroupMe
Marco Polo Video Walkie Talkie
Miitomo
SimSimi
Grindr - Gay and same sex guys chat, meet and date
Wishbone - Compare Anything
imo video calls and chat
After School - Funny Anonymous School News
Quick Reposter - Repost, Regram and Reshare Photos
Weibo HD
Repost for Instagram
Live.me – Live Video 

In [57]:
for i in gplay_free:
    if i[1] == "SOCIAL":
        print(i[0])

Facebook
Facebook Lite
Tumblr
Social network all in one 2018
Pinterest
TextNow - free text + calls
Google+
The Messenger App
Messenger Pro
Free Messages, Video, Chat,Text for Messenger Plus
Telegram X
The Video Messenger App
Jodel - The Hyperlocal App
Hide Something - Photo, Video
Love Sticker
Web Browser & Fast Explorer
LiveMe - Video chat, new friends, and make money
VidStatus app - Status Videos & Status Downloader
Love Images
Web Browser ( Fast & Secure Web Explorer)
SPARK - Live random video chat & meet new people
Golden telegram
Facebook Local
Meet – Talk to Strangers Using Random Video Chat
MobilePatrol Public Safety App
💘 WhatsLov: Smileys of love, stickers and GIF
HTC Social Plugin - Facebook
Quora
Kate Mobile for VK
Family GPS tracker KidControl + GPS by SMS Locator
Moment
Text Me: Text Free, Call Free, Second Phone Number
Text Free: WiFi Calling App
Badoo - Free Chat & Dating App
Text free - Free Text + Call
Tango - Live Video Broadcast
ooVoo Video Calls, Messaging & Stories

Though there are many apps already available under the Socal genre in both the  stores, we recommend that an app under `Social` or `Social Networking` genre would be profitable on both the markets.

### Conclusion

In this project, we were analysing the apps on both Android Google PlayStore and Apple's AppStore to recommend an app genre that would potentially be profitable on both the markets.

After analysing the popularity of the genres on both AppStore and Google PlayStore, we recommend that apps in the `Social` or `Socail Networking` genre would be profitable.