# Profitable App Profiles for the App Store and Google Play Markets - Author Bharti Gothwal


**Project Synopsis:** Data Analysis for a company that builds Android and iOS mobile apps. All the apps build by this company is free and the main source of income is in-app ads. That means the revenue for any given app is mostly influneced by the number of users who use the app- the more the users that see and engage with the ads, the better.

**Project Goal:** The main goal of this project is to analyze data to help the developers to understand what type of apps are likely to attract more users. 



**Step-1 Opening and Exploring dataset: **
The source of this data set is kaggle.com
The data set contains the following information
* googleplaystore.csv - conatains information about 10,000 Android Apps
* AppleStore.csv - contains information about 7,000 different iOS Apps.




**Explore data set function goes here.**
* Function defination - explore_data(dataset, start,end, rows_and_columns=False)
* dataset- the dataset(list of lists) to be explored
* start - starting point of the slice
* end - end point of the slice
* rows_and_columns - bool which is false by default. asks you if you want to * * print the number of rows and columns. If yes then pass True as argument.


In [4]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows: ', len(dataset))
        print('Number of columns: ', len(dataset[0]))
        print('\n')

# Importing the datasets

In [5]:
opened_file1 = open('googleplaystore.csv')
opened_file2 = open('AppleStore.csv')
from csv import reader
read_file1 = reader(opened_file1)
googleApps = list(read_file1)
read_file2 = reader(opened_file2)
iosApps = list(read_file2)

** Calling the explore_data function **

In [6]:
explore_data(googleApps,10473,10474,True)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Number of rows:  10842
Number of columns:  13




# Data Cleansing Phase


**Removing entries with missing values** 



After reading a discussion I found that in google app data  row **10473** has a missing entry and a 
column shift happened for next few columns. Printing the row in next cell
To delete a row from data set use del command for example you want to delete the 800th row from dataset ** del data[800] **

In [7]:
explore_data(googleApps,0,1,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows:  10842
Number of columns:  13




In [8]:
explore_data(googleApps,0,1,True)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows:  10842
Number of columns:  13




** Removing the entires with duplicates **

While exploring the data we can easily see that some apps  appears more the once.


In [9]:
def check_duplicates(dataset):
    duplicate_apps = []
    unique_apps = []
    dupData = [[]]

    for app in dataset:
        name = app[0]
        if name in unique_apps:
            duplicate_apps.append(name)
            dupData.append(app)
        else:
            unique_apps.append(name)
   
    return (duplicate_apps)

**Removing the duplicates**
Before removing the duplicates we first print them and see what should be the criteria for removing the duplicates. For example number of reviews could be a nice metric. We will keep the entry which has highest number of reviews.<br>
This process is done in two steps:<br>
**Step-1** Create a dictionar where each key is unique app name and the value is highest number of the review of that app.<br>
**Step-2** Use this Dictionory to create a new data set, which will only have one entry for each app name with the highest value of review attribute.

In [10]:
print(len(check_duplicates(googleApps)))


1181


In [11]:
del(googleApps[10473])


**Step-1 Creating the dictionary of Unique Apps**<br>

In [12]:
review_max = {}
for app in googleApps[1:]:
    name=app[0]
    n_reviews = float(app[3])
    if name in review_max and review_max[name] < n_reviews:
        review_max[name] = n_reviews
    elif name not in review_max:
        review_max[name] = n_reviews
 

In [14]:
print('Expected length:', len(googleApps)-1181)
print('Actual Length:',len(review_max))


Expected length: 9660
Actual Length: 9659


In [15]:
google_clean = []
already_added = []
for app in googleApps[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if((review_max[name] == n_reviews) and (name not in already_added)):
        google_clean.append(app)
        already_added.append(name)
print(len(google_clean))
explore_data(google_clean, 0,5,True)

9659
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows:  9659
Number of columns:  13




** Removing Non English Apps**

In [16]:
def is_english(string):
    flag=0
    for character in string:
        if ord(character) > 127 or ord(character)< 0:
            flag+=1
        if flag>3:
            return False
    return True


**Testing the is_english(string) function**

In [17]:
is_english('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [18]:
is_english('Instagram')

True

In [19]:
is_english('Docs To Go™ Free Office Suite')

True

In [20]:
is_english('Instachat 😜')

True

In [25]:
google_clean_new = []
ios_clean_new = []
for app in google_clean[1:]:
    name  = app[0]
    if is_english(name):
        google_clean_new.append(app)

for app in iosApps[1:]:
    name = app[1]
    if is_english(name):
        ios_clean_new.append(app)
        
print('Google Clean:',len(google_clean))
print('Google Clean New',len(google_clean_new))
print('ios_old', len(iosApps))
print('ios_new',len(ios_clean_new))

Google Clean: 9659
Google Clean New 9613
ios_old 7198
ios_new 6183


**Removing no-free apps**

In [26]:
explore_data(google_clean_new, 0, 4, True)
explore_data(ios_clean_new, 0, 4, True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows:  9613
Number of columns:  13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '

In [27]:
free_google_apps = []
free_apple_apps = []
for app in google_clean_new:
    if app[7] == '0':
        free_google_apps.append(app)
for app in ios_clean_new:
    if app[4] == '0.0':
        free_apple_apps.append(app)
print('Previos Google Apps', len(google_clean_new))
print('Free Google Apps',len(free_google_apps))
print('Previous ios Apps',len(ios_clean_new))
print('Free Apple Apps', len(free_apple_apps))

Previos Google Apps 9613
Free Google Apps 8863
Previous ios Apps 6183
Free Apple Apps 3222


** Till this point the data sets are clean. Till now I**<br>
1.) Removed inaccurate data.<br>
2.) Removed duplicate app entries<br>
3.) Removed non-English apps<br>
4.) Isolated the free-apps<br>
**free_google_apps[] and free_apple_apps[] are the final dataset for processing**

# Data Analysing Phase
**Aim of the project is to determine the kinds of apps that are likely to attract more users  because our revenue is highly influenced by the number of people using our apps**<br>
To minimize the risks and overhead, our validation strategy for an app idea is comprised of three steps:<br>
1.) Build a mininmal Android version of the app, and add it to the App store.<br>
2.) If the app has a good response from users, we develope it further.<br>
3.) If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.<br>


In [28]:
explore_data(free_google_apps,0,5,True)
explore_data(free_apple_apps,0,5,True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'April 26, 2018', '1.1', '4.0.3 and up']


Number of rows:  8863
Number of columns:  13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '

**Generating Frequency tables**

In [29]:
def freq_table(dataset, index):
    table = {}
    total = 0
    for row in dataset:
        total+=1
        value = row[index]
        if value in table:
            table[value]+=1
        else:
            table[value]=1
            
    total_percentage = {}
    for key in table:
        percent = (table[key]/total)*100
        total_percentage[key]=percent
        
    return total_percentage

def print_freq_table(dataset, index):
    table = freq_table(dataset,index)
    display_table = []
    for key in table:
        key_val_pair = (table[key],key)
        display_table.append(key_val_pair)
    table_sorted = sorted(display_table,reverse=True)
    for e in table_sorted:
        print(e[1],':',e[0])
        

**Printing the genre for the google apps**

In [37]:
print(googleApps[0][9])
print_freq_table(free_google_apps,9)

Genres
Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.92519462935

In [96]:
print(iosApps[0][11])
print_freq_table(free_apple_apps,11)

prime_genre
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [30]:
print(googleApps[0][9])

Genres


In [98]:
print(googleApps[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [36]:
print_freq_table(free_google_apps,1)

FAMILY : 18.910075595170937
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

In [34]:
print_freq_table(free_apple_apps,11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [47]:
print_freq_table(free_google_apps,2)

NaN : 14.645154011057205
4.3 : 9.522734965587274
4.4 : 9.285794877580955
4.5 : 8.811914701568318
4.2 : 8.473428861559292
4.6 : 6.8374139681823305
4.1 : 6.668171048177818
4.0 : 5.517319192147128
4.7 : 4.321335890781902
3.9 : 3.870021437436534
3.8 : 3.0125239760803337
5.0 : 2.7417353040731127
3.7 : 2.403249464064087
4.8 : 2.06476362405506
3.6 : 1.771409229380571
3.5 : 1.6472977547105947
3.4 : 1.3088119147015682
3.3 : 1.0718718266952498
4.9 : 0.8913460453571026
3.0 : 0.8236488773552973
3.1 : 0.7333859866862237
3.2 : 0.6882545413516867
2.9 : 0.4400315920117342
2.8 : 0.4174658693444658
2.6 : 0.2482229493399526
2.7 : 0.2369400880063184
2.5 : 0.21437436533904997
2.3 : 0.20309150400541578
2.4 : 0.19180864267178158
2.2 : 0.15796005867087895
1.0 : 0.15796005867087895
2.0 : 0.1241114746699763
1.9 : 0.1241114746699763
2.1 : 0.09026289066907367
1.8 : 0.07898002933543948
1.7 : 0.07898002933543948
1.6 : 0.045131445334536835
1.5 : 0.033848584000902626
1.4 : 0.033848584000902626
1.2 : 0.011282861333634

In [41]:
print(googleApps[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [64]:
#Calculating the average number of user rating per genre.

genre_table = freq_table(free_apple_apps,-5)
rating_table = {}
for genre in genre_table:
    total = 0
    len_genre = 0
    for app in free_apple_apps:
        app_genre = app[-5]
        
        if genre == app_genre:
            app_rating = float(app[5])
            total+=app_rating
            len_genre+=1
    avg_app_rating = total/len_genre
    rating_table[genre]=avg_app_rating
sorted_table = sorted(rating_table.items(), key = lambda x:x[1], reverse = True)

for e in sorted_table:
    print (e[0], ':',e[1])

 
    

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


# Analysing Data for Apple Apps:

**Question-1: What is the most common genre? What is the runner up?**<br>
Answer: The most common genre is Navigation as it has highest user rating. This is heavily influenced by Waze and Google Maps. Runner ups are Reference influeced by Bible and dictionary.com, and Social Networking influenced by Facebook and pinterst .<br>


In [66]:
for app in free_apple_apps:
    if app[-5] == 'Navigation':
        print (app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [67]:
for app in free_apple_apps:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [70]:
for app in free_apple_apps:
    if app[-5] == 'Book':
        print(app[1], ':', app[5])

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


In [68]:
for app in free_apple_apps:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

**Question-2: What other patterns do you see?**<br>
Answer: Popular apps like google maps, facebook and pinterest and whatsapp are very popular in the market. But these apps are already there in the market and even if we create new social networing or navigation apps chances are there that people will still be using these legacy apps. So for a better picture we can do analaysis after removing these popular apps.<br>


# Analysing Data for Google Apps:

In [74]:
print(googleApps[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [73]:
print_freq_table(free_google_apps,5) #Installs Columns

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.188423784271691
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835
0 : 0.011282861333634209


In [76]:
print_freq_table(free_google_apps,1) #Each genre.

FAMILY : 18.910075595170937
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

In [83]:
#counting the average number of installs per genre.
genre_android = freq_table(free_google_apps,1)
install_table = {}
for genre in genre_android:
    total = 0
    len_genre=0
    for app in free_google_apps:
        if genre == app[1]:
            n_installs = app[5]
            n_installs = n_installs.replace(',','')
            n_installs = n_installs.replace('+','')
            total+=float(n_installs)
            len_genre+=1
    avg_install = total/len_genre
    print(genre, ':',avg_install )
            
            

COMMUNICATION : 38456119.167247385
SHOPPING : 7036877.311557789
TOOLS : 10801391.298666667
PARENTING : 542603.6206896552
BEAUTY : 513151.88679245283
WEATHER : 5074486.197183099
MEDICAL : 120550.61980830671
ENTERTAINMENT : 11640705.88235294
PHOTOGRAPHY : 17840110.40229885
AUTO_AND_VEHICLES : 647317.8170731707
EVENTS : 253542.22222222222
BUSINESS : 1712290.1474201474
PRODUCTIVITY : 16787331.344927534
COMICS : 817657.2727272727
DATING : 854028.8303030303
FAMILY : 3695641.8198090694
TRAVEL_AND_LOCAL : 13984077.710144928
GAME : 15588015.603248259
FINANCE : 1387692.475609756
MAPS_AND_NAVIGATION : 4056941.7741935486
SPORTS : 3638640.1428571427
FOOD_AND_DRINK : 1924897.7363636363
BOOKS_AND_REFERENCE : 8767811.894736841
HOUSE_AND_HOME : 1331540.5616438356
SOCIAL : 23253652.127118643
ART_AND_DESIGN : 2021626.7857142857
HEALTH_AND_FITNESS : 4188821.9853479853
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
NEWS_AND_MAGAZINES : 9549178.467741935
VIDEO_PLAYERS : 24727872.452830

**The Communication app has highesht number of installation. Then comes shopping and tools genre. This amount is dominated by Whataspp and Messanger apps for SMS**

In [85]:
for app in free_google_apps:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [86]:
for app in free_google_apps:
    if app[1] == 'SHOPPING':
        print(app[0], ':', app[5])

Amazon for Tablets : 10,000,000+
OfferUp - Buy. Sell. Offer Up : 10,000,000+
Shopee - No. 1 Online Shopping : 10,000,000+
Shopee: No.1 Online Shopping : 10,000,000+
Kroger : 5,000,000+
Walmart : 10,000,000+
eBay: Buy & Sell this Summer - Discover Deals Now! : 100,000,000+
letgo: Buy & Sell Used Stuff, Cars & Real Estate : 50,000,000+
Amazon Shopping : 100,000,000+
Lazada - Online Shopping & Deals : 50,000,000+
OLX - Buy and Sell : 50,000,000+
The wall : 1,000,000+
Flipp - Weekly Shopping : 10,000,000+
Shrimp skin shopping: spend less, buy better : 5,000,000+
Lotte Home Shopping LOTTE Homeshopping : 5,000,000+
Horn, free country requirements : 1,000,000+
Jiji.ng : 1,000,000+
GS SHOP : 10,000,000+
The birth : 50,000,000+
Home & Shopping - Only in apps. 10% off + 10% off : 10,000,000+
EHS Dongsen Shopping : 1,000,000+
bigbasket - online grocery : 5,000,000+
Bukalapak - Buy and Sell Online : 10,000,000+
Life market : 1,000,000+
Jabong Online Shopping App : 10,000,000+
Family Dollar : 1,000

In [87]:
for app in free_google_apps:
    if app[1] == 'GAME':
        print(app[0], ':', app[5])

Solitaire : 10,000,000+
Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Bubble Witch 3 Saga : 50,000,000+
Race the Traffic Moto : 10,000,000+
Marble - Temple Quest : 10,000,000+
Shooting King : 10,000,000+
Geometry Dash World : 10,000,000+
Jungle Marble Blast : 5,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Block Craft 3D: Building Simulator Games For Free : 50,000,000+
Farm Fruit Pop: Party Time : 1,000,000+
Love Balls : 50,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Paint Hit : 10,000,000+
Snake VS Block : 50,000,000+
Rolly Vortex : 10,000,000+
Woody Puzzle : 1,000,000+
Stack Jump : 10,000,000+
The Cube : 5,000,000+
Extreme Car Driving Simulator : 100,000,000+
Bricks n Balls : 1,000,000+
The Fish Master! : 1,000,000+
Color Road : 10,000,000+
Draw In : 10,000,000+
PLANK! : 500,000+
Looper! : 1,000,000+
Trivia Crack : 100,000,000+
Will it Crush? : 5,000,000+
Tomb of the Mask : 5,000,000+
Baseball Boy! : 10,000,000+
Hello Stars : 10,000,000+
Tank Stars : 1

# Conclusion:

**In this project we started exploring the dataset. After exploaration we found that dataset is not perfect for ananlysis. It has missing values, duplicate entries, non english apps and non free apps. We started with removing the rows which has missing values in their coulmns. After that we removed the duplicate apps. We gave the preference to the maximum number of reviews. We kept the app app which has the maximum number og reviews and deleted rest of the rows including the same app name. We created a dictionary with the unique app names and built a  new data set using this dictionary.**<br>
**After that is done we removed the non english apps with the help of ord() function. Our approch was as follows if the value return by the function ord() is greater than 127 than it is not an English alphabet. But it leads to remove lot of rows which were having some special character in theie app name column. To fix this issue we kept this critiera, if an app name has more than 3 non english characters then only consider it as an non english app. 
We further refine the data set on the basis of its cost.**<br>
**After all this done we then moved towards the data analysation phase. In which 
we made a frequecy table on the basis of Genre of the app. And later we made a table in which we added all the genres with their average number of installs.**<br>
**At conclusion, we can suggest our company to make app in Book genre. It does not requires any external resources such as subscription from any other source. There are lot of good books are available in the market. They only need to find a good book. They can add features like dictionary, daily quotes from that book etc. This could be great app for both android and ios platform.**<br>

Thank you