# Profitable app profiles for the App store and Google play markets

For this guided project we are working as a data analyst for a company that builds iOS and Anroid mobile apps. Aim of this project is to analyze the data and help developers understand what type of app data are likely to attract more users.

There is two data sets to analyse the iOS dataset and Anroid(google) data set. The company build apps that are only free to download and install.The revenue for the company is from the app ads. The more apps users download and install, the company will profit. As a data analyst our goal is to analyze the data to understand what type of data will attract more users to increase the revenue.

You can find the google play store data set here. [data set](https://www.kaggle.com/lava18/google-play-store-apps)

You can find the iOS data set from here. [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)


In [1]:
#Apps store data set
from csv import reader
opend_file = open('AppleStore.csv')
read_file = reader(opend_file)
apps_data = list(read_file)
apps_data_header = apps_data[0]
apps_data = apps_data[1:]   

In [2]:
#google data set
opend_file = open('googleplaystore.csv')
read_file = reader(opend_file)
google_data = list(read_file)
google_data_header = google_data[0]
google_data = google_data[1:]

In [3]:
#funtion to explore the data with given start and end row 
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    #print the rows and columns    
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
print(apps_data_header)
explore_data(apps_data, 0, 1, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows: 7197
Number of columns: 16


In [5]:
print(google_data_header)
explore_data(google_data,0,1,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [6]:
explore_data(google_data,9149, 9151, True)

['Star Wars™: Galaxy of Heroes', 'FAMILY', '4.5', '1461698', '67M', '10,000,000+', 'Free', '0', 'Everyone 10+', 'Role Playing', 'May 21, 2018', '0.12.334385', '4.1 and up']


['Dungeon Keeper', 'FAMILY', '4.0', '69574', '45M', '500,000+', 'Free', '0', 'Everyone 10+', 'Strategy', 'May 24, 2018', '1.8.91', '4.1 and up']


Number of rows: 10841
Number of columns: 13


### Deleting wrong data

In kaggle as per Google play data set discussion section, there is one row that is incorrect. From the discussion section the row is 10472. Let's print the row and see the error.

In [7]:
print(google_data[10472]) #print the incorrect row
print('\n')
print(google_data_header) #print the header
print('\n')
print(google_data[10473]) #print a correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


From the above result it is clear that for the row 10472 the rating(3rd column) is wrong as 19. The rating can go only upto 5. For row 10472 the category column name is missing resulting wrong rating. We will delete this row in the data set.

In [8]:
print(len(google_data))
del google_data[10472]
print(len(google_data))

10841
10840


##  Data Cleaning

For the data cleaning process need to do the following.
* Remove duplicate data
* Detect inaccurate data and remove it.
* Remove non- English apps
* Remove apps that are not free

For the google data will explore whether the dataset has duplicate values.

In [9]:
for app in google_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [10]:
#Find the duplicate apps
apps_duplicated = []
apps_unique = []
for row in google_data:
    name = row[0]
    if name in apps_unique:
        apps_duplicated.append(name)
    else:
        apps_unique.append(name)
print('Number of duplicate apps:', len(apps_duplicated))
print('\n')


Number of duplicate apps: 1181




* From the above code we can see that there are 1181 duplicate apps.
* We won't remove the rows randomly for the duplicated rows. Based on the column reviews, we will keep the higheset review and the other duplicated rows will be deleted.

We will create a dictionary named 'reviews_max' to store the app name and the reviews number as values.(That has highest number of reviews after looping through a for loop).

Then a list named google_clean is created to only store the unique apps with the highest review score. After each iteration the name of tha app is added to the already_added list if the name is not included in the list.

In [11]:
reviews_max = {}
for app in google_data:
    name = app[0]
    n_reviews = (app[3])
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))
google_clean = []
already_added = []
for app in google_data:
    name = app[0]
    n_reviews = (app[3])
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        google_clean.append(app)
        already_added.append(name)       

9659


In [12]:
#Cheking the first 2 rows and number of rows and columns for the cleaned goole cleaned data      
explore_data(google_clean,0,2,True) 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


In [13]:
#checking duplicate apps for iOS data
apps_list = []
duplicate_apps = []
for app in apps_data:
    id_num = app[0]
    if id_num in apps_list:
        duplicate_apps.append(id_num)
    else:
        apps_list.append(id_num)  

print(len(duplicate_apps))   

0


From the above code there are no duplicate values in iOS data.

### Removing the non-english apps

After exploring the data set below are some of the apps created for a non-english speaking audience.

In [14]:
print(apps_data[813])
print('\n')
print(apps_data[6731])
print('\n')
print(google_clean[7941])

['445375097', '爱奇艺PPS -《欢乐颂2》电视剧热播', '224617472', 'USD', '0.0', '14844', '0', '4.0', '0.0', '6.3.3', '17+', 'Entertainment', '38', '5', '3', '1']


['1120021683', '【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜', '77551616', 'USD', '0.0', '0', '0', '0.0', '0.0', '1.3', '12+', 'Games', '38', '0', '1', '1']


['لعبة تقدر تربح DZ', 'FAMILY', '4.2', '238', '6.8M', '10,000+', 'Free', '0', 'Everyone', 'Education', 'November 18, 2016', '6.0.0.0', '4.1 and up']


All the charactors specific to english characters are encoded using ASCII standard. Each ASCII charactor in a english letter will have a number between 0 to 127.

A funtion named is_english is created. Using a for loop and ord() funtion to check whether each charater is english.

Some of the apps names contain emojis, so we will only remove an app if the name has more than 3 non-ascii characters and consider it as non-english app.
Then for google data the english apps is appended to google_english list. For iOS data the english apps are appended to apple_english list.

In [15]:
def is_english(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii +=1
            
    if non_ascii > 3:
        return False
    else:
        return True

google_english = [] #for anroid data
apple_english = []  # for iOS data
for i in google_clean:
    name = i[0]
    if is_english(name):
        google_english.append(i)
        
for apps in apps_data:
    name = apps[1]
    if is_english(name):
        apple_english.append(apps)

In [16]:
explore_data(google_english,0,2,True)
print('\n')
explore_data(apple_english,0,2,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6183
Number of columns: 16


In [17]:
#chek whether these considered as english apps
print(is_english("Instachat 😜"))
print(is_english('Docs To Go™ Free Office Suite'))

True
True


### Isolating the free apps

In [18]:
free_google_apps = [] #list that contains free anroid apps
free_apple_apps = []  #list that contains free iOS apps

for i in google_english:
    price = i[7]
    if price == '0':
        free_google_apps.append(i)
    
for apps in apple_english:
    name = apps[0]
    price = apps[4]
    if price == '0.0':
        free_apple_apps.append(apps)

In [19]:
explore_data(free_google_apps, 0, 1, True)
explore_data(free_apple_apps, 0, 1, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 8862
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows: 3222
Number of columns: 16


After cleaning the data and isolating the free apps there are 8862 google_apps and 3222 apple apps.

At the beginning of the project as mentioned the goal is to find the apps that has most users since the revenue is from the user app adds. 

The validation strategy for a app idea is as below.
1. Build minimal Anroid version of the app and add it to the Google Play.
2. If the app has good response then deveop it further.
3. If the app is profitable after six months, buid an iOS version of the add and add it to the App store.

To analyse we need to create a frequency table that indicates what are the most common genres in each market. We will use the column "Genre" in Android data and "prime_genre" in iOS data.
We will use a function named display_table() to use frequency_table() function and display the data. The output of the frequency_table is a dictionary.

In the display_table() function since sorting the dictionaries won't work we will change the dictionary to save as tuples and sort it in descending order.

In [20]:
def freq_table(data_set, index):
    table = {}
    total = 0
    for app in data_set:
        value = app[index]
        total +=1
        if value in table:
            table[value] +=1
        else:
            table[value] = 1
        
    table_percentage = {}
    
    for key in table:
        percentage = (table[key] / total)*100
        table_percentage[key]= percentage
    return table_percentage

def display_table(data_set,index):
    table = freq_table(data_set, index) #using the freq_table funtion 
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key) #save data as tuple
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':' , entry[0])

In [21]:
#For iOS data display apps by the column 'Genre' 
display_table(free_apple_apps, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [22]:
#For Anroid data display the apps by the column 'Category' 
display_table(free_google_apps, 1)

FAMILY : 18.934777702550214
GAME : 9.693071541412774
TOOLS : 8.451816745655607
BUSINESS : 4.5926427443015125
LIFESTYLE : 3.9043105393816293
PRODUCTIVITY : 3.8930264048747465
FINANCE : 3.7011961182577298
MEDICAL : 3.5206499661475967
SPORTS : 3.39652448657188
PERSONALIZATION : 3.3175355450236967
COMMUNICATION : 3.238546603475513
HEALTH_AND_FITNESS : 3.080568720379147
PHOTOGRAPHY : 2.945159106296547
NEWS_AND_MAGAZINES : 2.798465357707064
SOCIAL : 2.663055743624464
TRAVEL_AND_LOCAL : 2.335815842924848
SHOPPING : 2.2455427668697814
BOOKS_AND_REFERENCE : 2.143985556307831
DATING : 1.8618821936357481
VIDEO_PLAYERS : 1.7941773865944481
MAPS_AND_NAVIGATION : 1.399232678853532
FOOD_AND_DRINK : 1.2412547957571656
EDUCATION : 1.1735499887158656
ENTERTAINMENT : 0.9591514330850823
LIBRARIES_AND_DEMO : 0.9365831640713158
AUTO_AND_VEHICLES : 0.9252990295644324
HOUSE_AND_HOME : 0.8237418190024826
WEATHER : 0.8011735499887158
EVENTS : 0.7109004739336493
PARENTING : 0.6544798013992327
ART_AND_DESIGN : 0.

From the above analysis we can see that for apple data the highest percentage is from 'Games', 58.16% with percentage more than 50%. Second and thrid highest is 'Entertainment' and 'Photo and Video'. From the results it is clear that there is high number of apps for fun stuff. 

From the anroid data the highest is from 'Family', 18.93%. Second highest is 'Game' with 9.69%. Then from 'Tools' and 'Bussiness'. From the results it is also clear that most of the highest percentage apps are from 'Life style' and 'productivity' apps.

For Anroid data even though the 'Family' category is the highest, from further analysis we can see that there is moslty games and fun apps included for childresn under family category.

In [23]:
#For Anroid data display the apps by the column 'Genre' 
display_table(free_google_apps, -4)

Tools : 8.440532611148726
Entertainment : 6.070864364703228
Education : 5.348679756262695
Business : 4.5926427443015125
Productivity : 3.8930264048747465
Lifestyle : 3.8930264048747465
Finance : 3.7011961182577298
Medical : 3.5206499661475967
Sports : 3.4642292936131795
Personalization : 3.3175355450236967
Communication : 3.238546603475513
Action : 3.1031369893929135
Health & Fitness : 3.080568720379147
Photography : 2.945159106296547
News & Magazines : 2.798465357707064
Social : 2.663055743624464
Travel & Local : 2.324531708417964
Shopping : 2.2455427668697814
Books & Reference : 2.143985556307831
Simulation : 2.0424283457458814
Dating : 1.8618821936357481
Arcade : 1.8505980591288649
Video Players & Editors : 1.7716091175806816
Casual : 1.7490408485669149
Maps & Navigation : 1.399232678853532
Food & Drink : 1.2412547957571656
Puzzle : 1.128413450688332
Racing : 0.9930038366057323
Role Playing : 0.9365831640713158
Libraries & Demo : 0.9365831640713158
Auto & Vehicles : 0.92529902956443

### To Find the most popular apps by Genre

For the next part of the project we will find the popularity (most number of users) apps. To find the numer of users per apps, for google data the column "Installs" can be used. For iOS data there is no specific column with this inormation however the column "rating_count_tot" contains information on total number of user ratings. We will use this column in iOS data to find the average ratings.

Using the freq_table function created earlier we will get the app data with the percentage dictionary. 

* To calculate the average number of installs for each app genres Then we will loop over a for loop for each app data.
* Sum up the user rating or installs for the apps of that genre.
* Divide the sum by the number of apps belonging to that genre.

### Most popular app for iOS data

In [24]:
#use the freq_table we created earlier and assign to genres_ios
genres_ios = freq_table(free_apple_apps, -5)

#iterate over the key and values in dictionary using for loop
for genre in genres_ios:
    len_genre = 0
    total = 0
    for i in free_apple_apps:
        genre_app = i[-5]
        if genre_app == genre:
            n_rating = float(i[5])
            total += n_rating
            len_genre +=1
     
    average_nrating = total / len_genre 
    print(genre , ':' , average_nrating)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average the highest is from Navigation with 86090. This can be with the google maps installed by lot of users.

In [25]:
for app in free_apple_apps:
    if app[-5] == "Navigation":
        print(app[1],":",app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [26]:
for app in free_apple_apps:
    if app[-5] == "Reference":
        print(app[1],":",app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


* For iOS data from the analysis the highest number of users are from the 'Navigation' genre. It shows that users installing apps such as google maps, Waze, GPS maps has increaed the counts.

* The second highest is 'Reference'.For the 'Reference' as shown above lot of it come from users installng the Bible, dictionary and Quran and thesaurus.

* Third highest users are from genre 'Social networking'. For Social Networking genre it can be similar as above analysis that users installing popular apps such as Facebook, Instagram, viber has increase the installed app count. 

The user count seems to skew because of some of the very popular apps such as Facebook, Instagram, Google maps. We can eliminate these apps and see how other apps will have a impact on the user counts. However for iOS data focusing on the second highest 'Reference' genre we can promote apps with content of online books, quotation with thesaurus added.

### Most popular app for Anroid data

In [27]:
#use the freq_table we created earlier and assign to google_category
google_category = freq_table(free_google_apps, 1)

#iterate over the key and values in dictionary using for loop
for category in google_category:
    len_category = 0
    total = 0
    for app in free_google_apps:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            total += float(installs)
            len_category +=1
            
    avg_installs_google = total/len_category 
    print(category,":",avg_installs_google)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1820673.076923077
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15560965.599534342
FAMILY : 3694276.334922527
MEDICAL : 120616.48717948717
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17805627.643678162
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10682301.033377837
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

For the google data the high count is from 'Communicaion'. Second highest is from 'Social'. For the communication data it is the same as previous because of some popular apps such as Messenger, Watsupp, Viber, Line the numer is highly skewed. Analysing the column some app shows more than one billion installs from one app.

We will explore the apps that has over one billion installs.

In [28]:
for app in free_google_apps:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' 
                                      or app[5] == '5000,000,000+'
                                      or app[5] == '1000,000,000+'):
       
        print(app[0],":",app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+


In [29]:
for app in free_google_apps:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+' 
                                      or app[5] == '5000,000,000+'
                                      or app[5] == '1000,000,000+'):
       
        print(app[0],":",app[5])

Facebook : 1,000,000,000+
Google+ : 1,000,000,000+
Instagram : 1,000,000,000+


We will take a look at the 'BOOKS_AND_REFERENCE' and 'NEWS_AND_MAGAZINEs' genres.

In [30]:
#explore the data for genre 'Books and Reference'
for app in free_google_apps:
    if app[1] == 'BOOKS_AND_REFERENCE':
       
        print(app[0],":",app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [31]:
#explore the high user count data for genre 'Books and Reference'
for app in free_google_apps:
    if app[1] == 'BOOKS_AND_REFERENCE'and (app[5] == '1,000,000,000+' 
                                      or app[5] == '5000,000,000+'
                                      or app[5] == '1000,000,000+'):
       
        print(app[0],":",app[5])

Google Play Books : 1,000,000,000+


In [32]:
#explore the data for genre 'News and magazines'
for app in free_google_apps:
    if app[1] == 'NEWS_AND_MAGAZINES':
       
        print(app[0],":",app[5])

Fox News – Breaking News, Live Video & News Alerts : 10,000,000+
NEW - Read Newspaper, News 24h : 10,000,000+
BaBe + - Indonesian News : 1,000,000+
daily News : 1,000,000+
BBC News : 10,000,000+
Free TV Shows App:News, TV Series, Episode, Movies : 1,000,000+
News24 : 1,000,000+
Le Monde, the continuous news : 5,000,000+
Wireless news : 1,000,000+
CNN Breaking US & World News : 10,000,000+
Gnoche entertainment news · sports news is also free : 5,000,000+
BaBe - Read News : 10,000,000+
Nigeria News NAIJ.com : 1,000,000+
detikcom - Latest & Most Complete News : 10,000,000+
Dailyhunt (Newshunt) - Latest News, Viral Videos : 50,000,000+
BaBe Lite - Read Quota Saving News : 1,000,000+
ARY NEWS URDU : 500,000+
Bengali Newspaper - The first L. : 1,000,000+
Read- Latest News, Information, Gossip and Politics : 10,000,000+
Reddit: Social News, Trending Memes & Funny Videos : 10,000,000+
BBC Persian | BBC Farsi News : 100,000+
Opera News - Trending news and videos : 10,000,000+
Topbuzz: Breaking 

In [33]:
#explore the high user count data for genre 'News and Magazines'
for app in free_google_apps:
    if app[1] == 'NEWS_AND_MAGAZINES'and (app[5] == '1,000,000,000+' 
                                      or app[5] == '5000,000,000+'
                                      or app[5] == '1000,000,000+'):
       
        print(app[0],":",app[5])

Google News : 1,000,000,000+


In [34]:
#explore the data for genre 'Travel and local'
for app in free_google_apps:
    if app[1] == 'TRAVEL_AND_LOCAL':
        print(app[0],":",app[5])

trivago: Hotels & Travel : 50,000,000+
Hopper - Watch & Book Flights : 5,000,000+
TripIt: Travel Organizer : 1,000,000+
Trip by Skyscanner - City & Travel Guide : 500,000+
CityMaps2Go Plan Trips Travel Guide Offline Maps : 1,000,000+
KAYAK Flights, Hotels & Cars : 10,000,000+
World Travel Guide by Triposo : 500,000+
Booking.com Travel Deals : 100,000,000+
Hostelworld: Hostels & Cheap Hotels Travel App : 1,000,000+
Google Trips - Travel Planner : 5,000,000+
GPS Map Free : 5,000,000+
GasBuddy: Find Cheap Gas : 10,000,000+
Southwest Airlines : 5,000,000+
AT&T Navigator: Maps, Traffic : 10,000,000+
VZ Navigator : 50,000,000+
KakaoMap - Map / Navigation : 10,000,000+
AirAsia : 10,000,000+
Expedia Hotels, Flights & Car Rental Travel Deals : 10,000,000+
Goibibo - Flight Hotel Bus Car IRCTC Booking App : 10,000,000+
Allegiant : 1,000,000+
Amtrak : 1,000,000+
JAL (Domestic and international flights) : 1,000,000+
Flight & Hotel Booking App - ixigo : 5,000,000+
VZ Navigator for Tablets : 500,000+

In [35]:
#explore the high user count data for genre 'Travel and local'
for app in free_google_apps:
    if app[1] == 'TRAVEL_AND_LOCAL'and (app[5] == '1,000,000,000+' 
                                      or app[5] == '5000,000,000+'
                                      or app[5] == '1000,000,000+'):
       
        print(app[0],":",app[5])

Maps - Navigate & Explore : 1,000,000,000+
Google Street View : 1,000,000,000+


* For the Anroid data high user installs are from the Communication, Social, video_players, Game and Beauty. This is because of some of the very popular apps such as google maps, Facebook, Instagram with user installs over a billion. Because of this the number of installs per genre for these areas seems to be highly skewed.
* Other genres with high number of installs are apps such as Productivity, News and Magazines, Books and References, Travel and Local. For these areas the user installs number are not skewed. Therefore there is still potential to create apps that will attract users.

### Conclusion

From the analysis from the two data sets I believe there is potential to create popular apps in genres such as 'Books and references' and 'Travel and Local'. Books and references' has a high count of downloads in both data sets. 'Travel and Local' is also comparetively popular in both data set. Since people are waiting to travel after the panademic, it is wiser to create apps in travel area that will be populer in future.

Both these areas number count is not skewed beacuse of few popular apps. Therefore their is potential to create apps that will be popular in the market and to increase the company revenue. 