# Project_Profitable App Profiles for the App Store and Google Play Markets

**About**:
For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better.

**Objectives**:
Analyze data to help apps developers (from iOS and Android) understand what type of apps are likely to attract more users.

**Source of data set**:
- [Android](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)
- [App Store](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)

In [5]:
from csv import reader

In [7]:
opn_aps = open('AppleStore.csv',encoding='utf8')
rd_aps = reader(opn_aps)
aps_ds = list(rd_aps)
aps = aps_ds[1:]
aps_head = aps_ds[0]

opn_ard = open('googleplaystore.csv',encoding='utf8')
rd_ard = reader(opn_ard)
ard_ds = list(rd_ard)
ard = ard_ds[1:]
ard_head = ard_ds[0]

In [8]:
def expl_data(dataset,start,end,row_and_columns=False):
    for row in dataset[start:end]:
        print(row)
        print('\n')
        
    if row_and_columns:
        print('Number of rows: ',len(dataset))
        print('Number of columns: ',len(dataset[0]))

In [9]:
print(ard_head, '\n')
expl_data(ard,0,3,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  10841
Number of columns:  13


In [10]:
print(aps_head, '\n')
expl_data(aps,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows:  7197
Number of columns:  16


Before beginning our analysis, we need to make sure the data we analyze is accurate, otherwise the results of our analysis will be wrong. This means that we need to:

- Detect inaccurate data, and correct or remove it.
- Detect duplicate data, and remove the duplicates.

Recall that at our company, we only build apps that are free to download and install, and that are directed toward an English-speaking audience. This means that we'll need to:

- Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播.
- Remove apps that aren't free.

In [11]:
#We start with detect inaccurate data or missing data
def lencheck(dataset,head):
    lst = []
    index = 0
    ln_head = len(head)
    for row in dataset:
        ln_row = len(row)
        if ln_row != ln_head:
            lst.append(index)
        else:
            index += 1
    return lst #row index which there are(is) some(a) missing data

In [12]:
lencheck(ard,ard_head) #there is a missing data at index 10472

[10472]

In [13]:
lencheck(aps,aps_head) #there is no missing data for this data set

[]

In [14]:
#remove the missing data row
print(len(ard))
del ard[10472] #don't repeat this more than once!
print(len(ard))

10841
10840


In [15]:
#Next, detect duplicate data, and remove the duplicates
#Make dict to know what are and how much are the duplicates
def dupl(dataset,ap_name_col):
    dupl_app = []
    uniq_app = []
    for app in dataset:
        name = app[ap_name_col]
        if name in uniq_app:
            dupl_app.append(name)
        else:
            uniq_app.append(name)
            
    dct = {}
    for row in dupl_app:
        if not row in dct:
            dct[row] = 2
        else:
            dct[row] += 1
    return dct

In [16]:
dupl(ard,0) #here is the frequency table of duplicate android app 

{'Quick PDF Scanner + OCR FREE': 3,
 'Box': 3,
 'Google My Business': 3,
 'ZOOM Cloud Meetings': 2,
 'join.me - Simple Meetings': 3,
 'Zenefits': 2,
 'Google Ads': 3,
 'Slack': 3,
 'FreshBooks Classic': 2,
 'Insightly CRM': 2,
 'QuickBooks Accounting: Invoicing & Expenses': 3,
 'HipChat - Chat Built for Teams': 2,
 'Xero Accounting Software': 2,
 'MailChimp - Email, Marketing Automation': 2,
 'Crew - Free Messaging and Scheduling': 2,
 'Asana: organize team projects': 2,
 'Google Analytics': 2,
 'AdWords Express': 2,
 'Accounting App - Zoho Books': 2,
 'Invoice & Time Tracking - Zoho': 2,
 'Invoice 2go — Professional Invoices and Estimates': 2,
 'SignEasy | Sign and Fill PDF and other Documents': 2,
 'Genius Scan - PDF Scanner': 2,
 'Tiny Scanner - PDF Scanner App': 2,
 'Fast Scanner : Free PDF Scan': 2,
 'Mobile Doc Scanner (MDScan) Lite': 2,
 'TurboScan: scan documents and receipts in PDF': 2,
 'Tiny Scanner Pro: PDF Doc Scan': 2,
 'Docs To Go™ Free Office Suite': 2,
 'OfficeSuite : 

In [17]:
for row in ard:
    if row[0] == '10 Best Foods for You':
        print(row) #for validating our 'dupl' function

['10 Best Foods for You', 'HEALTH_AND_FITNESS', '4.0', '2490', '3.8M', '500,000+', 'Free', '0', 'Everyone 10+', 'Health & Fitness', 'February 17, 2017', '1.9', '2.3.3 and up']
['10 Best Foods for You', 'HEALTH_AND_FITNESS', '4.0', '2490', '3.8M', '500,000+', 'Free', '0', 'Everyone 10+', 'Health & Fitness', 'February 17, 2017', '1.9', '2.3.3 and up']


In [18]:
dupl(aps,1) #here is the frequency table of duplicate iOS app

{'Mannequin Challenge': 2, 'VR Roller Coaster': 2}

In [19]:
#help us to know the index of duplicates
def ind_dupl(dataset,name,name_col): 
    index = 0
    for row in dataset:
        if row[name_col] == name:
            print(row, 'at index: ',index)
        index += 1
    return index

In [20]:
ind_dupl(aps,'Mannequin Challenge',1)

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1'] at index:  2948
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1'] at index:  4463


7197

In [21]:
aps[2948] #for validating our 'ind_dup' function

['1173990889',
 'Mannequin Challenge',
 '109705216',
 'USD',
 '0.0',
 '668',
 '87',
 '3.0',
 '3.0',
 '1.4',
 '9+',
 'Games',
 '37',
 '4',
 '1',
 '1']

**Removing Duplicates**

We will remove duplicates data in 'ard' and 'aps' data sets by following these criteria:
- Remove all duplicates data for the same apps except the apps that has maximum review (in 'ard', index col=3) or rating count total (in 'aps', index col=5) from others (the maximum review means the newest data).
- Remove all non-free apps, because we just care about the free apps for this analyses

In [22]:
dupl(aps,1)

{'Mannequin Challenge': 2, 'VR Roller Coaster': 2}

In [23]:
ind_dupl(aps,'Mannequin Challenge',1)

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1'] at index:  2948
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1'] at index:  4463


7197

In [24]:
del aps[4463] #rating count total = 105
len(aps) #now we have 7196 rows

7196

In [25]:
ind_dupl(aps,'VR Roller Coaster',1)

['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1'] at index:  4442
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1'] at index:  4830


7196

In [26]:
del aps[4830] #rating count total = 67
len(aps)

7195

Unfortunately, it will be tedious if we use the same method with deleting duplicates in 'aps' data set for cleaning duplicates in 'ard' data set because so many duplicates apps. So, we need to use another way like below.

In [27]:
reviews_max = {}

for row in ard:
    name = row[0]
    n_reviews = float(row[3]) #number of review
    
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [28]:
ard_ = []
already_added = []

for row in ard:
    name = row[0]
    n_reviews = float(row[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        ard_.append(row)
        already_added.append(name) # make sure this is inside the if block

In [29]:
len(ard_) #we did it!

9659

**Removing non-English Apps**

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) system. You can check `ord()` [built-in function](https://docs.python.org/3/library/functions.html#ord) to know the value of character.

In [30]:
#example of `ord()` function
print(ord('A')) #in range of 0-127
print(ord('爱')) #more than 127

65
29233


In [31]:
#function that takes string and check the english character
def chr_check(string):
    for char in string:
        if ord(char) > 127:
            return False
    return True

In [32]:
chr_check('Instagram')

True

In [33]:
chr_check('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [34]:
chr_check('Docs To Go™ Free Office Suite') #I think this is english app

False

In [35]:
chr_check('Instachat 😜') #I think this is english app

False

Change the function you created in the previous screen. If the input string has more than three characters that fall outside the ASCII range (0 - 127), then the function should return `False` (identify the string as non-English), otherwise it should return `True`.

In [36]:
def chr_check(string):
    count = 0
    for char in string:
        if ord(char) > 127:
            count += 1
    if count > 3:
        return False
    else:
        return True

In [37]:
chr_check('Docs To Go™ Free Office Suite')

True

In [38]:
chr_check('Instachat 😜')

True

In [39]:
chr_check('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

Use the new function to filter out non-English apps from both data sets. Loop through each data set. If an app name is identified as English, append the whole row to a separate list.

In [40]:
print(len(aps))
aps_en = []
for row in aps:
    name = row[1]
    if chr_check(name):
        aps_en.append(row)

print(len(aps_en))

7195
6181


In [41]:
print(len(ard_))
ard_en = []
for row in ard_:
    name = row[0]
    if chr_check(name):
        ard_en.append(row)

print(len(ard_en))

9659
9614


So far, we have deleted duplicates and removed non-english apps, next we will remove all non-free apps.

**Removing non-Free Apps**

- Loop through each data set to isolate the free apps in separate lists.
- Make sure you identify the columns describing the app price correctly.

In [42]:
#recall the header
print(ard_head)
print(ard_en[1], '\n')
print(aps_head)
print(aps_en[1])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


In [43]:
#Isolating free apps from 'ard_en'
#col_index = 7
print(len(ard_en))
ard_f = []
for row in ard_en:
    price = row[7]
    if price == '0':
        ard_f.append(row)
        
print(len(ard_f))

9614
8864


In [44]:
#Isolating free apps from 'aps_en'
#col_index = 4
print(len(aps_en))
aps_f = []
for row in aps_en:
    price = float(row[4])
    if price == 0.0:
        aps_f.append(row)
        
print(len(aps_f))

6181
3220


**Analyzing Most Common Genre**

Inspect both data sets and identify the columns you could use to generate frequency tables to find out what are the most common genres in each market.

In [45]:
#make frequency table of genres
def freq_tab(dataset,col_gen_index):
    dct = {}
    total = 0
    
    for row in dataset:
        total += 1
        genre = row[col_gen_index]
        if genre not in dct:
            dct[genre] = 1
        else:
            dct[genre] += 1
            
    tab_percent = {}
    for key in dct:
        percent = (dct[key] / total) * 100
        tab_percent[key] = percent
    
    return tab_percent

In [46]:
def disp_tab(dataset, col_gen_index):
    table = freq_tab(dataset, col_gen_index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [47]:
disp_tab(ard_f,-4) #Genres

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [48]:
disp_tab(ard_f,1) #Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Analyze the frequency table you generated for the `Category` and `Genres` column of the Google Play data set.

- What are the most common genres? **The most common genre is Tools, following by Entertainment, Education, Business, Productivity, and Lifestyle.**
- What other patterns do you see? **The most  common category is Family, then following by Game, Tools, Business, and Lifestyle.**
- Compare the patterns you see for the Google Play market with those you saw for the App Store market. **The most common genre/category in Google Play is not all about fun purpose but more practical purpose, but the most common apps in App Store is highly intens for fun purpose.**
- Can you recommend an app profile based on what you found so far? Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users?

In [49]:
disp_tab(aps_f,-5) #prime_genre

Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


Analyze the frequency table you generated for the `prime_genre` column of the App Store data set.

- What is the most common genre? What is the runner-up? **Most common genre is Games, followed by Entertaiment, Photo & Video, Education, and Social Media.**
- What other patterns do you see? **At the second five row (6-10 row), the favorite genres are Shopping, Utilities, Sports, Music, and Healh & fitness.**
- What is the general impression — are most of the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) or more for entertainment (games, photo and video, social networking, sports, music)? **The general impression of most apps designed for Entertainment Purpose.**
- Can you recommend an app profile for the App Store market based on this frequency table alone? If there's a large number of apps for a particular genre, does that also imply that apps of that genre generally have a large number of users? 

**Analyzing Popular Apps by Genre**

- One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. 
- For the 'ard_f' (Google Play) data set, we can find this information in the `Installs` column, but for the 'aps_f' (App Store) data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

In [51]:
#calculate the average number of user ratings per app genre on the App Store
gen_aps = freq_tab(aps_f, -5)

for gen in gen_aps:
    total = 0
    len_gen = 0
    for row in aps_f:
        name_gen = row[-5]
        if gen == name_gen:            
            n_ratings = float(row[5])
            total += n_ratings
            len_gen += 1
    avg_n_ratings = total / len_gen
    print(gen, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22812.92467948718
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average, navigation apps have the highest number of user reviews, then following by Reference, Social Networking, Music, and Weather. But this figure is heavily influenced by giant apps that drive the big skewness in each genre. So, we need to learn more if the giant apps are removed, which genre is really popular.

In [52]:
for row in aps_f:
    if row[-5] == 'Navigation':
        print(row[1], ':', row[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [54]:
for row in aps_f:
    if row[-5] == 'Reference':
        print(row[1], ':', row[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [53]:
for row in aps_f:
    if row[-5] == 'Social Networking':
        print(row[1], ':', row[5]) # print name and number of ratings

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [55]:
for row in aps_f:
    if row[-5] == 'Music':
        print(row[1], ':', row[5]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [56]:
for row in aps_f:
    if row[-5] == 'Weather':
        print(row[1], ':', row[5]) # print name and number of ratings

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

Next we will try to figure out the most populer app in Google Play

In [64]:
categ_ard = freq_tab(ard_f, 1)
category_ard = []
for categ in categ_ard:
    total = 0
    len_categ = 0
    for row in ard_f:
        categ_app = row[1]
        if categ_app == categ:            
            n_installs = row[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_categ += 1
    avg_n_installs = total / len_categ
    category_ard.append([avg_n_installs,categ])
#     print(categ, ':', avg_n_installs)
    
sort_categ_ard = sorted(category_ard,reverse=True)    

In [65]:
sort_categ_ard

[[38456119.167247385, 'COMMUNICATION'],
 [24727872.452830188, 'VIDEO_PLAYERS'],
 [23253652.127118643, 'SOCIAL'],
 [17840110.40229885, 'PHOTOGRAPHY'],
 [16787331.344927534, 'PRODUCTIVITY'],
 [15588015.603248259, 'GAME'],
 [13984077.710144928, 'TRAVEL_AND_LOCAL'],
 [11640705.88235294, 'ENTERTAINMENT'],
 [10801391.298666667, 'TOOLS'],
 [9549178.467741935, 'NEWS_AND_MAGAZINES'],
 [8767811.894736841, 'BOOKS_AND_REFERENCE'],
 [7036877.311557789, 'SHOPPING'],
 [5201482.6122448975, 'PERSONALIZATION'],
 [5074486.197183099, 'WEATHER'],
 [4188821.9853479853, 'HEALTH_AND_FITNESS'],
 [4056941.7741935486, 'MAPS_AND_NAVIGATION'],
 [3695641.8198090694, 'FAMILY'],
 [3638640.1428571427, 'SPORTS'],
 [1986335.0877192982, 'ART_AND_DESIGN'],
 [1924897.7363636363, 'FOOD_AND_DRINK'],
 [1833495.145631068, 'EDUCATION'],
 [1712290.1474201474, 'BUSINESS'],
 [1437816.2687861272, 'LIFESTYLE'],
 [1387692.475609756, 'FINANCE'],
 [1331540.5616438356, 'HOUSE_AND_HOME'],
 [854028.8303030303, 'DATING'],
 [817657.27272727

In [66]:
for row in ard_f:
    if row[1] == 'COMMUNICATION':
        print(row[0], ':', row[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [67]:
for row in ard_f:
    if row[1] == 'VIDEO_PLAYERS':
        print(row[0], ':', row[5])

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

In [68]:
for row in ard_f:
    if row[1] == 'SOCIAL':
        print(row[0], ':', row[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, sti

In [69]:
for row in ard_f:
    if row[1] == 'PHOTOGRAPHY':
        print(row[0], ':', row[5])

TouchNote: Cards & Gifts : 1,000,000+
FreePrints – Free Photos Delivered : 1,000,000+
Groovebook Photo Books & Gifts : 500,000+
Moony Lab - Print Photos, Books & Magnets ™ : 50,000+
LALALAB prints your photos, photobooks and magnets : 1,000,000+
Snapfish : 1,000,000+
Motorola Camera : 50,000,000+
HD Camera - Best Cam with filters & panorama : 5,000,000+
LightX Photo Editor & Photo Effects : 10,000,000+
Sweet Snap - live filter, Selfie photo edit : 10,000,000+
HD Camera - Quick Snap Photo & Video : 1,000,000+
B612 - Beauty & Filter Camera : 100,000,000+
Waterfall Photo Frames : 1,000,000+
Photo frame : 100,000+
Huji Cam : 5,000,000+
Unicorn Photo : 1,000,000+
HD Camera : 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera : 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor : 1,000,000+
Moto Photo Editor : 5,000,000+
InstaBeauty -Makeup Selfie Cam : 50,000,000+
Garden Photo Frames - Garden Photo Editor : 500,000+
Photo Frame : 10,000,000+
Selfie Camera - Photo Edito

In [70]:
for row in ard_f:
    if row[1] == 'PRODUCTIVITY':
        print(row[0], ':', row[5])

Microsoft Word : 500,000,000+
All-In-One Toolbox: Cleaner, Booster, App Manager : 10,000,000+
AVG Cleaner – Speed, Battery & Memory Booster : 10,000,000+
QR Scanner & Barcode Scanner 2018 : 10,000,000+
Chrome Beta : 10,000,000+
Microsoft Outlook : 100,000,000+
Google PDF Viewer : 10,000,000+
My Claro Peru : 5,000,000+
Power Booster - Junk Cleaner & CPU Cooler & Boost : 1,000,000+
Google Assistant : 10,000,000+
Microsoft OneDrive : 100,000,000+
Calculator - unit converter : 50,000,000+
Microsoft OneNote : 100,000,000+
Metro name iD : 10,000,000+
Google Keep : 100,000,000+
Archos File Manager : 5,000,000+
ES File Explorer File Manager : 100,000,000+
ASUS SuperNote : 10,000,000+
HTC File Manager : 10,000,000+
MyMTN : 1,000,000+
Dropbox : 500,000,000+
ASUS Quick Memo : 10,000,000+
HTC Calendar : 10,000,000+
Google Docs : 100,000,000+
ASUS Calling Screen : 10,000,000+
lifebox : 5,000,000+
Yandex.Disk : 5,000,000+
Content Transfer : 5,000,000+
HTC Mail : 10,000,000+
Advanced Task Killer : 50

In [71]:
for row in ard_f:
    if row[1] == 'GAME':
        print(row[0], ':', row[5])

Solitaire : 10,000,000+
Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Bubble Witch 3 Saga : 50,000,000+
Race the Traffic Moto : 10,000,000+
Marble - Temple Quest : 10,000,000+
Shooting King : 10,000,000+
Geometry Dash World : 10,000,000+
Jungle Marble Blast : 5,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Block Craft 3D: Building Simulator Games For Free : 50,000,000+
Farm Fruit Pop: Party Time : 1,000,000+
Love Balls : 50,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Paint Hit : 10,000,000+
Snake VS Block : 50,000,000+
Rolly Vortex : 10,000,000+
Woody Puzzle : 1,000,000+
Stack Jump : 10,000,000+
The Cube : 5,000,000+
Extreme Car Driving Simulator : 100,000,000+
Bricks n Balls : 1,000,000+
The Fish Master! : 1,000,000+
Color Road : 10,000,000+
Draw In : 10,000,000+
PLANK! : 500,000+
Looper! : 1,000,000+
Trivia Crack : 100,000,000+
Will it Crush? : 5,000,000+
Tomb of the Mask : 5,000,000+
Baseball Boy! : 10,000,000+
Hello Stars : 10,000,000+
Tank Stars : 1

San Andreas City : Auto Theft Car gangster : 100,000+
Miami Crime Vice Town : 10,000,000+
CJ Poker Odds Calculator : 50,000+
Gangster City Auto Theft : 50,000+
China Town Auto Theft 2 : 50,000+
Grand Gangsters 3D : 10,000,000+
Gang Wars of San Andreas : 1,000,000+
Sin City Hero : Crime Simulator of Vegas : 100,000+
Zombie Death Shooter : 1,000,000+
Project Grand Auto Town Sandbox Beta : 500,000+
CONTRACT KILLER: ZOMBIES (NR) : 5,000,000+
CKZ ORIGINS : 1,000,000+
211:CK Lite : 10+
CONTRACT KILLER: ZOMBIES : 5,000,000+
Can Knockdown 3 : 10,000,000+
Sky Streaker - Gumball : 5,000,000+
Just A Regular Arcade : 1,000,000+
Dreamland Arcade - Steven Universe : 500,000+
Adventure Time Run : 1,000,000+
Glitch Fixers: Powerpuff Girls : 5,000,000+
StirFry Stunts - We Bare Bears : 10,000,000+
Angelo Rules - Crazy day : 1,000,000+
Multicraft Miner Exploration : 1,000,000+
Alex & Co Quiz : 1,000+
Them Bombs: co-op board game play with 2-4 friends : 100,000+
Chinese Chess ( Xiangqi Free ) : 50,000+
Ch

At this point, I could recommend developers (both Google Play and Apple Store) to make new apps with profile like these:

- Fun
- Practical
- Related with certain book