# Project to extract the data from Google and iOS app store dataset to find the behaviour of users

The aim of this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

In [6]:
#import file reader from csv module
from csv import reader

#read Android store app data set
opened_file = open('googleplaystore.csv' , encoding='utf8')
read_file = reader(opened_file)
android_data = list(read_file)

#extract header and data from play date set
android_apps_header = android_data[0]
android_apps_data = android_data[1:]

#read iOS store app data set
opened_file = open('AppleStore.csv' , encoding='utf8')
read_file = reader(opened_file)
ios_data = list(read_file)

#extract header and data from play date set
ios_apps_header = ios_data[0]
ios_apps_data = ios_data[1:]

#print number of rows and column of data set
print('total number of rows(exclude header row)', len(android_apps_data))
print('total number of columns', len(android_apps_header))

def explore_data(data_set, start, end, row_and_columns = False):
    for data in data_set[start:end]:
        print(data);
        print('\n')
        
    if row_and_columns:
        print('Number of rows:', len(data_set))
        print('Number of columns:', len(data_set[0]))

#print header of Andriod data set
print(android_apps_header)

#explore the android data set, print limited(4 row)
explore_data(android_apps_data, 0, 4)

#print header of ios data set 
print(ios_apps_header)

#explore the ios data set, print limited(4 row)
explore_data(ios_apps_data, 0, 4)

total number of rows(exclude header row) 10841
total number of columns 13
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['id', 'track_name', 'size_bytes', 'cur

# Find row(s) in which any column is missing

To clean the data we need to find the row(s) in which any column data is missing and to clean the data we need to delele these row.

First we find the length of header of data so that we can compare the each row length with header length to find is any column is missing or not in any row of data set.

In [7]:
#find row(s) in which any column is/are missing

#data set header length
header_len = len(android_apps_header)

#counter to count row(s) of missing column(s)
count = 0

#find and delete row in which any column missing
for row in android_apps_data:
    row_len = len(row)
    if row_len != header_len:
        count += 1
        del android_apps_data[android_apps_data.index(row)]

print('missing column(s) total row(s)', count)

missing column(s) total row(s) 1


# Find the duplicate rows

To clean the data we need to find the duplicate rows and delele these row.

To find the duplicate row in data set we itrate through the data set. In each itration we store the app name in seprate list and check is app name is already present in list or not. If app name present in list than this is duplicate row else unique row.


To delete to duplicate row, we can check the rating count, highest rating count row is the latest row so we can delete the remaining row except the highest rating count row.

In [8]:
#unique app dictnory with hightest rating count
reviews_max = {}

#find duplicate rows
for row in android_apps_data:
    name = row[0]
    n_reviews = float(row[3])

    # if app name in unique dictnory than this is duplicate row so add in duplicate list
    if name in reviews_max:
        if reviews_max.get(name) < n_reviews:
            reviews_max[name] = n_reviews
    else:
        reviews_max[name] = n_reviews


#data set after cleaning the data(removing the duplicate data)
android_clean = []

#app name after cleaning the data(removing the duplicate data)
already_added = []
for row in android_apps_data:
    name = row[0]
    n_reviews = float(row[3])
    if (reviews_max.get(name) == n_reviews) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)


#remaining rows in cleaned android data set
explore_data(android_clean, 0, 3, True)
print('\n')

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13




# Validate the App name

We are working for English laguage app so filter the English name app. the english alphabets, digits and special symbols ascii value range from 0 to 127 so if any character ascii value is out of range means that is not valid character.

is_valid_name function validate the app name. If app name contain more than 3 character out of rage than return False else True.

In [10]:
def is_valid_name(name):
    count = 0
    for char in name:
        ascii_val = ord(char)
        if ascii_val < 0 or ascii_val > 127:
            count += 1
            if count > 3:
                return False
    return True

#new android app list of English name app
android_clean_eng = []
for row in android_clean:
    if is_valid_name(row[0]):
        android_clean_eng.append(row)
        
#new ios app list of English name app
ios_clean_eng = []
for row in ios_apps_data:
    if is_valid_name(row[1]):
        ios_clean_eng.append(row)
        


#remaining rows in android data set
explore_data(android_clean_eng, 0, 3, True)
print('\n')


#remaining rows in ios data set
explore_data(ios_clean_eng, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

# Filter free app

we are working for free app so we need to filter the free app from the given data set

In [11]:
def make_price(price):
    price = price.strip()
    # set index as default value 0
    index = 0
    is_special_symbol = False
    for char in price:
        if char == '$':
            is_special_symbol = True
            break
        index += 1
       
    #if $ find in price
    if is_special_symbol:
        if index == 0:
            price = price[1:]
        else:
            price = price[:-1]
            
    return price



#free clean android app list
android_final = []

for row in android_clean_eng:
    price = float(make_price(row[7]))
    if price == float(0):
        android_final.append(row)
        

#free clean ios app list
ios_final = []

for row in ios_clean_eng:
    price = float(make_price(row[4]))
    if price == float(0):
        ios_final.append(row)


print('Android data set size ', len(android_final))
print('iOS data set size ',len(ios_final))

Android data set size  8864
iOS data set size  3222


# Find the trending Genres

To find the trending Genres, we'll need to build a frequency table for the prime_genre column of the App Store data set, and for the Genres and Category columns of the Google Play data set

We'll build two functions we can use to analyze the frequency tables:

* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order

In [12]:
def freq_table(dataset, index):
    #frequency table dictnory
    frequency_table = {}
    
    #count all rows
    count = 0
    for row in dataset:
        count += 1
        name = row[index]
        if name in frequency_table:
            frequency_table[name] += 1
        else:
            frequency_table[name] = 1
    
    for freq_index in frequency_table:
        frequency = frequency_table[freq_index]
        frequency_table[freq_index] = (frequency / count) * 100
    
    return frequency_table



def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

# Display the frequency table of prime_genre column of iOS data set

In [15]:
#display frequency table of prime_genre column of ios data set
display_table(ios_final, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


# Now we analyze the frequency table of prime_genre column of the App Store data set

* The most common genre: to find most common genre, we find the average number of the total user rating count because in iOS store data set, we have only number of user rating not number of installs.

In [16]:
#ios prime_genre column frequency table
ios_prime_genre = freq_table(ios_final, 11)


for genre in ios_prime_genre:
    genre_total = 0
    genre_count = 0
    for row in ios_final:
        app_genre = row[11]
        app_total_count = float(row[5])
        if app_genre == genre:
            genre_total += app_total_count
            genre_count += 1
    genre_avg = genre_total / genre_count
    print(genre, ":", genre_avg)


Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


* So the 'Navigation' genre app is the hightest avg user review count, so we need to check the app of 'Navigation' genre.

In [17]:
# print Navigation genre app with user rating total
app_count = 0
for app in ios_final:
    if app[11] == 'Navigation':
        app_count += 1
        print(app[1], ':', app[5])

print('\nTotal number of Navigation app', app_count)

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5

Total number of Navigation app 6


* In the 'Navigation' genre, there are only 6 apps from which 'Waze - GPS Navigation' and 'Google Maps' are most reviewed apps and if we remove these apps from this genre than the review count drastically decreased so the 'Naviagtion' genre is not best choice to develop new app.
* 'Social Networking' genre also have less margin in user review count so we need to check the app in 'Social Networking' genre.

In [19]:
# print Social Networking genre app with user rating total
app_count = 0
for app in ios_final:
    if app[11] == 'Social Networking':
        app_count += 1
        print(app[1], ':', app[5])

print('\nTotal number of Social Networking app', app_count)

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

* In the the 'Social Networking' genre, there are 106 apps from which Facebook, Pinterest, Skype for iPhone, Messenger, WhatsApp Messenger are most reviewed apps and if we remove these apps from this genre than the review count significant reduce.
* In the 'Social Networking' genre there are two issues: 1) Too much apps as compared to 'Naviagtion' genre. 2) Dominant app are form world best tech companies so 'Social Networking' genre is not best choice to develop new app.
* 'Reference' genre also have less margin in user review count so we need to check the apps in 'Reference' genre.

In [20]:
# print Reference genre app with user rating total
app_count = 0
for app in ios_final:
    if app[11] == 'Reference':
        app_count += 1
        print(app[1], ':', app[5])
    
print('\nTotal number of Reference app', app_count)

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0

Total number of Reference app 18


* In 'Reference' genre only 2 dominant app(Bible and Dictionary.com) and if we remove these apps than the review count significant decreased so we can develop new app in this genre because in this genre there are many apps but only 2 apps are most reviewed while in case of 'Navigation' genre there are total 6 apps by which we can assume that there may not be market in the 'Navigation' genre.

* In 'Reference' genre we can add new features in the app like quote of the day, most popular books, suggest book on the basis of word searched by user etc. By adding these features, we can cover 'Book' genre also.

* So we can develop new app in 'Reference' genre.

* 'Weather' and 'Music' also have a good number of avgerage review so we can further analyse these genre also.


In [26]:
# print Weather genre app with user rating total
app_count = 0
for app in ios_final:
    if app[11] == 'Weather':
        app_count += 1
        print(app[1], ':', app[5])
    
print('\nTotal number of Weather app', app_count)

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

* In 'Weather' genre only 2 dominant apps(The Weather Channel: Forecast, Radar & Alerts and The Weather Channel App for iPad – best local forecast, radar map, and storm tracking) and some less dominant apps(WeatherBug - Local Weather Radar Maps Alerts, MyRadar NOAA Weather Radar Forecast, AccuWeather - Weather for Life and Yahoo Weather)

* If we remove 2 dominant apps still we have 4 less dominant app so 'Weather' is not as good genre as 'Reference'.

* Further we analyse the 'Music' genre.

In [27]:
# print Music genre app with user rating total
app_count = 0
for app in ios_final:
    if app[11] == 'Music':
        app_count += 1
        print(app[1], ':', app[5])
    
print('\nTotal number of Music app', app_count)

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

* In 'Music' genre only 2 dominant apps(Pandora - Music & Radio and Spotify Music) and some less dominant apps(Shazam - Discover music artists videos & lyrics, iHeartRadio – Free Music & Radio Stations, SoundCloud - Music & Audio, Magic Piano by Smule, Smule Sing!, TuneIn Radio - MLB NBA Audiobooks Podcasts Music and Amazon Music)

* If we remove 2 dominant apps still we have 6 less dominant app.

* Pandora is dominant app, Amazon Music is world top company app and there are too much apps in 'Music' genre compaire to 'Reference' genre so 'Reference' is best genre to develop app.

# Display the frequency table of Genres column of Android data set

In [29]:
#display frequency table of Genres column of android data set
display_table(android_final, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

# Display the frequency table of Category column of Android data set

In [30]:
#display frequency table of Category column of android data set
display_table(android_final, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

# Now we analyze the frequency table of Genres column of the Android Play Store data set

* The most common genre: we can find most common genre by number of installs. While there is open end in install column ex(100+, 1000+, 10000+ etc)

In [21]:
#print the install column frequncy table
display_table(android_final, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


* For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes.
* We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.
* In the number of installs their is + sign which we need to remove to find the avg category wise installation.

In [22]:
#find category frequncy table
category_freq = freq_table(android_final, 1)

for category in category_freq:
    install_total = 0
    install_count = 0
    for app in android_final:
        app_category = app[1]
        if app_category == category:
            install = app[5]
            install = install.replace(',', '')
            install = install.replace('+', '')
            install_total += float(install)
            install_count += 1
    avg_install = install_total / install_count
    print(category, ':', avg_install)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

* So the COMMUNICATION category app is the hightest average installs. We can analyze the apps of this category.

In [23]:
#print COMMUNICATION apps with number of installs
app_count = 0
for app in android_final:
    if app[1] == 'COMMUNICATION':
        app_count += 1
        print(app[0], ':', app[5])
        
print('\nTotal number of COMMUNICATION app', app_count)

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

* According to the data set result COMMUNICATION category app is the hightest average installs because of some apps like: WhatsApp Messenger, imo beta free, skype, GO SMS Pro, Messenger for SMS etc.
* WhatsApp, skype, Messanger are the app of world best tech compnay so we can not compete with these companies.
* Too many apps in COMMUNICATION category.
* So we need to check 2nd hightest installed category.
* The 2nd hightest installed category is VIDEO_PLAYERS so will analyze this category apps.

In [24]:
# print VIDEO_PLAYERS apps with number of installs
app_count = 0
for app in android_final:
    if app[1] == 'VIDEO_PLAYERS':
        app_count += 1
        print(app[0], ':', app[5])
        
print('\nTotal number of VIDEO_PLAYERS app', app_count)

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

* Same with VIDEO_PLAYERS category app. Due to some apps the VIDEO_PLAYERS is the 2nd hightest average installs like: YouTube, Google Play Movies, Dubsmash, VivaVideo etc.
* YouTube and Google Play Movies are the app of world best tech compnay so we can not compete with these companies.
* Too many apps in VIDEO_PLAYERS category.
* We have checked further category app and we find the this trend that because of some app category have high installation number.
* I have find WEATHER category where less number of app(71) compaire to COMMUNICATION and VIDEO_PLAYERS categories and very less number of dominating apps so we can develop new app in WEATHER category.

In [25]:
#print WEATHER apps with number of installs
app_count = 0
for app in android_final:
    if app[1] == 'WEATHER':
        app_count += 1
        print(app[0], ':', app[5])
        
print('\nTotal number of WEATHER app', app_count)

The Weather Channel: Rain Forecast & Storm Alerts : 50,000,000+
Weather forecast : 1,000,000+
AccuWeather: Daily Forecast & Live Weather Reports : 50,000,000+
Live Weather Pro : 10,000+
Weather by WeatherBug: Forecast, Radar & Alerts : 10,000,000+
weather - weather forecast : 1,000,000+
MyRadar NOAA Weather Radar : 10,000,000+
SMHI Weather : 1,000,000+
Free live weather on screen : 1,000,000+
Weather Radar Widget : 1,000,000+
Weather –Simple weather forecast : 10,000,000+
Weather Crave : 5,000,000+
Klara weather : 500,000+
Yahoo Weather : 10,000,000+
Real time Weather Forecast : 1,000,000+
METEO FRANCE : 5,000,000+
APE Weather ( Live Forecast) : 5,000,000+
Live Weather & Daily Local Weather Forecast : 1,000,000+
Weather : 10,000,000+
Rainfall radar - weather : 5,000,000+
Yahoo! Weather for SH Forecast for understanding the approach of rain clouds Free : 1,000,000+
The Weather Network : 5,000,000+
Klart.se - Sweden's best weather : 1,000,000+
GO Weather - Widget, Theme, Wallpaper, Effic