# Project Breakdown
- Working as a data analyst for a company that builds mobile apps
- Only builds free apps and revenue comes from in-app ads
- Thus revenue = number of users

## Goal is to analyse data to see what kind of apps attracts users

In [1]:
#open datasets in python
def open_dataset(dataset, header = True):
    open_file = open(dataset)
    from csv import reader
    read_file = reader(open_file)
    dataset = list(read_file)
    if header:
        data = dataset[1:]
        header = dataset[0]
    else:
        data = dataset
    return data, header

In [2]:
AppleStore_sub = open_dataset('AppleStore.csv')
AppleStore_header = AppleStore_sub[1]

AppleStore = AppleStore_sub[0]

GooglePlayStore_sub = open_dataset('googleplaystore.csv')
GooglePlayStore_header = GooglePlayStore_sub[1]

GooglePlayStore = GooglePlayStore_sub[0]

In [3]:
#given by dataquest.io
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
print(GooglePlayStore_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [5]:
explore_data(GooglePlayStore, 0, 5, rows_and_columns = True)

['Farm Heroes Saga', 'GAME', '4.4', '7614415', '70M', '100,000,000+', 'Free', '0', 'Everyone', 'Casual', 'July 26, 2018', '5.1.8', '2.3 and up']


['Plants vs. Zombies FREE', 'GAME', '4.4', '4067651', '69M', '100,000,000+', 'Free', '0', 'Everyone 10+', 'Strategy', 'July 6, 2018', '2.2.00', '4.1 and up']


['Word Search', 'GAME', '4.7', '295576', '3.9M', '10,000,000+', 'Free', '0', 'Everyone', 'Word', 'June 6, 2018', '1.1.3', '3.0 and up']


['Block Puzzle', 'GAME', '4.6', '59907', '7.8M', '5,000,000+', 'Free', '0', 'Everyone', 'Puzzle', 'March 6, 2018', '2.9', '2.3 and up']


['Super Jim Jump - pixel 3d', 'GAME', '4.5', '10460', '18M', '1,000,000+', 'Free', '0', 'Everyone', 'Arcade', 'July 11, 2018', '2.2.3181', '4.0 and up']


['Pou', 'GAME', '4.3', '10486018', '24M', '500,000,000+', 'Free', '0', 'Everyone', 'Casual', 'May 25, 2018', '1.4.77', '4.0 and up']


['Temple Run 2', 'GAME', '4.3', '8119151', '62M', '500,000,000+', 'Free', '0', 'Everyone', 'Action', 'July 5, 2018', '1.49.1', 

In [6]:
print(AppleStore_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [7]:
explore_data(AppleStore, 0, 5, rows_and_columns = True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


In [8]:
'''Check whether there are missing data, dataset_1 is the main dataset 
whilst data_set_2 is the header dataset
'''
def missing_data(dataset_1, dataset_2):
    count = -1
    for i in dataset_1:
        count += 1
        if len(i) != len(dataset_2):
            print("Index: "+str(count), dataset_1[count])

In [9]:
missing_data(GooglePlayStore, GooglePlayStore_header)

Index: 10472 ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [10]:
missing_data(AppleStore, AppleStore_header)

So there is one data entry with at least more than 1 data points in Google Play Store
Upon inspection this is the Category of the app Life Made WI-Fi Touchscreen Photo Frame at index 10472 of the list
There is none in Apple Store

In [11]:
del GooglePlayStore[10472]

In [12]:
#check if data entry is deleted
missing_data(GooglePlayStore, GooglePlayStore_header)

According to the dataset discussion page, Apple Store dataset has duplicate entries.
I will try to find the duplicate entries and attempt to remove them.

In [13]:
def duplicates(dataset, index): 
    unique_entries = []
    duplicate_entries = []
    count = -1
    for i in dataset:
        count += 1
        app_name = i[index]
        if app_name in unique_entries:
            duplicate_entries.append([app_name, count])
        else:
            unique_entries.append(app_name)
    return duplicate_entries

In [14]:
print("There are " + str(len(duplicates(AppleStore, 1))) + " duplicates")
print(duplicates(AppleStore, 1))

There are 2 duplicates
[['Mannequin Challenge', 4463], ['VR Roller Coaster', 4831]]


In [15]:
for app in AppleStore:
    name = app[1]
    if (name == "Mannequin Challenge") or (name =="VR Roller Coaster"):
        print(app)

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


Upon inspection AppleStore dataset seems to apps with huge discrepancy between the duplicates so it is more likely that these are different apps with the same name rather than duplicates of the same app.

In [16]:
print("There are " + str(len(duplicates(GooglePlayStore, 0))) + " duplicates")
print(duplicates(GooglePlayStore, 0))

There are 1181 duplicates
[['Quick PDF Scanner + OCR FREE', 229], ['Box', 236], ['Google My Business', 239], ['ZOOM Cloud Meetings', 256], ['join.me - Simple Meetings', 261], ['Box', 265], ['Zenefits', 266], ['Google Ads', 267], ['Google My Business', 268], ['Slack', 269], ['FreshBooks Classic', 270], ['Insightly CRM', 271], ['QuickBooks Accounting: Invoicing & Expenses', 272], ['HipChat - Chat Built for Teams', 273], ['Xero Accounting Software', 274], ['MailChimp - Email, Marketing Automation', 275], ['Crew - Free Messaging and Scheduling', 276], ['Asana: organize team projects', 277], ['Google Analytics', 278], ['AdWords Express', 279], ['Accounting App - Zoho Books', 280], ['Invoice & Time Tracking - Zoho', 281], ['join.me - Simple Meetings', 282], ['Invoice 2go — Professional Invoices and Estimates', 283], ['SignEasy | Sign and Fill PDF and other Documents', 284], ['Quick PDF Scanner + OCR FREE', 285], ['Genius Scan - PDF Scanner', 286], ['Tiny Scanner - PDF Scanner App', 287], ['F

According to the results above there are several duplicate entries in both datasets

In [17]:
for app in GooglePlayStore:
    name = app[0]
    if name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [18]:
for app in GooglePlayStore:
    name = app[0]
    if name == "FOX Sports: Live Streaming, Scores & News":
        print(app)

['FOX Sports: Live Streaming, Scores & News', 'SPORTS', '4.0', '28895', '82M', '5,000,000+', 'Free', '0', 'Teen', 'Sports', 'August 2, 2018', '4.4.15', '5.0 and up']
['FOX Sports: Live Streaming, Scores & News', 'SPORTS', '4.0', '28895', '82M', '5,000,000+', 'Free', '0', 'Teen', 'Sports', 'August 2, 2018', '4.4.15', '5.0 and up']


Now we need to be able to remove duplicates however must be done in a methodological manner

The most logical manner is to use the entry with the most amount of review as this is most likely to be the most recent snapshot of the app

It might be useful to look at the file size as well (whether they are drastically different in which case they are likely to be two seperate apps rather than snapshots of the same app)

In [19]:
def remove_dup(dataset, name_index, n_reviews_index):
    reviews_max = {}
    for i in dataset:
        name = i[name_index]
        n_reviews = float(i[n_reviews_index])
        if (name in reviews_max) and (n_reviews > reviews_max[name]):
            reviews_max[name] = n_reviews
        if name not in reviews_max:
            reviews_max[name] = n_reviews
    return reviews_max

In [20]:
GooglePlayStore_dup = remove_dup(GooglePlayStore, 0, 3)
print(len(GooglePlayStore_dup))

9659


In [21]:
def clean_data(dataset,dup_dataset, name_index, n_reviews_index):
    clean_dataset = []
    already_added = []
    for i in dataset:
        name = i[name_index]
        n_reviews = float(i[n_reviews_index])
        if (dup_dataset[name] == n_reviews) and (name not in already_added):
            clean_dataset.append(i)
            already_added.append(name)
    return clean_dataset
'''
we added an already added list because there are some entries in the 
dataset where the same name has the max number of review several times so 
to avoid having the same entry we keep a seperate list to keep track of 
what has been added to the list
'''

'\nwe added an already added list because there are some entries in the \ndataset where the same name has the max number of review several times so \nto avoid having the same entry we keep a seperate list to keep track of \nwhat has been added to the list\n'

In [22]:
GooglePlayStore_clean = clean_data(GooglePlayStore, GooglePlayStore_dup, 0, 3)
print(len(GooglePlayStore_clean))

9659


In [23]:
def english_only(string):
    for i in string:
        if ord(i) > 127:
            return False
    return True
        
print(english_only('Instagram'))
print(english_only('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_only('Docs To Go™ Free Office Suite'))
print(english_only('Instachat 😜'))

True
False
False
False


In [24]:
def english_only2(string):
    number_of_greater = 0
    for i in string:
        if ord(i) > 127:
            number_of_greater += 1
    if number_of_greater > 3:
        return False
    else:
        return True

print(english_only2('Instagram'))
print(english_only2('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_only2('Docs To Go™ Free Office Suite'))
print(english_only2('Instachat 😜'))

True
False
True
True


In [25]:
def remove_non_english(dataset, name_index):
    english_data = []
    for i in dataset:
        name = i[name_index]
        number_of_greater = 0
        for character in name:
            if ord(character) > 127:
                number_of_greater +=1
        if number_of_greater <=3:
            english_data.append(i)
    return english_data

In [26]:
GooglePlayStore_english = remove_non_english(GooglePlayStore_clean, 0)
print(len(GooglePlayStore_english))

AppleStore_english = remove_non_english(AppleStore, 1)
print(len(AppleStore_english))

9614
6183


Now we need to isolate only the free apps as this is the target audience for the app producer

In [27]:
def free_only(dataset, price_index):
    free_data = []
    for i in dataset:
        price = i[price_index]
        if price == '0' or price == '0.0':
            free_data.append(i)
    return free_data

In [28]:
GooglePlayStore_free = free_only(GooglePlayStore_english, 7)
print(len(GooglePlayStore_free))

AppleStore_free = free_only(AppleStore_english, 4)
print(len(AppleStore_free))

8864
3222


So above `GooglePlayStore_free` and `AppleStore_free` are the final two final clean datasets

Now we can use these to determine the kinds of apps that attract the most users as revenue is correlated with the number of users

Firstly we need to see what the common genres are in each market

The validation strategy used is to first develop an android app and then continue developing it if the app becomes popular. If the app is profitable for several months, then an iOS version can be made.

This strategy reduces risk because the developer can put little resources into developing an app and testing it out. Since Apple has a more restrictive requirement, only an android version is produced as Play Store is more lenient in app quality. 

Developer's end goal for an app is to be successful in both app stores in order to reach its largest market.

In [29]:
def frequency(dataset, genre_index):
    frequency_dict = {}
    for i in dataset:
        genre = i[genre_index]
        if genre in frequency_dict:
            frequency_dict[genre] += 1
        else:
            frequency_dict[genre] = 1
    for i in frequency_dict:
        frequency_dict[i] = (frequency_dict[i]/len(dataset))*100
    return frequency_dict

In [30]:
def display_table(dataset, index):
    table = frequency(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [31]:
googleplay_category = display_table(GooglePlayStore_free, 1)
googleplay_freq = frequency(GooglePlayStore_free, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [32]:
googleplay_genre = display_table(GooglePlayStore_free, 9)
googleplay_genre_freq = frequency(GooglePlayStore_free, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

In [33]:
applestore_cat = display_table(AppleStore_free, 11)
applestore_freq = frequency(AppleStore_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Looking at the frequency tables, the most common apps on the play store are listed under Family. Top 5 are Family, Games, Tools, Business and Lifestyle. On the play store majority of the top apps seem to be real case use orientated (referring to Family, Tools, Business and Lifestyle). However it is important to note that Games is the second largest app category in the play store.

Frequency table under genre seems to be extremely granular so there seems to be no meaningful takeaway.

On apple store, most popular genre is Games. Top 5 being Games, Entertainment, Photo & Video, Education and Social Networking. Majority of the top categories are for entertainment purposes with the exception of education. So apps which provide short term satisfaction seems to be popular amongst developers. However this may not mean that these apps are popular amongst consumers. It may indicate the popularity of the apps since more developers may imply that demand for these kinds of apps is larger.

In both stores, Games is very popular amongst developers.

In [52]:
def freq_number(genre_dict, dataset, genre_index, n_review_index):
    n_review_freq = {}
    for i in genre_dict:
        total = 0
        len_genre = 0
        for x in dataset:
            genre = x[genre_index]
            n_review = x[n_review_index].replace("+","")
            n_review = float(n_review.replace(",",""))
            if i == genre:
                total += n_review
                len_genre += 1
        n_review_freq[i] = (total/len_genre)
    return n_review_freq

In [53]:
googleplaystore_n_review_freq = freq_number(googleplay_freq, GooglePlayStore_free, 1, 3)
print(googleplaystore_n_review_freq)

{'ART_AND_DESIGN': 24699.42105263158, 'AUTO_AND_VEHICLES': 14140.280487804877, 'BEAUTY': 7476.226415094339, 'BOOKS_AND_REFERENCE': 87995.06842105264, 'BUSINESS': 24239.727272727272, 'COMICS': 42585.61818181818, 'COMMUNICATION': 995608.4634146341, 'DATING': 21953.272727272728, 'EDUCATION': 56293.09708737864, 'ENTERTAINMENT': 301752.24705882353, 'EVENTS': 2555.84126984127, 'FINANCE': 38535.8993902439, 'FOOD_AND_DRINK': 57478.79090909091, 'HEALTH_AND_FITNESS': 78094.9706959707, 'HOUSE_AND_HOME': 26435.465753424658, 'LIBRARIES_AND_DEMO': 10925.807228915663, 'LIFESTYLE': 33921.82369942196, 'GAME': 683523.8445475638, 'FAMILY': 113142.99821002387, 'MEDICAL': 3730.1533546325877, 'SOCIAL': 965830.9872881356, 'SHOPPING': 223887.34673366835, 'PHOTOGRAPHY': 404081.3754789272, 'SPORTS': 116938.6146179402, 'TRAVEL_AND_LOCAL': 129484.42512077295, 'TOOLS': 305732.8973333333, 'PERSONALIZATION': 181122.31632653062, 'PRODUCTIVITY': 160634.5420289855, 'PARENTING': 16378.706896551725, 'WEATHER': 171250.774

In [54]:
googleplaystore_installs_freq = freq_number(googleplay_freq, GooglePlayStore_free, 1, 5)
print(googleplaystore_installs_freq)

{'ART_AND_DESIGN': 1986335.0877192982, 'AUTO_AND_VEHICLES': 647317.8170731707, 'BEAUTY': 513151.88679245283, 'BOOKS_AND_REFERENCE': 8767811.894736841, 'BUSINESS': 1712290.1474201474, 'COMICS': 817657.2727272727, 'COMMUNICATION': 38456119.167247385, 'DATING': 854028.8303030303, 'EDUCATION': 1833495.145631068, 'ENTERTAINMENT': 11640705.88235294, 'EVENTS': 253542.22222222222, 'FINANCE': 1387692.475609756, 'FOOD_AND_DRINK': 1924897.7363636363, 'HEALTH_AND_FITNESS': 4188821.9853479853, 'HOUSE_AND_HOME': 1331540.5616438356, 'LIBRARIES_AND_DEMO': 638503.734939759, 'LIFESTYLE': 1437816.2687861272, 'GAME': 15588015.603248259, 'FAMILY': 3695641.8198090694, 'MEDICAL': 120550.61980830671, 'SOCIAL': 23253652.127118643, 'SHOPPING': 7036877.311557789, 'PHOTOGRAPHY': 17840110.40229885, 'SPORTS': 3638640.1428571427, 'TRAVEL_AND_LOCAL': 13984077.710144928, 'TOOLS': 10801391.298666667, 'PERSONALIZATION': 5201482.6122448975, 'PRODUCTIVITY': 16787331.344927534, 'PARENTING': 542603.6206896552, 'WEATHER': 50

In [55]:
applestore_n_review_freq = freq_number(applestore_freq, AppleStore_free, 11, 5)
print(applestore_n_review_freq)

{'Social Networking': 71548.34905660378, 'Photo & Video': 28441.54375, 'Games': 22788.6696905016, 'Music': 57326.530303030304, 'Reference': 74942.11111111111, 'Health & Fitness': 23298.015384615384, 'Weather': 52279.892857142855, 'Utilities': 18684.456790123455, 'Travel': 28243.8, 'Shopping': 26919.690476190477, 'News': 21248.023255813954, 'Navigation': 86090.33333333333, 'Lifestyle': 16485.764705882353, 'Entertainment': 14029.830708661417, 'Food & Drink': 33333.92307692308, 'Sports': 23008.898550724636, 'Book': 39758.5, 'Finance': 31467.944444444445, 'Education': 7003.983050847458, 'Productivity': 21028.410714285714, 'Business': 7491.117647058823, 'Catalogs': 4004.0, 'Medical': 612.0}


In [56]:
def display_table2(freq_table):
    table = freq_table
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [57]:
print(display_table2(googleplaystore_n_review_freq))

COMMUNICATION : 995608.4634146341
SOCIAL : 965830.9872881356
GAME : 683523.8445475638
VIDEO_PLAYERS : 425350.08176100627
PHOTOGRAPHY : 404081.3754789272
TOOLS : 305732.8973333333
ENTERTAINMENT : 301752.24705882353
SHOPPING : 223887.34673366835
PERSONALIZATION : 181122.31632653062
WEATHER : 171250.77464788733
PRODUCTIVITY : 160634.5420289855
MAPS_AND_NAVIGATION : 142860.0483870968
TRAVEL_AND_LOCAL : 129484.42512077295
SPORTS : 116938.6146179402
FAMILY : 113142.99821002387
NEWS_AND_MAGAZINES : 93088.03225806452
BOOKS_AND_REFERENCE : 87995.06842105264
HEALTH_AND_FITNESS : 78094.9706959707
FOOD_AND_DRINK : 57478.79090909091
EDUCATION : 56293.09708737864
COMICS : 42585.61818181818
FINANCE : 38535.8993902439
LIFESTYLE : 33921.82369942196
HOUSE_AND_HOME : 26435.465753424658
ART_AND_DESIGN : 24699.42105263158
BUSINESS : 24239.727272727272
DATING : 21953.272727272728
PARENTING : 16378.706896551725
AUTO_AND_VEHICLES : 14140.280487804877
LIBRARIES_AND_DEMO : 10925.807228915663
BEAUTY : 7476.22641

In [58]:
print(display_table2(googleplaystore_installs_freq))

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

In [59]:
print(display_table2(applestore_n_review_freq))

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0
None


In [64]:
def search_data(database, category, cat_index, name_index, freq_index):
    for i in database:
        if i[cat_index] == category:
            print(i[name_index], ':', i[freq_index])

In [65]:
search_data(AppleStore_free, 'Navigation', 11, 1, 5)
'''
n if very small and reviews are dominated by two large navigation apps 
(Waze and Google Maps). This skews the average to be very large. The other
few remaining apps do not seem to be successful.
'''

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [66]:
search_data(AppleStore_free, 'Reference', 11, 1, 5)
'''
Runs into similar problems as Navigation however there is an interesting 
point that the successful book apps tend to be based around popular books
for example the bible app or the quran app. Minecraft book apps being very 
popular because the game itself being very popular
'''

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [67]:
search_data(AppleStore_free, 'Social Networking', 11, 1, 5)
'''
This category seems to be very promising as there are some very large apps
in the category which heavily skews the average, there is an abundance of
more successful smaller social networking app. An interest thought is that
all the apps which provide an extention to existing social media apps
tend to do very well in the App Store.
Providing apps which provide extra functionality or information for users
can be successful especially when the social media site in question is 
popular itself for example Instagram analytic tracker. 
Since social media is something that is checked often, it may imply that 
these apps will also be used fairly frequently thus giving opportunity to 
earn from ad revenue or premium features.
'''

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [75]:
search_data(AppleStore_free, 'Photo & Video', 11, 1, 5)

Instagram : 2161558
Snapchat : 323905
YouTube - Watch Videos, Music, and Live Streams : 278166
Pic Collage - Picture Editor & Photo Collage Maker : 123433
Funimate video editor: add cool effects to videos : 123268
musical.ly - your video social network : 105429
Photo Collage Maker & Photo Editor - Live Collage : 93781
Vine Camera : 90355
Google Photos - unlimited photo and video storage : 88742
Flipagram : 79905
Mixgram - Picture Collage Maker - Pic Photo Editor : 54282
Shutterfly: Prints, Photo Books, Cards Made Easy : 51427
Pic Jointer – Photo Collage, Camera Effects Editor : 51330
Color Pop Effects - Photo Editor & Picture Editing : 45320
Photo Grid - photo collage maker & photo editor : 40531
iSwap Faces LITE : 39722
MOLDIV - Photo Editor, Collage & Beauty Camera : 39501
Photo Editor by Aviary : 39501
Photo Lab: Picture Editor, effects & fun face app : 34585
Rookie Cam - Photo Editor & Filter Camera : 33921
FotoRus -Camera & Photo Editor & Pic Collage Maker : 32558
PicsArt Photo St

In [68]:
search_data(AppleStore_free, 'Music', 11, 1, 5)

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [81]:
search_data(AppleStore_free, 'Productivity', 11, 1, 5)

Evernote - stay organized : 161065
Gmail - email by Google: secure, fast & organized : 135962
iTranslate - Language Translator & Dictionary : 123215
Yahoo Mail - Keeps You Organized! : 113709
Google Docs : 64259
Google Drive - free online storage : 59255
Dropbox : 49578
Microsoft Word : 47999
Microsoft OneNote : 39638
Microsoft Outlook - email and calendar : 32807
Hotspot Shield Free VPN Proxy & Wi-Fi Privacy : 32499
Documents 6 - File manager, PDF reader and browser : 29110
Google Sheets : 24602
Microsoft Excel : 24430
Inbox by Gmail : 21561
T-Mobile : 19977
Paper by FiftyThree - Sketch, Diagram, Take Notes : 18219
MyScript Calculator - Handwriting calculator : 16555
VPN Proxy Master - Unlimited WiFi security VPN : 13674
Microsoft OneDrive – File & photo cloud storage : 12797
Ever - Capture Your Memories : 12755
Speak & Translate － Voice and Text Translator : 12062
Tayasui Sketches : 11505
Drawing Desk - Draw, Paint, Doodle & Sketch board : 11040
Microsoft PowerPoint : 10939
Email - F

In [71]:
search_data(GooglePlayStore_free, 'COMMUNICATION', 1, 0, 5)

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [72]:
search_data(GooglePlayStore_free, 'VIDEO_PLAYERS', 1, 0, 5)

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

In [76]:
search_data(GooglePlayStore_free, 'SOCIAL', 1, 0, 5)

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, sti

In [78]:
search_data(GooglePlayStore_free, 'BOOKS_AND_REFERENCE', 1, 0, 5)

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [79]:
search_data(GooglePlayStore_free, 'PRODUCTIVITY', 1, 0, 5)

Microsoft Word : 500,000,000+
All-In-One Toolbox: Cleaner, Booster, App Manager : 10,000,000+
AVG Cleaner – Speed, Battery & Memory Booster : 10,000,000+
QR Scanner & Barcode Scanner 2018 : 10,000,000+
Chrome Beta : 10,000,000+
Microsoft Outlook : 100,000,000+
Google PDF Viewer : 10,000,000+
My Claro Peru : 5,000,000+
Power Booster - Junk Cleaner & CPU Cooler & Boost : 1,000,000+
Google Assistant : 10,000,000+
Microsoft OneDrive : 100,000,000+
Calculator - unit converter : 50,000,000+
Microsoft OneNote : 100,000,000+
Metro name iD : 10,000,000+
Google Keep : 100,000,000+
Archos File Manager : 5,000,000+
ES File Explorer File Manager : 100,000,000+
ASUS SuperNote : 10,000,000+
HTC File Manager : 10,000,000+
MyMTN : 1,000,000+
Dropbox : 500,000,000+
ASUS Quick Memo : 10,000,000+
HTC Calendar : 10,000,000+
Google Docs : 100,000,000+
ASUS Calling Screen : 10,000,000+
lifebox : 5,000,000+
Yandex.Disk : 5,000,000+
Content Transfer : 5,000,000+
HTC Mail : 10,000,000+
Advanced Task Killer : 50

There seems to be a lot of outliers which cause the averages to be much larger than it should be for apps that are starting out. However apps which are designed to provide extra functionality for larger apps tend to be very successful in both markets. Especially social media ones. 

A point of focus could be apps for social medias that do not have such analytics apps. There are many IG analytics app however there barely seem to be some for twitter or FB in this particular dataset. We could also look into ways to provide innovative useful data for consumers for example looking at ways to be able to share your analytics with friends etc.

This is a specific category of apps that does well in both app stores.

Another approach could be Books and References. In both stores, there were few apps that heavily skewed the data averages. And there seems to be a pattern of apps on popular books tend to do very well namely apps on the quran. A potential route could be to work on apps based around popular books with innovative functions. For example a religious book app with text to speech functions. This is an easily monetized category because reading books are very time consuming so giving rise to opportunity to run ads. 

Finally, productivity apps do very well in both stores. Upon inspection, neither datasets have large skews on the averages so most apps do well. Mjaority of the apps that do well are apps that can be used on a daily basis to provide quality of life improvements for example calenders, to do lists, alarm clocks. These apps tend to do less well on the Apple Store and are more scarce on the Apple Store. This might be due to Apple Store's tight restriction however it might also be because small time developer are unwilling to port to the Apple Store at a high enough standard. This might be an opportunity to look into being the one of the first developers to produce iOS productivity apps. Innovative steps can also be taken to ensure succes