# Analysing Mobile App Data

This is a project to analyse Apple App Store and Google Play Store data, to identify key features of popular free apps.

The ultimate aim is to use these features to develop new apps that will attract a large number of downloads, with the goal of maximising income from in-app ads.

Firstly, check whether the raw data files are present in the project folder...

In [1]:
import os

In [2]:
print(os.getcwd())

/home/dq/notebook


In [3]:
!pwd

/home/dq/notebook


In [4]:
%ls

[0m[01;32mAppleStore.csv[0m*  [01;32mBasics.ipynb[0m*  [01;32mgoogleplaystore.csv[0m*


Create a function 'explore_data()' to print rows from a dataset, where the dataset, start row, and end row, are all supplied as arguments of the function. The function will also print the total number of rows and columns if the Boolean 'True' is supplied as argument.

In [5]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [6]:
from csv import reader

Import the Apple App Store data and save as a list of lists...

In [7]:
apple_store = open("AppleStore.csv")
apple_store_csv = reader(apple_store)
apple_store_list = list(apple_store_csv)

Documentation describing the Apple App Store dataset may be found at the following link [app_store](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

Print the header row from the App Store data...

In [8]:
print(apple_store_list[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [9]:
play_store = open("googleplaystore.csv")
play_store_csv = reader(play_store)
play_store_list = list(play_store_csv)

Print the header row from the Google Play Store data...

Documentation describing the Google Play Store dataset is available via the following link [google_play](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).

In [10]:
print(play_store_list[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Use the explore_data() function to explore the App Store and Google Play datasets...

In [11]:
explore_data(apple_store_list[1:],1,5,True)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


In [12]:
explore_data(play_store_list[1:],1,5,True)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


As a first step to cleaning the data, using the length of the header row as a guide, look for and delete any rows which have fewer elements than they should...

Looking at the Google Play data, identify any truncated rows

In [13]:
play_columns = len(play_store_list[0])
print(play_columns)

13


In [14]:
this_row = 1
short_rows = []
for row in play_store_list[1:]:
    if len(row) < play_columns:
        short_rows.append(this_row)
    this_row += 1
print(short_rows)

[10473]


May be seen that only a single row from the data has fewer than expected fields.

Print the header row to see what fields should be present in the data...

In [15]:
print(play_store_list[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Now, print the truncated row...

In [16]:
print(play_store_list[short_rows[0]])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Can be seen that the Category column is missing from the truncated row.

Delete the incomplete row...

In [17]:
#del play_store_list[short_rows[0]]
play_store_list.pop(short_rows[0])

['Life Made WI-Fi Touchscreen Photo Frame',
 '1.9',
 '19',
 '3.0M',
 '1,000+',
 'Free',
 '0',
 'Everyone',
 '',
 'February 11, 2018',
 '1.0.19',
 '4.0 and up']

Excluding the header, check how many rows now remain (should be one fewer than found above)...

In [18]:
play_rows = len(play_store_list[1:])
print(play_rows)

10840


Now, identify and remove any duplicate rows.

Using the App column (column 0), check that each app has just a single entry (row).

In [19]:
unique_apps = []
duplicate_apps = []
for app in play_store_list[1:]:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

#print(play_store_list[:10])
print("There are ", len(duplicate_apps), "duplicate rows.\n")
print("Here are the first 10 duplicates ... \n", duplicate_apps[:10],"\n")
#print(unique_apps[:10])
print("There are ", len(unique_apps), "unique apps.\n")
print("Here are the first 10... \n", unique_apps[:10])

There are  1181 duplicate rows.

Here are the first 10 duplicates ... 
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack'] 

There are  9659 unique apps.

Here are the first 10... 
 ['Photo Editor & Candy Camera & Grid & ScrapBook', 'Coloring book moana', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint', 'Pixel Draw - Number Art Coloring Book', 'Paper flowers instructions', 'Smoke Effect Photo Maker - Smoke Editor', 'Infinite Painter', 'Garden Coloring Book', 'Kids Paint Free - Drawing Fun']


To demonstrate, taking the first element from the 'duplicate_apps' list, print a list of all the matching rows in the 'play_store_list'... 

In [20]:
duplicate = duplicate_apps[0]
duplicate_rows = []
row = 1
for app in play_store_list[1:]:
    name = app[0]
    if name == duplicate:
        duplicate_rows.append(row)
        #print(name, "\n")
    row += 1
print(duplicate)
print(duplicate_rows)

Quick PDF Scanner + OCR FREE
[223, 230, 286]


To illustrate, print the duplicate rows...

In [21]:
print(play_store_list[0], "\n")
for row in duplicate_rows:
    print(play_store_list[row], "\n")

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'] 

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'] 

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'] 



The 'Reviews' column provides a suitable criterion for removing the unwanted entries; i.e. retain only the row with the highest number of reviews, since this will likely be the most up-to-date information.

Create a dictionary with the unique app names as keys, and the corresponding highest number of reviews, as values.

The original list (play_store_list) has 10840 rows (not including header), and there are 1181 duplicate entries, so that there should be 9659 unique entries.

In [40]:
max_reviews = {}
for app in play_store_list[1:]:
    name = app[0]
    total_reviews = float(app[3])
    if name not in max_reviews:
        max_reviews[name] = total_reviews
    elif total_reviews > max_reviews[name]:
        max_reviews[name] = total_reviews
print(len(max_reviews))
# print the first 10 key:value pairs
index = 1
for key in max_reviews:
    print(key, " : ",max_reviews[key])
    if index >= 10:
        break
    index += 1

9659
Photo Editor & Candy Camera & Grid & ScrapBook  :  159.0
Coloring book moana  :  974.0
U Launcher Lite – FREE Live Cool Themes, Hide Apps  :  87510.0
Sketch - Draw & Paint  :  215644.0
Pixel Draw - Number Art Coloring Book  :  967.0
Paper flowers instructions  :  167.0
Smoke Effect Photo Maker - Smoke Editor  :  178.0
Infinite Painter  :  36815.0
Garden Coloring Book  :  13791.0
Kids Paint Free - Drawing Fun  :  121.0


Using the 'max_reviews' dictionary, create a de-duplicated list 'android_clean'. Also create a list 'already_added' to keep track of what's been added to android_clean.

Iterating over the rows of play_store_list (exc. the header), test whether the number of reviews is equal to max number of reviews for the same app (as recorded in the reviews_max dictionary). If true, and the app hasn't already been added (check whether the app name is in the already_added list), add the corresponding row to android_clean.

The purpose of already_added is to prevent creation of dup's for cases where more than one row of play_store_list has total reviews equal to the max reviews.

In [41]:
android_clean = []
already_added = []
for app in play_store_list[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == max_reviews[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
print(len(android_clean))
print(android_clean[:10])

9659
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up'], ['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000

Test the Apple App Store data for duplicates. May use the ID field (index 0) of each row, rather than app name...

In [42]:
unique_apps = []
duplicate_apps = []
for app in apple_store_list[1:]:
    ID = app[0]
    if ID in unique_apps:
        duplicate_apps.append(ID)
    else:
        unique_apps.append(ID)

#print(play_store_list[:10])
print("There are", len(duplicate_apps), "duplicate apps...")
print(duplicate_apps[:10],"\n")
#print(unique_apps[:10])
print("There are", len(unique_apps), "unique apps...")
print(unique_apps[:10])
print("\n")
print("Exluding the header, the Apple App Store data has", len(apple_store_list[1:]), " rows.")

There are 0 duplicate apps...
[] 

There are 7197 unique apps...
['284882215', '389801252', '529479190', '420009108', '284035177', '429047995', '282935706', '553834731', '324684580', '343200656']


Exluding the header, the Apple App Store data has 7197  rows.


Now, check for and remove any non-English-language apps...

The alphanumeric characters and punctuation, used in English, have ASCII codes 0 - 127. May use the Python function ord() to return the ASCII code of any character.

The following function takes a string argument and returns False if 3 or more characters lie outside of the range of common English characters. This function will therefore return True with up to 3 non-standard characters (e.g. emojis)...

In [43]:
def check_if_english(text):
    total_non_english_chars = 0
    for char in text:
        if ord(char) not in range(0,127):
            total_non_english_chars += 1
        if total_non_english_chars > 3:
            return False
    return True    

Test the function...

In [44]:
print(check_if_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_if_english('Instachat 😜'))
print(check_if_english('Docs To Go™ Free Office Suite'))

False
True
True


Check the Google Play and App Store data, and make lists of all the English-language apps...

In [45]:
english_app_store_apps = []
# first, copy the header row...
english_app_store_apps.append(apple_store_list[0])
for app in apple_store_list[1:]:
    name = app[1]
    if check_if_english(name):
        english_app_store_apps.append(app)
    
print("The English-only App Store dataset has", len(english_app_store_apps)," rows")

The English-only App Store dataset has 6184  rows


In [46]:
english_play_store_apps = []
# first, copy the header row...
english_play_store_apps.append(play_store_list[0])
for app in android_clean[1:]:
    name = app[0]
    if check_if_english(name):
        english_play_store_apps.append(app)
    
print("The English-only Play Store dataset has", len(english_play_store_apps)," rows")

The English-only Play Store dataset has 9614  rows


Now, the final step in data cleaning is to remove any apps that are not free of charge.

First, the App Store...

In [47]:
print(english_app_store_apps[:5])

[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'], ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1'], ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']]


Observe that 'price' corresponds to the element with index = 4 of each row. This is a string, so cast to float...

In [48]:
free_app_store_apps = []
free_app_store_apps.append(english_app_store_apps[0])
for app in english_app_store_apps[1:]:
    price = float(app[4])
    if price == 0:
        free_app_store_apps.append(app)
    
print("The App Store dataset has", len(free_app_store_apps),"free English apps")

The App Store dataset has 3223 free English apps


Now, take a look at the first few rows of the English-language apps from the Google Play Store...

In [49]:
print(english_play_store_apps[:5])

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


Observe that the price is the 8th element of each row (index = 7), but that the seventh element (index = 6) == 'Free' if the price is zero!

In [50]:
free_play_store_apps = []
free_play_store_apps.append(english_play_store_apps[0])
for app in english_play_store_apps[1:]:
    type = app[6]
    if type == 'Free' or type == 'free':
        free_play_store_apps.append(app)
    
print("The Play Store dataset has", len(free_play_store_apps),"free English apps")

The Play Store dataset has 8863 free English apps


We now have the desired lists of free English-language apps from both the App Store and Play Store.

Our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affects our in-app ad revenue. 

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.

From the Play Store data, 'genres' corresponds to the 10th element (index = 9) of each row, and the 2nd element (index = 1) of each row refers to 'category'. For the App Store data, the 12th element (index = 11) from each row corresponds to 'prime_genre'. 

Examples...

In [54]:
print("Play Store genre: ", free_play_store_apps[1][9])

Play Store genre:  Art & Design


In [59]:
print("Play Store category: ", free_play_store_apps[1][1])

Play Store category:  ART_AND_DESIGN


In [58]:
print("App Store prime genre: ",free_app_store_apps[1][11])

App Store prime genre:  Social Networking


Create genre frequency tables for both App Store and Play Store datasets...

In [60]:
app_store_genre_freqs = {}
for app in free_app_store_apps[1:]:
    genre = app[11]
    if genre in app_store_genre_freqs:
        app_store_genre_freqs[genre] += 1
    else:
        app_store_genre_freqs[genre] = 1
        
print(app_store_genre_freqs)

{'Social Networking': 106, 'Photo & Video': 160, 'Games': 1874, 'Music': 66, 'Reference': 18, 'Health & Fitness': 65, 'Weather': 28, 'Utilities': 81, 'Travel': 40, 'Shopping': 84, 'News': 43, 'Navigation': 6, 'Lifestyle': 51, 'Entertainment': 254, 'Food & Drink': 26, 'Sports': 69, 'Book': 14, 'Finance': 36, 'Education': 118, 'Productivity': 56, 'Business': 17, 'Catalogs': 4, 'Medical': 6}


In [61]:
play_store_genre_freqs = {}
for app in free_play_store_apps[1:]:
    genre = app[9]
    if genre in play_store_genre_freqs:
        play_store_genre_freqs[genre] += 1
    else:
        play_store_genre_freqs[genre] = 1
        
print(play_store_genre_freqs)

{'Art & Design': 52, 'Art & Design;Creativity': 6, 'Auto & Vehicles': 82, 'Beauty': 53, 'Books & Reference': 190, 'Business': 407, 'Comics': 54, 'Comics;Creativity': 1, 'Communication': 287, 'Dating': 165, 'Education': 474, 'Education;Creativity': 4, 'Education;Education': 30, 'Education;Pretend Play': 5, 'Education;Brain Games': 3, 'Entertainment': 538, 'Entertainment;Brain Games': 7, 'Entertainment;Creativity': 3, 'Entertainment;Music & Video': 15, 'Events': 63, 'Finance': 328, 'Food & Drink': 110, 'Health & Fitness': 273, 'House & Home': 73, 'Libraries & Demo': 83, 'Lifestyle': 345, 'Lifestyle;Pretend Play': 1, 'Card': 40, 'Arcade': 164, 'Puzzle': 100, 'Racing': 88, 'Sports': 307, 'Casual': 156, 'Simulation': 181, 'Adventure': 60, 'Trivia': 37, 'Action': 275, 'Word': 23, 'Role Playing': 83, 'Strategy': 80, 'Board': 34, 'Music': 18, 'Action;Action & Adventure': 9, 'Casual;Brain Games': 12, 'Educational;Creativity': 3, 'Puzzle;Brain Games': 15, 'Educational;Education': 35, 'Casual;Pre

There's a lot of repetition in the above, therefore create a function freq_table() instead...

In [62]:
def freq_table(dataset, index):
    freqs = {}
    for app in dataset[1:]:
        field = app[index]
        if field in freqs:
            freqs[field] += 1
        else:
            freqs[field] = 1
    return freqs

Test for 'category' (index = 1) of free_play_store_apps..

In [63]:
# play_categories = {}
play_categories = freq_table(free_play_store_apps, 1)
print(play_categories)

{'ART_AND_DESIGN': 56, 'AUTO_AND_VEHICLES': 82, 'BEAUTY': 53, 'BOOKS_AND_REFERENCE': 190, 'BUSINESS': 407, 'COMICS': 55, 'COMMUNICATION': 287, 'DATING': 165, 'EDUCATION': 103, 'ENTERTAINMENT': 85, 'EVENTS': 63, 'FINANCE': 328, 'FOOD_AND_DRINK': 110, 'HEALTH_AND_FITNESS': 273, 'HOUSE_AND_HOME': 73, 'LIBRARIES_AND_DEMO': 83, 'LIFESTYLE': 346, 'GAME': 862, 'FAMILY': 1675, 'MEDICAL': 313, 'SOCIAL': 236, 'SHOPPING': 199, 'PHOTOGRAPHY': 261, 'SPORTS': 301, 'TRAVEL_AND_LOCAL': 207, 'TOOLS': 750, 'PERSONALIZATION': 294, 'PRODUCTIVITY': 345, 'PARENTING': 58, 'WEATHER': 71, 'VIDEO_PLAYERS': 159, 'NEWS_AND_MAGAZINES': 248, 'MAPS_AND_NAVIGATION': 124}


The following function display_table() takes a freq_table() returned dictionary, and converts to a list of tuples, before sorting into descending order...

In [64]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now, using the above functions, display freq tables in descending order for...

i) category of free Google Play Store apps...

In [65]:
display_table(free_play_store_apps, 1)

FAMILY : 1675
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 56
COMICS : 55
BEAUTY : 53


ii) genre of free Google Play apps...

In [66]:
display_table(free_play_store_apps, 9)

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 80
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 52
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
Arcade;Action & Advent

iii) 'prime genre' of Apple App Store apps...

In [67]:
display_table(free_app_store_apps, 11)

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


Therefore, for the Apple App Store, the most common genre is 'games' with 1874 apps, followed by 'entertainment' with 254 apps. In general, most apps could be described as being for entertainment purposes (i.e. games, entertainment, photo & video, social networking, etc).

The total number of reviews may be a surrogate for total number of users. Therefore, generate a freq-table of 'rating_count_total' (index = 5)...

In [68]:
display_table(free_app_store_apps, 5)

0 : 151
1 : 23
7 : 14
5 : 14
2 : 14
10 : 13
6 : 12
14 : 12
9 : 10
53 : 10
29 : 10
22 : 10
17 : 10
8 : 9
41 : 9
38 : 9
105 : 9
78 : 8
21 : 8
115 : 8
58 : 7
49 : 7
43 : 7
37 : 7
30 : 7
3 : 7
19 : 7
110 : 7
109 : 7
99 : 6
94 : 6
70 : 6
56 : 6
50 : 6
42 : 6
39 : 6
35 : 6
343 : 6
18 : 6
15 : 6
12 : 6
89 : 5
88 : 5
84 : 5
83 : 5
81 : 5
74 : 5
69 : 5
668 : 5
64 : 5
62 : 5
55 : 5
46 : 5
40 : 5
397 : 5
32 : 5
27 : 5
25 : 5
228 : 5
20 : 5
197 : 5
154 : 5
144 : 5
135 : 5
125 : 5
104 : 5
101 : 5
91 : 4
86 : 4
72 : 4
691 : 4
67 : 4
60 : 4
54 : 4
44 : 4
422 : 4
416 : 4
360 : 4
34 : 4
339 : 4
329 : 4
31 : 4
285 : 4
26 : 4
220 : 4
213 : 4
200 : 4
173 : 4
172 : 4
169 : 4
168 : 4
164 : 4
16 : 4
158 : 4
147 : 4
145 : 4
142 : 4
134 : 4
130 : 4
128 : 4
122 : 4
1193 : 4
116 : 4
111 : 4
11 : 4
98 : 3
93 : 3
90 : 3
85 : 3
77 : 3
76 : 3
689 : 3
68 : 3
66 : 3
63 : 3
612 : 3
599 : 3
59 : 3
57 : 3
539 : 3
51 : 3
48 : 3
47 : 3
441 : 3
439 : 3
4050 : 3
392 : 3
375 : 3
373 : 3
36 : 3
33 : 3
314 : 3
299 : 3
298 : 3
2

Apple App Store doesn't provide data about the number of downloads, but it does include the total reviews. Google Play includes both total reviews and total downloads. Given that some users will download but never use an app, may argue that the total reviews is a better guide to the total active app users.

Therefore create a freq table showing the number of reviews per genre...

In [73]:
def reviews_per_genre(dataset, genre, reviews):
    freqs = {}
    for app in dataset[1:]:
        app_genre = app[genre]
        app_reviews = int(app[reviews])
        if app_genre in freqs:
            freqs[app_genre] += app_reviews
        else:
            freqs[app_genre] = app_reviews
    return freqs

For Play Store, the genre has index = 9, and reviews has index = 3...

In [74]:
play_reviews_per_genre = reviews_per_genre(free_play_store_apps, 9, 3)
print(play_reviews_per_genre)

{'Art & Design': 1384154, 'Art & Design;Creativity': 34067, 'Auto & Vehicles': 1159503, 'Beauty': 396240, 'Books & Reference': 16719063, 'Business': 9865569, 'Comics': 2341693, 'Comics;Creativity': 516, 'Communication': 285739629, 'Dating': 3622290, 'Education': 7684196, 'Education;Creativity': 25537, 'Education;Education': 7036946, 'Education;Pretend Play': 125757, 'Education;Brain Games': 577773, 'Entertainment': 55623415, 'Entertainment;Brain Games': 553730, 'Entertainment;Creativity': 430678, 'Entertainment;Music & Video': 1195193, 'Events': 161018, 'Finance': 12639775, 'Food & Drink': 6322667, 'Health & Fitness': 21319927, 'House & Home': 1929789, 'Libraries & Demo': 906842, 'Lifestyle': 11595956, 'Lifestyle;Pretend Play': 140995, 'Card': 6653399, 'Arcade': 116248093, 'Puzzle': 21566256, 'Racing': 52623750, 'Sports': 65525584, 'Casual': 130682137, 'Simulation': 25855890, 'Adventure': 18132896, 'Trivia': 7175764, 'Action': 149868296, 'Word': 5250257, 'Role Playing': 20688317, 'Stra

Create a new version of display_table() to directly accept a freq_table as input...

In [75]:
def display_freq_table(freq_table):
    #table = freq_table(dataset, index)
    table_display = []
    for key in freq_table:
        key_val_as_tuple = (freq_table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now display the sorted table showing play reviews per genre...

In [76]:
display_freq_table(play_reviews_per_genre)

Communication : 285739629
Tools : 228957337
Social : 227936113
Action : 149868296
Casual : 130682137
Arcade : 116248093
Photography : 105465239
Strategy : 101399187
Video Players & Editors : 67351839
Sports : 65525584
Entertainment : 55623415
Productivity : 55418917
Personalization : 53249961
Racing : 52623750
Shopping : 44553582
Travel & Local : 26802386
Simulation : 25855890
News & Magazines : 23085832
Puzzle : 21566256
Health & Fitness : 21319927
Role Playing : 20688317
Adventure : 18132896
Maps & Navigation : 17714646
Books & Reference : 16719063
Finance : 12639775
Weather : 12158805
Lifestyle : 11595956
Casual;Action & Adventure : 11312717
Business : 9865569
Education : 7684196
Trivia : 7175764
Education;Education : 7036946
Card : 6653399
Food & Drink : 6322667
Word : 5250257
Casino : 5103954
Adventure;Action & Adventure : 4539807
Board : 4132800
Music : 3896216
Dating : 3622290
Racing;Action & Adventure : 3047923
Puzzle;Brain Games : 2361690
Comics : 2341693
Casual;Pretend Play :

Now, look at Google Play categories...

In [77]:
play_reviews_per_category = reviews_per_genre(free_play_store_apps, 1, 3)
print(play_reviews_per_genre)

{'Art & Design': 1384154, 'Art & Design;Creativity': 34067, 'Auto & Vehicles': 1159503, 'Beauty': 396240, 'Books & Reference': 16719063, 'Business': 9865569, 'Comics': 2341693, 'Comics;Creativity': 516, 'Communication': 285739629, 'Dating': 3622290, 'Education': 7684196, 'Education;Creativity': 25537, 'Education;Education': 7036946, 'Education;Pretend Play': 125757, 'Education;Brain Games': 577773, 'Entertainment': 55623415, 'Entertainment;Brain Games': 553730, 'Entertainment;Creativity': 430678, 'Entertainment;Music & Video': 1195193, 'Events': 161018, 'Finance': 12639775, 'Food & Drink': 6322667, 'Health & Fitness': 21319927, 'House & Home': 1929789, 'Libraries & Demo': 906842, 'Lifestyle': 11595956, 'Lifestyle;Pretend Play': 140995, 'Card': 6653399, 'Arcade': 116248093, 'Puzzle': 21566256, 'Racing': 52623750, 'Sports': 65525584, 'Casual': 130682137, 'Simulation': 25855890, 'Adventure': 18132896, 'Trivia': 7175764, 'Action': 149868296, 'Word': 5250257, 'Role Playing': 20688317, 'Stra

In [78]:
display_freq_table(play_reviews_per_category)

GAME : 589197554
COMMUNICATION : 285739629
TOOLS : 229299673
SOCIAL : 227936113
FAMILY : 189627665
PHOTOGRAPHY : 105465239
VIDEO_PLAYERS : 67630663
PRODUCTIVITY : 55418917
PERSONALIZATION : 53249961
SHOPPING : 44553582
SPORTS : 35198523
TRAVEL_AND_LOCAL : 26803276
ENTERTAINMENT : 25648941
NEWS_AND_MAGAZINES : 23085832
HEALTH_AND_FITNESS : 21319927
MAPS_AND_NAVIGATION : 17714646
BOOKS_AND_REFERENCE : 16719063
FINANCE : 12639775
WEATHER : 12158805
LIFESTYLE : 11736951
BUSINESS : 9865569
FOOD_AND_DRINK : 6322667
EDUCATION : 5798189
DATING : 3622290
COMICS : 2342209
HOUSE_AND_HOME : 1929789
ART_AND_DESIGN : 1407708
MEDICAL : 1167538
AUTO_AND_VEHICLES : 1159503
PARENTING : 949965
LIBRARIES_AND_DEMO : 906842
BEAUTY : 396240
EVENTS : 161018


For the Apple App Store, the relevant fields are 5 & 11...

In [79]:
apple_reviews_per_genre = reviews_per_genre(free_app_store_apps, 11, 5)
print(apple_reviews_per_genre)

{'Social Networking': 7584125, 'Photo & Video': 4550647, 'Games': 42705967, 'Music': 3783551, 'Reference': 1348958, 'Health & Fitness': 1514371, 'Weather': 1463837, 'Utilities': 1513441, 'Travel': 1129752, 'Shopping': 2261254, 'News': 913665, 'Navigation': 516542, 'Lifestyle': 840774, 'Entertainment': 3563577, 'Food & Drink': 866682, 'Sports': 1587614, 'Book': 556619, 'Finance': 1132846, 'Education': 826470, 'Productivity': 1177591, 'Business': 127349, 'Catalogs': 16016, 'Medical': 3672}


Print as ordered table...

In [80]:
display_freq_table(apple_reviews_per_genre)

Games : 42705967
Social Networking : 7584125
Photo & Video : 4550647
Music : 3783551
Entertainment : 3563577
Shopping : 2261254
Sports : 1587614
Health & Fitness : 1514371
Utilities : 1513441
Weather : 1463837
Reference : 1348958
Productivity : 1177591
Finance : 1132846
Travel : 1129752
News : 913665
Food & Drink : 866682
Lifestyle : 840774
Education : 826470
Book : 556619
Navigation : 516542
Business : 127349
Catalogs : 16016
Medical : 3672


Observe that, for both App Store and Play Store datasets, apps related to entertainment score very highly. 

In particular, games attracted the highest number of reviews for both Apple and Google.