In [1]:
%%html
<style>
table {align:left;display:block}
</style>

# to align html tables to left

# Profitable App Profiles for the App Store and Google Play Markets

## Introduction

For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and in the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

### googleplaystore.csv

#### Description
A dataset containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.

#### Metadata
Dataset source: [Link](https://www.kaggle.com/lava18/google-play-store-apps?select=googleplaystore.csv)

| Column | Description |
| --- | --- |
| App | Application name |
| Category | Category the app belongs to |
| Rating | Overall user rating of the app (as when scraped) |
| Reviews | Number of user reviews for the app (as when scraped) |
| Size | Size of the app (as when scraped) |
| Installs | Number of user downloads/installs for the app (as when scraped) |
| Types | Paid or Free |
| Price | Price of the app (as when scraped) |
| Content Rating | Age group the app is targeted at - Children / Mature 21+ / Adult |
| Genres | An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to |
| Last Updated | Date when the app was last updated on Play Store (as when scraped) |
| Current Ver | Current version of the app available on Play Store (as when scraped) |
| Android Ver | Min required Android version (as when scraped) |

### AppleStore.csv

#### Description
A dataset containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017.

#### Metadata
Dataset source: [Link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps?select=AppleStore.csv)

Dimension of the data set;
7197 rows and 16 columns

| Column | Description |
| :--- | --- |
| id | App ID |
| track_name | App Name |
| size_bytes | Size (in Bytes) |
| currency | Currency Type |
| price | Price amount |
| ratingcounttot | User Rating counts (for all version) |
| ratingcountver | User Rating counts (for current version) |
| user_rating | Average User Rating value (for all version) |
| user_ratingver | Average User Rating value (for current version) |
| ver | Latest version code |
| cont_rating | Content Rating |
| prime_genre | Primary Genre |
| sup_devices.num | Number of supporting devices |
| ipadSc_urls.num | Number of screenshots showed for display |
| lang.num | Number of supported languages |
| vpp_lic | Vpp Device Based Licensing Enabled |

## Prepare: Load, open, explore datasets

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

        
def open_file(file_name, header=True):
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    if header == True:
        data = data[1:]
        return data
    elif header != True:
        return data

google_df = open_file(file_name='googleplaystore.csv', header=True)

explore_data(dataset=google_df, start=0, end=2, rows_and_columns=True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [3]:
apple_df = open_file(file_name='AppleStore.csv', header=True)

explore_data(dataset=apple_df, start=0, end=2, rows_and_columns=True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7197
Number of columns: 16


## Process: Identify and remove inaccurate data

In [4]:
# From Kaggle discussion forum, there is a row with missing column data below
# index of row with error is entry 10472 (without including header)
# there is no data for the 'Category' attribute
explore_data(dataset=google_df, start=10472, end=10473, rows_and_columns=True)

# redefine google_df to contain header for ease of review of header
google_df = open_file(file_name='googleplaystore.csv', header=False)
explore_data(dataset=google_df, start=0, end=1, rows_and_columns=False)

# redefine google_df to not contain header for ease of data manipulation
google_df = open_file(file_name='googleplaystore.csv', header=True)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Number of rows: 10841
Number of columns: 13
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']




In [5]:
# use del statement function to delete unwanted data row
del google_df[10472]

## Process: Identify and remove duplicate data rows

In [6]:
# there are also instances with duplicate app entries
# created 2 lists: - duplicate and unique app names

duplicate_apps = []
unique_apps = []

for row in google_df:
    name = row[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:2])

Number of duplicate apps 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box']


In [7]:
# one possible approach to choosing which duplicate data rows to remove
# (instead of random) is to retain data row with highest number 
# of reviews as it should be indicative of being the most recent data

# Create a dictionary, where each dictionary key is a unique app name 
# and the corresponding dictionary value is the highest number of reviews 
# of that app.

# Use the information stored in the dictionary and create a new dataset,
# which will have only one entry per app (and for each app,
# we'll only select the entry with the highest number of reviews).

# reate a dictionary where each key is a unique app name
# and the corresponding dictionary value is the highest
# number of reviews of that app.
reviews_max = {}  # create empty dictionary
for row in google_df:  #iterate each row in dataset
    name = row[0]  # assign value to variable
    n_reviews = float(row[3])  # assign value to variable
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews  # update value to latest higher value
    elif name not in reviews_max:
        reviews_max[name] = n_reviews  # assign variable (value) to key

reviews_max  # review reviews_max, should be dictionary with key-value pairs
len(reviews_max)  # check number of key-value pairs in dictionary

9659

In [8]:
# Use the dictionary created above to remove the duplicate rows

# create 2 empty lists
android_clean = []  
already_added = []

for row in google_df:  # iterate google dataset (list of list); should exclude header row already
    name = row[0]  # assign value to variable from data row
    n_reviews = float(row[3])  # assign value to variable from data row
    if name in reviews_max:
        max_n_review = reviews_max[name]  # locate max review number of this app
    if name not in already_added and n_reviews == max_n_review:
        android_clean.append(row)  # to append correct row to new cleaned dataset
        already_added.append(name)  # to keep track of app name already appended

android_clean[0:1]  # review android_clean; should be a cleaned list

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up']]

In [9]:
# review cleaned dataset, 
# should have 9659 unique rows, consistent with reviews_max
len(android_clean)

9659

In [10]:
# Check any duplicate app entries for Apple store data
# checking method: see if any duplicate id
# Based on below results, there are no duplicate apps for apple dataset

duplicate_apps_apple = []
unique_apps_apple = []

for row in apple_df:
    app_id = row[0]
    if app_id in unique_apps_apple:
        duplicate_apps_apple.append(name)
    else:
        unique_apps_apple.append(name)
        
print('Number of duplicate apps for Apple dataset', len(duplicate_apps_apple))
print('\n')
print('Examples of duplicate apps for Apple dataset:', duplicate_apps_apple[:2])

Number of duplicate apps for Apple dataset 0


Examples of duplicate apps for Apple dataset: []


## Process: Identify and remove apps with non-english names

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) system. 

Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. 

If the number is equal to or less than 127, then the character belongs to the set of common English characters. 

If an app name contains a character that is greater than 127, then it probably means that the app has a non-English name.

In [11]:
# return True if all char has ASCI number of less than 127
# ; ie. english characters
# ord() function returns an integer representing the Unicode character.

# to include emojis (which would have ASCI > 127) and limit useful data loss,
# function is modified to return false only if >3 char from input string
# trigger the ASCI > 127 flag t
def check_eng(test_string):
    count = 0
    for char in test_string:
        if ord(char) > 127:
            count += 1 
        elif ord(char) <= 127:
            pass
    if count > 3:
        return False  # return False if triggered > 3 times
    elif count <= 3:
        return True  # return True if else

In [12]:
check_eng('Instagram')  # test function: check_eng

True

In [13]:
check_eng('爱奇艺PPS -《欢乐颂2》电视剧热播')  # test function: check_eng

False

In [14]:
check_eng('Docs To Go™ Free Office Suite')  # test function: check_eng

True

In [15]:
check_eng('Instachat 😜')  # test function: check_eng

True

In [16]:
# now apply function to cleaned dataset to remove non-english apps

android_clean_v2 = []  # create separate new empty list for cleaned data
for row in android_clean:
    name = row[0]
    if check_eng(name) == True:  # append row if defined as english name
        android_clean_v2.append(row)
    else:
        pass

android_clean_v2[0:1]  # review cleaned dataset

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up']]

In [17]:
# check number of rows for google cleaned dataset;
# should be lower if removed rows
# previous number of rows is 9659
len(android_clean_v2)

9614

In [18]:
# now apply the same sequence to apple dataset
apple_clean = [] # create separate new empty list for cleaned data
for row in apple_df:
    name = row[1]
    if check_eng(name) == True:  # append row if defined as english name
        apple_clean.append(row)
    else:
        pass

apple_clean[0:1]  # review cleaned dataset

[['284882215',
  'Facebook',
  '389879808',
  'USD',
  '0.0',
  '2974676',
  '212',
  '3.5',
  '3.5',
  '95.0',
  '4+',
  'Social Networking',
  '37',
  '1',
  '29',
  '1']]

In [19]:
# check number of rows for apple cleaned dataset;
# should be lower if removed rows
# previous number of rows is 7,197
len(apple_clean)

6183

## Process: Isolate free apps for analysis

As mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [20]:
android_clean_v3 = []
for row in android_clean_v2:
    types = row[6]
    if types == 'Free':
        android_clean_v3.append(row)

android_clean_v3[0:1]  # review a few row to check result

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up']]

In [21]:
# review if any rows are removed
# previous number of rows before clean is 9,614
len(android_clean_v3)  

8863

In [22]:
# perform similar sequence for apple cleaned data
# to only let free apps remain
apple_clean_v2 = []
for row in apple_clean:
    price = float(row[4])
    if price == 0.0:
        apple_clean_v2.append(row)

apple_clean_v2[0:1]  # review a few row to check result

[['284882215',
  'Facebook',
  '389879808',
  'USD',
  '0.0',
  '2974676',
  '212',
  '3.5',
  '3.5',
  '95.0',
  '4+',
  'Social Networking',
  '37',
  '1',
  '29',
  '1']]

In [23]:
# review if any rows are removed
# previous number of rows before clean is 6,183
len(apple_clean_v2)

3222

## Analyse: Build a frequency table (transform the data)

As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.



In [24]:
# Generate frequency table for genre from Google dataset
google_genre = {}
for row in android_clean_v3:
    genre = row[9]  # get genre from cleaned dataset
    if genre not in google_genre:
        google_genre[genre] = 1
    elif genre in google_genre:
        google_genre[genre] += 1

google_genre  # review if ok

{'Art & Design': 53,
 'Art & Design;Creativity': 6,
 'Auto & Vehicles': 82,
 'Beauty': 53,
 'Books & Reference': 190,
 'Business': 407,
 'Comics': 54,
 'Comics;Creativity': 1,
 'Communication': 287,
 'Dating': 165,
 'Education': 474,
 'Education;Creativity': 4,
 'Education;Education': 30,
 'Education;Pretend Play': 5,
 'Education;Brain Games': 3,
 'Entertainment': 538,
 'Entertainment;Brain Games': 7,
 'Entertainment;Creativity': 3,
 'Entertainment;Music & Video': 15,
 'Events': 63,
 'Finance': 328,
 'Food & Drink': 110,
 'Health & Fitness': 273,
 'House & Home': 73,
 'Libraries & Demo': 83,
 'Lifestyle': 345,
 'Lifestyle;Pretend Play': 1,
 'Card': 40,
 'Arcade': 164,
 'Puzzle': 100,
 'Racing': 88,
 'Sports': 307,
 'Casual': 156,
 'Simulation': 181,
 'Adventure': 60,
 'Trivia': 37,
 'Action': 275,
 'Word': 23,
 'Role Playing': 83,
 'Strategy': 80,
 'Board': 34,
 'Music': 18,
 'Action;Action & Adventure': 9,
 'Casual;Brain Games': 12,
 'Educational;Creativity': 3,
 'Puzzle;Brain Games':

In [25]:
# Generate frequency table for genre from Apple dataset
apple_genre = {}
for row in apple_clean_v2:
    genre = row[11]  # get genre from cleaned dataset
    if genre not in apple_genre:
        apple_genre[genre] = 1
    elif genre in google_genre:
        apple_genre[genre] += 1

apple_genre  # review if ok

{'Social Networking': 1,
 'Photo & Video': 1,
 'Games': 1,
 'Music': 66,
 'Reference': 1,
 'Health & Fitness': 65,
 'Weather': 28,
 'Utilities': 1,
 'Travel': 1,
 'Shopping': 84,
 'News': 1,
 'Navigation': 1,
 'Lifestyle': 51,
 'Entertainment': 254,
 'Food & Drink': 26,
 'Sports': 69,
 'Book': 1,
 'Finance': 36,
 'Education': 118,
 'Productivity': 56,
 'Business': 17,
 'Catalogs': 1,
 'Medical': 6}

In [26]:
# Generate google genre frequency table that shows percentages

total_google = len(android_clean_v3)  # from number of rows in cleaned dataset

google_genre_percent = {}
for key in google_genre:
    proportion = google_genre[key] / total_google
    percentage = round(proportion * 100, 2)  # rounded to 2 decimal places
    google_genre_percent[key] = percentage

google_genre_percent  # review results

{'Art & Design': 0.6,
 'Art & Design;Creativity': 0.07,
 'Auto & Vehicles': 0.93,
 'Beauty': 0.6,
 'Books & Reference': 2.14,
 'Business': 4.59,
 'Comics': 0.61,
 'Comics;Creativity': 0.01,
 'Communication': 3.24,
 'Dating': 1.86,
 'Education': 5.35,
 'Education;Creativity': 0.05,
 'Education;Education': 0.34,
 'Education;Pretend Play': 0.06,
 'Education;Brain Games': 0.03,
 'Entertainment': 6.07,
 'Entertainment;Brain Games': 0.08,
 'Entertainment;Creativity': 0.03,
 'Entertainment;Music & Video': 0.17,
 'Events': 0.71,
 'Finance': 3.7,
 'Food & Drink': 1.24,
 'Health & Fitness': 3.08,
 'House & Home': 0.82,
 'Libraries & Demo': 0.94,
 'Lifestyle': 3.89,
 'Lifestyle;Pretend Play': 0.01,
 'Card': 0.45,
 'Arcade': 1.85,
 'Puzzle': 1.13,
 'Racing': 0.99,
 'Sports': 3.46,
 'Casual': 1.76,
 'Simulation': 2.04,
 'Adventure': 0.68,
 'Trivia': 0.42,
 'Action': 3.1,
 'Word': 0.26,
 'Role Playing': 0.94,
 'Strategy': 0.9,
 'Board': 0.38,
 'Music': 0.2,
 'Action;Action & Adventure': 0.1,
 'Casua

In [27]:
# Generate google genre frequency table that shows percentages
total_apple = len(apple_clean_v2)  # from number of rows in cleaned dataset

apple_genre_percent = {}
for key in apple_genre:
    proportion = apple_genre[key] / total_apple
    percentage = round(proportion * 100, 2)  # rounded to 2 decimal places
    apple_genre_percent[key] = percentage

apple_genre_percent  # review results

{'Social Networking': 0.03,
 'Photo & Video': 0.03,
 'Games': 0.03,
 'Music': 2.05,
 'Reference': 0.03,
 'Health & Fitness': 2.02,
 'Weather': 0.87,
 'Utilities': 0.03,
 'Travel': 0.03,
 'Shopping': 2.61,
 'News': 0.03,
 'Navigation': 0.03,
 'Lifestyle': 1.58,
 'Entertainment': 7.88,
 'Food & Drink': 0.81,
 'Sports': 2.14,
 'Book': 0.03,
 'Finance': 1.12,
 'Education': 3.66,
 'Productivity': 1.74,
 'Business': 0.53,
 'Catalogs': 0.03,
 'Medical': 0.19}

In [28]:
# alternatively, we can use helper functions to generate frequency table
# write a helper function to generate frequency table
# this function will be used by the helper function below 'display_table'
def freq_table(dataset, index):
    data_freq_table = {}
    length_table = len(dataset)
    for row in dataset:  # generate frequency table by counts
        key = row[index]  # get index value from cleaned dataset
        if key not in data_freq_table:
            data_freq_table[key] = 1
        elif key in data_freq_table:
            data_freq_table[key] += 1
    for key in data_freq_table:
        proportion = data_freq_table[key] / length_table
        percentage = round(proportion * 100, 2)  # rounded to 2 decimal places
        data_freq_table[key] = percentage  # replace values with percentages
    return data_freq_table

In [29]:
# write a helper function to sort a dictionary by values
# in descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [30]:
# test the helper function 'display_table'
# use it on google cleaned dataset
display_table(android_clean_v3, 9)  # index 9 is genre for google dataset

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.9
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;B

In [31]:
# test the helper function 'display_table'
# use it on google cleaned dataset
display_table(android_clean_v3, 1)  # index 1 is category for google dataset

FAMILY : 18.9
GAME : 9.73
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


In [32]:
# test the helper function 'display_table'
# use it on google cleaned dataset
display_table(apple_clean_v2, 11)  # index 11 is genre for apple dataset

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


## Analyse / Share: Most common genre/category

Google dataset:
- Most common genre: Tools at 8.45% (not that high)
- 2nd common genre: Entertainment at 6.07%
- 3rd common genre: Education at 5.35%
- Most common category: Family at 18.9% (slightly higher)
- 2nd common category: Game at 9.73%
- 3rd common category: Tools at 8.46%

Apple dataset:
- Most common genre: Games at 58.16% (More than half!)
- 2nd common genre: Entertainment at 7.88%
- 3rd common genre: Photo & Video at 4.97%


It would appear that:
- Practical/entertainment apps are more balanced in Google apps stores
- Entertainment/game apps are much more popular in Apple apps store
    - More than half of the samples observed.
    
This does reveal that game apps would perform disproportionally more effective in the Apple Apps environment.

## Analyse / Share:  Apple Apps Store - Alternative metric for popularity (rating count)

However, we can also consider alternative metrics to measure popularity. The most common apps measured by category/genre in our earlier metric is based on the supply of the apps market place.

However, by measuring the popularity from the users/demand side, we may come to a different insights, and some may say measuring demand is more important, if not more important, than measuring supply, in terms of determining commercial viability.

In [33]:
# use helper function - freq_table written earlier,
# to assist in generating frequency table
apple_genre = freq_table(dataset=apple_clean_v2, index=11)  # index11: genre

# initiate nested loops
for genre in apple_genre:
    total = 0  # Store sum of user ratings specific to each genre
    len_genre = 0 # Store number of apps specific to each genre
    genre_app = genre
    for row in apple_clean_v2:
        if row[11] == genre_app:  # index11: genre
            rating_count = float(row[5])  # index5: user rating counts(all versions)
            total += rating_count  # accumulate user rating sums for specific genre
            len_genre += 1  # accumulate number of apps specific to genre
    avg_user_numbers = round(total / len_genre, 0)  # compute avg user numbers each specific genre
    print("App genre: ", genre, "\n", "Average users based on rating counts: ", avg_user_numbers, "\n")
    


App genre:  Social Networking 
 Average users based on rating counts:  71548.0 

App genre:  Photo & Video 
 Average users based on rating counts:  28442.0 

App genre:  Games 
 Average users based on rating counts:  22789.0 

App genre:  Music 
 Average users based on rating counts:  57327.0 

App genre:  Reference 
 Average users based on rating counts:  74942.0 

App genre:  Health & Fitness 
 Average users based on rating counts:  23298.0 

App genre:  Weather 
 Average users based on rating counts:  52280.0 

App genre:  Utilities 
 Average users based on rating counts:  18684.0 

App genre:  Travel 
 Average users based on rating counts:  28244.0 

App genre:  Shopping 
 Average users based on rating counts:  26920.0 

App genre:  News 
 Average users based on rating counts:  21248.0 

App genre:  Navigation 
 Average users based on rating counts:  86090.0 

App genre:  Lifestyle 
 Average users based on rating counts:  16486.0 

App genre:  Entertainment 
 Average users based on

### Findings (Apple Apps) - Alternative metric on popularity:

Top 3 app genre based on number of user counts (derived by number of ratings):
- Navigation: 86,090
- References: 74,942
- Social networking: 71,548

As such, the above 3 apps in Apple Apps Stores are recommended to be explored for further commercial opportunities as well based on this alternative metric.

## Analyse / Share:  Google Apps Store - Alternative metric for popularity (installs count)

In [34]:
# view install count tables of Google play Store
# we need to remove special characters to perform computations later
# str.replace(old, new)
display_table(android_clean_v3, 5)  #index5: Number of installs

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05


In [40]:
# generate frequency table for 'Category' column from Google cleaned dataset
# use helper function - freq_table written earlier,
# to assist in generating frequency table
google_category = freq_table(dataset=android_clean_v3, index=1)  # index1: category

# initiate nested loops
for category in google_category:
    total = 0  # Store sum of installs specific to each category
    len_category = 0 # Store number of apps specific to each category
    category_app = category
    for row in android_clean_v3:
        if row[1] == category_app:  # index1: category
            install_count = row[5]  # index5: no. of installs for specific app
            # str.replace(old, new)
            install_count = install_count.replace('+', '')  # remove '+'
            install_count = install_count.replace(',', '')  # remove ','
            install_count = float(install_count)  # typecast str into float
            total += install_count  # accumulate install counts sums for specific category
            len_category += 1  # accumulate number of apps specific to category
    avg_user_numbers = round(total / len_category, 0)  # compute avg user numbers each specific category
    print("App category: ", category, "\n", "Average users based on install counts: ", avg_user_numbers, "\n")
    

App category:  ART_AND_DESIGN 
 Average users based on install counts:  1986335.0 

App category:  AUTO_AND_VEHICLES 
 Average users based on install counts:  647318.0 

App category:  BEAUTY 
 Average users based on install counts:  513152.0 

App category:  BOOKS_AND_REFERENCE 
 Average users based on install counts:  8767812.0 

App category:  BUSINESS 
 Average users based on install counts:  1712290.0 

App category:  COMICS 
 Average users based on install counts:  817657.0 

App category:  COMMUNICATION 
 Average users based on install counts:  38456119.0 

App category:  DATING 
 Average users based on install counts:  854029.0 

App category:  EDUCATION 
 Average users based on install counts:  1833495.0 

App category:  ENTERTAINMENT 
 Average users based on install counts:  11640706.0 

App category:  EVENTS 
 Average users based on install counts:  253542.0 

App category:  FINANCE 
 Average users based on install counts:  1387692.0 

App category:  FOOD_AND_DRINK 
 Average

### Findings (Google Apps) - Alternative metric on popularity:

Top app categories based on number of installs:
- Communications: 38,456,119
- Video players: 24,727,872
- Social: 23,253,652
- Photography: 17,840,110
- Productivity: 16,787,331
- Game: 15,588,016

As such, the above apps in Google Apps Stores are recommended to be explored for further commercial opportunities as well based on this alternative metric.

## Act (Further internal discussion for opportunities)

Further discussions can be conducted with the products and operations departments to see what specific app product features we can develop and push in respective apps environment, based on:
- supply & demand of the marketplace (data-driven insights from data analytics project)
- our internal development resources