# What type of apps are likely to attract more users

### Which has the most profitable apps: Android or IOS?

### What are the most popular genres?

Requirements of the project:
- Working as a Data Analyst for a company that builds Android and IOS mobile apps.
- Objective: to define what type of apps are likely to attract more users.
- Analysed data: from Statist (German Statistics office). 
[August2018](https://www.kaggle.com/datasets/lava18/google-play-store-apps)
[July2018](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)

## The datasets

### Mobile App Statistics (Apple iOS app store)
This data set contains more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study. The variables are poorly described in the data source.

Data collection date (from API);
July 2017

[Aplestore](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)

### Google Play Store Apps
This information is scraped from the Google Play Store. The date of scraping was not described by the author, and this is a limitation of this dataset. This dataset has most of the qualitative variables.

[Googleplaystore](https://www.kaggle.com/datasets/lava18/google-play-store-apps)

## Open datasets

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
# open dataset AppleStore.csv and transform in a list of lists
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios_data = ios[1:]

In [3]:
print(explore_data(ios, 0, 3, True))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16
None


In [4]:
# count rows
r=0
for row in ios:
    r += 1
print(r)    

7198


In [5]:
# open dataset googleplaystore.csv and transform in a list of lists
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android_data = android[1:]

In [6]:
# count rows
r=0
for row in android:
    r += 1
print(r) 

10842


In [7]:
print(explore_data(android, 0, 3, True))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13
None


## Exploring datasets and Cleaning data

The datasets are cleaned to improve accuracy, check user engagement, and allow us to evaluate the kinds of apps that are likely to attract more users and, consequently, increase our revenue.

1-  Detect duplicate data, and remove the duplicates: to reduce errors and risks

2- Detect inaccurate data, and correct or remove it:
- Remove non-English apps
    > Most of the users use apps in English and keeping only with this language, we reach most users and reduce machine costs in evaluating. <br><br>
    > To classify the apps as English or not, ASCII is used.
    > English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, so we can build a function that checks an app name and tells us whether it contains non-ASCII characters.

- Remove apps that aren't free
    > I keep with the free apps. Most of the apps make money inside the app, having free installations. This business type makes sense for new apps like ours new apps.
    
3- Examining genres of apps.
- Most reviewed genres
    > Reviews are used to find engagement index of user with the app. Reviews are a benchmark for measuring user acceptance of an app.<br>
    
- Most installed genres
    > By the number of apps installed in certain genres, we can identify demand by genre, considering different platforms.
    

### Explore applestore_data

#### Check the number of columns in each row

In [8]:
# create a function to check and delete the rows with missing data/columns in the datasets
def check_missing_values(data):
    is_all_valid = True
    for row in data:
        n_columns = len(data[0])  # n_of_headers:
        if len(row) != n_columns:
            print(data.index(row))
            print('The row index ', data.index(row), ' is missing ', n_columns - len(row), ' column(s). This row is being deleted.')
            index_to_delete = data.index(row)
            del data[index_to_delete]
            is_all_valid = False

    if is_all_valid:
        print('There is no missing columns in the dataset.')

In [9]:
check_missing_values(ios)

There is no missing columns in the dataset.


In [10]:
# count rows
r=0
for row in ios:
    r += 1
print(r) 

7198


In [11]:
check_missing_values(android)

10473
The row index  10473  is missing  1  column(s). This row is being deleted.


In [12]:
# count rows
r=0
for row in android:
    r += 1
print(r) 

10841


### Check duplicate entries

In [13]:
# ios
for app in ios:
    name = app[0]
    if name == 'Instagram':
        print(app)

In [14]:
print(android[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [15]:
# googleplaystore_data
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [16]:
# Finding the number of duplicate app:  # IOS
unique_apps = []
duplicate_apps = []
for app in ios:
    app_name =  app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:     
        unique_apps.append(app_name)
        
print('Number of duplicate IOS apps: ', len(duplicate_apps))
if duplicate_apps:
    print('Examples of duplicate IOS apps: ', duplicate_apps[:3])
else:
    print('There is no duplicate IOS apps.')

Number of duplicate IOS apps:  0
There is no duplicate IOS apps.


In [17]:
# Finding the number of repeated app:  # Android
unique_apps = []
duplicate_apps = []
for app in android:
    app_name =  app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:     
        unique_apps.append(app_name)
        
print('Number of duplicate Android apps: ', len(duplicate_apps))
if duplicate_apps:
    print('Examples of duplicate Android apps: ', duplicate_apps[:15])
else:
    print('There is no duplicate Android apps.')

Number of duplicate Android apps:  1181
Examples of duplicate Android apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


The **IOS** dataset, from googleplaystore, has **1181** multiple entries to the app name. **'Quick PDF Scanner + OCR FREE'**, **'Box'**, **'Google My Business'** , **'Slack'**, **'ZOOM'** are some examples of repeated apps.

### Remove duplicates in apps, keeping the ones with more reviews.

In [18]:
print(explore_data(android, 0, 3, True))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13
None


In [19]:
# Android

reviews_max = {}

for app in android[1:]:   # excluding the header
    name = app[0]   
    n_reviews = len(app[3])  
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        

In [20]:
print('Expected length:', len(android[1:]) - 1181)
print('Actual length:', len(reviews_max))   # considering unique values

Expected length: 9659
Actual length: 9659


In [21]:
android_clean = []
already_added = []

for app in android[1:]:
    name = app[0]
    n_reviews = len(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)   # rows  
        already_added.append(name)   # values in apps

In [22]:
print(explore_data(android_clean, 0, 3, True))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13
None


### Remove Non-English Apps

In [23]:
def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(ord('™'))
print(ord('😜'))

True
False
False
False
8482
128540


In [24]:
def is_english_0(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
            
        if non_ascii > 3:
            return False
        else:
            return True
# print(is_english('Instagram'))
# print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
# print(is_english('Docs To Go™ Free Office Suite'))
# print(is_english(is_english(txt)))
print(is_english_0('欢乐颂2'))

True


Trying different because the code above didn't work well.

### Using Python String isascii() Method

In [25]:
def is_english(txt):
    n_non_englisg_letter = 0

    for letter in txt:
        if letter.isascii() == False:
            n_non_englisg_letter += 1

    if n_non_englisg_letter > 3:
        return False

    else:    
        return True

# txt = "Company123欢乐欢乐欢乐欢乐"
txt = '最長１週間の献立が簡単に作れるme:new（ミーニュー）'
# txt = 'Instachat 😜'
# txt = 'Instagram'
# txt = '爱奇艺PPS -《欢乐颂2》电视剧热播'
# txt = 'Docs To Go™ Free Office Suite'
is_english(txt)

False

This second option worked better than using the 1st method.
I considered words with more than 3 characters, not ascii as non-English.

In [26]:
ios_english = []
android_english = []

for app in ios[1:]:
    name = app[0]
    if is_english(name) == True:
        ios_english.append(app)
        
explore_data(ios_english, 0, 3, True)
print('\n')

for app in android_clean:
    name = app[0]
    if is_english(name) == True:
        android_english.append(app)
        
explore_data(android_english, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1,

We're left with 7197 IOS apps and 9614 Android apps.

### Keeping only with free apps

In [27]:
ios_final = [ios_english[0]]
android_final = [android_english[0]]

for app in ios_english:
    type = app[4]
    if type == '0.0':
        ios_final.append(app)
    
explore_data(ios_final, 0, 3, True)
print('\n')

for app in android_english:
    type = app[7]
    if type == '0':
        android_final.append(app)
    
explore_data(android_final, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 4057
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018'

We found 4057 free IOS apps and 8865 free android apps, with titles in English. 

### Transforming the numerical variables into integer and float
In order to check how spread the observations are.

In [28]:
# transform objects (strings) in float and integer  # IOS apps

# ios_cols_category = ios_free[['track_name','currency', 'user_rating', 'ver','cont_rating', 'prime_genre']]
# ios_cols_numeric = ios_free['size_bytes', 'price', 'rating_count_tot', 'rating_count_ver', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
import pandas as pd
cols = ios[0]
df_ios = pd.DataFrame(ios_final[1:], columns = cols)
df_ios

to_transform_to_int = ['size_bytes','rating_count_tot', 'rating_count_ver', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
for var in to_transform_to_int:
    df_ios[var] = df_ios[var].astype('int')

df_ios['price'] = df_ios['price'].astype('float')

# df_ios.info()
df_ios.describe()


Unnamed: 0,size_bytes,price,rating_count_tot,rating_count_ver,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
count,4056.0,4056.0,4056.0,4056.0,4056.0,4056.0,4056.0,4056.0
mean,147935700.0,0.0,19749.8,569.400888,37.428254,3.585552,5.732495,0.994822
std,208901400.0,0.0,97744.28,4134.301293,2.954281,2.041633,8.505148,0.071777
min,767126.0,0.0,0.0,0.0,9.0,0.0,0.0,0.0
25%,54041340.0,0.0,22.0,1.0,37.0,2.0,1.0,1.0
50%,99600380.0,0.0,466.0,22.0,37.0,5.0,1.0,1.0
75%,161198600.0,0.0,5450.75,162.25,38.0,5.0,9.0,1.0
max,3148421000.0,0.0,2974676.0,117470.0,47.0,5.0,75.0,1.0


Among the numeric variables for IOS apps, `size_bytes`, `sup_devices.num`, `ipadSc_urls.num`,	and	`vpp_lic` are the most reliable for having lower std. However, we don't have description to  `sup_devices.num`, `ipadSc_urls.num`, and	`vpp_lic`. So I will not analyze these in detail.

In [29]:
# transform objects (strings) in float and integer  # Android apps

import pandas as pd
cols = android[0]
df_android = pd.DataFrame(android_final[1:], columns = cols)
df_android

to_transform_to_int_android = ['Reviews', 'Price']
for var in to_transform_to_int_android:
    df_android[var] = df_android[var].astype('int')

# df_ios.info()
df_android.describe()

Unnamed: 0,Reviews,Price
count,8864.0,8864.0
mean,235433.7,0.0
std,1910437.0,0.0
min,0.0,0.0
25%,30.0,0.0
50%,1403.0,0.0
75%,35528.5,0.0
max,78158310.0,0.0


Reviews have an OK to bad score for standard deviation which makes it a reliable variable for deep analysis.

### Defining percentages to evaluate genres
- One function to generate frequency tables that show percentages
- Another function that we can use to display the percentages in a descending order

In [30]:
def freq_table(dataset, index):
    table = {}   # create an empty dictionary
    total = 0    # counter
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1  # since it will happen many times, this came first. 
                    # Thus, we avoid to make the machine iterate by code for nothing.
        else:
            table[value] = 1   # this will hapend only in the first iteration.
            
    table_percentages = {}
    
    for key in table:
        percentage = (table[key]/total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []    # create an empty list
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse=True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Examining the apps genre

In [31]:
# for ios app, removing the header
display_table(ios_final[1:], -5)   # prime_genre

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


The percentages show that the most popular IOS apps are focused on entertainment. [Publications](https://www.macrumors.com/2021/12/02/apple-most-downloaded-apps-2021/) ratify the information.

In [32]:
# for android app, removing the header
display_table(android_final[1:],1)   # category

FAMILY : 18.44539711191336
GAME : 9.882671480144404
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.782490974729242
MAPS_AND_NAVIGATION : 1.3989169675090252
EDUCATION : 1.2748194945848377
FOOD_AND_DRINK : 1.2409747292418771
ENTERTAINMENT : 1.128158844765343
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
ART_AND_DESIGN : 0.6768953068592057
PARENTING : 0.6

In [33]:
# for android app, removing the header
display_table(android_final[1:],-4)   # genres

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

According to the [description of the variables of the dataset](https://www.kaggle.com/datasets/lava18/google-play-store-apps) and observation from the results for frequencies, `Genres` are more granulated than `Categories`.

At first glance, I can say that `Games` apps is the most popular genre  in IOS with 55,4%, presenting only 10,59% in Android. `Education` has 3.25% of the apps in IOS and 1,51% in Android. As well as, `Entertainment` is more significant in IOS (8,23%) than in Android (1,47%). 

Android stands out for `Business`, rated in Android with 4,5% and in IOS with 0,49%, for `Productivity` with 3,95% in Android while IOS presents 1,52%, for `Medical` apps (3,54%)  what does not happen with IOS (0,19%), and for `Travel` 2,46% while IOS presents 1,38%. 

`Food and drink` was not a significant genre in apps from IOS (1,06%) or Android(1,25%). 

Apps of Games and entertainment are expected to have more frequency and engagement than functional apps such as food, finance and travel.

Note: Since the categories are different between IOS and Android apps, a manual normalization or a data visualization could give better insights. 

### Examining reviews in Games

In [34]:
for app in android_final:
    if app[1] == 'GAME':
        print(app[0], ':', app[3], ':', app[2])  # print name and number of ratings

ROBLOX : 4447388 : 4.5
Subway Surfers : 27722264 : 4.5
Candy Crush Saga : 22426677 : 4.4
Solitaire : 254258 : 4.7
Bubble Shooter : 148897 : 4.5
Hello Kitty Nail Salon : 369203 : 4.2
slither.io : 5234162 : 4.4
Clash Royale : 23133508 : 4.6
Temple Run 2 : 8118609 : 4.3
Pou : 10485308 : 4.3
Helix Jump : 1497361 : 4.2
Block Puzzle : 59800 : 4.6
Angry Birds Rio : 2610526 : 4.4
Plants vs. Zombies FREE : 4066989 : 4.4
Sonic Dash : 3778921 : 4.5
Candy Crush Soda Saga : 6198563 : 4.4
Zombie Hunter King : 10306 : 4.3
Clash of Clans : 44891723 : 4.6
Kick the Buddy : 1000417 : 4.3
Block Puzzle Classic Legend ! : 17039 : 4.2
PAC-MAN : 685981 : 4.2
Super Jim Jump - pixel 3d : 10393 : 4.5
8 Ball Pool : 14198297 : 4.5
Magic Tiles 3 : 592068 : 4.5
Bubble Witch 3 Saga : 1732263 : 4.7
Word Search : 295241 : 4.7
Granny : 1135631 : 4.5
Angry Birds Classic : 5566669 : 4.4
Flow Free : 1295557 : 4.3
Race the Traffic Moto : 270687 : 3.8
Fishdom : 2157930 : 4.6
Galaxy Attack: Alien Shooter : 506275 : 4.6
Zombie

Bike Race Free - Top Motorcycle Racing Games : 2586261 : 4.5
Smash Hit : 4147718 : 4.5
Temple Run : 4000433 : 4.3
Super Car F. Mod for MCPE : 214 : 3.6
Gangstar Vegas - mafia game : 4830407 : 4.5
Rolling G Sky : 2065 : 3.6
G-Switch 2 : 12683 : 4.4
G-Switch 3 : 3656 : 4.6
G-Switch : 4116 : 3.8
Radial-G : Infinity : 517 : 3.6
Geometry Dash Lite : 6181640 : 4.5
Car Driving Simulator Drift : 19816 : 4.4
RC City Police Heavy Traffic Racer : 3250 : 4.4
Motorbike Driving Simulator 3D : 178723 : 3.9
Helix : 2454 : 3.9
Solitaire! : 403911 : 4.7
iGun Pro -The Original Gun App : 249308 : 4.2
NinJump : 421000 : 4.3
Fidget Spinner : 307398 : 4.2
J Balvin Piano Tiles : 6 : NaN
Guess the song of J Balvin : 28 : NaN
J-Stars Victory VS Guide : 609 : 4.1
KIM KARDASHIAN: HOLLYWOOD : 1017408 : 4.3
Magic Tiles - TWICE Edition (K-Pop) : 2351 : 4.4
Magic Tiles - Blackpink Edition (K-Pop) : 4304 : 4.6
Korean Dungeon: K-Word 1000 : 703 : 4.7
Lep's World 🍀 : 696019 : 4.3
Ant Smasher by Best Cool & Fun Games : 1

Word Hunt : 59 : 4.9
Skip-Bo™ Free : 46801 : 3.5
Red Ball 4 : 1432447 : 4.4
Skater Boy : 1167143 : 4.3
Sic Bo Online! Free Casino : 488 : 3.8
Hambo : 125578 : 4.4
Sic Bo Rave : 164 : 3.9
Thai Sic Bo : 14283 : 4.1
Bounce Classic : 38297 : 4.5
Bomber Friends : 776730 : 4.5
Sic Bo (Tai Xiu) - Multiplayer Casino : 152 : 4.2
Carros Rebaixados BR : 20691 : 4.3
Texas HoldEm Poker Deluxe (BR) : 1418 : 4.5
Brick Breaker BR : 7 : 5.0
Black Commando | Special Ops | FPS Shooting : 1167 : 4.1
True Skateboarding Ride Skateboard Game Freestyle : 8175 : 3.8
Bullshit! (Free) : 45 : 2.4
Bullshite! : 48 : 2.8
Block Strike : 947515 : 4.5
BombSquad Remote : 13304 : 4.1
BS Tractor : 3 : NaN
BS Chopper : 0 : NaN
B@dL!bs Lite : 10 : 3.8
kick the buddy : 294 : 3.3
Bu Hangi Ünlü? : 1 : NaN
Bu Nedir ? : 0 : NaN
BU HANGİ ŞARKI ? - 2018 : 4 : NaN
Bu Hangi Oyun ? : 2 : NaN
Bu Hangi Youtuber ? : 31 : NaN
Nedir Bu ? : 0 : NaN
Bu Hangi Dizi ? : 14 : NaN
Smashy Road: Arena : 45558 : 4.3
BW-Go Free : 547 : 4.6
BW-Joseki

Extreme Super Car Driving 3D : 23104 : 3.9
Fast Racing Car Simulator : 8482 : 3.8
Beach Head Shooting Assault : 555 : 3.8
Stickman Warriors Heroes 2 : 13714 : 4.4
Dead Target Zombie Shooting US Sniper Killer Squad : 240 : 4.3
Dubai Racing : 16237 : 4.2
Robot Fighting Games™ - Real Boxing Champions 3D : 30 : 4.4
Ghost Hunting camera : 3048 : 3.5
Santa Panda Bubble Christmas : 101 : 4.1
Racing Moto : 697805 : 4.3
Train Racing Games 3D 2 Player : 192374 : 4.3
Adivina el cantante de Trap y Reggaeton : 1914 : 4.5
Fernanfloo : 526595 : 4.8
Adivina el Emoji : 2536 : NaN
Red Hands – 2-Player Games : 93608 : 4.2
Get 'Em : 7904 : 4.1
Live Hold’em Pro Poker - Free Casino Games : 1123190 : 4.6
Beach Shoot Em Up: Head Hunter : 1218 : 4.4
Stick 'Em Up 2 Starter Edition : 2102 : 3.8
Shoot`Em Down: Shooting game : 7972 : 3.9
Texas Hold’em Poker + | Social : 4416 : 3.5
PlayTexas Hold'em Poker Free : 3543 : 3.8
HAWK – Force of an Arcade Shooter. Shoot 'em up : 190274 : 4.6
Texas Hold'em Poker : 12846 : 

As examples of Android games, I cite:
`Roblox` had 4.447.388 reviews and was rated 4.5,
`Candy Crush Saga` had 22.426.677 reviews and was rated 4.4,
`Clash of Clans` had  44.891.723 reviews and was rated 4.6, and
`Temple Run 2` had 8.118.609 reviews and was rated 4.3.

In [35]:
reviews_android_games = []
for app in android_final:
    if app[1] == 'GAME':
        reviews_android_games.append(int(app[3]))
sorted_reviews_android_games = sorted(reviews_android_games)
# print(sorted_reviews_android_games[-1:])
for app in android_final:
    if app[3] == '44893888':
        print('The most reviewed Android game was ', app[0],' with 44,893,888 reviews.')
    

In [36]:
for app in ios_final:
    if app[-5] == 'Games':
        print(app[1], ':', app[5], ':', app[7])  # print name, number of ratings and user rating

Clash of Clans : 2130805 : 4.5
Temple Run : 1724546 : 4.5
Candy Crush Saga : 961794 : 4.5
Angry Birds : 824451 : 4.5
Subway Surfers : 706110 : 4.5
Solitaire : 679055 : 4.5
CSR Racing : 677247 : 4.5
Crossy Road - Endless Arcade Hopper : 669079 : 4.5
Injustice: Gods Among Us : 612532 : 4.5
Hay Day : 567344 : 4.5
PAC-MAN : 508808 : 3.0
DragonVale : 503230 : 4.5
Head Soccer : 481564 : 5.0
Despicable Me: Minion Rush : 464312 : 4.5
The Sims™ FreePlay : 446880 : 4.5
Sonic Dash : 418033 : 4.5
8 Ball Pool™ : 416736 : 4.5
Tiny Tower - Free City Building : 414803 : 4.5
Jetpack Joyride : 405647 : 4.5
Bike Race - Top Motorcycle Racing Games : 405007 : 4.5
Kim Kardashian: Hollywood : 397730 : 4.5
Trivia Crack : 393469 : 4.5
WordBrain : 391401 : 4.5
Sniper 3D Assassin: Shoot to Kill Gun Game : 386521 : 5.0
Flow Free : 373857 : 4.5
Geometry Dash Lite : 370370 : 5.0
▻Sudoku : 359832 : 4.5
Fruit Ninja® : 327025 : 4.5
Pixel Gun 3D : 301182 : 4.5
Temple Run 2 : 295211 : 4.5
My Horse : 293857 : 4.5
Word Co

Design Home : 23298 : 4.5
100 PICS Quiz - guess the picture trivia games : 23160 : 4.5
Castle Crush: Epic Strategy Game : 23148 : 5.0
PAC-MAN 256 - Endless Arcade Maze : 23015 : 4.5
Empires & Allies : 22902 : 4.5
Crazy Taxi™ City Rush : 22877 : 4.5
Angry Birds Star Wars HD : 22674 : 4.5
Dungeon Hunter 5 - Multiplayer RPG on iOS : 22513 : 4.5
Traffic Racer : 22458 : 4.5
Kill Shot Bravo : 22444 : 4.5
iGun Pro HD - The Original Gun Application : 22324 : 4.5
Where's My Water? 2 : 22302 : 3.5
Solitaire Classic Card Game™ : 22079 : 4.5
Amazing Thief : 21984 : 4.0
Vlogger Go Viral - Clicker Game & Vlog Simulator : 21977 : 5.0
Inside Out Thought Bubbles : 21881 : 4.5
Restaurant DASH with Gordon Ramsay : 21804 : 4.5
Juice Jam : 21733 : 4.5
Jelly Splash : 21601 : 4.0
Colorfly : Coloring Book for Adults - Free Games : 21472 : 4.5
Robot Unicorn Attack 2 : 21432 : 4.5
Gods of Olympus : 21414 : 4.5
RollerCoaster Tycoon® 4 Mobile™ : 21407 : 3.5
MONOPOLY Bingo! : 21025 : 4.5
The Walking Dead: No Man's

Football Heroes PRO 2016 : 3018 : 4.0
The Paranormal Society™: Hidden Adventure : 3000 : 4.0
50 50 - The Addictive Slicing Game : 2998 : 4.5
KINGDOM HEARTS Union χ[Cross] : 2984 : 4.0
Water Bottle Flip Challenge 2 : 2964 : 4.0
Slam Dunk Basketball 2 - Play & Do Good : 2963 : 4.5
Spellspire : 2939 : 4.5
PolitiCats: Awesome Clicker Game : 2938 : 4.5
Battle Copters : 2922 : 4.5
Cookie Clickers 2 : 2912 : 5.0
Javelin Masters 2 : 2909 : 4.5
Love and Hip Hop The Game : 2908 : 4.0
Sea Hero Quest : 2908 : 5.0
Alpha Omega : 2892 : 4.5
Jupiter Jump : 2890 : 4.5
Fernanfloo : 2886 : 4.5
Shopkins: Chef Club : 2881 : 4.5
Solitaire ∞ : 2865 : 5.0
Lollipop: Sweet Taste Match3 : 2861 : 4.5
Talking Tom Jetski : 2853 : 4.0
Skiing Yeti Mountain : 2830 : 5.0
1+2=3 : 2816 : 4.0
Gear.Club - True Racing : 2793 : 4.5
Flow Free: Hexes : 2791 : 5.0
Galaxy Reavers-Space Strategy game(RTS) : 2778 : 4.5
Juju on the Beat : 2772 : 3.5
World of Cubes Survival Craft : 2758 : 4.0
Big Bang Racing : 2758 : 4.5
Pocket Poli

Bubble Shoot Pet : 914 : 4.5
Splat Wars : 913 : 4.5
Fitness Girl - Dance and Play at the Gym : 913 : 4.5
Nekosan : 905 : 4.5
Simulator Run Real Hologram : 903 : 1.5
Doodle Jump Easter Special : 900 : 4.5
Robbery Bob 2: Double Trouble : 899 : 4.5
Teen Titans Go Arcade : 897 : 3.5
Leveled! : 897 : 4.5
Puzzle Monster Quest - New MultiPlayer : 894 : 4.5
Ookujira - Giant Whale Rampage : 878 : 4.5
Exit Zone : 875 : 4.0
Duel Master - Best Duelist : 871 : 4.5
Mushroom Wars 2 : 871 : 3.0
Finger Hero: Avoid Obstacles : 870 : 4.5
Water Bottle Flip Challenge : 868 : 4.0
Combo Quest 2 : 855 : 4.5
Clashy Colors : 854 : 5.0
KSI Unleashed : 854 : 5.0
Zootopia Crime Files: Hidden Object : 853 : 4.5
Cake Break : 852 : 4.5
Hammer Time! : 839 : 4.0
MARVEL Tsum Tsum : 838 : 4.5
LEGO® NEXO KNIGHTS™ : MERLOK 2.0 : 837 : 4.0
SLOTS: TRUMP vs. HILLARY CLINTON Free Slot Games : 834 : 5.0
Dawn of Gods : 833 : 4.5
Ketchapp Tennis : 830 : 4.0
All is Lost : 829 : 4.5
Eden: The Game - Build Your Village! : 823 : 4.0


Diner Dynasty : 342 : 4.0
Circuroid : 342 : 4.5
The Cooking Game- With Cute iMessage Food Stickers : 340 : 4.5
Spinny Phone : 339 : 4.0
Ketchapp Soccer : 339 : 4.0
Crazy Love Story - Wedding Dreams : 337 : 4.0
Sacred Legends : 335 : 4.0
Tower Knights! : 334 : 4.0
Delicious - Emily’s Message in a Bottle : 333 : 4.0
SkillTwins Football Game : 332 : 4.5
Cops & Robbers! : 328 : 4.0
Color Dotz : 326 : 3.5
Hoverboard Simulator : 325 : 3.5
Crane Game Toreba! : 323 : 2.5
FaZes - Run & Jump : 322 : 4.5
Baby Airlines - Airport Adventures : 321 : 3.5
Crooked Path: Infinity Run : 318 : 4.0
Tennis Bits : 316 : 4.0
Kids Movie Night - Popcorn & Soda : 314 : 4.0
Batman v Superman: Who Will Win : 314 : 3.0
Evel Knievel : 313 : 4.5
Tap Cats: Idle Warfare : 313 : 4.5
Hidden my game by mom - escape room : 311 : 5.0
Back To Square One : 310 : 4.5
Soccer Hit : 309 : 4.0
Mucho Party : 307 : 4.5
Pop The Circle! : 304 : 4.0
QB Hero : 304 : 4.5
Rockstar Girls - Crazy Concert Day : 303 : 4.0
Rock The School - Cl

Fiete Sports : 86 : 3.0
Protect The Planet : 86 : 4.0
Candy's Carnival : 86 : 4.0
All Limpy Run! : 86 : 3.0
VR Roller Coaster World - Virtual Reality : 85 : 4.5
1000m Zombie Escape! : 85 : 4.0
金庸群侠传—全自由单机武侠RPG : 85 : 5.0
Pocket Rush : 85 : 4.0
Minescape : 84 : 3.5
House of Terror VR : 84 : 2.5
Burgers & Shakes - Fast Food Maker : 84 : 4.0
Pixel Disc Golf 2 : 83 : 3.0
Offline IV Calc for Pokemon GO : 82 : 3.5
Burnout City : 82 : 4.0
Amusement Park Fair Ground Circus Trucker Parking Simulator : 81 : 4.0
Crystal Rush! Color Shoot Arcade Game : 81 : 4.0
Dead Center : 81 : 3.5
Sticklings : 79 : 4.0
Werewolf "Nightmare in Prison" : 78 : 4.5
Just Trap : 78 : 4.0
4 Images 1 Mot : 78 : 4.0
Spiral Tower : 78 : 4.0
Fits - Block Puzzle King : 78 : 4.5
Bella's Pizza Place - Italian Food Maker : 77 : 4.0
Virtual Reality Roller Coaster for Google Cardboard VR : 77 : 4.0
Crazy Nursery - Newborn Baby Doctor Care : 76 : 4.0
Terra Monsters 3 : 76 : 3.5
Driving Evolution Parking Sim Real Car Driver Test :

Escape from the grandma house in the countryside. : 5 : 3.5
ファンタジードライブ【快進撃3DRPG】 : 5 : 4.5
New baseball board app BasePinBall : 5 : 3.0
モン娘は～れむ【モンはれ】 : 5 : 4.0
Escape the Ice Cream Parlor : 5 : 4.0
Escape from the ICU room. : 5 : 4.0
もやししゃちょー : 5 : 4.5
Escape the Sweet Shop Series : 3 : 5.0
【よく当たる】手相鑑定〜選べる鑑定ジャンル(人生運、仕事運、恋愛運) : 3 : 3.5
俺と鬼嫁の100日戦記 : 3 : 4.5
ホストのアブナイ世界 : 3 : 5.0
感動アプリ！無料ゲーム Japanese Gacha : 3 : 4.0
UDONちゅるん : 3 : 5.0
Inner Circle : 3 : 4.5
Escape from the beach house of everlasting summer. : 3 : 4.5
Escape from the frigid Igloo. : 3 : 4.0
Escape from the living room during the Obon holiday. : 3 : 4.0
Escape a Crepe House : 3 : 5.0
Wire de Coins 2 : 2 : 4.5
Circle Swing : 2 : 3.5
ROAD -escape game- : 2 : 5.0
4 Bilder 1 Wort : 2 : 2.5
Escape Game : Escape from Okinawa : 2 : 5.0
大航海世界（风靡日本航海手游《壮绝大航海》正版授权） : 2 : 4.5
Bounce Balls - Strike Game : 2 : 1.0
【カノピッピ大作戦】ギャルがオタクに恋をした/脱ギャル系彼女育成ゲーム : 2 : 5.0
Dash Heroes : 2 : 4.5
天天麻将-有奖励的麻将玩法全集 : 2 : 5.0
Soccer Cup Championship 2016 :

王权之战-帝国时代皇室争霸！ : 0 : 0.0
[GP]パチスロ デビル メイ クライ クロス(パチスロゲーム) : 0 : 0.0
ほぼ無理ゲー : 0 : 0.0
CUBE 360°　～想像力×知能×反射神経～ : 0 : 0.0
勘違い探し（俺のこと・・・） : 0 : 0.0
机甲无双－燃即正义 殿堂级战斗手游 : 0 : 0.0
完全犯罪クラブ : 0 : 0.0
脱出ゲーム KA-MA-KU-RA : 0 : 0.0
777游戏厅-2017超人气掌上游戏厅 : 0 : 0.0
脱出ゲーム Toilet : 0 : 0.0
【負けるが勝ち？】少数派に入れば勝利の7Beget : 0 : 0.0
脱出ゲーム 心霊学校からの脱出 : 0 : 0.0
指神 : 0 : 0.0
彼女の涙が僕を人間にした : 0 : 0.0
壊して！無限ドリルランド : 0 : 0.0
脱出ゲーム Oden : 0 : 0.0
[GP]アナザーゴッドハーデス-奪われたZEUSver.-(パチスロゲーム) : 0 : 0.0
おやじが跳ぶ : 0 : 0.0
ナイトメアハーレム 女性向け恋愛ゲーム！乙女げーむ : 0 : 0.0
キンニクエスト 筋肉と筋肉と筋肉と呪われし筋肉 : 0 : 0.0
战车世纪·经典FC重装机兵归来 : 0 : 0.0
【ひっぱりパズル】ズキュ〜〜〜〜ン！ : 0 : 0.0
秒で100億円 : 0 : 0.0
番茄电玩城 : 0 : 0.0
大人の脳トレ！ぴたぽん #頭が良くなる #快感ハメコミパズル : 0 : 0.0
Pantheon Warfare: The Chaos of Knights : 0 : 0.0
物理パズル:flower（フラワー） : 0 : 0.0
轩辕剑叁外传之天之痕-正版经典仙侠授权 : 0 : 0.0
主播炸金花 : 0 : 0.0
謝罪会見 : 0 : 0.0
早くトイレに行きたい : 0 : 0.0
僕の魔球打てるの？ : 0 : 0.0
[GP]パチスロ ヱヴァンゲリヲン〜決意の刻〜(パチスロゲーム) : 0 : 0.0
がんばれルルロロ！かさねてブロック : 0 : 0.0
なっとう-人気の納豆育成ゲーム- : 0 : 0.0
一筆書きゲーム！無料パズルで脳トレしよう！ by だーぱん : 0 : 0.0
AKU

As examples of IOS games, I cite:
`Roblox` had 183,621 reviews,
`Clash of Clans` had 2.130.805 reviews,
`Temple Run` had 1.724.546 reviews, and `Candy Crush Saga` had 961.794 reviews. 
Both were rated 4.5 by users.

In [37]:
reviews_ios_games = []
for app in ios_final:
    if app[-5] == 'Games':
        reviews_ios_games.append(int(app[5]))
sorted_reviews_ios_games = sorted(reviews_ios_games)
print(sorted_reviews_ios_games[-1:])
for app in ios_final:
    if app[5] == '2130805':
        print('The most reviewed IOS game was ', app[1],' with 2,130,805 reviews.')

[2130805]
The most reviewed IOS game was  Clash of Clans  with 2,130,805 reviews.


In [38]:
reviews_android_games = []
for app in android_final:
    if app[1] == 'GAME':
        reviews_android_games.append(int(app[3]))
sorted_reviews_android_games = sorted(reviews_android_games)
# print(sorted_reviews_android_games[-1:])
for app in android_final:
    if app[3] == '44893888':
        print('The most reviewed Android game was ', app[0],' with 44,893,888 reviews.')

`Clash of Clans` was the most reviewed game on both platforms. However, the number of reviews on Android is 22 times higher than on IOS. 

User engagement is higher on Android. But this could only be proven with the information on the number of installed apps, or the number of users, which are information that we don't have for the two datasets.

### Number of users and popularity
Data of installs of Android apps are used to give us an overview about the genre popularity. 



In [39]:
# check the variables
android[0]

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [40]:
unique_installs = []  # list unique values for number of installs 

for value in android_final[1:]:
    install = value[5] 
    if install not in unique_installs:
        unique_installs.append(install)   
unique_installs        


['10,000+',
 '500,000+',
 '5,000,000+',
 '50,000,000+',
 '100,000+',
 '50,000+',
 '1,000,000+',
 '10,000,000+',
 '5,000+',
 '100,000,000+',
 '1,000,000,000+',
 '1,000+',
 '500,000,000+',
 '500+',
 '100+',
 '50+',
 '10+',
 '1+',
 '5+',
 '0+',
 '0']

In [41]:
# To find the higher install number, remove characters and sort in descending order
unique_installs = []

for value in android_final[1:]:
    install = value[5] 
    install = install.replace(',', '') # remove , character
    
    substring = '+'
    if install.find(substring): 
        install = install.replace('+', '') # remove + character
        install = int(install)    # transform in integer
        
    if install not in unique_installs:
        unique_installs.append(install)
        
sorted(unique_installs, reverse = True)  

[1000000000,
 500000000,
 100000000,
 50000000,
 10000000,
 5000000,
 1000000,
 500000,
 100000,
 50000,
 10000,
 5000,
 1000,
 500,
 100,
 50,
 10,
 5,
 1,
 0]

In [42]:
print(explore_data(android_final, 0, 2, True))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 8865
Number of columns: 13
None


#### The categories that have the most installed apps
List unique categories that have apps with 1,000,000,000+ installs

In [43]:
categories_installs = []
for row in android_final:
    category = row[1]
    install = row[5] 
    if install == '1,000,000,000+' and category not in categories_installs:
        categories_installs.append(category)
print(categories_installs)
# android_free

['BOOKS_AND_REFERENCE', 'COMMUNICATION', 'ENTERTAINMENT', 'GAME', 'SOCIAL', 'PHOTOGRAPHY', 'TRAVEL_AND_LOCAL', 'TOOLS', 'PRODUCTIVITY', 'VIDEO_PLAYERS', 'NEWS_AND_MAGAZINES']


In [44]:
# remove characters and sort in descending order in the dataset
for value in android_final[1:]:
    install = value[5] 
    install = install.replace(',', '')
    
    substring = '+'
    if install.find(substring): 
        install = install.replace('+', '')
    install = int(install)
    value[5] = install

print(explore_data(android_final, 0, 2, True))       

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', 10000, 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', 10000, 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 8865
Number of columns: 13
None


In [45]:
# find the average of pps installed that are in COMMUNICATION category and  has less than 100000000 installs
under_100_m = []

for app in android_final:
    installs = app[5]
    if (app[1] == 'COMMUNICATION') and installs < 100000000:
        under_100_m.append(installs)
        
print(sum(under_100_m) / len(under_100_m))


3603485.3884615386


In [46]:
# find the average of pps installed that are in BOOKS_AND_REFERENCE category and  has less than 100000000 installs
under_100_m = []

for app in android_final:
    installs = app[5]
    if (app[1] == 'BOOKS_AND_REFERENCE') and installs < 100000000:
        under_100_m.append(installs)
        
print(sum(under_100_m) / len(under_100_m))

1437212.2162162163


In [47]:
# find the average of pps installed that are in ENTERTAINMENT category and  has less than 100000000 installs
under_100_m = []

for app in android_final:
    installs = app[5]
    if (app[1] == 'ENTERTAINMENT') and installs < 100000000:
        under_100_m.append(installs)
        
sum(under_100_m) / len(under_100_m)

6526170.212765957

In [48]:
# find the average of pps installed that are in GAME category and  has less than 100000000 installs
under_100_m = []

for app in android_final:
    installs = app[5]
    if (app[1] == 'GAME') and installs < 100000000:
        under_100_m.append(installs)
        
sum(under_100_m) / len(under_100_m)

6397874.017199017

In [49]:
list_avg = [4386993.665492957,1673876.3541666667, 6389411.764705882, 7943314.246119734]
sorted(list_avg)
# book, communication, entertainment, game

[1673876.3541666667, 4386993.665492957, 6389411.764705882, 7943314.246119734]

In [50]:
counter = 0
for app in android_final:
    if app[1] == 'COMMUNICATION' and app[5] == 100000000:
        counter += 1
        print(app[0], ':', app[5])
        
print(counter)

Messenger Lite: Free Calls & Messages : 100000000
Firefox Browser fast & private : 100000000
Yahoo Mail – Stay Organized : 100000000
imo beta free calls and text : 100000000
Opera Mini - fast web browser : 100000000
Opera Browser: Fast and Secure : 100000000
Who : 100000000
WeChat : 100000000
UC Browser Mini -Tiny Fast Private & Secure : 100000000
Android Messages : 100000000
Telegram : 100000000
GO SMS Pro - Messenger, Free Themes, Emoji : 100000000
BBM - Free Calls & Messages : 100000000
KakaoTalk: Free Calls & Text : 100000000
Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000
Kik : 100000000
16


In [51]:
counter = 0
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and app[5] == 100000000:
        counter += 1
        print(app[0], ':', app[5])

print(counter)

Wattpad 📖 Free Books : 100000000
Amazon Kindle : 100000000
Bible : 100000000
Audiobooks from Audible : 100000000
4


In [52]:
counter = 0
for app in android_final:
    if app[1] == 'ENTERTAINMENT' and app[5] == 100000000:
        counter += 1
        print(app[0], ':', app[5])
print(counter)

Netflix : 100000000
Hotstar : 100000000
Talking Angela : 100000000
IMDb Movies & TV : 100000000
Talking Ben the Dog : 100000000
5


In [53]:
counter = 0
for app in android_final:
    if app[1] == 'GAME' and app[5] == 100000000:
        counter += 1
        print(app[0], ':', app[5])
        
print(counter)

ROBLOX : 100000000
slither.io : 100000000
Clash Royale : 100000000
Helix Jump : 100000000
Angry Birds Rio : 100000000
Plants vs. Zombies FREE : 100000000
Sonic Dash : 100000000
Candy Crush Soda Saga : 100000000
Clash of Clans : 100000000
PAC-MAN : 100000000
8 Ball Pool : 100000000
Angry Birds Classic : 100000000
Flow Free : 100000000
Zombie Tsunami : 100000000
Hill Climb Racing : 100000000
Minion Rush: Despicable Me Official Game : 100000000
Farm Heroes Saga : 100000000
My Talking Angela : 100000000
Cut the Rope FULL FREE : 100000000
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 100000000
Cooking Fever : 100000000
Score! Hero : 100000000
Garena Free Fire : 100000000
Roll the Ball® - slide puzzle : 100000000
Talking Tom Gold Run : 100000000
Dream League Soccer 2018 : 100000000
Traffic Racer : 100000000
Hill Climb Racing 2 : 100000000
Hungry Shark Evolution : 100000000
Piano Tiles 2™ : 100000000
Pokémon GO : 100000000
Extreme Car Driving Simulator : 100000000
Trivia Crack : 10000000

### Conclusion

**Which has the most profitable apps: Android or IOS?** We know that the percentage of mobile over desktop is only increasing. Android holds about 53.2% of the smartphone market, while iOS is 43%. Based on these datasets, I can't answer this question. But I can highlight points and make some observations.

Free apps from IOS (2017) and Android(unknown data) were analyzed. A Cleaning was made to detect and remove missing and duplicate data, and detect and fix or remove inaccurate data. Duplicate entries were removed, keeping with entries with the higher number of reviews, that can represent the more recent records. Non-English apps and non-free apps were excluded from this work. Percentages for genres in both platforms were found. And the game with more reviews was shown.


- **Games are the strong point of IOS apps.**
Games and Entertainment are strong points of IOS apps. These types of apps are quite profitable, even though they are initially free and have different business models. Unlike functional apps, users spend a lot of time using gaming and entertainment apps, which makes room for selling space for marketing and also selling features. <br><br>

- **Business and Productivity are the most important genres in Android apps.**
Only requires time and focus as input. We can say that business apps are profitable, even if they don't have many users, and that they run profitably with a small mumber of users. Because these only require time and focus as input, they do not require upfront capital, nor do they require an extended business network [(source)](https://www.appypie.com/app-business-model). <br><br>

- **Clash of Clans** is the game with more reviews on both platforms, but the engagement on Android is 22 times higher for this game.

**What are the most popular genres?**

- `book and reference`, `communication`, `entertainment`, and `game` are the most installed categories with more.<br><br>

- `Communication` category was shown to be a higher proportion of apps with 100,000,000+ more disproportionate than the average of those with less installs. This makes me believe that this category is dominated by big and famous apps, while the other three categories are more competitive.

With these general insights, I can prioritize the platform I choose to launch my company's app on, according with the company's niche.

