## Profit making Apps for Google Play and App Store

This project focuses on identifying app profiles that are likely to attract more users on Google Play and the App Store. As data analysts for a mobile app development company, our main task is to analyze data from both platforms to understand user preferences and trends. Since our company builds free-to-download apps, our revenue depends primarily on in-app ads, making user engagement crucial for profitability.

The goal of this project is to provide actionable insights for our development team on the types of apps that have the highest potential to engage a large user base. By analyzing market data, we aim to uncover patterns that will help guide app development decisions, ultimately driving more downloads and maximizing ad revenue through increased user engagement.

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.Source:[Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)

Since this is a huge amount of data that requires a lot of resources, we only use sample data from these datasets:
- [A dataset](https://www.kaggle.com/datasets/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
- [A dataset](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

We'll start by opening and exploring these two data sets. To make them easier to explore, we create a function named `explore_data()` that you can repeatedly use to print rows in a readable way.

**Opening the datasets and exploring them**

In [5]:
import csv

# Google play dataset
with open('googleplaystore.csv', 'r', encoding='utf8') as file:
    read_file = csv.reader(file)
    android = list(read_file)
    android_header = android[0]
    android = android[1:]

# App store dataset
with open('AppleStore.csv', 'r', encoding='utf8') as file:
    read_file = csv.reader(file)
    ios = list(read_file)
    ios_header = ios[0]
    ios = ios[1:]

Let's write the `explore_data()` function

In [6]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Explore both datasets using the `explore_data()` function

In [7]:
# Google play dataset
print(android_header)
print('\n')
explore_data(android, 0, 2, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


This dataset has `10841 rows` and `13 columns`.

The columns that are helpful for our analysis are `App`, `Category`, `Reviews`, `Installs`, `Type`, `Price` and `Genres` 

In [8]:
# App store dataset
print(ios_header)
print('\n')
explore_data(ios, 0, 4, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


This dataset has `7197 rows` and `16 columns`

The columns we will need for analysis are `track_name`, `currency`, `price`, `rating_count_tot`, `rating_count_ver` and `prime_genre`

## Data Cleaning

**1. Detecting and deleting wrong data**

In [9]:
for index, row in enumerate(android): 
    if len(row) != len(android_header): 
        print(row) 
        print(index)

print('\n')
print(android_header)  # header
print('\n')
print(android[0])      # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


The `Category` column is missing in row `10472` and the `Genre` column has no value. Hence we drop the row

In [10]:
del android[10472]  # don't run this more than once
print(len(android))

10840


**2. Removing Duplicates**

**Google Play dataset**

Upon examining the Google Play dataset closely we discover that it has duplicates. For instance `Slack` has three entries

In [11]:
for app in android:
    name = app[0]
    if name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


Let's check the number of cases where an app is appearing more than once

In [12]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


We do not want to count an app more than once during analysis so we need to remove the duplicates.

However we do not want to do it randomly. If you check the `Slack` example, what differs between the entries is the number of `reviews`. The higher the number of reviews the more recent the entry. Therefore we can keep the row with the highest number of reviews

To remove the duplicates, we will do the following:

- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [13]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])

    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews

    elif name not in reviews_max:
        reviews_max[name] = n_reviews

Original number of entries was 10840 and the duplicates are 1181. After removing them we expect to have less entries by 1181

In [14]:
print('Expected Length:', len(android) - 1181)
print('Actual Length:', len(reviews_max))

Expected Length: 9659
Actual Length: 9659


Let us use the dictionary to remove the duplicates and store the data in a new list of lists

We start by initializing two empty lits. Then we loop through the Google Play dataset and do the following:
- Assign the `app` name to a variable named `name`.
- Convert the number of `reviews` to float, and assign it to a variable named `n_reviews`.
  
- If `n_reviews` is the same as the number of maximum reviews of the app `name` (the number can be found in the reviews_max dictionary) and `name` is not already in the list `already_added`:
    - Append the entire row to the `android_clean` list (which will eventually be a list of lists and store our cleaned dataset).
    - Append the name of the app `name` to the `already_added` list  (this helps us to keep track of apps that we already added).

In [15]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])

    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

Check whether the dataset has 9,659 rows

In [16]:
explore_data(android_clean, 0, 4, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9659
Number of columns: 13


**App store data**

In [17]:
duplicate_apps_ios = []
unique_apps_ios = []

for app in ios:
    name = app[0]
    if name in unique_apps_ios:
        duplicate_apps_ios.append(name)
    else:
        unique_apps_ios.append(name)

print('Number of duplicate apps:', len(duplicate_apps_ios))
print('\n')
print('Examples of duplicate apps:', duplicate_apps_ios[:2])

Number of duplicate apps: 0


Examples of duplicate apps: []


The App Store data has no duplicates

**3. Removing Non-English Apps**

We'd like to analyze only the apps that are designed for an English-speaking audience. However, if we dig deeper, we'll find that both datasets have apps with names that suggest they are not designed for an English-speaking audience. For instance:

In [18]:
print(android_clean[4412][0])
print('\n')
print(ios[813][1])

中国語 AQリスニング


爱奇艺PPS -《欢乐颂2》电视剧热播


We need to remove this apps and one way of doing it is checking whether the app name contains characters not commonly used in English. The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127 according to the ASCII (American Standard Code for Information Interchange) system. This means that if an app name has more than 127 characters then it is non-English.

Let's write a function to check this

In [19]:
def is_english(string):
    # Iterate through each character in the string
    for char in string:
        # Check if the character's ASCII value is greater than 127
        if ord(char) > 127:
            return False  # Non-English character found
    # If no characters were non-English, return True
    return True

Use the function to check whether these app names are detected as English or non-English:

- `Instagram`
- `爱奇艺PPS -《欢乐颂2》电视剧热播`
- `Docs To Go™ Free Office Suite`
- `Instachat 😜`

In [20]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


The function seems to work but does not consider English apps that use emojis and symbols. This can make us remove some useful apps

Let's modify the function to only remove an app if it has more than three non-ASCII characters

In [21]:
def is_english(string):
    non_ascii = 0
    
    for char in string:
        if ord(char) > 127:
            non_ascii += 1

    if non_ascii > 3:
        return False 
    else:
        return True

In [22]:
# Test the function
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


The function works but some non-English apps may pass through our filter but we will work with it as it is for now.

Use the new function to filter out non-English apps from both datasets. 

In [23]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)

Explore the datasets and see how many rows you have remaining for each dataset.

In [24]:
explore_data(android_english, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


In [25]:
explore_data(ios_english, 0, 2, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6183
Number of columns: 16


Now we are left with 9614 Google Play apps and 6183 App Store apps

**4. Isolating the Free Apps**

As mentioned earlier we only build apps that are free to download. Both our datasets contain free and paid apps. We will need to isolate the free ones for analysis

In [26]:
android_cleaned = []
ios_cleaned = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_cleaned.append(app)

for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_cleaned.append(app)

Check the length of each dataset to see how many apps you have remaining

In [27]:
print('Android apps:', len(android_cleaned))
print('iOS apps:', len(ios_cleaned))

Android apps: 8864
iOS apps: 3222


## Analysis

**Most Common Apps by Genre**

As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.

We can use `Category` and `Genre` columns from the Google Play dataset and `prime_genre` column from App Store dataset

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function we can use to display the percentages in a descending order

In [28]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [29]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Use the `display_table()` function to display the frequency table of the columns `prime_genre`, `Genres`, and `Category`

In [30]:
# freq table for prime_genre in App store dataset
display_table(ios_cleaned, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Of the free English apps on App store, over 58% are games, entertainment taking about 8% followed by Photo and video(4%). Only 3.6% are dedicated to Education and 3.28% to Social Networking. This shows that most apps in App Store a designed for fun. However this does not mean they have the highest number of users 

In [31]:
# freq table for Category in Google play dataset
display_table(android_cleaned, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Most of the apps on Google Play store are designed for practical purposes like family, tools, business e.t.c

In [32]:
# freq table for Genre in Google play dataset
display_table(android_cleaned, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The Genre column has more categories but depicts almost a similar picture with the category column which shows a balance between fun apps and practical apps.

From the analysis above we can see that App store is dominated by fun apps while Google Play balances fun and practicality

**Genre with most users on App Store**

One way to find out what genres are the most popular (have the most users) is to calculate the average number of `installs` for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

We'll start with calculating the average number of user ratings per app genre on the App Store

In [35]:
# Generate a frequency table for the prime_genre column
ios_genres = freq_table(ios_cleaned, 11)

# Dictionary to store average ratings per genre
genre_avg_ratings = {}

# Isolate apps for each genre and calculate average ratings
for genre in ios_genres:
    total = 0
    len_genre = 0

    for app in ios_cleaned:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings  # Add up user ratings for each app
            len_genre += 1

    avg_n_ratings = total / len_genre  # Calculate average number of ratings
    genre_avg_ratings[genre] = avg_n_ratings  # Store in dictionary

# Sort genres by average ratings in descending order and print
sorted_genre_avg_ratings = sorted(genre_avg_ratings.items(), key=lambda x: x[1], reverse=True)

for genre, avg_rating in sorted_genre_avg_ratings:
    print(genre, ":", avg_rating)


Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


The top five genres are Navigation, Reference, Social Networking, Music and Weather. Let's look at the major apps contributing to these high ratings

In [37]:
# Navigation apps
for app in ios_cleaned:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Waze and Google Maps are the biggest influencers with over half a million reviews while the others have fewer reviews

In [40]:
# Social networking apps
for app in ios_cleaned:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Social networking giants like Facebook, Pinterest and Messenger dominate this landscape

In [39]:
# Reference apps
for app in ios_cleaned:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Bible and Dictionary are the biggest contributors 

In [42]:
# Music apps
for app in ios_cleaned:
    if app[11] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

From the investigation above, the high ratings are contributed by a few apps which have very high ratings while others struggle to even get a quarter of the ratings those apps have.

However the Music profile seem to be a bit balanced. Since App store is dominated by fun apps and that is what most users are going for we can create a music app. It can have free music in audio and video format and even radio. It can have the ability to create personalized playlists

**Genre with most users on Google Play Store**

Since we have data about installs from this dataset it is easier to determine the most popular genres. However the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [43]:
display_table(android_cleaned, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, we'll need to convert each install number from a string to a float. This means we need to remove the commas and the plus characters, or the conversion will fail and cause an error.

Let's remove those characters and calculate average number of installs

In [46]:
# Generate a frequency table for the category column
android_categories = freq_table(android_cleaned, 1)

# Dictionary to store average installs per category
category_avg_installs = {}

# Isolate apps for each category and calculate average installs
for category in android_categories:
    total = 0
    len_category = 0

    for app in android_cleaned:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)  # Add up number of installs for each app
            len_category += 1

    avg_n_installs = total / len_category  # Calculate average number of installs
    category_avg_installs[category] = avg_n_installs  # Store in dictionary

# Sort categories by average installs in descending order and print
sorted_category_avg_installs = sorted(category_avg_installs.items(), key=lambda x: x[1], reverse=True)

for category, avg_installs in sorted_category_avg_installs:
    print(category, ":", avg_installs)


COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

Communication apps dominate having the highest number of installs followed by video players and social. 

Let's take a closer look

In [48]:
for app in android_cleaned:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

This category seems saturated with a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs. If these are removed from the analysis the average will reduce significantly.

In [49]:
for app in android_cleaned:
    if app[1] == 'VIDEO_PLAYERS':
        print(app[0], ':', app[5])

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

Although dominated by the giant YouTube, the other apps seems to have significant number of installs.

We can maximize on this and build an app on this category since we saw the same potential in the App Store. We can incorporate the audio and radio functionality since very few apps here seem to have an all in one.

## Conclusion

In this project, we analyzed data on Google Play Store and App Store with an aim of recommending to the development team a profitable niche for our free app.

We conclude by recommending an app in the music genre which incorporates audio, video and radio functionality.