# Profitable App Profiles for the App Store and Google Play Markets

The goal of this project is to identify profitable mobile app profiles for the App Store and Google Play markets. our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

We will use two data sets that seem suitable for our purpose:

A [dataset](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)
containing data about approximately ten thousand Android apps from Google Play. 

A [data set](https://dq-content.s3.amazonaws.com/350/AppleStore.csv) containing data about approximately seven thousand iOS apps from the App Store. 

Let's start by opening the two data sets and then continue with exploring the data.

In [1]:
from csv import reader

#Googleplay_Store Dataset
opened_file = open('googleplaystore.csv')
read_list = reader(opened_file)
android = list(read_list)
android_header = android[0]
android_data = android[1:]


#App_Store Dataset
opened_file = open('AppleStore.csv')
read_list = reader(opened_file)
ios = list(read_list)
ios_header = ios[0]
ios_data = ios[1:]



To make it easier to explore the two data sets, we'll first write a function named `explore_data()` that we can use repeatedly to explore rows in a more readable way. 

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

explore_data(android_data, 0, 5, False)
explore_data(ios_data, 0, 5, False)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '

***
## Columns helpful in our Analysis
At a quick glance of the Google Play Store dataset, the columns that might be useful for the purpose of our analysis are`Category`, `Rating`, `Reviews`, `Size`, `Installs`, `Type`, `Price`, `Genre`

in the iOS data set,the columns that seem interesting are: `track_name`, `currency`, `price`, `rating_count_tot`, `rating_count_ver`, and `prime_genre`.

***
## Deleting Wrong Data
The Google Play data set has a dedicated discussion section, and we can see that some rows have incorrect or incomplete Data, which has been reported from this discussion section :- [Discussion Forum](https://www.kaggle.com/lava18/google-play-store-apps/discussion)


In [3]:
print(android_data[10472])
print('\n')
print(android_header)
print('\n')
#Correct Row
print(android_data[0])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


As we can see that the **row 10472** does not have an entry for the **Category** section which makes the row incorrect shifting all other corresponding values to the left. 

**For Example the 'rating' row** the row has the rating 19 but google store apps only have rating upto 5

So now we will delete that row

In [4]:
print(len(android_data))
print('\n')
print(android_data[10472])
print('\n')
del android_data[10472]
print('\n')
print(android_data[10472])
print('\n')
print(len(android_data))

10841


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


10840


***
## Remove Duplicate Entries: Part One

If we explore the Google Play data set long enough or look at the [discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion) section, we notice that some apps have duplicate entries. For instance, Instagram has four entries:

In [5]:
for app in android_data:
    app_name = app[0]
    if app_name == 'Instagram':
        print('\n')
        print(app)



['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. 

**So, now lets first count the number of duplicate apps in the dataset**


In [6]:
duplicate_apps = []
unique_apps = []

for app in android_data:
    app_name = app[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


---
## Removing Duplicate Entries: Part Two

**Now we could remove the duplicates randomly**

But if we observe the Duplicate Instagram entries above for example
we can observe that they all have different reviews.

We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. 

So Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

In [7]:
reviews_max = {}

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


In [8]:
#New cleaned dataset
android_clean = []
#Store app names
already_added = []

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    if name not in already_added and n_reviews == reviews_max[name]:
        android_clean.append(app)
        already_added.append(name)

print(len(android_clean))

9659


---
## Above Code explained

To remove the duplicate entries, we basically start by making two empty lists `android_clean` to store the cleaned data and `already_added` to store the already added apps to keep track of the duplicates 

    for app in android_data:
        name = app[0]
        n_reviews = float(app[3])
        if name not in already_added and n_reviews ==      
        reviews_max[name]:
            android_clean.append(app)
            already_added.append(name)

    print(len(android_clean))

our code basically loops through the data set, and converts to reviews to float values, which is assigned to `n_reviews` after that we check if the app exists in the `already_added` dataset and if the number of reviews is maximum for that app `n_reviews == reviews_max[name]`

if the condition is satisfied we add the app to the `android_clean` dataset and the name of the app to the `already_added` dataset

in the end we print the length of the android_clean dataset `print(len(android_clean))`

---
## Removing Non-English Apps: Part One

we'd like to analyze only the apps that are designed for an English-speaking audience

if we explore the data long enough, we'll find that both datasets have apps with names that suggest they are not designed for an English-speaking audience. 

**We're not interested in keeping these apps, so we'll remove them.** 

In [9]:
def is_eng(string):
    for character in string:
        if ord(character) > 127:
            return False
        else:
            return True

print(is_eng('Instagram'))
print(is_eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_eng('Docs To Go™ Free Office Suite'))
print(is_eng('Instachat 😜'))


True
False
True
True


## Removing Non-English Apps: Part Two

we wrote a function that detects non-English app names, but we saw that the function couldn't correctly identify certain English app names like `'Docs To Go™ Free Office Suite'` and `'Instachat 😜'`. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127.

In [10]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


**Now we will use the new function to filter out non-English apps from both datasets**

In [11]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagr

---
## Isolating the Free Apps
Our data sets contain both free and non-free apps, and for our analysis we isolate only the free apps. 
Below, we isolate the free apps for both our data sets.

In [12]:
final_android = []
final_ios = []
for app in android_english:
    price = app[7]
    if price == '0':
        final_android.append(app)

for app in ios_english:
    price = app[4]
    if price == '0.0':
        final_ios.append(app)

print(final_android[:2])
print(len(final_android))
print('\n')
print(final_ios[:2])
print(len(final_ios))

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']]
8864


[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']]
3222


Finally we are left with `8864` **Android apps** and `3222` **Ios apps** which are `Free` and only `English`

---
## Most Common Apps by Genre: Part One

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets.



In [13]:
print(android_header)
print('\n')
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## Most Common Apps by Genre: Part Two
For this, we'll build a frequency table for the `prime_genre` column of the App Store data set, and the `Genres` and `Category` columns of the Google Play data set.

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages

- Another function we can use to display the percentages in a descending order


In [14]:
def freq_table(dataset, index):
    table = {}
    count = 0
    
    for row in dataset:
        count += 1
        key = row[index]
        if key in table:
            table[key] += 1
        else:
            table[key] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / count) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

## Most Common Apps by Genre: Part Three
We start by analyzing the frequency table generated for the `prime_genre` column of the App Store dataset

In [15]:
display_table(final_ios, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


**Analyzing the Table we can observe that:**
- The most common genre is: `Games` which is at 58%
- The next most common is: `Entertainment` at 3.66%

*observing the frequency table we get the general impression that most apps are designed for entertainment purposes like `Games`, `Photo & Video`, `Social Nerworking` and `sports` etc* 

However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Let's continue by examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).

In [16]:
display_table(final_android, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

From the table for the `Category` section we can observe that the most common category for apps in the Google Play store is the `Family` category at 18.9% which are basically apps or games for kids

And if we observe the table of the `Genres` column we can observe that - 

In [17]:
display_table(final_android, 9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Most apps are made for productive use compared to fun and entertainment like the `Tools` genre has the maximum no.of apps at 8.44% and even the `Education``Business`, `Productivity` and `Lifestyle` etc, have high number of apps at 5.34%, 4.59%, 3.89%, 3.89% respectively

**After analyzing the tables we can say that they seem significantly different.**

On Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.)

Whereas, we found that the App Store is dominated by apps designed for fun.

---
## Most Popular Apps by Genre on the App Store

The frequency tables we analyzed on the previous screen showed us that apps designed for fun dominate the App Store, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to determine the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play dataset, we can find this information in the `Installs` column, but this information is missing for the App Store dataset. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.



In [18]:
genre_ios = freq_table(final_ios, 11)

In [19]:
for genre in genre_ios:
    total = 0
    len_genre = 0
    for app in final_ios:
        genre_app = app[11]
        if genre_app == genre:
            user_ratings = float(app[5])
            total += user_ratings
            len_genre += 1
        
    avg_rating = (total/len_genre)
    print(genre, ':', avg_rating)
        
    

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average the `Navigation` Genre has the highest number of reviews, but lets analyze the `Games` genre as it has the most potential to be profitable with large popularity in mobile games right now

In [20]:
for app in final_ios:
    if app[11] == 'Games':
        print(app[1], ":", app[5])

Clash of Clans : 2130805
Temple Run : 1724546
Candy Crush Saga : 961794
Angry Birds : 824451
Subway Surfers : 706110
Solitaire : 679055
CSR Racing : 677247
Crossy Road - Endless Arcade Hopper : 669079
Injustice: Gods Among Us : 612532
Hay Day : 567344
PAC-MAN : 508808
DragonVale : 503230
Head Soccer : 481564
Despicable Me: Minion Rush : 464312
The Sims™ FreePlay : 446880
Sonic Dash : 418033
8 Ball Pool™ : 416736
Tiny Tower - Free City Building : 414803
Jetpack Joyride : 405647
Bike Race - Top Motorcycle Racing Games : 405007
Kim Kardashian: Hollywood : 397730
Trivia Crack : 393469
WordBrain : 391401
Sniper 3D Assassin: Shoot to Kill Gun Game : 386521
Flow Free : 373857
Geometry Dash Lite : 370370
▻Sudoku : 359832
Fruit Ninja® : 327025
Pixel Gun 3D : 301182
Temple Run 2 : 295211
My Horse : 293857
Word Cookies! : 287095
Dragon City Mobile : 277268
The Simpsons™: Tapped Out : 274501
Plants vs. Zombies™ 2 : 267394
Clash Royale : 266921
Pokémon GO : 257627
CSR Racing 2 : 257100
Star Wars™: 

From the Table we can see that most reviews come from the game `Clash of Clans` : 2130805 and `Temple Run` : 1724546

Let's also analyze the `Music` genre, which produces profits through its subscription models.In the `music` genre most reviews are for `Pandora - Music & Radio` : 1126879 and second most are for `Spotify Music` : 878563, which I had expected to be on the top, 

*This unexpected finding demonstrates the importance of relying on data analysis rather than assumptions or expectations and highlight the critical role of data analysis in understanding trends and making informed choices.*

In [21]:
for app in final_ios:
    if app[-5] == 'Music':
        print(app[1], ":", app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

## Most Popular Apps by Genre on Google Play
We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity.

In [22]:
display_table(final_android, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

**To perform computations, however, we'll need to convert each install number from a string to a float.**

In [23]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [24]:
category_android = freq_table(final_android, 1)

In [25]:
for category in category_android:
    total = 0
    len_category = 0
    for app in final_android:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_category += 1
    avg_installs = total/len_category
    print(category, ":", avg_installs)
    

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

We can notice that the `Communication` genre has the most number of Installs and In that category we can see that the Most Installs are for WhatsApp Messenger which are 1,000,000,000+

In [26]:
for app in final_android:
    if app[1] == 'COMMUNICATION':
        print(app[0], ":", app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

One other category that has a high number of Installs in the Google Play Store is the `GAME` category, which has 15588015 installs,  a common trend we observed in the App Store as well.

In [27]:
for app in final_android:
    if app[1] == 'GAME':
        print(app[0], ":", app[5])

Solitaire : 10,000,000+
Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Bubble Witch 3 Saga : 50,000,000+
Race the Traffic Moto : 10,000,000+
Marble - Temple Quest : 10,000,000+
Shooting King : 10,000,000+
Geometry Dash World : 10,000,000+
Jungle Marble Blast : 5,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Block Craft 3D: Building Simulator Games For Free : 50,000,000+
Farm Fruit Pop: Party Time : 1,000,000+
Love Balls : 50,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Paint Hit : 10,000,000+
Snake VS Block : 50,000,000+
Rolly Vortex : 10,000,000+
Woody Puzzle : 1,000,000+
Stack Jump : 10,000,000+
The Cube : 5,000,000+
Extreme Car Driving Simulator : 100,000,000+
Bricks n Balls : 1,000,000+
The Fish Master! : 1,000,000+
Color Road : 10,000,000+
Draw In : 10,000,000+
PLANK! : 500,000+
Looper! : 1,000,000+
Trivia Crack : 100,000,000+
Will it Crush? : 5,000,000+
Tomb of the Mask : 5,000,000+
Baseball Boy! : 10,000,000+
Hello Stars : 10,000,000+
Tank Stars : 1

In [28]:
for app in final_android:
    if app[1] == 'GAME' and (app[5] == '500,000,000+'):
        print(app[0], ":", app[5])

Candy Crush Saga : 500,000,000+
Pou : 500,000,000+
My Talking Tom : 500,000,000+
Temple Run 2 : 500,000,000+


With `Candy Crush Saga` and `Temple Run 2` a common game in the App Store as well with 500,000,000+ Installs

We can observe that both the App Store and Google Play Store are dominated by `Social Media or Communication`, `Books and Reference` and `Game` apps

We also notice that Most popular games are of `strategy`, `puzzle`. `rpg` or `endless styles`, which make most profit through their in- app purchases, special events and rewards for their users.

However this category is dominated by a few apps already and to make a profitable app we will have to implement similar features in our app development with in-app purchases, special events and rewards for users, and some other unique features like some `social features`, `personalization and customization`, `cross-platform accessibility` and some `early access features` for paid users.

---
## Conclusion
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that making a game app would be more profitable for both the Google play store and the App Store, with its diverse monetization oppurtunities like in-app purchases, advertisements, and premium content allowing us to generate revenue directly from players, unlike many other app genres that rely solely on ads or subscriptions