# Project: Profitable App Profiles for the iOS App Store and Google Play Markets

This project will be using a dataset of approximately 10,000 Android apps from Google Play and 7,000 iOS apps from the App Store. 

This project aims to analyze the iOS and Android apps to determine what type of apps are more likely to attract users.

# Opening the dataset

We will first open the two datasets,
1) the Google Play data set and
2) the App Store dataset.

In [29]:
from csv import reader

### The Google Play dataset ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store dataset ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

Definition of the columns are available in the [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

We will be defining the `explore_data()` function to verify our results which does the following:
- Prints a few rows of sample data
- Prints the number of rows and columns in the data set


In [30]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


From the above code, we know that there are 10,841 rows and 13 columns within the Google Play dataset. The columns include `App`, `Installs` and more.

A more detailed breakdown of the column definition is available [here](https://www.kaggle.com/lava18/google-play-store-apps).

# Data cleaning
Before we begin our analysis, we need to do the following:
- Detect inaccurate data, then correct or remove it
- Detect duplicate data, then remove them
- Remove non-English apps
- Remove paid Apps

# Detect inaccurate data, then correct or remove it

From the discussion section [here](https://www.kaggle.com/lava18/google-play-store-apps), we see that row 10472 contains incorrect data.

In [31]:
print(android_header)  # header
print('\n')
print(android[10472])  # incorrect row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


We can see that the `Genre` of this app is missing. We will delete this row first.

In [32]:
print(len(android)) #check entries before deletion
del android[10472]
len(android) #check entries after deletion

10841


10840

# Detect duplicate data, then remove them

Besides the row we deleted, the Google Play dataset contains some duplicate entries that will be removed.

Here, we can see an example, Instagram, with 4 duplicates.

In [33]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We can find the other duplicates using the below code. We will create two lists, `duplicate_apps` and `unique_apps`. If the `name` of the app already appears on `unique_apps`, we will add it to the duplicate list.

From this, we found a total of 1180 duplicate apps.

In [34]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Example of duplicate apps:',duplicate_apps[:5])

Number of duplicate apps: 1181


Example of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


Looking at the Instagram entries, we can see that the difference is in the 'Reviews' column. As we expect the latest data to contain the highest number of 'Reviews', we will keep the entry with the highest 'Reviews' value.

We will create a dictionary that:
- contains the unique app name as the key, highest number of reviews as the value
- is used to create a new list of apps with the highest number of reviews

In [35]:
reviews_max={} #creating dictionary with key = app name, value = number of reviews

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

We will check the number of entries in our dictionary:

In [36]:
print(len(reviews_max))

9659


We will create a new list of apps with the highest number of reviews.

In [37]:
android_clean = [] #new list of apps with highest number of reviews
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max [name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

We will check if the android_clean dataset contains the same number as reviews_max.

In [38]:
explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


# Remove non-English apps

Here, we determine a function `is_english_string` that we can use to recognise that the app names are not in English.

The `ord()` function returns the ASCII number associated with a character, with values above 127 indicating non-English text. We can use this to determine which apps contain non-English names.

In [39]:
def is_english_string(string):
    for character in string:
        if ord(character)>127:
            return False
    return True

print(is_english_string('Instagram'))
print(is_english_string('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))
print(is_english_string('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_english_string('Instachat üòú'))

True
False
False
False


We can see from here that the function has removed some English apps, as certain symbols (‚Ñ¢) and emojis fall out of the ASCII range.

As a proxy, we can create a function `is_english` to remove an app only if its name has more than three characters out of the ASCII range.

In [40]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

We can test this for some of the apps.

In [41]:
print(is_english('Docs To Go‚Ñ¢ Free Office Suite'))
print(is_english('Instachat üòú'))
print(is_english('Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠'))

True
True
False


Next, we will create a list of English only applications.

In [42]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)

for app in ios:
    name = app[0]
    if is_english(name):
        ios_english.append(app)

explore_data(android_english,0,3,True)
print('\n')
explore_data(ios,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

Here, we can see we have 9,614 Android apps and 7,197 iOS apps in English.

# Remove paid Apps

Next, we will create a list with only free apps. For the Google playstore dataset, free apps have a price of `'0'`, while the App Store dataset has a price of `'0.0'`.

In [43]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0': #free apps in Google playstore dataset have price of 0
        android_final.append(app)

for app in ios_english:
    price = app[4]
    if price == '0.0': #free apps in App Store dataset have price of 0.0
        ios_final.append(app)

print(len(android_final))
print(len(ios_final))

8864
4056


We are left with 8,864 Android apps and 4,056 iOS apps that are free.

# Most Common Apps by Genre
As mentioned, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue. We can then create an app that is able to do the same.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we develop it further.
- If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Thus, we need to find app profiles that are successful in both markets. We can use `'Genre'` and `'Category`' of the Google Play dataset and `'prime_genre'` of the App Store dataset to determine this.

We will create the `freq_table()` function that does the following:
- Takes in two parameters: `dataset` and `index`. `dataset` will be a list of lists, and `index` will be an integer
- Creates a frequency table based on any column specified.

We will then create the `display_table()` function that does the following:
- Transforms the frequency table we created into a list of tuples, then sorts the list in a descending order
- Prints the entries of the frequency table in descending order

In [48]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [108]:
def sort_index(list_of_lists):
    list_of_lists.sort(key= lambda row: (row[1], row[0])) 

## App Store - Genres

Based on the frequency table for `prime_genre` column of the App Store dataset, we see that the top 2 genres are: `Games` and `Entertainment`. The following apps are also related to entertainment, such as `Social Networking`.

In [50]:
display_table(ios_final,-5)

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


Based on this, we can tell that there is a huge market for apps of the entertainment genre.

With majority of the apps falling under the `Games` genre, it implies that the games market may oversaturated with other apps. This may make it difficult to distinguish ourselves if we create a game app, but we will need more information on the demand for these apps.

## Google Play - Category and Genres
Based on the frequency table for `Category` and `Genres` column of the Google Play dataset:
Top 2 Categories are: `Family` and `Game`
Top 2 Genres: `Tools` and `Entertainment`

There's a shared commonality of entertainment apps with the App Store results, but the percentages on the Google Play store are less skewed.

In [55]:
display_table(android_final,1) #Category
print('\n')
display_table(android_final,-4) #Genres

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Looking at the `Category` and `Genres` columns, the difference is that the latter is more specific - i.e, subgroups under the `Games` of `Category`. We will just use the `Category` column from the Google Play dataset to compare with the `prime_genre` column from the App Store.

Currently, we see that most apps fall into the entertainment category, especially for the App Store. However, we will need to look at the data from the user end.

# Most Common Apps by Installs

## Determining the iOS App Profile for App Store

In [69]:
genres_ios = freq_table(ios_final,-5)

In [93]:
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
            
    avg_n_ratings = total / len_genre 
    print(genre, ':', avg_n_ratings)

Social Networking : 53078.195804195806
Photo & Video : 27249.892215568863
Games : 18924.68896765618
Music : 56482.02985074627
Reference : 67447.9
Health & Fitness : 19952.315789473683
Weather : 47220.93548387097
Utilities : 14010.100917431193
Travel : 20216.01785714286
Shopping : 18746.677685950413
News : 15892.724137931034
Navigation : 25972.05
Lifestyle : 8978.308510638299
Entertainment : 10822.961077844311
Food & Drink : 20179.093023255813
Sports : 20128.974683544304
Book : 8498.333333333334
Finance : 13522.261904761905
Education : 6266.333333333333
Productivity : 19053.887096774193
Business : 6367.8
Catalogs : 1779.5555555555557
Medical : 459.75


We can see that `Reference`, `Music` apps have the most number of ratings for the App Store. `Games`, which made up most of the apps in the App Store however, has few ratings.

Zooming into `Reference`, we see that this is being skewed by the Bible and Dictionary apps. However, there are some religious texts and gaming references that are doing well too.

In [94]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ‚Ñ¢ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pok√©mon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Êïô„Åà„Å¶!goo : 0
ÂΩ©Â∫ìÂÆùÂÖ∏-„ÄêÂÆòÊñπÁâà„Äë : 0
Jishokun-Japanese English Dictionary & Translator : 0
ÁÑ°Êñô„ÅßÈü≥Ê•Ω„ÇÑÂÜôÁúü„Éª„Ç´„É°„É©„ÅÆË£èÊäÄ„Ç¢„Éó„É™ for iPhone7 : 0


This is also the case for the `Music` genre, with Pandora and Spotify Music taking most of the user ratings. However, there are a lot more players in this space compared to the `Reference` genre.

In [95]:
for app in ios_final:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio ‚Äì Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube 

As `Social Networking` has number of user ratings close to that of `Music`, we can look at the results too. This is also mainly due to the top players, such as Facebook.

In [97]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo ‚Äì Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger ‚Äì Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match‚Ñ¢ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miito

Nevertheless, there are some smaller apps doing well. Looking back at the `Reference` genre, it may be possible for us to create an app which is a guide of a popular game (Mobile/Desktop).

We can also use the results we found earlier using the same data set for this app profile. We previously found that `Games` is the Most Common App by Genre for the App Store dataset. We can look into the specific apps to determine which mobile game to create a guide for. From here, it seems that Clash of Clans has the most number of downloads.

In [98]:
for app in ios_final:
    if app[-5] == 'Games':
        print(app[1], ':', app[5])

Clash of Clans : 2130805
Temple Run : 1724546
Candy Crush Saga : 961794
Angry Birds : 824451
Subway Surfers : 706110
Solitaire : 679055
CSR Racing : 677247
Crossy Road - Endless Arcade Hopper : 669079
Injustice: Gods Among Us : 612532
Hay Day : 567344
PAC-MAN : 508808
DragonVale : 503230
Head Soccer : 481564
Despicable Me: Minion Rush : 464312
The Sims‚Ñ¢ FreePlay : 446880
Sonic Dash : 418033
8 Ball Pool‚Ñ¢ : 416736
Tiny Tower - Free City Building : 414803
Jetpack Joyride : 405647
Bike Race - Top Motorcycle Racing Games : 405007
Kim Kardashian: Hollywood : 397730
Trivia Crack : 393469
WordBrain : 391401
Sniper 3D Assassin: Shoot to Kill Gun Game : 386521
Flow Free : 373857
Geometry Dash Lite : 370370
‚ñªSudoku : 359832
Fruit Ninja¬Æ : 327025
Pixel Gun 3D : 301182
Temple Run 2 : 295211
My Horse : 293857
Word Cookies! : 287095
Dragon City Mobile : 277268
The Simpsons‚Ñ¢: Tapped Out : 274501
Plants vs. Zombies‚Ñ¢ 2 : 267394
Clash Royale : 266921
Pok√©mon GO : 257627
CSR Racing 2 : 257100


## Determining the Android App Profile for Google Play

In [103]:
category_android = freq_table(android_final,1)

In [111]:
for category in category_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            total += float(n_installs)
            len_category += 1
            
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

We can see that `COMMUNICATION` apps have the most number of installs for Google Play, with apps like WhatsApp Messenger, Gmail making up most of it.

In [119]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and app[5] == '1,000,000,000+':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger ‚Äì Text and Video Chat for Free : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+


We analyze the `BOOKS_AND_REFERENCE` category, which is the equivalent of the App Store `Reference` prime_genre.

This category is saturated with free eBook readers and various dictionaries. Thus, we can focus on the niche markets within this category.

In [122]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+

To focus on the small to mid players, we can start by removing those apps with either 'Dictionary' or 'Book' or 'Ebook'.

One similarity with the App Store is that there are also quite a few of gaming and religious texts here. For the former, unlike the App Store, we do not see as many online game guides.

In [151]:
for app in android_final:
    app_name = app[0]
    app_name = app_name.lower()
    if app[1] == 'BOOKS_AND_REFERENCE' and ("dictionary" not in app_name
                                            and "ebook" not in app_name
                                            and "book" not in app_name
                                            and "reader" not in app_name):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Free Panda Radio Music : 100,000+
FamilySearch Tree : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
Anonymous caller detection : 10,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
English translation from Bengali : 100,000+
Only 30 days in English, the guideline is guaranteed : 500,000+
SH-02J Owner's Manual (Android 8.0) : 50,000+
All Language Translator Free : 1,000,000+
URBANO V 02 instruction manual : 100,000+
Bible : 100,000,000+
C Programs and Reference : 50,000+
C Offline Tutorial : 1,000+
Amazon Kindle : 100,000,000+
Aab e Hayat Full Novel : 100,000+
Google I/O 2018 : 500,000+
R Language Reference Guide : 10,000+
Learn R Programming Full : 5,000+
R Programing Offline Tutorial : 1,000+
Guide for R Programming : 5+
Learn R Programming : 10+
R Quick Reference Big Data : 1,000+
V Made : 100,000+
Guide (for X-MEN) : 100,000+
AC Air condition Troubleshoot,Repair,Maintenance : 5,000+
AE Bulletins : 1,000+
Ae Alla

Similar to the App Store, another possible App Profile for Google Play would be gaming guides. However, we would need to get more information on the type of games (offline/online (mobile,desktop) to create the guides for, as well as the sources.

# Conclusion

In this project, we analyzed the app profiles on the Google Play and App Store based on its genre, user ratings and installs. 


As most of the genres of the apps are dominated by big players based on user ratings and number of installs, it may be best to focus on the small-mid market within each genre.

We have looked at several genres such as Social Networking and Games. Across these, the reference genre is one with relatively low barriers to entry (due to the availability of resources offline and online) and high user count, so we do recommend it as one of the app profiles.

Moving forward, we could conduct more detailed analysis to determine which reference text to use for the app, and what additional features could set the app apart from existing competitors.