# What characterizes attractive free mobile applications?

This project is done for a company developing apps relying on in-app ads as a predominant source of income. In such businesses the number of users is a critical factor to business success. The aim of this project is to analyse data to help developers understand what types of apps are likely to attract more users.

We start by opening the data sets and storing them in lists. The headers will be excluded from the lists and instead be stored separately.

In [1]:
from csv import reader
opened_file = open('AppleStore.csv', encoding="utf8")
read_file = reader(opened_file)
ios_data = list(read_file)
ios_header = ios_data[0]
ios_data = ios_data[1:]


opened_file = open('googleplaystore.csv', encoding="utf8")
read_file = reader(opened_file)
android_data = list(read_file)
android_header = android_data[0]
android_data = android_data[1:]

The following function lists the entries in the entered range of the entered data set. It can also display the number of rows and columns of the data set. It is useful for quickly examining a data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
explore_data(ios_data, 0, 5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


In [4]:
explore_data(android_data, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


Documentation for the datasets can be found under the following links:
https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps
https://www.kaggle.com/lava18/google-play-store-apps 

The columns are as follows:

In [5]:
print(ios_header)
print('\n')
print(android_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


The discussion page of the google play store data set suggests there are issues with entry 10472. We print it, and if that's the case we'll delete it to avoid further issues.

In [6]:
print(android_data[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
del(android_data[10472])

In [8]:
print(android_data[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


# Removing duplicates
The Google Play data set has duplicate entries. The following code finds duplicates and displays some examples.

In [9]:
unique_entries = []
duplicate_entries = []

for app in android_data:
    if app[0] in unique_entries:
        duplicate_entries.append(app[0])
    else:
        unique_entries.append(app[0])
        
print("Duplicate entries:", len(duplicate_entries))
print("\n")
print("Examples:", duplicate_entries[:15])

Duplicate entries: 1181


Examples: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Duplicates won't be deleted randomly. Rather, the one with highest number of reviews will be kept as this is likely the most resent entry. The following code is used to find duplicates while accounting for the forementioned condition.
Keep in mind that the number of rows is 10840 and number of duplicates is 1081. Hence, expected number of unique entries is 10840-1081 = 9659

In [10]:
reviews_max = {}

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    if (name in reviews_max and n_reviews > reviews_max[name]) or name not in reviews_max:
        reviews_max[name] = n_reviews
print(len(reviews_max))

9659


As expected, the number of unique entries corresponds to 9659.
We have created a dict containing unique app name as key and maximum number of reviews for that app as value. We can now use this to create a list containing only the apps corresponding to this dict. This procedure essentially deletes all duplicates, keeping only the copy with the largest number of reviews.

In [11]:
android_clean = []
already_added = []

for app in android_data:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

The duplicates are removed, and to confirm this we check that its length corresponds to the expected 9659 and that entries seem to have been added correctly, including all columns.

In [12]:
explore_data(android_clean, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


The same procedure is done for ios app data:

In [13]:
unique_entries = []
duplicate_entries = []

for app in ios_data:
    if app[0] in unique_entries:
        duplicate_entries.append(app[0])
    else:
        unique_entries.append(app[0])
        
print("Duplicate entries:", len(duplicate_entries))

Duplicate entries: 0


As one can see, no duplicates exists in the iosapp store data. Thus, no further action regarding this is needed.

# Removing non-english apps

As the company targets an english-speaking audience, entries of non-english speaking apps should be removed to make the data more representative. Looking at the data sets, they both contains such entries. For example:

In [14]:
print(ios_data[813][1])
print(android_clean[4412][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
中国語 AQリスニング


The built-in `ord()`-function will be utilized to return the unicode code point of each character. https://docs.python.org/3/library/functions.html#ord
If a character in the app name doesn't have a code in the range of 0 to 127, the app can be discarded as it includes characters not commonly used in English text.

The  `is_english`-function checks if a given string utilizes only common english characters or not.

In [15]:
def is_english(str):
    for c in str:
        if ord(c) > 127:
            return False
    return True

Lets try the funtion out on some examples:

In [16]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


The above examples shows that the function correctly singles out strings that uses non-common english characters. However, this also includes characters such as emojis. Therefore, as in the last two examples, certain apps aimed at an english are also flagged as non-english by our function. This means data loss.

To minimize this effect the function is altered below to `only return false if more than 3 characters are not common english characters`. An overwhelming majority of apps targeting an english-speaking audience are likely using less than four such characters in their app names. Few apps not targeting an english-speaking audience likely have names shorter than four characters. There might be some data loss as well as apps getting past the filter, but these potential issues should be of a rather tiny scale in relation to the entire data set, thus not having a significant effect on the result.

In [17]:
def is_english(str):
    n = 0
    for c in str:
        if ord(c) > 127:
            n += 1
    return True if n <= 3 else False

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


Using the later `is_english`-function, non-english apps are removed.

In [18]:
ios_eng = []
for app in ios_data:
    if is_english(app[1]):
        ios_eng.append(app)

android_eng = []
for app in android_clean:
    if is_english(app[0]):
        android_eng.append(app)

explore_data(ios_eng, 0, 3, True)
print('\n')
explore_data(android_eng, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Vari

# Isolate free apps

In this project we are interested in free apps, running ads as a source of revenue. Therefore, we isolate the free apps to make the data set even more accurate for relevant analysis.

In [19]:
ios_final = []
android_final = []

for app in ios_eng:
    if float(app[4]) == 0:
        ios_final.append(app)

for app in android_eng:
    if app[6] == "Free":
        android_final.append(app)

explore_data(ios_final, 0, 3, True)
print('\n')
explore_data(android_final, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Vari

Remaining are 3222 ios-apps and 8863 android-apps. This should suffice for relevant analysis.

# Apps by genre

## Context
As mentioned earlier, this project aims to determine what kind of apps are likely to attract more users. This is desired as an ad-dependent business model heavily relies on the number of users.

To minimize risk and overhead, our app launches following a certain validation strategy made up off three steps:
1. Build an Android version of the app, and publish it on Google Play
2. If user reception is solid, furher development is done
3. Following six months of profitability, an IOS version of the app is developed and published on the App Store

Since the end goal is to publish apps on both Google Play and in the App Store, we desire profiles that are successful in both of these markets.

We will start by examining genres and how common they are. This is done below, using frequency tables for the `prime_genre` in the App Store data set and the `Genres` as well as the `Category`columns in the Google Play data set.

## Functions
Two functions are created to aid this cause:
1. `freq_table`. Creates a frequency table of a given column and dataset. The frequency is expressed in percentages.
2. `display_table`. Displays the frequency table in descending order.

In [20]:
def freq_table(dataset, index):
    ftable = {}
    for i in dataset:
        col = i[index]
        if col in ftable:
            ftable[col] += 1
        else:
            ftable[col] = 1
    
    for i in ftable:
        ftable[i] = ftable[i]*100/len(dataset)
        
    return ftable

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

## Number of apps per genre in the respective stores
We begin by examining a frequency table for the `prime_genre` column in the App Store data set.

In [21]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.6623215394165114
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.017380509000621
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that out of the free apps in the App Store targeting an english-speaking market, 58% are Games. There is a significant gap between Games and the second biggest genre, Entertainment, coming in at only 8%. This is followed by the genres of Photo & Video, Education and Social Networking, coming in at 5%, 4% and 3% respectively.

It seems as though the App Stores free apps targeting an english-speaking market are predominantly related to fun in contrast to apps with practical purposes that has a relatively slim presence on the platform in the stated segment. Note that this does not imply that "fun" apps have more users, strictly that the number of apps belonging to such categories are more numerous in the segment.

We now continue by looking at the frequency tables for android data, for the columns of `Category` and `Genres` respectively.

In [22]:
display_table(android_final, 1)

FAMILY : 18.8987927338373
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.700778517432021
MEDICAL : 3.5315355974275078
SPORTS : 3.3961412614238973
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376733
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.2452894053932075
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496447
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916394
AUTO_AND_VEHICLES : 0.9251946293580052
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189553
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0.

The largest category is FAMILY at 19%, followed by GAME (10%), TOOLS (8%), BUSINESS (5%), LIFESTYLE (4%), PRODUCTIVITY (4%) and FINANCE (4%). In comparison with the App Store there seems to be a lot more of an emphasis on apps with practical purposes in the Google Play Store. It is worth noting that the FAMILY category mostly contains games of kids so apps made for fun do have a representation in the top 2 categories. However, they are significantly less dominant in the Google Play Store.

In [23]:
display_table(android_final, -4)

Tools : 8.450863138892023
Entertainment : 6.070179397495205
Education : 5.348076272142615
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.700778517432021
Medical : 3.5315355974275078
Sports : 3.4638384294257025
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376733
Travel & Local : 2.324269434728647
Shopping : 2.2452894053932075
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496447
Arcade : 1.8503892587160105
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.1282861333634209
Racing : 0.9928917973598105
Role Playing : 0.9364774906916394
Libraries & Demo : 0.9364774906916394
Auto & Vehicles : 0.9251946293580052

It is not quite clear how the `Genres` and `Category` columns relate to each other. A clear difference is that `Genres` may contain multiple values, whereas `Category` always are singular. For now, we focus on the bigger picture and concentrate on the `Category` column.

It is clear that free apps targeting an english-speaking audience in the Google Play Store are more emphasizing practical purposes rather than fun, in comparison to the App Store. Yet we can't really draw any conclusions regarding what kind of apps attract more users, as the number of apps in the respective categories do not necessarily correlate with the number of users.

## Most popular apps by genre
We now want to look at the average number of users per genre to get a sense of what genres are more popular. In the Google Play data set we have the column `Installs` representing the number of installs made of the respective apps. Eventhough installing an app doesn't necessarily result in usage, it should be a decent indication.
A column similar to `Installs` is abscent in the App Store data set. However, we now the number of rating of each app, stored in the `ratingcounttot`. This should be a decent proxy for the number of users, so we use this number in our analysis.

We start by calculating the average number of user ratings for each genre using the App Store data set:

In [24]:
ios_genres = freq_table(ios_final, -5)
for genre in ios_genres:
    total = 0
    len_genre = 0
    for app in ios_final:
        if app[-5] == genre:
            total += float(app[5])
            len_genre += 1
    print(genre, ":", total/len_genre)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Looking at these numbers solely may be a bit misleading. There can be cases where a few apps completely dominates a genre, skewing the data. Therefore, we will look at the number of ratings for each app for the interesting genres, to try and get an idea of the spread. Trying to enter a genre where the ratings are essentially concentrated around very few apps is probably not a great idea as this could indicate that the genre in itself isn't actually as popular as it seems on the surface.

The largest average number of user ratings are found in the genre of `Navigation`. Lets print the apps belonging to this genre:

In [25]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


As is apparent the ratings are greatly concentrated around Waze and Google Maps, that together make out 96.8% of the total ratings in the genre.

Another genre with a large number of average user ratings is `Social Networking`. Lets print the apps in this genre:

In [26]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Similar to `Navigation` the ratings in this genre seem to be concentrated around a few giants, such as Facebook, Pinterest and Skype.

Looking at `Music` the same applies in that genre, as the ratings are greatly concentrated around players like Spotify, Pandora and Shazam.

This high degree of concentration indicates that Navigation, Social Networking and Music might not be as popular as genres as they seem at the surface. We may get a better idea of the potential in these genres if we exclude these few apps that heavily skews the data.

The second largest average number of ratings is found in `Reference`. Lets look at these apps:

In [27]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


It is apparent that a couple of apps make out a large portion of the ratings, most notably `Bible` and `Dictionary.com`, causing a skew. However, the concentration is not nearly as great as in some other mentioned genres. Creating an app in this genre can be done, for example, by taking a popular book and adding different features. This could probably be done at a relatively low cost.

As previously determined, the App Store is dominated by "fun"-apps. The Reference genre is of a more practical character, and the general relative abscence of such apps could indicate opportunities in this market. This further legitimizes the idea that this genre is a good option.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

- Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

- Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

- Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.


Lets make a similar analysis of the Google Play Store. The number of installs are stored as strings like "100,000+". Thus, we don't actually have very precise data as the previous figure might be 150,000, 200,000 or anything greater than 100,000. We will have to make due with what we've got and for our purpose very precise data is probably not critical. 

To make computations we need to convert the strings to a quantifiable data type, such as float. To achieve this, we will have to remove the non-integer characters ',' and '+'. This is done using the built-in `replace`-function. https://docs.python.org/3/library/stdtypes.html?#str.replace

In [33]:
android_genres = freq_table(android_final, 1)

for category in android_genres:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace('+', '')
            installs = installs.replace(',', '')
            total += float(installs)
            len_category += 1
    avg_installs = total/len_category
    print(category, ":", avg_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps have the most installs on average: 38,456,119. This is very much skewed up by a couple of apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [34]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

With these apps excluded, the average decreases 10 times, proving this heavy skew:

In [35]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

The Video Player category shows a similar pattern regarding skew, coming up in second place with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The same pattern is also present among social apps, photography apps, or productivity apps.

Again, the main concern is this heavy skew causing these genres to look more popular than they actually might be. Additionally, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre scores quite high on average. However, as previously discussed this part of the market seems like it could be a bit saturated for the App Store. Since we are looking to make a recommendation that has potential in both stores we should look for a different recommendation.

The books and reference category appears quite popular in the Google Play Store, coming in at 8767812. The fact that this is also the case in the App Store makes this genre interesting to examine further, as we're looking at a recommenation that can succeed in both stores.

Let's take a look at some of the apps from this genre and their number of installs:

In [36]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference category includes a quite wide range of apps, such as E-book processing software, collections of libraries, dictionaries, etc. This category is no exception to the earlier mentioned skew apparent in other large categories, where a few apps make up a large portion of the installs:

In [37]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, there seem to be a rather small number of apps with such a great reach in comparison with other categories. The skew appears less present in this category. Let's look at the mid segment of the market, with apps between 1,000,000 and 100,000,000 installs:

In [38]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This category appears dominated by E-book processing software, collections of libraries and dictionaries. It is probably a good idea to avoid such type of apps as the competition is bound to be rather high.

We also notice there are several apps based on the Quran, which suggests that the model of building an app around a popular book can be profitable. It seems as making an app based on a book could be a promising idea for an app likely to attract many users. Since there are already a lot of libraries present in the Google Play Store it is probably a good idea to add value in some additional way, by adding features such as daily quotes, a forum or similar.

## Conclusions
In this project we analysed data from both the App Store and the Google Play Store. This was done to identify what kind of free apps targeting an English-speaking audience are likely to attract many users. This is important conclusions for the company in question that have an add-based business model for their apps, making the number of users a key variable in business success.

It was concluded that it could be a good idea to turn a popular book into an app, and add supporting features to add additional value. This kind of an app seems like it has potential to attract many users in both the App Store and the Google Play store, thus supporting the company's goals.