# The best genre to develop a free app

In 2019, the App Store generated 54.2 billion dollars of revenue, even though, as of march 2020, only 9.6% of the apps on the App Store were paid (Clement, 2020). In fact, 98% of app revenue worldwide comes from free apps. How do they do it? Free apps have two main ways of generating a profit, one could be the so-called "Freemium" model, in which an app is offered for free, but it offers optional in-app purchases for additional content, subscriptions or digital goods.

Another way is that many free apps include ads, under this model, the more people that use the app, the better. This type of apps will be the main focus on this project. 

It is important for an app developer to know which apps are the ones the market is looking for and that don't have a tight competition. This project tries to explain which genre of free apps is more likely to attract more users, which could make them profitable through the use of ads.

## Exploring the data

As of January 2020, there were approximately 1.8 million apps available on the App Store and 2.5 million apps on Google Play.
Analyzing this amount of data is not a simple task, so we will take a sample of the two stores.

* For the Google Play store I used the following [data set](https://www.kaggle.com/lava18/google-play-store-apps), which contains the information of 10,000 apps. This data set was collected in August 2018.

* For the App Store I used this [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps), which includes information of around 7,000 apps. This data was collected in July 2017.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
from csv import reader

### Google Play data set ###
open_file=open("googleplaystore.csv")
read_file=reader(open_file)
gpa=list(read_file)
gpa_header=gpa[0]
gpa_data=gpa[1:]

### App Store data set ###
open_file=open("AppleStore.csv")
read_file=reader(open_file)
apa=list(read_file)
apa_header=apa[0]
apa_data=apa[1:]

In [3]:
### Exploring Google Store data set ###
print(gpa_header)
print("\n")
print(explore_data(gpa_data,1,4,rows_and_columns=True))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13
None


As we can see this data set contains the information of 10,841 apps. Some columns that might be useful for our analysis are 'App', 'Category','Reviews', 'Installs', 'Price' and 'Type'.

You can find the description for each column [here](https://www.kaggle.com/lava18/google-play-store-apps).

Now, let's take a look at the App Store Data set:

In [4]:
### Exploring App Store data set ###
print(apa_header)
print("\n")
print(explore_data(apa_data,1,4,rows_and_columns=True))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16
None


This data set contains the information of 7197 apps. The columns that might help us with our analysis are 'price', 'user_rating', 'cont_rating'and 'prime_genre'.

You can find the description for each column [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

## Cleaning the Data

### Duplicates and missing values

#### Google Play data set

Looking through the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion) for the Google Play data set, [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) mentions that there is an error on row 10472, so let's check out that row.

In [5]:
## Printing row 10472
print(gpa_header,"\n",len(gpa_header))
print("\n")
print(gpa_data[10472],"\n",len(gpa_data[10472]))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 
 13


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 
 12


We can see that the length of the header and the length of that row vary, inspection of that row reveals that this is because the 'category' column is missing.
We are going to go ahead and delete that row.

In [6]:
print(len(gpa_data))
del gpa_data[10472]
print(len(gpa_data))

10841
10840


While looking through other discussions, some apps have duplicate values. For instance, if we look for "Instagram" on our data set, we'll get several entries with different number of reviews:

In [7]:
for app in gpa_data:
    name=app[0]
    
    if name=="Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Let's find out how many and which apps are duplicated:

In [8]:
dup_apps=[]
unique_apps=[]

for app in gpa_data:
    name=app[0]
    
    if name in unique_apps:
        dup_apps.append(name)
        
    else:
        unique_apps.append(name)

print("Number of duplicated apps: ",len(dup_apps),"\n")
print("Number of unique apps : ",len(unique_apps),"\n")
print("Apps that are duplicated:","\n",dup_apps[:5])

Number of duplicated apps:  1181 

Number of unique apps :  9659 

Apps that are duplicated: 
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


As we've seen, the only difference between these duplicate values is the number of reviews, this means that the more reviews there is, the more recent the data is. We're going to take the most recent data and remove the rest for each of the duplicated apps.

In order to do that, we will:

* Create a dictionary, where each key is a unique app name and the corresponding dictionary value is the highest rating.
* Use this dictionary to create a new data set with no duplicates.

First, we create a dictionary that stores an unique name app with the corresponding highest number of reviews.

In [9]:
max_review={}

for app in gpa_data:
    name=app[0]
    nrating=float(app[3])
    
    if name in max_review and nrating>max_review[name]:
        max_review[name]=nrating
        
    if name not in max_review:
        max_review[name]=nrating

To ensure everything went correctly, let's calculate the number of rows that we should have.

In [10]:
print("Number of rows we should have: ",len(gpa_data)-len(dup_apps))
print("Number of rows we have: ",len(max_review))

Number of rows we should have:  9659
Number of rows we have:  9659


Perfect, everything went as expected.

Now, let's use the 'max_review' dictionary to create a new data set 'gpa_clean' which has no duplicated rows.

In [11]:
gpa_clean=[]
apps_added=[] #Some apps are repeated AND have the same rating

for app in gpa_data:
    name=app[0]
    nrating=float(app[3])
    
    if nrating==max_review[name] and (name not in apps_added):
        gpa_clean.append(app)
        apps_added.append(name)
        
explore_data(gpa_clean,1,5,True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


The new data set contains the number of rows that it should and it looks like the rows are correct.
Note that I added the 'apps_added' variable because some apps had repeated entries but had the same rating, which would cause the data set to have more rows than it should.

#### App Store data set

Let's analyze if this data set contains any duplicated values:

In [12]:
duplicates=[]
app_checked=[]

for app in apa_data:
    name=app[1]
    
    if name in app_checked:
        duplicates.append(name)
    else:
        app_checked.append(name)
        
print("This data set contains ",len(duplicates)," duplicates.")

This data set contains  2  duplicates.


It seems to have two duplicates, however on the [App Store data set discussion section](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion), there is a discussion about two apps that may be duplicated, but even though these apps have the exact same name, these are two different apps. You can read more about that [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion/90409).

Now that we've removed duplicate values and missing values on both data sets, we're going to extract the relevant data for our analysis.

### Relevant Data

In this analysis I decided to focus on apps that are free, which could get their revenue through advertisements. Also, while exploring both data sets, I noticed that some of the app names are not in English. Because I want to analyze which apps could attract a higher number of users, I'm going to remove these type of apps.

#### Removing non-English apps

Here's an example of two apps that I want to remove:

In [13]:
print(apa_data[813][1],"\n")
print(gpa_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播 

لعبة تقدر تربح DZ


In order to remove both of these apps, I'm going to iterate through each character of every app name, and determine if that character is a character commonly used on the English language.

Each character on the English language [has an associated number between 0 and 127](https://elcodigoascii.com.ar/). The 'ord()' function returns the number associated with each character and if any character is greater than 127, we'll classify that as a non-English app.

In [14]:
def english_word(word):
        for char in word:
            if ord(char)>127:
                return False
        return True
    
print(english_word("Instagram"))
print(english_word("爱奇艺PPS -《欢乐颂2》电视剧热播"))

True
False


Now, this function works most of the time, there are some cases where we have English characters but we also have special characters, for example:

In [15]:
print(english_word("Docs To Go™ Free Office Suite"))
print(english_word("Instachat 😜"))

False
False


Both of these cases are clearly in English but because of '™' and '😜' our function assumes they are not in English, which would cause a loss of data. To fix that, we will add a condition where only if three or more characters are detected as non-English characters, we classify the word as non-English.

In [16]:
def english_word(word):
    nchar=0
    
    for char in word:
        if ord(char)>127:
            nchar+=1
            
    if nchar>3:
        return False
    else:
        return True

print(english_word("Docs To Go™ Free Office Suite"))
print(english_word("Instachat 😜"))
print(english_word("爱奇艺PPS -《欢乐颂2》电视剧热播"))

True
True
False


This solution is far from perfect as non-English app names that contain less than three non-English characters will go through, and an English app name with four or more special characters will get deleted. Still, it should be effective enough for our analysis.

Let's apply the function to both data sets:

In [17]:
### Google Play data set ###
english_GP=[]
for app in gpa_clean:
    name=app[0]
    if english_word(name):
        english_GP.append(app)

# print("Google Play data set:")
explore_data(english_GP,0,1,True)

### App Store data set ###
english_AS=[]
for app in apa_data:
    name=app[1]
    if english_word(name):
        english_AS.append(app)
        
print("\n")
print("App Store data set")
explore_data(english_AS,0,1,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


App Store data set
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows: 6183
Number of columns: 16


The Google Play data set now has 9,614 apps and the App Store data set now has 6,183.

#### Removing non-free apps

As I mentioned before, we're only interested in apps whose only source of revenue consists of ads. This means we have to extract the apps which are free by using the price column on both data sets.

In [18]:
### Free Apps ###
#### Google Play data set ####
GooglePlay=[]

for app in english_GP:
    price=app[7]
    if price=="0":
        GooglePlay.append(app)

print("Google Play data set: ",len(GooglePlay))


#### App Store data set ####
AppStore=[]
var=english_AS[1][4]

for app in english_AS:
    price=app[4]
    if price==var:
        AppStore.append(app)

print("\n")
print("App Store data set: ",len(AppStore))

Google Play data set:  8864


App Store data set:  3222


We are left with 8,864 free apps on English on the Google Play Store and 3,222 on the App Store. This is the data we're going to analyze.

## Analyzing the data

Because of the wider reach an app can have by being on both the App Store and the Google Play Store, I want to find characteristics that can be successful on both markets. 

A good initial approach could be analyzing the apps by genre. Let's create two frequency tables with the genres of both stores to get a sense of which are the most common. For this, we will use the 'prime_genre' column on the App Store data set and the 'category' and 'genre' columns on the Google Play Store data set.

Here, we create a function that returns a frequency table in percentages and another function that sorts it in descending order:

In [19]:
def freq_table(dataset,index):
    dict={}
    
    for row in dataset:
        var=row[index]
        if var in dict:
            dict[var]+=1
        else:
            dict[var]=1
            
    for bin in dict:
        dict[bin]=round(dict[bin]*100/len(dataset),2)
    return dict

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Most common genres on the App Store 

Using this functions, the most common app genres on the App Store are:

In [20]:
display_table(AppStore,11)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


The most common genre of free, English-speaking apps is 'Games' by a big margin with 58.16%, followed by 'Entertainment' with 7.88% and 'Photo & Video' with 7.88%.
'Education' apps represent the 3.66%, while Social Networking makes up only 3.29% of the total.

The top three apps are apps designed for enterntainment and for people to use on their free time. These are followed by 'Education', 'Social Networking' and 'Shopping', which are apps designed for more practical purposes. 

Now, the fact that there are more gaming on the App Store, does not necessarily mean that this genre will have a large number of downloads. We will have to do further analysis to determine if availability is correlated with number of downloads.

### Most common genres on the Google Play Store

The most common genres on the Google Play Store by the 'category' column are:

In [21]:
display_table(GooglePlay,1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


The most common genre on the Google Play Store according to the 'category' column is 'Family' with 18.91%, followed by 'Game' with 9.72%. Interestingly, the top app is not as dominating as it was on the App Store data set.

The 'Games' category also differs greatly from what we have on the App Store. Here this category is not even the top app and it is 48.44% below the equivalent category on the App Store.

Taking a look at the Google Play Store, most of the apps on the "Family" category are games made for children. Even then, the difference on the proportion of games with the App Store is important.

#### Most common genres by the 'genre' column

The most common genres on the Google Play data set by the 'genre' column are:

In [22]:
display_table(GooglePlay,-4)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

This column creates a more specific partition of the data. For example, you can find subcategories of games like "Casual;Brain Games", "Racing" and "Adventure".

Because we want a general idea of how the store works, we'll stick with the 'category' column.

While the App Store has more apps designed for fun, the Google Play Store has a better balance between entertainment and practical apps.

As I mentioned before, this frequency tables cannot be used to conclude that these are the most downloaded apps, but they give a general idea of the availability of apps.

## Most popular apps on the App Store

The number of downloads is a good indicator of what people prefer, so we will calculate average number of installs for each app. The Google Play data contains the variable 'Installs', which serves this purpose. Unfortunately, for the App Store data there is not a column with the number of downloads or installs. I will use the number of user ratings as a proxy, which can be found on the 'rating_count_tot'.

Let's calculate the average number of ratings by genre on the App Store.
First, let's retrieve all of the unique app genres using the previously written function:

In [23]:
genres=freq_table(AppStore,11)
display_table(AppStore,11)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


Next, I create a nested loop to calculate the average number of ratings:

In [24]:
print("Average Number of Ratings")
for genre in genres:
    total=0
    len_genre=0
    
    for app in AppStore:
        genre_app=app[11]  
        
        if genre_app==genre:
            ratings=float(app[5])
            total+=ratings
            len_genre+=1
        
    avg_rating=round(total/len_genre,1)
    
    print("\n")
    print(genre,": ",avg_rating)

Average Number of Ratings


Utilities :  18684.5


Medical :  612.0


Music :  57326.5


Health & Fitness :  23298.0


Productivity :  21028.4


Lifestyle :  16485.8


Reference :  74942.1


Social Networking :  71548.3


Sports :  23008.9


Finance :  31467.9


Travel :  28243.8


News :  21248.0


Weather :  52279.9


Business :  7491.1


Navigation :  86090.3


Photo & Video :  28441.5


Games :  22788.7


Entertainment :  14029.8


Food & Drink :  33333.9


Shopping :  26919.7


Catalogs :  4004.0


Education :  7004.0


Book :  39758.5


On the App Store, the genre with the highest average number of ratings and thus the highest average number of downloads is Navigation. It probably has the most downloads because these apps are used on a daily basis. However, this might be a difficult market to get into due to the existence of very popular apps that dominate this category. Even before checking, it is certain that Google Maps and Waze will be at the top of this category.

In [25]:
for app in AppStore:
    if app[11]=="Navigation":
        print(app[1],":",app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


So this might not be the best genre if you want the most engagement.

The second genre with the highest average number of ratings is Reference, this is a good category to develop an app on, because it is used by a wide audience and can be used on a daily basis at schools and universities, for example. Adding non intrusive ads could lead to a great business model.

In [26]:
for app in AppStore:
    if app[-5]=="Reference":
        print(app[1],":",app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Unfortunately, the number of downloads here is also skewed due to the Bible, dictionaries and Google Translate.


Social Networking probably suffers form this too, clearly apps like Facebook, Messenger and Whatsapp will have a greater number of reviews than the other apps on this genre.

In [27]:
for app in AppStore:
    if app[-5]=="Social Networking":
        print(app[1],":",app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Still, over the years several messaging apps and apps that connect people in general have appeared. This could be a good opportunity to come up with a fresh app, which expands on the features of existing apps and acquires a fair number of users that could be shown ads. But we want the highest number of users possible, so let's try a different category.

Taking a look at the "Finance" and "Book" categories, both of these are apps that can reach a wide audience, specially the "Book" genre. They are also apps that can be used daily, an audiobook that gets listened to over a month, a finance app which you have to access daily in order to track your expenses.
Both of these categories will receive the most engagement and are also not among the most common apps on the App Store, which means less competition for the app, and a higher probability of being seen.

## Most popular apps on the Google Play Store

I'm going to use the 'Installs' column in order to see which apps are the most downloaded.

In [29]:
display_table(GooglePlay,5)

1,000,000+ : 15.73
100,000+ : 11.55
10,000,000+ : 10.55
10,000+ : 10.2
1,000+ : 8.39
100+ : 6.92
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.51
10+ : 3.54
500+ : 3.25
50,000,000+ : 2.3
100,000,000+ : 2.13
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05
0 : 0.01


As we can see above the numbers of installs is not precise, but we don't need the exact number, we only want to find out which genres will attract the most users.
In order to make the analysis easier, the apps that have a "100,000+" will be taken as if they had 100,000 installs, the ones with "1,000,000+" as if they had 1,000,000 installs, etc.
In order to work with this numbers, I'm going to remove the commas and the plus characters. We also calculate the average number of installs per genre. 

In [34]:
categories_GP=freq_table(GooglePlay,1)

for category in categories_GP:
    total=0
    length=0
    
    for app in GooglePlay:
        if category==app[1]:
            nInstall=app[5]
            nInstall=nInstall.replace("+",'')
            nInstall=nInstall.replace(",",'')
            total+=float(nInstall)
            length+=1
    print(category,": ",round(total/length))

BEAUTY :  513152
DATING :  854029
SHOPPING :  7036877
TRAVEL_AND_LOCAL :  13984078
AUTO_AND_VEHICLES :  647318
HOUSE_AND_HOME :  1331541
EVENTS :  253542
PERSONALIZATION :  5201483
PARENTING :  542604
GAME :  15588016
FAMILY :  3695642
WEATHER :  5074486
ENTERTAINMENT :  11640706
TOOLS :  10801391
PHOTOGRAPHY :  17840110
COMICS :  817657
MEDICAL :  120551
FINANCE :  1387692
BOOKS_AND_REFERENCE :  8767812
ART_AND_DESIGN :  1986335
COMMUNICATION :  38456119
LIBRARIES_AND_DEMO :  638504
HEALTH_AND_FITNESS :  4188822
SPORTS :  3638640
BUSINESS :  1712290
NEWS_AND_MAGAZINES :  9549178
MAPS_AND_NAVIGATION :  4056942
PRODUCTIVITY :  16787331
VIDEO_PLAYERS :  24727872
FOOD_AND_DRINK :  1924898
SOCIAL :  23253652
LIFESTYLE :  1437816
EDUCATION :  1833495


The genres that have the most installs are 'Communication' and 'Video_Players' The first two are heavily influenced by the most popular apps: 'Communication' by Whatsapp, Android messengers and Skype. 'Video_Players' by Google Play and Youtube. 
The same happens to the social, entertainment, photography and productivity genres.

The 'Game' genre is popular, but as we saw earlier it is a market with a lot of competition.

A genre that is popular and is not as competitive would be the "Books and Reference" genre, this genre is not dominated by popular apps. This genre also matches our findings on the App Store, which means a bigger audience that can download the app.

Only a few number of popular apps skew the genre on this category, but this genre still has potential.

In [41]:
for app in GooglePlay:
    if app[1]=="BOOKS_AND_REFERENCE" and (app[5]=='1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0],": ",app[5])

Google Play Books :  1,000,000,000+
Bible :  100,000,000+
Amazon Kindle :  100,000,000+
Wattpad 📖 Free Books :  100,000,000+
Audiobooks from Audible :  100,000,000+


Let's take a look at the apps we can find on this category and are on the middle range of popularity:

In [44]:
for app in GooglePlay:
    if app[1]=="BOOKS_AND_REFERENCE" and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0],": ",app[5])

Wikipedia :  10,000,000+
Cool Reader :  10,000,000+
Book store :  1,000,000+
FBReader: Favorite Book Reader :  10,000,000+
Free Books - Spirit Fanfiction and Stories :  1,000,000+
AlReader -any text book reader :  5,000,000+
FamilySearch Tree :  1,000,000+
Cloud of Books :  1,000,000+
ReadEra – free ebook reader :  1,000,000+
Ebook Reader :  5,000,000+
Read books online :  5,000,000+
eBoox: book reader fb2 epub zip :  1,000,000+
All Maths Formulas :  1,000,000+
Ancestry :  5,000,000+
HTC Help :  10,000,000+
Moon+ Reader :  10,000,000+
English-Myanmar Dictionary :  1,000,000+
Golden Dictionary (EN-AR) :  1,000,000+
All Language Translator Free :  1,000,000+
Aldiko Book Reader :  10,000,000+
Dictionary - WordWeb :  5,000,000+
50000 Free eBooks & Free AudioBooks :  5,000,000+
Al-Quran (Free) :  10,000,000+
Al Quran Indonesia :  10,000,000+
Al'Quran Bahasa Indonesia :  10,000,000+
Al Quran Al karim :  1,000,000+
Al Quran : EAlim - Translations & MP3 Offline :  5,000,000+
Koran Read &MP3 30

We can see there's a lot of apps that are for eBooks, some language books and dictionaries. Creating a new app about a popular subject, seems to be the best way to get to the top. For example here we have several Quran apps, apps about Clash Royale and My Little Pony, so this would be a great way to get more engagement. Maybe creating an audio version, interactive version or an app with questions about the book, would be the best way to go.

## Conclusions

The analysis we did with the App Store and Google Play Store coincides that the best genre to create a free app, in order to earn profits through advertisements is the book genre. With a little of innovation to give a spin on an already existing popular book, an app developer can reach both the Google Play and App Store audiences, create an app on a market that is not saturated and have millions of downloads on their app.

### References:

* Clement, J. (2020). Android & iOS free and paid apps share 2020. Statista. Retrieved 25 March 2020, from https://www.statista.com/statistics/263797/number-of-applications-for-mobile-phones/.

* Clement, J. (2020). Apple App Store revenue 2019. Statista. Retrieved 20 March 2020, from https://www.statista.com/statistics/296226/annual-apple-app-store-revenue/.
