# Profiles of the Most Popular Mobile Apps

Our goal with this project is to analyze which mobile apps are the most popular in the Apple App Store and Google Play markets. We work for a company that builds free apps (both Android and iOS) that generate revenue through in-app advertisements. Therefore, it is crucial that we equip our developer team with data-driven information on what apps are most appealing to users, as the number of users will directly influence our profits. 

## Step One: Opening and Exploring the Data

As of November 2022, over 3.5 million apps were available on the Google Play store and over 1.6 million on the Apple App Store. ([Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/))

Since collecting data from over 5 million apps is not feasible, we will analyze a sample of the data instead. There are two existing data sets that serve our purpose:

1. A data set containing information from ~10,000 apps from the Google Play store. It can be downloaded [here](https://www.kaggle.com/datasets/lava18/google-play-store-apps).
2. A data set containing information from ~7000 apps from the App Store. It can be downloaded [here](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps).

Our first step is to open the data sets and explore the data they contain.

In [1]:
from csv import reader

### The Google Play Data Set ###

opened_file = open('/Users/amyaragon/Downloads/googleplaystore.csv')
read_file = reader(opened_file)
googleplay = list(read_file)

### The App Store Data Set ###

opened_file2 = open('/Users/amyaragon/Downloads/AppleStore.csv')
read_file2 = reader(opened_file2)
appstore = list(read_file2)


We will now write a function called `explore_data()` so that we can explore the rows from the data set in a more readable way, as well as view the number of rows and columns. Once we create this function, it can be used repeatedly. 

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(explore_data(googleplay, 0, 3, True)) 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13
None


Using our `explore_data()` function, we can see that the Google Play data set has 10842 rows and 13 columns. Several columns may prove useful for our analysis, such as `App`, `Category`, `Reviews`, `Installs`, `Type`, `Price`, and `Genre`. 

Now we will explore the App Store data set.

In [3]:
print(explore_data(appstore, 0, 3, True))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16
None


 We see that there are 7198 apps in the App Store data set. Columns such as `track_name`, `currency`, `price`, `rating_count_tot`, `rating_count_ver`, and `prime_genre` may be useful for us. The column names are not as straightforward in this data set, but the [documentation](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps) details what information is included in each column.  

## Step 2: Cleaning the Data
**Removing Incorrect Entries**

The Google Play data set has a [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion) section, and we found that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) identified an error in row 10472. This user had removed their header row, which we have not done, therefore this row corresponds to row 10473 in our data set. Let's view this row and see if we can find the error. We will also print the header row so that we can see what column each piece of data corresponds to.

In [4]:
print(googleplay[0])
print(googleplay[10473])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


We can see that the app *Life Made with WI-Fi Touchscreen Photo Frame* app has a value of 19 in the 'Rating' column. This must be incorrect since the maximum app rating in the Google Play store is 5. As mentioned in the discussion thread, this error was caused by a missing value in the 'Category' column. Due to these errors, we will delete this row from our data set. The code below deletes the error row and shows the information that is now in row 10473 to confirm that it has been updated.

In [5]:
del googleplay[10473]

In [6]:
print(googleplay[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


**Identifying Duplicates**

In the data set of Google Play apps, there are apps with duplicate entires. For example, see the duplicate entries for Instagram below:

In [7]:
for app in googleplay:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In order to identify and remove duplicate entries, we must determine which apps are duplicated and the number of the duplicates. We will do this by creating new lists- one list for duplicates and one list for unique apps. The code below loops through the data set and adds all unique names to the list `unique_apps`, and if the name already appears in that list, it adds the name to `duplicate_apps`. Then, it prints the overall number of `duplicate_apps` using the `len()` function, as well as several examples of the duplicates that were detected.

In [8]:
duplicate_apps = []
unique_apps = []

for app in googleplay:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', (duplicate_apps[:15]))

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Upon examining the data set, the duplicates differ in their number of user reviews, indicating that this data may have been extracted at different times. It stands to reason that the entries with the most user reviews are likely the most recent data and will likely contain the most accurate ratings. Therefore, instead of randomly removing duplicate entries, we will keep the entry with the most user reviews and delete the other duplicates.

To do this, we will create a dictionary called `reviews_max` where each key is a unique app name and the value is the highest number of reviews of that app. We can then use this dictionary to make a new data set that has only one entry per app. 

In [9]:
reviews_max = {}

for row in googleplay[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [10]:
print(len(reviews_max))

9659


The above code has identified all unique app names and kept only the duplicated app entries with the highest number of ratings. This leaves us with 9659 unique entries, which checks out because we previously identified 1181 duplicate apps out of our original total of 10842. After removing the header row, the incorrect data described previously, and the duplicates, we are now left with 9659 total entries

**Removing Duplicates and Updating Our Data Set**

Now, we remove these duplicates and create our new data set. To do this, we will create two new lists, `googleplay_clean` and `already_added`. We will loop through the `googleplay` data set and isolate the names of the apps and number of reviews. Then we add the current row (`app`) to the `googleplay_clean` list, and the app name (`name`) to the `already_added` list if:

1. The number of reviews of the current app matches the number of reviews in the `reviews_max` dictionary
2. The name of the app is not in the `already_added` list. This additional condition will prevent any duplicate entries that have the same number of user reviews from appearing in our `googleplay_clean` list.

In [11]:
googleplay_clean = []
already_added = []

for app in googleplay[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
    
        googleplay_clean.append(app)
        already_added.append(name)
        
explore_data(googleplay_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


Using the `explore_data` function, we have confirmed that the duplicate removal process has worked as intended, leaving us with 9659 rows as expected.

## Step 3: Removing Non-English Apps

Some of the apps in both of our data sets are intended for non-English speaking users. Here are several examples:

In [12]:
print(appstore[814][1])
print(appstore[6732][1])

print(googleplay_clean[4412][0])
print(googleplay_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


As English speakers are our target audience, so we will remove these apps from our analysis. We can do this by removing apps that contain symbols not typically used in English texts. The characters typically used in English (the English alphabet, digits 0 to 9, punctuation marks, etc.) are encoded using the ASCII standard. Each character has a corresponding number between 0 and 127 assosicated with it, so we will build a function called `is_english` that checks each app name and determines whether it contains non-ASCII characters. Inside of our function, we are using the built-in `ord()` function to see if the app name (`string`) contains characters that are outside of the ASCII range.

In [13]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
        
    return True

In [14]:
is_english('Instagram')

True

In [15]:
is_english('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [16]:
is_english('Docs To Go™ Free Office Suite')

False

In [17]:
is_english('Instachat 😜')

False

When testing the `is_english` function above, we found that app names containing emojis or other symbols (such as ™) will be classified as non-English because they contain symbols outside of the ASCII range. This may cause us to lose useful data if there are other apps with emojis or symbols in their names. We will change the function to only remove apps if they contain more than 3 non-ASCII characters.

In [18]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    
    else:
        return True
    
is_english('Docs To Go™ Free Office Suite')
is_english('Instachat 😜')

True

We now see that the English apps that were previously filtered out due to emojis or symbols are now returning `True`, meaning that they are considered English. It is still possible that a few non-English apps may get past this filter, but this is likely good enough for purposes of our analysis.

Now, we will use our `in_english` function to create new lists (`googleplay_english` and `appstore_english`) that include only the English apps as identified by our function. 

In [19]:
googleplay_english = []
appstore_english = []

for app in googleplay_clean[1:]: 
    name = app[0]
    if is_english(name):
        googleplay_english.append(app)
        
for app in appstore[1:]:
    name = app[1]
    if is_english(name):
        appstore_english.append(app)

explore_data(googleplay_english, 0, 3, True)
print('\n')
explore_data(appstore_english, 0, 3, True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9613
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

We are left with 9613 English apps in the Google Play Store and 6183 in the App Store.

## Step Four: Identifying Free Apps

As mentioned in the introduction, our company builds free apps that utilize in-app advertisements. Therefore, we want to isolate only free apps in our analysis. The current data sets include both free and paid apps, so we will need to identify and separate out the unpaid apps. The code below creates two new lists `googleplayfinal` and `appstorefinal` that contain only the English language apps with a price of 0.00. 

In [20]:
googleplayfinal = []
appstorefinal = []

for app in googleplay_english:
    price = app[7]
    if price == '0':
        googleplayfinal.append(app)
        
for app in appstore_english:
    price = app[4]
    if price == '0.0':
        appstorefinal.append(app)
    

In [21]:
print(len(googleplayfinal))
print(len(appstorefinal))

8863
3222


We are now left with 8863 Google Play apps and 3222 App Store apps, which is enough for our analysis.

## Step Five: Most Common App Types

The aim of this analysis is to determine what types of free apps are likely to attract the most users. To minimize risk and overhead, this is our validation strategy for app ideas:

1. Build a minimal Android version of the app and add to Google Play.
2. If the app generates a positive response from users, develop it further and add features.
3. If the app is profitable after 6 months, build an iOS version and add it to the App Store.

Since the end goal is for the app to appear in both Google Play and the App Store, we want to investigate what app profiles are successful on both markets. 

First, we can determine what the most common genres are on each market. We will build a frequency table for the `prime_genre` column in the App Store data set, and the `Genres` and `Category` columns in the Google Play data set.

In [22]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1 
        value = row[index]
        if value in table:
            table[value] += 1
        else: 
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [23]:
display_table(appstorefinal, 11)


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


As we can see in the frequency table above, Games is by far the most common genre in our data set, comprising 58.16% of free English apps. The second most common genre is Entertainment with nearly 8%, followed by Photo & Video with 5%. Education and Social Networking make up 3.66% and 2.29%, respectively.

Overall, the App Store's unpaid English apps are primarily for gaming and entertainment purposes, while apps intended for productivity, shopping, education, etc. are not as represented. However, this does not necessarily mean that the number of users for gaming and entertainment apps is greater, it simply shows that there are fewer offerings for pratical apps. 

In [24]:
display_table(googleplayfinal, 1) ### Category Column

FAMILY : 18.910075595170937
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

In the frequency table above, we have examined the `Category` column from our Google Play data set. We see here that the Family category is the most common type of app at nearly 19%, followed by Games at 9.73%, and Tools at 8.46%. 

In [25]:
display_table(googleplayfinal, 9) ### Genre Column

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

This final frequency table examines the `Genre` column from our Google Play data set and reveals that Tools is the most common genre at 8.45%, followed by Entertainment at ~6%, and Education at 5.34%. There seems to be a greater number of apps intended for pratical purposes (tools, business, education, etc.) represented in the Google Play market versus what we found in the App Store data. 

However, this information still does not tell us which genres bring in the most users. We have simply determined the most common types of apps in both markets.

Going forward, we will use the `Category` column when analyzing Google Play data, as this contains fewer elements and we are primarily looking for information representing the big picture.

# Step Six: Most Popular App Types
**App Store**

Now that we have identified the most commonly represented genres, we are now interested in how many users these various app types attract. The App Store data set does not include the number of installs, so we will use the number of user ratings to give us an idea of each app's popularity. In the code below, we loop through the data set and create a frequency table that determines the average number of user ratings in each genre. 

In [26]:
appstoregenres = freq_table(appstorefinal, 11)

for genre in appstoregenres:
    total = 0
    len_genre = 0
    for app in appstorefinal:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
            
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Navigation apps are the most reviewed genre, followed by Reference, Social Networking, and Music apps. Based on this information, we can see that while games are the most common type of app in the App Store, more pratical apps are the most reviewed, which means they are likely the apps with the most users. However, these numbers are potentially skewed by apps with large numbers of users, such as Facebook and Pinterest in the Social Networking genre. For instance, we can see that Waze and Google Maps have brought in very large numbers of user ratings compared to other navigation apps.

In [27]:
for app in appstorefinal:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


We see a similar trend in the Reference apps, with the Bible app at nearly 1 million user ratings, skewing the overall average. 

In [28]:
for app in appstorefinal:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Due to the abundance of free gaming apps that we found previously, it is possible that a pratical app (such as Reference, Book, or Productivity genres), could attract more users. The App Store is oversatuated with gaming and entertainment apps compared to the Google Play market, but the number of user reviews in the practical categories is promising. For instance, the Book genre averages 39758 user ratings and Productivity has an average of 21028. Since we are using in-app advertisements to generate revenue, we would want users to spend time in the app. An app in the Weather, Finance, or Food & Drink genres would likely not keep users in the app for a long enough time since these are apps that users visit for a short time to fulfull a specific purpose (check the forecast, order a drink, transfer money, etc.). 

Perhaps an app that overlaps into several practical genres would prove successful, such as a scheduling app that combines elements from Productivity, Weather, Lifestyle, Health & Fitness, and Entertainment. For instance, a scheduling app could appear in the Productivity genre but allow users to track their habits, receive book and movie recommendations based on their preferences, and provide recommendations based on today's weather. Maybe they have a break in their schedule, it's sunny out, and the app can recommend taking a walk outdoors. It can remind users to drink water throughout the workday, and then recommend a book to read or movie to watch in the evening based on the likes/dislikes they've set in their profiles.

Overall, apps that target users' practical needs may stand out on the App Store. 

**Google Play**

In [29]:
freq_table(googleplayfinal, 5)

{'5,000,000+': 6.826131106848697,
 '50,000,000+': 2.3017037120613786,
 '100,000+': 11.55365000564143,
 '50,000+': 4.772650344127271,
 '1,000,000+': 15.728308699086089,
 '10,000+': 10.188423784271691,
 '10,000,000+': 10.549475346947986,
 '5,000+': 4.513144533453684,
 '500,000+': 5.562450637481666,
 '1,000,000,000+': 0.2256572266726842,
 '100,000,000+': 2.1324607920568655,
 '1,000+': 8.394448832223853,
 '500,000,000+': 0.270788672007221,
 '500+': 3.2494640640866526,
 '100+': 6.916393997517771,
 '50+': 1.9180864267178157,
 '10+': 3.542818458761142,
 '1+': 0.5077287600135394,
 '5+': 0.7898002933543946,
 '0+': 0.045131445334536835,
 '0': 0.011282861333634209}

In [30]:
categoriesgoogleplay = freq_table(googleplayfinal, 1)

for category in categoriesgoogleplay:
    total = 0
    len_category = 0
    for app in googleplayfinal:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 2021626.7857142857
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

In our Google Play table above, Communication apps have the most installs on average: 38,456,119. The Video Players category is in second place with over 24 million installs. Similar to the App Store, these categories are skewed by apps such as Facebook and Youtube. It would be difficult to compete with these well-established and extremely popular social media or video player apps. 

If we explore the Productivity cateogry, we see that Productivity apps have over 16 million installs on average.  In the code below, we have identified Productivity apps with over 1 billion, over 500 million, and over 100 million installations to see if there are any apps skewing the average. In the table below, it is clear that the Microsoft Office and G Suite apps greatly contribute to this number.

In [31]:
for app in googleplayfinal:
    if app[1] == 'PRODUCTIVITY' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+


In [32]:
under_100_m = []

for app in googleplayfinal:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'PRODUCTIVITY') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
print('Average installs for Productivity apps under 100 million:', sum(under_100_m) / len(under_100_m))

Average installs for Productivity apps under 100 million: 3379657.318885449


When we remove apps with over 100 million installs, we can see that the average number of installs for Productivity apps is ~3.4 million. This number is still quite competitive with other app genres; for instance, Health and Fitness apps show around 4 million installations on average. We believe a productivity app that offers practical and unique features, such as the scheduling app idea we discussed based on the App Store data, would attract users in the Google Play store as well.

# Conclusions

Based on our analysis of the most popular apps from the App Store and Google Play store, we concluded that creating a productivity app that includes unique and pratical features could be profitable in both markets. The App Store showed an oversaturation of fun apps, such as games and entertainment, but pratical apps brought in the most reviews. In the Google Play store, productivity apps average ~3.4 million installs even with Microsoft and G Suite apps removed from the analysis. A well-designed scheduling app with unique features could potentially stand out in both markets, and would generate ad revenue because users would frequently open it and use it for several minutes at a time.