# Profitable App Profiles for the App Store and Google Play Markets

*Our goal for this project is to analyze data to help developers understand what type of apps are likely to attract more users.*

For this project, we will only look at apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better.

### Dataset Sources
1. [Apple iOS app store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)
2. [Google Play Store apps](https://www.kaggle.com/lava18/google-play-store-apps/discussion/120506)

## Opening and Exploring the Data

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    '''
    Slices the data set using dataset[start:end].
    Loops through the slice, and for each iteration, prints a row and a newline
    Prints the number of rows and columns if rows_and_columns is True.
    dataset shouldn't have a header row, otherwise the function will print the wrong number of rows 
    (one more row compared to the actual length).

            Parameters:
                    dataset (list): A list of lists
                    start (int): An integer representing the starting index of a slice of data from the dataset
                    end (int): An integer representing the ending index of a slice of data from the dataset
                    rows_and_columns (bool): A boolean, False by default, that indicates whether to print the dataset's 
                                             rows and columns

    '''
    
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
# Open dataset and convert them to list of lists
from csv import reader
file_1 = open('AppleStore.csv', encoding='utf8')
file_2 = open('googleplaystore.csv', encoding='utf8')

reader_1 = reader(file_1)
reader_2 = reader(file_2)

app_store = list(reader_1)
play_store = list(reader_2)

In [3]:
explore_data(app_store, 0, 5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16


In [4]:
explore_data(play_store, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13


## Data Cleaning

The row at index 10473 is faulty because its rating column is 19, which is above the maximum allowed limit of 5 for a rating. So, we'll delete it.

In [5]:
incorrect_index = 10473
print(play_store[incorrect_index])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [6]:
del play_store[incorrect_index]

### Deleting Duplicate Rows
There are some apps with duplicate entries in the Google Play dataset. For example, Instagram has four entries:

In [7]:
for app in play_store:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Let's count the total number of duplicate entries:

In [15]:
duplicates = []
unique = []

for app in play_store[1:]:
    name = app[0]
    if name in unique:
        duplicates.append(name)
    else:
        unique.append(name)
        
print(f'Number of duplicates {len(duplicates)}\n')
print(f'Examples of duplicate apps:\n{duplicates[:15]}')

Number of duplicates 1181

Examples of duplicate apps:
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Let's just keep the entry that has the largest value in the **Reviews** column, since it is the most recent entry

In [22]:
# a dictionary where each key is a unique app name and the 
# corresponding dictionary value is the highest number of 
# reviews of that app
reviews_max = {}

# Excluding the header row
for app in play_store[1:]:
    name = app[0]
    n_reviews = int(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews or\
        name not in reviews_max:
        reviews_max[name] = n_reviews

# Ensure that the length of the dictionary is the same as actual length
# of our dataset without duplicates
assert len(reviews_max) == len(play_store[1:]) - len(duplicates)

In [24]:
# Remove duplicates
play_store_clean = []
already_added = []

# Include apps whose reviews are maximum
for app in play_store[1:]:
    name = app[0]
    n_reviews = int(app[3])
    # The second condition is for apps that have the same number of
    # max reviews 
    if n_reviews == reviews_max[name] and name not in already_added:
        play_store_clean.append(app)
        already_added.append(name)

We don't need to perform this procedure for apps in app_store because there are no duplicates since each app has a unique identifier in the id column.

### Removing Non-English apps
Some apps in both datasets have apps with names that suggest they are not directed toward an English-speaking audience.

In [25]:
print(app_store[813][1])
print(app_store[6731][1])
print('\n')
print(play_store_clean[4412][0])
print(play_store_clean[7940][0])

BATTLE BEARS -1
Beast Poker


中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested in keeping these apps, so we'll remove them. We'll remove all the apps with names containing characters whose [ASCII](https://en.wikipedia.org/wiki/ASCII) value (obtained using `ord()`) is outside the range of English characters (0-127).

In [26]:
def isEnglish(name):
    '''
    Takes in a string and checks if all of its characters belong to
    the English alphabet
    '''
    for char in name:
        if ord(char) > 127:
            return False
    return True

Let's test the function to see if it works

In [28]:
name_english = 'Instagram'
name_chinese = '爱奇艺PPS -《欢乐颂2》电视剧热播'
name_with_symbol = 'Docs To Go™ Free Office Suite'
name_with_emoji = 'Instachat 😜'

print(isEnglish(name_english))
print(isEnglish(name_chinese))
print(isEnglish(name_with_symbol))
print(isEnglish(name_with_emoji))

True
False
False
False


Unfortunately, the function also rejects app names in English that contain certain symbols or emojis, since their ASCII value is also above 127. 

To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range.

In [29]:
def isEnglish(name):
    '''
    Takes in a string and checks if all of its characters belong to
    the English alphabet
    '''
    count = 0
    for char in name:
        if ord(char) > 127:
            count += 1
        if count > 3:
            return False
    return True

print(isEnglish(name_english))
print(isEnglish(name_chinese))
print(isEnglish(name_with_symbol))
print(isEnglish(name_with_emoji))

True
False
True
True


In [30]:
# filter out non-English apps from both data sets
app_store_english = []
play_store_english = []

for app in app_store[1:]:
    name = app[1]
    if isEnglish(name):
        app_store_english.append(app)

for app in play_store_clean:
    name = app[0]
    if isEnglish(name):
        play_store_english.append(app)
        
print(f'Length of App Store: {len(app_store[1:])}. Length of App Store \
with English apps: {len(app_store_english)}\n')
print(f'Length of Play Store: {len(play_store_clean)}. Length of Play Store \
with English apps: {len(play_store_english)}\n')

Length of App Store: 7197. Length of App Store with English apps: 6183

Length of Play Store: 9659. Length of Play Store with English apps: 9614



### Isolating Free Apps
Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [37]:
app_store_free, play_store_free = [], []

for app in app_store_english:
    price = float(app[4])
    if price == 0.0:
        app_store_free.append(app)
        
for app in play_store_english:
    # Free apps have a price value of '0' so we'll just check for that
    price = app[7]
    if price == '0':
        play_store_free.append(app)

len(app_store_free), len(play_store_free)

(3222, 8864)

So, in the data cleaning process, we:
- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps
- Isolated the free apps

Now, we're left with 3222 iOS apps and 8864 android apps

# Validation strategy for an app

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of [gamification](https://en.wikipedia.org/wiki/Gamification).

Let's begin the analysis by getting a sense of what are the most common genres for each market.

## Most Common Apps by Genre

In [39]:
# Let's find the interesting columns of both datasets
print(app_store[0])
print('\n')
print(play_store[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


So, the columns that are of interest to us include the `prime_genre` column in the app_store and the `Category` and `Genres` columns in the play_store.

Now, we'll create frequency tables that show percentages in a descending order

In [41]:
def freq_table(dataset, index):
    '''Build a frequency table for a column in a dataset'''
    table = {}
    for app in dataset:
        val = app[index]
        table[val] = table.get(val, 0) + 1
    return table

In [42]:
def display_table(dataset, index):
    '''
    Generates a frequency table using the freq_table() function.
    Transforms the frequency table into a list of tuples, 
    then sorts the list in a descending order.
    Prints the entries of the frequency table in descending order.

            Parameters:
                    dataset (list): A list of lists
                    index (int): integer inde x of the dataset

    '''
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [43]:
display_table(app_store_free, 11) # prime_genre

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


In [44]:
display_table(play_store_free, 1) # Category

FAMILY : 1676
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53


In [45]:
display_table(play_store_free, 9) # Genres

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 81
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
Arcade;Action & Advent

## Analysis of App Store Frequency Table

- What is the most common genre? What is the runner-up?

Games is the most common. Entertainment is the runner-up.

- What other patterns do you see?

Apps made for leisurely time are popular.

- What is the general impression — are most of the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) or more for entertainment (games, photo and video, social networking, sports, music)?

People prefer entertainment apps over apps for practical purposes 

- Can you recommend an app profile for the App Store market based on this frequency table alone? If there's a large number of apps for a particular genre, does that also imply that apps of that genre generally have a large number of users?

A popular app profile for the App Store should include apps designed for entertainment. Since, 'Games' has the most apps, it would be a very profitable option. Moreover, since these are free apps, people produce more of these because they can easily attract a larger audience, which in turn, leads to more ad revenue. However, this does not have to be true since there might be greater supply than demand for entertainment apps.

## Analysis of Play Store Frequency Table

- What are the most common genres?

FAMILY and GAME are the most common categories. Tools and Entertainment are the most common genres.

- What other patterns do you see?

For categories, apps made for practical purposes are quite common. The same can be said for the genres. However, since FAMILY and GAME mostly include games for kids, their representation in the play store is significant.

- Compare the patterns you see for the Google Play market with those you saw for the App Store market.

The play store supplies more apps made for practical purposes, whereas the app store supplies more apps for leisurely purposes.

- Can you recommend an app profile based on what you found so far? Do the frequency tables you generated reveal the most frequent app genres or what genres have the most users?

The app profile that I would recommend would include a mix of productivity and entertainment based apps, with more productivity apps.
These frequency tables don't say anything about what genres have most users though.

## Most Popular Apps by Genre
Now, we'd like to get an idea about the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre.

For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` column.

###  App Store
Let's start with calculating the average number of user ratings per app genre on the App Store.

In [63]:
prime_genre_frequency = freq_table(app_store_free, 11)
genre_rating = {}

for genre in prime_genre_frequency:
    total = 0
    len_genre = 0
    for app in app_store_free:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = int(app[5])
            total += n_ratings
            len_genre += 1
    genre_rating[genre] = round(total / len_genre, 1)

# Print the genres sorted by their rating in descending order
for genre, rating in sorted(genre_rating.items(),\
                            key = lambda x : x[1], reverse = True):
    print(f'Genre: {genre}. Rating: {rating}')

Genre: Navigation. Rating: 86090.3
Genre: Reference. Rating: 74942.1
Genre: Social Networking. Rating: 71548.3
Genre: Music. Rating: 57326.5
Genre: Weather. Rating: 52279.9
Genre: Book. Rating: 39758.5
Genre: Food & Drink. Rating: 33333.9
Genre: Finance. Rating: 31467.9
Genre: Photo & Video. Rating: 28441.5
Genre: Travel. Rating: 28243.8
Genre: Shopping. Rating: 26919.7
Genre: Health & Fitness. Rating: 23298.0
Genre: Sports. Rating: 23008.9
Genre: Games. Rating: 22788.7
Genre: News. Rating: 21248.0
Genre: Productivity. Rating: 21028.4
Genre: Utilities. Rating: 18684.5
Genre: Lifestyle. Rating: 16485.8
Genre: Entertainment. Rating: 14029.8
Genre: Business. Rating: 7491.1
Genre: Education. Rating: 7004.0
Genre: Catalogs. Rating: 4004.0
Genre: Medical. Rating: 612.0


Apps used for navigation, reference and social networking have been rated the highest. Let's see the apps and their ratings for each genre.

In [51]:
for i in range(3):
    for app in app_store_free:
        if i == 0 and app[11] == 'Navigation':
            print(app[1], ':', app[5])
        elif i == 1 and app[11] == 'Reference':
            print(app[1], ':', app[5])
        elif i == 2 and app[11] == 'Social Networking':
            print(app[1], ':', app[5])
    print('---------------------------')

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
---------------------------
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edi

For each genre:

- Navigation is dominated by Google Maps and Waze since they account for 96.7% of ratings
- The Bible is the most popular reference app since it accounts for 37% of ratings
- Social Networking is not dominated by Facebook and Pinterest since they account for 88% of ratings.

Since Navigation and Social Networking are dominated by just two apps, this means that the bulk of their average ratings is because of the apps mentioned above for each category. Thus, a newcomer won't get much of an audience. Therefore, a Reference app is the best option to inlude in an App Portfolio for the App Store since it isn't dominated by a particular app.

The best reference app should include a popular book, a dictionary, an audio version of both, and some quizzes to make it entertaining.

Music and Weather apps are also a good niche to target.

### Play Store
We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity.

In [64]:
display_table(play_store_free, 5) # Installs column

1,000,000+ : 1394
100,000+ : 1024
10,000,000+ : 935
10,000+ : 904
1,000+ : 744
100+ : 613
5,000,000+ : 605
500,000+ : 493
50,000+ : 423
5,000+ : 400
10+ : 314
500+ : 288
50,000,000+ : 204
100,000,000+ : 189
50+ : 170
5+ : 70
1+ : 45
500,000,000+ : 24
1,000,000,000+ : 20
0+ : 4
0 : 1


These are intervals, so we don't know whether an app with 100,000+ installs has 100,000, 200,000, or 350,000 installs. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.

#### Checking Categories

In [70]:
category_freq = freq_table(play_store_free, 1)
avg_category_installs = {}
for cat in category_freq:
    total = 0
    len_cat = 0
    for app in play_store_free:
        category_app = app[1]
        if category_app == cat:
            installs = app[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            installs = int(installs)
            total += installs
            len_cat += 1
    avg_category_installs[cat] = total / len_cat
    
for category, rating in sorted(avg_category_installs.items(),
                               key = lambda x : x[1], reverse = True):
    print(F'Category: {category}. Avg Rating: {rating:.1f}')

Category: COMMUNICATION. Avg Rating: 38456119.2
Category: VIDEO_PLAYERS. Avg Rating: 24727872.5
Category: SOCIAL. Avg Rating: 23253652.1
Category: PHOTOGRAPHY. Avg Rating: 17840110.4
Category: PRODUCTIVITY. Avg Rating: 16787331.3
Category: GAME. Avg Rating: 15588015.6
Category: TRAVEL_AND_LOCAL. Avg Rating: 13984077.7
Category: ENTERTAINMENT. Avg Rating: 11640705.9
Category: TOOLS. Avg Rating: 10801391.3
Category: NEWS_AND_MAGAZINES. Avg Rating: 9549178.5
Category: BOOKS_AND_REFERENCE. Avg Rating: 8767811.9
Category: SHOPPING. Avg Rating: 7036877.3
Category: PERSONALIZATION. Avg Rating: 5201482.6
Category: WEATHER. Avg Rating: 5074486.2
Category: HEALTH_AND_FITNESS. Avg Rating: 4188822.0
Category: MAPS_AND_NAVIGATION. Avg Rating: 4056941.8
Category: FAMILY. Avg Rating: 3695641.8
Category: SPORTS. Avg Rating: 3638640.1
Category: ART_AND_DESIGN. Avg Rating: 1986335.1
Category: FOOD_AND_DRINK. Avg Rating: 1924897.7
Category: EDUCATION. Avg Rating: 1833495.1
Category: BUSINESS. Avg Rating:

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [71]:
for app in play_store_free:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])


WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:

In [73]:
under_100_m = []

for app in play_store_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [75]:
for app in play_store_free:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:



In [77]:
for app in play_store_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])


Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):



In [79]:
for app in play_store_free:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])


Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

#### Checking Genres

In [84]:
genre_freq = freq_table(play_store_free, 9)
avg_genre_installs = {}
for genre in genre_freq:
    total = 0
    len_genre = 0
    for app in play_store_free:
        genre_app = app[9]
        if genre_app == genre:
            installs = app[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            installs = int(installs)
            total += installs
            len_genre += 1
    avg_genre_installs[genre] = total / len_genre
    
for genre, rating in sorted(avg_genre_installs.items(),
                               key = lambda x : x[1], reverse = True):
    print(F'Genre: {genre}. Avg Rating: {rating:.1f}')

Genre: Communication. Avg Rating: 38456119.2
Genre: Adventure;Action & Adventure. Avg Rating: 35333333.3
Genre: Video Players & Editors. Avg Rating: 24947335.8
Genre: Social. Avg Rating: 23253652.1
Genre: Arcade. Avg Rating: 22888365.5
Genre: Casual. Avg Rating: 19569221.6
Genre: Puzzle;Action & Adventure. Avg Rating: 18366666.7
Genre: Photography. Avg Rating: 17840110.4
Genre: Educational;Action & Adventure. Avg Rating: 17016666.7
Genre: Productivity. Avg Rating: 16787331.3
Genre: Racing. Avg Rating: 15910645.7
Genre: Travel & Local. Avg Rating: 14051476.1
Genre: Casual;Action & Adventure. Avg Rating: 12916666.7
Genre: Action. Avg Rating: 12603588.9
Genre: Strategy. Avg Rating: 11199902.5
Genre: Tools. Avg Rating: 10802461.2
Genre: Lifestyle;Pretend Play. Avg Rating: 10000000.0
Genre: Casual;Music & Video. Avg Rating: 10000000.0
Genre: Tools;Education. Avg Rating: 10000000.0
Genre: Card;Action & Adventure. Avg Rating: 10000000.0
Genre: Adventure;Education. Avg Rating: 10000000.0
Genre

Just like the Categories, Communication has the highest rating. However, we have already concluded that it is dominated by some apps. The genres that are most popular mostly belong to the Games category. So, if we were to include a Game in our portfolio, it should belong to Adventure;Action & Adventure but it should also involve some puzzles, educational content, and a racing feature. 

This app would probably be more profitable on the Play Store as compared to App Store since the Games `prime_genre` is on the lower end of installs there.

# Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.