## Profitable App Profiles for App Store and Google Play Markets

In this project we are working as data analysts for a company that builds Android and iOS mobile apps that are free to download and install. The main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app. Our aim in this project is to find mobile app profiles in the App Store and Google Play markets that are profitable in order to help our team of developers understand what type of apps are likely to attract more users.

To avoid spending resources with collecting new data ourselves, we will be using the following two data sets that seem suitable for our purpose:

A [data set]('https://www.kaggle.com/lava18/google-play-store-apps') containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from this [link]('https://dq-content.s3.amazonaws.com/350/googleplaystore.csv').
A [data set]('https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps') containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from this [link]('https://dq-content.s3.amazonaws.com/350/AppleStore.csv').

Let's start by opening the two data sets and then continue with exploring the data.

In [1]:
import os
current_dir = os.getcwd()
android_file_path = current_dir + '\\datasets\\googleplaystore.csv'
ios_file_path = current_dir + '\\datasets\\AppleStore.csv'

def list_data(file_path, header=True, encoding_type='utf8'):
    from csv import reader
    opened_file = open(file_path, encoding=encoding_type)
    read_file = reader(opened_file)
    apps_data = list(read_file)
    
    if header == True:
        return apps_data[0], apps_data[1:]
    else:
        return apps_data[1:]

android_header, android_data = list_data(android_file_path)
ios_header, ios_data = list_data(ios_file_path)

#sort dataset by name
from operator import itemgetter
android_clean = sorted(android_data, key=itemgetter(0))
ios_clean = sorted(ios_data, key=itemgetter(1))

#row number headers
android_header.insert(0,"row_num")
ios_header.insert(0,"row_num")

#row numbers for datasets
i = 1
for app in android_clean:
    app.insert(0, i)
    i += 1

i = 1
for app in ios_clean:
    app.insert(0, i)
    i += 1

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

#print(android_header)
#print('\n')
#explore_data(android_data, 0, 3, True)
#print('\n')

#print(ios_header)
#print('\n')
#explore_data(ios_data, 0, 3, True)
#print('\n')

## Deleting Wrong Data

Before we analyze the data, let's clean it up a bit. First we want to remove any rows that have missing/incomplete data.

Below we will loop through both datasets, find any rows with missing columns, and delete those rows from the datasets:

In [3]:
android_error_rows = []
for row in android_data:
    if len(row) != len(android_header):
        #print(str(row) + "\n" + "Index postion is:", android_data.index(row))
        android_error_rows.append(android_data.index(row))

for row_index in android_error_rows:
    #print("Row to delete: " + str(row_index))
    del android_data[row_index]

ios_error_rows = []
for row in ios_data:
    if len(row) != len(ios_header):
        #print(str(row) + "\n" + "Index postion is:", ios_data.index(row))
        ios_error_rows.append(ios_data.index(row))
        
for row_index in ios_error_rows:
    #print("Row to delete: " + str(row_index))
    del ios_data[row_index]

## Removing Duplicate Entries

Next in our effort to clean up the data, we will want to make sure there is only one row per app. There are no duplicate entries in the Apple Store dataset, so we will only be working with the Google Play dataset in the following duplicate removal code.

Below we will loop through the Google Play dataset and determine which apps have duplicate entries. First we will remove any rows that repeat exactly, with the same data in every column:

In [4]:
def delete_data(dataset, rows_to_delete, index_pos=0):
    indexes_to_delete = []
    for row in dataset:
        if row[index_pos] in rows_to_delete:
            indexes_to_delete.append(dataset.index(row))
    
    for i in reversed(indexes_to_delete):
        del dataset[i]
    
    return dataset

#sort names into unique or duplicate lists
def check_for_dup_app_names(dataset, name_index):
    duplicate_apps = []
    unique_apps = []
    for app in dataset:
        name = app[name_index]
        if name in unique_apps:
            if name not in duplicate_apps:
                duplicate_apps.append(name)
        else:
            unique_apps.append(name)
    
    return duplicate_apps, unique_apps

#build dataset of duplicate rows
def duplicate_rows_dataset(dataset, dup_list, name_index):
    duplicate_rows = []
    for app in dataset:
        name = app[name_index]
        if name in dup_list:
            duplicate_rows.append(app)
    
    return duplicate_rows

#sort names into unique or duplicate lists
android_duplicate_apps, android_unique_apps = check_for_dup_app_names(android_clean, 1)

#build dataset of duplicate rows
android_duplicate_rows = duplicate_rows_dataset(android_clean, android_duplicate_apps, 1)

#get list of row numbers to delete for fully duplicate rows
rows_to_delete = []
current_app = []
for app in android_duplicate_rows:
    if not current_app:
        current_app = app
    else:
        if current_app[0] != app[0] and current_app[1] == app[1]:
            if current_app[2:] == app[2:]:
                rows_to_delete.append(app[0])
        else:
            current_app = app

#delete duplicate rows from main dataset
android_clean = delete_data(android_clean, rows_to_delete)

Next we will find any records that repeat except for the number of ratings. We will keep the row with the highest number of ratings and delete the rest:

In [5]:
#see if any other types of duplicates exist
android_duplicate_apps.clear()
android_unique_apps.clear()
android_duplicate_apps, android_unique_apps = check_for_dup_app_names(android_clean, 1)

#build new list of duplicate rows
android_duplicate_rows.clear()
android_duplicate_rows = duplicate_rows_dataset(android_clean, android_duplicate_apps, 1)

#create dictionary of max review count per app
reviews_max = {}
for app in android_duplicate_rows:
    row_num = app[0]
    name = app[1]
    n_reviews = float(app[4])
    if name in reviews_max:
        if reviews_max[name][1] < n_reviews:
            reviews_max.update({name: [row_num, n_reviews]})
    else:
        reviews_max[name] = [row_num, n_reviews]

#get row numbers of duplicate apps where review count is less than max
rows_to_delete = []
for app in android_duplicate_rows:
    row_num = app[0]
    name = app[1]
    n_reviews = float(app[4])
    if name in reviews_max:
        if row_num != reviews_max[name][0]:
            rows_to_delete.append(row_num)

#delete duplicate rows from main dataset
android_clean = delete_data(android_clean, rows_to_delete)

Finally, we will double check to make sure there are no other duplicate entries:

In [6]:
#see if any other types of duplicates exist
android_duplicate_apps.clear()
android_unique_apps.clear()
android_duplicate_apps, android_unique_apps = check_for_dup_app_names(android_clean, 1)

if len(android_duplicate_apps) > 0:
    print(len(android_duplicate_apps))
    for name in android_duplicate_apps:
        print(name)

## Removing Non-English Apps

Our company only deals with English apps, so next in our cleaning efforts we will want to remove any apps with non-English names from our datasets.

Below we will loop through Google Play and Apple Store datasets, parse the app name, determine if it is in English, and remove the row from the dataset if it is not:

In [7]:
def is_english(a_string):
    ie_bool = True
    i = 0
    for a_char in a_string:
        if ord(a_char) > 127:
            i += 1
        if i == 3:
            ie_bool = False
            break
    
    return ie_bool

###Google Play

#check if any app names are not in English
rows_to_delete.clear()
for app in android_clean:
    row_num = app[0]
    name = app[1]
    if not is_english(name):
        rows_to_delete.append(row_num)

#delete duplicate rows from main dataset
android_clean = delete_data(android_clean, rows_to_delete)

###Apple Store

#check if any app names are not in English
rows_to_delete.clear()
for app in ios_clean:
    row_num = app[0]
    name = app[2]
    if not is_english(name):
        rows_to_delete.append(row_num)

#delete duplicate rows from main dataset
ios_clean = delete_data(ios_clean, rows_to_delete)

## Removing Paid Apps

Our company also only deals with free apps, so we wil want to remove any paid apps from the datasets.

Below we will loop through Google Play and Apple Store datasets, determine whether or not the app is free, and remove it from the dataset if it is not:

In [8]:
def find_paid_apps_by_price(dataset, row_index, price_index):
    rows_to_del = []
    for app in dataset:
        row_num = app[row_index]
        price = float(app[price_index])
        if price > 0:
            rows_to_del.append(row_num)
    
    return delete_data(dataset, rows_to_del)

def find_paid_apps_by_type(dataset, row_index, type_index):
    rows_to_del = []
    for app in dataset:
        row_num = app[row_index]
        p_or_f = app[type_index]
        if p_or_f == "Paid":
            rows_to_del.append(row_num)
    
    return delete_data(dataset, rows_to_del)

android_clean = find_paid_apps_by_type(android_clean, 0, 7)
ios_clean = find_paid_apps_by_price(ios_clean, 0, 5)

#original
#10841 - android
#7197 - ios

#print(len(android_clean))
#print(len(ios_clean))

#cleaned
#8849 - android
#3203 - ios

## Most Common Apps by Genre

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we then develop it further.
If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the `prime_genre` column of the App Store data set, and the `Genres` and `Category` columns of the Google Play data set.

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function that we can use to display the percentages in a descending order

In [9]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We start by examining the frequency table for the `prime_genre` column of the App Store data set:

In [10]:
display_table(ios_clean, -5)

Games : 58.25788323446769
Entertainment : 7.836403371838902
Photo & Video : 4.995316890415236
Education : 3.6840462066812365
Social Networking : 3.3093974399000934
Shopping : 2.5913206369029034
Utilities : 2.466437714642523
Sports : 2.1542304089915705
Music : 2.0605682172962845
Health & Fitness : 2.0293474867311896
Productivity : 1.7483609116453322
Lifestyle : 1.5610365282547611
News : 1.3424914142990947
Travel : 1.248829222603809
Finance : 1.0927255697783327
Weather : 0.8741804558226661
Food & Drink : 0.8117389946924758
Reference : 0.5307524196066188
Business : 0.5307524196066188
Book : 0.3746487667811427
Navigation : 0.18732438339057134
Medical : 0.18732438339057134
Catalogs : 0.1248829222603809


Among the free English apps on App Store, more than half are games. The next most common apps are entertainment, photo & video, education, social networking, and shopping. The overall impression based on this data is that App Store is dominated by apps that are meant to be fun rather than useful.

Let's continue by examining the `Genres` and `Category` columns of the Google Play data set (two columns which seem to be related):

In [11]:
display_table(android_clean, 2) #Category

FAMILY : 18.939993219572834
GAME : 9.696010848683468
TOOLS : 8.452932534749689
BUSINESS : 4.599389761554978
PRODUCTIVITY : 3.8987456209741214
LIFESTYLE : 3.8874449090292686
FINANCE : 3.7066335179116283
MEDICAL : 3.5371228387388403
SPORTS : 3.3902135834557576
PERSONALIZATION : 3.3224093117866427
COMMUNICATION : 3.2320036162278223
HEALTH_AND_FITNESS : 3.085094360944739
PHOTOGRAPHY : 2.9494858176065093
NEWS_AND_MAGAZINES : 2.8025765623234262
SOCIAL : 2.6669680189851963
TRAVEL_AND_LOCAL : 2.339247372584473
SHOPPING : 2.2488416770256525
BOOKS_AND_REFERENCE : 2.1358345575771276
DATING : 1.8646174709006667
VIDEO_PLAYERS : 1.7968131992315517
MAPS_AND_NAVIGATION : 1.3899875692168606
FOOD_AND_DRINK : 1.2430783139337778
EDUCATION : 1.16397333031981
ENTERTAINMENT : 0.9605605153124647
LIBRARIES_AND_DEMO : 0.9379590914227597
AUTO_AND_VEHICLES : 0.9266583794779071
HOUSE_AND_HOME : 0.8023505480845292
WEATHER : 0.7910498361396768
EVENTS : 0.711944852525709
PARENTING : 0.6554412928014465
ART_AND_DESIGN 

In [12]:
display_table(android_clean, -4) #Genres

Tools : 8.441631822804837
Entertainment : 6.0797830263306585
Education : 5.356537461860097
Business : 4.599389761554978
Productivity : 3.8987456209741214
Lifestyle : 3.8761441970844164
Finance : 3.7066335179116283
Medical : 3.5371228387388403
Sports : 3.458017855124873
Personalization : 3.3224093117866427
Communication : 3.2320036162278223
Action : 3.096395072889592
Health & Fitness : 3.085094360944739
Photography : 2.9494858176065093
News & Magazines : 2.8025765623234262
Social : 2.6669680189851963
Travel & Local : 2.32794666063962
Shopping : 2.2488416770256525
Books & Reference : 2.1358345575771276
Simulation : 2.045428862018307
Dating : 1.8646174709006667
Arcade : 1.8420160470109617
Video Players & Editors : 1.7742117753418465
Casual : 1.762911063396994
Maps & Navigation : 1.3899875692168606
Food & Drink : 1.2430783139337778
Puzzle : 1.1300711944852526
Racing : 0.9944626511470221
Role Playing : 0.9379590914227597
Libraries & Demo : 0.9379590914227597
Auto & Vehicles : 0.926658379477

The Google Play landscape seems to be a bit different than the App Store. The perentages aren't quite so disparate between fun and practical apps, though family and games are still more prevalent. When breaking it down by genre rather than category, however, tools seems to edge out entertainment by over 2%, with education, business, and productivity following close behind.

## Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

Below, we calculate the average number of user ratings per app genre on the App Store:

In [13]:
def get_ratings(genre_dict, dataset, genre_index, ratings_index):
    genre_ratings = {}
    for genre in genre_dict:
        total = 0
        len_genre = 0
        for app in dataset:
            genre_app = app[genre_index]
            if genre_app == genre:
                n_ratings = float(app[ratings_index])
                total += n_ratings
                len_genre += 1
        avg_n_ratings = total / len_genre
        genre_ratings[genre] = avg_n_ratings
    
    return genre_ratings

def display_ratings(genre_ratings):
    table_display = []
    for key in genre_ratings:
        key_val_as_tuple = (genre_ratings[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
genres_ios = freq_table(ios_clean, -5)
display_ratings(get_ratings(genres_ios, ios_clean, -5, 6))

Navigation : 86090.33333333333
Reference : 79350.4705882353
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 46384.916666666664
Food & Drink : 33333.92307692308
Finance : 32367.02857142857
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 27230.734939759037
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22886.36709539121
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 19156.493670886077
Lifestyle : 16815.48
Entertainment : 14195.358565737051
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


In [14]:
def sort_and_print_genre_details(dataset, genre, genre_index, name_index, ratings_index, ratings_range=[]):
    genre_ratings = {}
    for app in dataset:
        name = app[name_index]
        ratings = app[ratings_index]
        ratings_range_str = ""
        if len(ratings_range) > 0:
            ratings_range_str = "("
            for rating in ratings_range:
                if ratings_range_str == "(":
                    ratings_range_str = ratings_range_str + "app[" + str(ratings_index) + "] == '" + rating + "'"
                else:
                    ratings_range_str = ratings_range_str + " or app[" + str(ratings_index) + "] == '" + rating + "'"
            ratings_range_str = ratings_range_str + ")"
            ratings_range_str = ratings_range_str.replace("ratings_str",ratings)
            ratings_range_str = "app[genre_index] == genre and " + ratings_range_str
            #print(ratings_range_str)
            #print(eval(ratings_range_str))
            if eval(ratings_range_str):
                ratings = ratings.replace('+','')
                ratings = ratings.replace(',','')
                if ratings.isnumeric():
                    genre_ratings[name] = float(ratings)
                else:
                    ratings = 0.0
        else:
            if app[genre_index] == genre:
                ratings = ratings.replace('+','')
                ratings = ratings.replace(',','')
                if ratings.isnumeric():
                    genre_ratings[name] = float(ratings)
                else:
                    ratings = 0.0
    display_ratings(genre_ratings)

sort_and_print_genre_details(ios_clean, "Navigation", -5, 2, 6)
print("-------------------------------------------------------")
sort_and_print_genre_details(ios_clean, "Social Networking", -5, 2, 6)
print("-------------------------------------------------------")
sort_and_print_genre_details(ios_clean, "Music", -5, 2, 6)

Waze - GPS Navigation, Maps & Real-time Traffic : 345046.0
Google Maps - Navigation & Transit : 154911.0
Geocaching® : 12811.0
CoPilot GPS – Car Navigation & Offline Maps : 3582.0
ImmobilienScout24: Real Estate Search in Germany : 187.0
Railway Route Search : 5.0
-------------------------------------------------------
Facebook : 2974676.0
Pinterest : 1061624.0
Skype for iPhone : 373519.0
Messenger : 351466.0
Tumblr : 334293.0
WhatsApp Messenger : 287589.0
Kik : 260965.0
ooVoo – Free Video Call, Text and Voice : 177501.0
TextNow - Unlimited Text + Calls : 164963.0
Viber Messenger – Text & Call : 164249.0
Followers - Social Analytics For Instagram : 112778.0
MeetMe - Chat and Meet New People : 97072.0
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414.0
InsTrack for Instagram - Analytics Plus More : 85535.0
Tango - Free Video Call, Voice and Chat : 75412.0
LinkedIn : 71856.0
Match™ - #1 Dating App. : 60659.0
Skype for iPad : 60163.0
POF - Best Dating App for Conversations : 52642

Navigation is the most popular genre in App Store, but this category is already dominated by big-names that nearly everyone uses. The same can be said of social networking and music apps. A new app in these genres would need to be exceptional to compete.

Also popular are food & drink (which would require an actual restaurant and/or delivery service), finance (which would require niche expertise outside the scope of our company).

The other two most popular genres are reference and book. Since the App Store market is fairly saturated with for-fun apps, something practical in the book and/or reference genres would be a good option to consider for new app development.

## Most Popular Apps by Genre on Google Play

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers aren't particularly precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users.

To perform computations, we'll need to convert each install number to float. This means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each category and genre.

In [15]:
def get_installs(genre_dict, dataset, genre_index, installs_index):
    genre_installs = {}
    for genre in genre_dict:
        total = 0
        len_genre = 0
        for app in dataset:
            genre_app = app[genre_index]
            if genre_app == genre:
                n_installs = app[installs_index]
                n_installs = n_installs.replace('+','')
                n_installs = n_installs.replace(',','')
                if n_installs.isnumeric():
                    n_installs = float(n_installs)
                else:
                    n_installs = 0.0
                total += n_installs
                len_genre += 1
        if len_genre > 0:
            avg_n_installs = total / len_genre
        else:
            avg_n_installs = 0.0
        genre_installs[genre] = avg_n_installs

    return genre_installs

def display_installs(genre_installs):
    table_display = []
    for key in genre_installs:
        key_val_as_tuple = (genre_installs[key], key)
        table_display.append(key_val_as_tuple)
    
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

cats_android = freq_table(android_clean, 2)     #Category 2
display_installs(get_installs(cats_android, android_clean, 2, 6))

COMMUNICATION : 38590581.08741259
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15544014.51048951
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10830251.970588235
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8814199.78835979
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5145550.285714285
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4049274.6341463416
FAMILY : 3695641.8198090694
SPORTS : 3650602.276666667
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1446158.2238372094
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1360598.042253521
DATING : 854028.8303030303
COMICS : 832613.8888888889
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 513151.886

In [16]:
sort_and_print_genre_details(android_clean, "COMMUNICATION", 2, 1, 6)
print("-------------------------------------------------------")
sort_and_print_genre_details(android_clean, "VIDEO_PLAYERS", 2, 1, 6)
print("-------------------------------------------------------")
sort_and_print_genre_details(android_clean, "SOCIAL", 2, 1, 6)
print("-------------------------------------------------------")
sort_and_print_genre_details(android_clean, "PHOTOGRAPHY", 2, 1, 6)
print("-------------------------------------------------------")
sort_and_print_genre_details(android_clean, "PRODUCTIVITY", 2, 1, 6)

WhatsApp Messenger : 1000000000.0
Skype - free IM & video calls : 1000000000.0
Messenger – Text and Video Chat for Free : 1000000000.0
Hangouts : 1000000000.0
Google Chrome: Fast & Secure : 1000000000.0
Gmail : 1000000000.0
imo free video calls and chat : 500000000.0
Viber Messenger : 500000000.0
UC Browser - Fast Download Private & Secure : 500000000.0
LINE: Free Calls & Messages : 500000000.0
Google Duo - High Quality Video Calls : 500000000.0
imo beta free calls and text : 100000000.0
Yahoo Mail – Stay Organized : 100000000.0
Who : 100000000.0
WeChat : 100000000.0
UC Browser Mini -Tiny Fast Private & Secure : 100000000.0
Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000.0
Telegram : 100000000.0
Opera Mini - fast web browser : 100000000.0
Opera Browser: Fast and Secure : 100000000.0
Messenger Lite: Free Calls & Messages : 100000000.0
Kik : 100000000.0
KakaoTalk: Free Calls & Text : 100000000.0
GO SMS Pro - Messenger, Free Themes, Emoji : 100000000.0
Firefox Browser fast &

Communication is the most popular genre in Google Play Store, but this category is already dominated by big-names that nearly everyone uses. The same can be said of video player and social apps. A new app in these genres would need to be exceptional to compete.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as we11. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [17]:
sort_and_print_genre_details(android_clean, "BOOKS_AND_REFERENCE", 2, 1, 6)

Google Play Books : 1000000000.0
Wattpad 📖 Free Books : 100000000.0
Bible : 100000000.0
Audiobooks from Audible : 100000000.0
Amazon Kindle : 100000000.0
Wikipedia : 10000000.0
Spanish English Translator : 10000000.0
Quran for Android : 10000000.0
Oxford Dictionary of English : Free : 10000000.0
NOOK: Read eBooks & Magazines : 10000000.0
Moon+ Reader : 10000000.0
JW Library : 10000000.0
HTC Help : 10000000.0
FBReader: Favorite Book Reader : 10000000.0
English Hindi Dictionary : 10000000.0
English Dictionary - Offline : 10000000.0
Dictionary.com: Find Definitions for English Words : 10000000.0
Dictionary - Merriam-Webster : 10000000.0
Dictionary : 10000000.0
Cool Reader : 10000000.0
Aldiko Book Reader : 10000000.0
Al-Quran (Free) : 10000000.0
Al'Quran Bahasa Indonesia : 10000000.0
Al Quran Indonesia : 10000000.0
Read books online : 5000000.0
English to Hindi Dictionary : 5000000.0
Ebook Reader : 5000000.0
Dictionary - WordWeb : 5000000.0
Bible KJV : 5000000.0
Ancestry : 5000000.0
AlRead

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average.

However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [18]:
sort_and_print_genre_details(android_clean, "BOOKS_AND_REFERENCE", 2, 1, 6, ["1,000,000+","5,000,000+","10,000,000+","50,000,000+"])

Wikipedia : 10000000.0
Spanish English Translator : 10000000.0
Quran for Android : 10000000.0
Oxford Dictionary of English : Free : 10000000.0
NOOK: Read eBooks & Magazines : 10000000.0
Moon+ Reader : 10000000.0
JW Library : 10000000.0
HTC Help : 10000000.0
FBReader: Favorite Book Reader : 10000000.0
English Hindi Dictionary : 10000000.0
English Dictionary - Offline : 10000000.0
Dictionary.com: Find Definitions for English Words : 10000000.0
Dictionary - Merriam-Webster : 10000000.0
Dictionary : 10000000.0
Cool Reader : 10000000.0
Aldiko Book Reader : 10000000.0
Al-Quran (Free) : 10000000.0
Al'Quran Bahasa Indonesia : 10000000.0
Al Quran Indonesia : 10000000.0
Read books online : 5000000.0
English to Hindi Dictionary : 5000000.0
Ebook Reader : 5000000.0
Dictionary - WordWeb : 5000000.0
Bible KJV : 5000000.0
Ancestry : 5000000.0
AlReader -any text book reader : 5000000.0
Al Quran : EAlim - Translations & MP3 Offline : 5000000.0
50000 Free eBooks & Free AudioBooks : 5000000.0
wikiHow: ho

This genre seems to be largely dictionaries and apps for reading ebooks. There are also several apps regarding the Quran. Though there are not many listed in this popularity range, there are likely quite a few other apps revolving around religious texts. It seems that taking a popular book and turning it into an app could be profitable for both the Google Play and the App Store markets.

In [19]:
#genres_android = freq_table(android_clean, -4)  #Genres -4
#display_installs(get_installs(genres_android, android_clean, -4, 6))