# Analyzing App Market Trends to Maximize User Engagement and Revenue

We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

## Opening and Exploring the Data

Collecting data for over 4 million apps requires significant time and money, so we'll analyze a sample instead. To avoid spending resources on collecting new data, we'll use existing datasets. Fortunately, we have two suitable datasets:

A dataset of approximately 10,000 Android apps from Google Play, collected in August 2018. 
Link: https://dq-content.s3.amazonaws.com/350/googleplaystore.csv

A dataset of approximately 7,000 iOS apps from the App Store, collected in July 2017. 
Link: https://dq-content.s3.amazonaws.com/350/AppleStore.csv

We'll start by opening and exploring these two data sets.

In [1]:
from csv import reader

# Opening the App Store dataset
opened_file = open('AppleStore.csv') 
read_file = reader(opened_file) 
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]


# Opening the Google Play dataset
opened_file = open('googleplaystore.csv') 
read_file = reader(opened_file) 
android = list(read_file)
android_header = android[0]
android = android[1:]

To make these two data sets easier to explore, we created a function named explore_data() that we can repeatedly use to print rows in a readable way.

In [2]:
# Create a function named explore_data()
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

# Explore the first 5 rows of the android dataset 
print(android_header) # First row (header)
print('\n')
explore_data(android, 0, 5, True) # From index 0 to 5, print the number of rows and columns in the android dataset. 

# Explore the first 5 rows of the ios dataset 
print(ios_header) # First row (header)
print('\n')
explore_data(ios, 0, 5, True) # From index 0 to 5, print the number of rows and columns in the ios dataset. 


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

Our Google Play dataset has 10841 apps and 13 columns. The columns that could help us with our analysis are 'App', 'Category', 'Rating', 'Reviews', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres'.

Our App Store dataset has 7197 apps and 16 columns. The columns that could help us with our analysis are 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'prime_genre'.

## Deleting Wrong Data 
The Google Play dataset has a dedicated discussion section, and we can see that one of the discussions describes an error for wrong rating for entry 10472. This entry has missing 'Rating' and a column shift happened for next columns.

Link for discussion section: https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015

Let's determine the index of the row and print the row at that index to check if it's incorrect. We can compare this row with the header and another row that is correct.

In [3]:
# Check if the index of the row (entry 10472) is incorrect by comparing this row with the header and another row that is correct.
print(android_header) 
print('\n')
print(android[10472])
print('\n')
print(android[5])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


The entry (10472) is not missing 'Rating', instead, it's missing 'Category', and 'Genres' (9th column) is empty. We will have to delete this row.

In [4]:
# Remove entry 10472 from Google Play dataset.
print(len(android))
del android[10472] # Don't run the del statement more than once, otherwise more than one row will be deleted.
print(len(android))

10841
10840


## Removing Duplicate Entries: Part 1 
If we explore the Google Play data set long enough or look at the discussions section, we'll notice some apps have duplicate entries. For instance, the application Instagram has 4 entries:

In [5]:
for app in android: 
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Let's calculate how many duplicate apps are there in total.

In [6]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))

Number of duplicate apps: 1181


In total, there are 1,181 cases where an app appears more than once.

However, we will not remove the duplicates randomly. If we examine the rows we printed for the Instagram app, the main difference happens on the 4th position of each row, which corresponds to the number of reviews. The different numbers show the data was collected at different times.

We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

## Removing Duplicate Entries: Part 2 


Add conditions to check if the review count contains 'M' or 'K' and converts these strings into their appropriate numerical values by removing the letters and multiplying by the appropriate factor (1,000,000 for 'M' and 1,000 for 'K').

In [7]:
def convert_to_float(n_reviews_str):
    if 'M' in n_reviews_str:
        return float(n_reviews_str.replace('M', '')) * 1e6
    elif 'K' in n_reviews_str:
        return float(n_reviews_str.replace('K', '')) * 1e3
    else:
        return float(n_reviews_str)

In our previous step, we converted strings like 'M' or 'K' into their appropriate numerical values, now we can create a dictionary where each key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.

In [8]:
# Create a dictionary where each key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = app[3]
    
    if name in reviews_max and reviews_max[name] < n_reviews: 
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max: # Don't use an else clause here, or the number of reviews will be incorrectly updated whenever reviews_max[name] < n_reviews evaluates to False.
        reviews_max[name] = n_reviews

In total, there are 1,181 cases where an app appears more than once. Therefore, the length of our dictionary (of unique apps) should be equal to the difference between the length of our data set and 1,181.

In [9]:
print('Expected length:', len(android) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


In [10]:
# Use the dictionary we created above to remove the duplicate rows.
android_clean = [] # Store cleaned data set
already_added = [] # Store app names

for app in android:
    name = app[0]
    n_reviews = app[3]
    if (reviews_max[name]== n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

# Separate the header and the rest of the data in the android_clean dataset.
android_clean_header = android_clean[0]
android_clean = android_clean[1:]

Let's explore the android_clean dataset using the explore_data() function to ensure everything went as expected. The dataset should have 9,659 rows. 

In [11]:
explore_data(android_clean, 0, 3, True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9658
Number of columns: 13


In the previous steps, we managed to remove the duplicate app entries in the Google Play dataset. We don't need to do the same for the App Store data because there are no duplicates: 

In [12]:
duplicate_apps = []
unique_apps = []

for app in ios:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))

Number of duplicate apps: 0


## Removing Non-English Apps: Part One
We use English for the apps we develop at our company, and we'd like to analyze only the apps that are designed for an English-speaking audience. 

However, if we explore the data long enough, we'll find that both datasets have apps with names that suggest they are not designed for an English-speaking audience. Below, we see a couple of examples from both data sets:

In [13]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


AQ Coach
لعبة تقدر تربح DZ


The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. 

In [14]:
def english_or_not(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

# Test
print(english_or_not('Instagram'))
print(english_or_not('Docs To Go™ Free Office Suite'))
print(english_or_not('Instachat 😜'))
print(english_or_not('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(english_or_not('脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜'))
print('\n')
print(ord('™'))
print(ord('😜'))

True
False
False
False
False


8482
128540


## Removing Non-English Apps: Part 2
In the previous step, we wrote a function that detects non-English app names, but we saw that the function couldn't correctly identify certain English app names like 'Docs To Go™ Free Office Suite' and 'Instachat 😜'. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127.

If we're going to use the function we've created, we'll lose useful data since many English apps will be incorrectly labeled as non-English.

To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range:

In [15]:
def english_or_not(string):
    non_ascii_count = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii_count += 1
    
    if non_ascii_count > 3:
        return False
    else:
        return True

# Testing
print(english_or_not('Docs To Go™ Free Office Suite'))
print(english_or_not('Instachat 😜'))
print(english_or_not('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


The function is still not perfect, and very few non-English apps might get past our filter, but this seems good enough at this point in our analysis — we shouldn't spend too much time on optimization at this point.

Now we'll use the new function to filter out non-English apps from both datasets:

In [16]:
# Create 2 empty lists for English apps for both data sets.
ios_english =[]
android_english =[]

# Loop through Apple Store dataset.
for app in ios:
    name = app[1]
    if english_or_not(name):
        ios_english.append(app)

# Loop through Google Play dataset.
for app in android_clean:
    name = app[0]
    if english_or_not(name):
        android_english.append(app)
        
# Explore the datasets and see how many rows we have remaining for each dataset.        
print(explore_data(ios_english,0,3,True))
print('\n')
print(explore_data(android_english,0,3,True))

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16
None


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'Jun

We can see that we're left with 9614 Android apps and 6183 iOS apps.

## Isolating the Free Apps
As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our datasets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [17]:
# Create 2 empty lists for free apps
free_android_apps = []
free_ios_apps = []

# Loop through Google Play dataset to isolate the free apps in free_android_apps list.
for app in android_english:
    price = app[7]
    if price == '0':
        free_android_apps.append(app)
        
# Loop through Apple Store dataset to isolate the free apps in free_ios_apps list.
for app in ios_english:
    price = app[4]
    if price == '0.0':
        free_ios_apps.append(app)
        
# Check the length of each dataset to see how many apps we have remaining.
print(len(free_android_apps))
print(len(free_ios_apps))

8861
3222


We're left with 8864 Android apps and 3222 iOS apps, which should be enough for our analysis.

## Most Common Apps by Genre: Part one
As we mentioned in the introduction, our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

To minimize risks and overhead, our validation strategy for an app idea has three steps:

1. Build a minimal Android version of the app, and add it to Google Play.

2. If the app has a good response from users, we develop it further.

3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by determining the most common genres for each market. For this, we'll build a frequency table for the 'prime_genre' column of the App Store data set, and the 'Genres' and 'Category' columns of the Google Play data set.

## Most Common Apps by Genre: Part two 
We'll build two functions we can use to analyze the frequency tables:

1. One function to generate frequency tables that show percentages

2. Another function we can use to display the percentages in a descending order

In [18]:
# Create a function named freq_table() that takes in two inputs: dataset (which will be a list of lists) and index (which will be an integer).
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

# Create a function named display_table() to display the frequency table of the columns prime_genre, Genres, and Category
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

## Most Common Apps by Genre: Part Three
We start by examining the frequency table for the prime_genre column of the App Store data set.

In [19]:
display_table(free_ios_apps, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The most common genre is Games at 58.16%, suggesting a significant focus on entertainment.

Other entertainment-related categories include Entertainment (7.88%), Photo & Video (4.97%), Social Networking (3.29%), Music (2.05%), and Sports (2.14%).
Combined, these entertainment-focused categories account for roughly 78.49% of the apps.

Practical categories include Education (3.66%), Shopping (2.61%), Utilities (2.51%), Health & Fitness (2.02%), Productivity (1.74%), Lifestyle (1.58%), News (1.33%), Travel (1.24%), Finance (1.12%), Weather (0.87%), Food & Drink (0.81%), Reference (0.56%), Business (0.53%), Navigation (0.19%), Medical (0.19%), and Catalogs (0.12%).
These categories combined account for 21.51% of the apps.

Let's continue by examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).

In [20]:
# Category 
display_table(free_android_apps, 1)

FAMILY : 18.936914569461685
GAME : 9.694165444080802
TOOLS : 8.452770567656021
BUSINESS : 4.593161042771697
LIFESTYLE : 3.9047511567543167
PRODUCTIVITY : 3.8934657487868187
FINANCE : 3.7016138133393524
MEDICAL : 3.521047285859384
SPORTS : 3.3969077982169056
PERSONALIZATION : 3.317909942444419
COMMUNICATION : 3.2389120866719328
HEALTH_AND_FITNESS : 3.080916375126961
PHOTOGRAPHY : 2.9454914795169844
NEWS_AND_MAGAZINES : 2.7987811759395105
SOCIAL : 2.663356280329534
TRAVEL_AND_LOCAL : 2.3360794492720913
SHOPPING : 2.245796185532107
BOOKS_AND_REFERENCE : 2.144227513824625
DATING : 1.8620923146371742
VIDEO_PLAYERS : 1.794379866832186
MAPS_AND_NAVIGATION : 1.3993905879697552
FOOD_AND_DRINK : 1.2413948764247829
EDUCATION : 1.1736824286197944
ENTERTAINMENT : 0.9592596772373322
LIBRARIES_AND_DEMO : 0.9366888613023362
AUTO_AND_VEHICLES : 0.9254034533348381
HOUSE_AND_HOME : 0.8238347816273558
WEATHER : 0.8012639656923597
EVENTS : 0.7109807019523756
PARENTING : 0.6545536621148855
ART_AND_DESIGN : 

The most common category on Google Play is Family, accounting for 18.94% of the apps. Upon closer examination, it becomes evident that this category predominantly comprises games for kids, indicating a strong focus on child-friendly entertainment. The Game category follows at 9.69%, further underscoring the platform's significant interest in entertainment. Other notable entertainment-related categories include Sports (3.40%), Photography (2.95%), Social (2.66%), and Entertainment (0.96%). Combined, these entertainment-focused categories represent approximately 38.64% of the apps.

In [21]:
# Genres
display_table(free_android_apps, -4)

Tools : 8.441485159688522
Entertainment : 6.071549486513937
Education : 5.349283376594064
Business : 4.593161042771697
Productivity : 3.8934657487868187
Lifestyle : 3.8934657487868187
Finance : 3.7016138133393524
Medical : 3.521047285859384
Sports : 3.464620246021894
Personalization : 3.317909942444419
Communication : 3.2389120866719328
Action : 3.103487191061957
Health & Fitness : 3.080916375126961
Photography : 2.9454914795169844
News & Magazines : 2.7987811759395105
Social : 2.663356280329534
Travel & Local : 2.3247940413045933
Shopping : 2.245796185532107
Books & Reference : 2.144227513824625
Simulation : 2.0426588421171425
Dating : 1.8620923146371742
Arcade : 1.8508069066696762
Video Players & Editors : 1.7718090508971898
Casual : 1.749238234962194
Maps & Navigation : 1.3993905879697552
Food & Drink : 1.2413948764247829
Puzzle : 1.1285407967498025
Racing : 0.9931159011398263
Role Playing : 0.9366888613023362
Libraries & Demo : 0.9366888613023362
Auto & Vehicles : 0.925403453334838

While it's difficult to tell exactly how the Genres and Category columns differ from one another, we can tell that the Genres column is significantly more detailed due to its larger number of categories. For now, we're only concerned with the broad picture, thus going forward, we'll only be working with the Category column.

## Most Popular Apps by Genre on the App Store
The frequency tables we analyzed on the previous steps showed us that apps designed for fun dominate the App Store, while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to determine the kind of apps with the most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Let's start with calculating the average number of user ratings per app genre on the App Store.

In [22]:
# Use the freq_table() function we wrote to generate a frequency table for the prime_genre column to get the unique app genres 
prime_genre_ios = freq_table(ios, -5)
print(prime_genre_ios)

# Loop over the unique genres of the App Store dataset.
for genre in prime_genre_ios:
    total = 0
    len_genre = 0
    
    for app in free_ios_apps:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

{'Social Networking': 2.3204112824788106, 'Photo & Video': 4.849242740030569, 'Games': 53.66124774211477, 'Music': 1.9174656106711132, 'Reference': 0.8892594136445742, 'Health & Fitness': 2.501042100875365, 'Weather': 1.0004168403501459, 'Utilities': 3.4458802278727245, 'Travel': 1.1254689453939142, 'Shopping': 1.6951507572599693, 'News': 1.0421008753647354, 'Navigation': 0.6391552035570377, 'Lifestyle': 2.0008336807002918, 'Entertainment': 7.433652910935113, 'Food & Drink': 0.8753647353063776, 'Sports': 1.5839933305543976, 'Book': 1.5562039738780047, 'Finance': 1.445046547172433, 'Education': 6.294289287203002, 'Productivity': 2.473252744198972, 'Business': 0.7919966652771988, 'Catalogs': 0.1389467833819647, 'Medical': 0.31957760177851885}
Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Tra

In [23]:
# Import the 'operator' module, which provides functions for common operations like sorting.
import operator

# Create a dictionary of categories and average number of user ratings
ratings = {
    "Social Networking": 71548.34905660378,
    "Photo & Video": 28441.54375,
    "Games": 22788.6696905016,
    "Music": 57326.530303030304,
    "Reference": 74942.11111111111,
    "Health & Fitness": 23298.015384615384,
    "Weather": 52279.892857142855,
    "Utilities": 18684.456790123455,
    "Travel": 28243.8,
    "Shopping": 26919.690476190477,
    "News": 21248.023255813954,
    "Navigation": 86090.33333333333,
    "Lifestyle": 16485.764705882353,
    "Entertainment": 14029.830708661417,
    "Food & Drink": 33333.92307692308,
    "Sports": 23008.898550724636,
    "Book": 39758.5,
    "Finance": 31467.944444444445,
    "Education": 7003.983050847458,
    "Productivity": 21028.410714285714,
    "Business": 7491.117647058823,
    "Catalogs": 4004.0,
    "Medical": 612.0
}

# Sort the dictionary 'ratings' in descending order, and convert the sorted list of key-value pairs back into a dictionary named 'sorted_ratings'
sorted_ratings = dict(sorted(ratings.items(), key=operator.itemgetter(1), reverse=True))
print(sorted_ratings)

{'Navigation': 86090.33333333333, 'Reference': 74942.11111111111, 'Social Networking': 71548.34905660378, 'Music': 57326.530303030304, 'Weather': 52279.892857142855, 'Book': 39758.5, 'Food & Drink': 33333.92307692308, 'Finance': 31467.944444444445, 'Photo & Video': 28441.54375, 'Travel': 28243.8, 'Shopping': 26919.690476190477, 'Health & Fitness': 23298.015384615384, 'Sports': 23008.898550724636, 'Games': 22788.6696905016, 'News': 21248.023255813954, 'Productivity': 21028.410714285714, 'Utilities': 18684.456790123455, 'Lifestyle': 16485.764705882353, 'Entertainment': 14029.830708661417, 'Business': 7491.117647058823, 'Education': 7003.983050847458, 'Catalogs': 4004.0, 'Medical': 612.0}


We can see that the top 5 most popular apps are 'Navigation','Reference', 'Social Networking', 'Music', and 'Weather'. Among all, navigation apps have the highest number of user reviews of around 86090. However, this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [24]:
for app in free_ios_apps:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Similar trends apply to social networking apps, where a small number of dominant players—Facebook, Pinterest, Skype, etc.—have a significant impact on the average quantity. The same applies for music apps, where a small number of major companies greatly influence the average number, such as Shazam, Pandora, and Spotify.

Though social networking, music, and navigation apps may appear more popular than they actually are, our goal is to identify popular categories. A small number of apps, with hundreds of thousands of user ratings, appear to be skewing the average amount of ratings, and other apps may find it difficult to surpass the 10,000 mark. We'll save this level of information for later. By deleting these incredibly popular apps for each genre, we may obtain a better picture and subsequently adjust the averages.

Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:

In [25]:
for app in free_ios_apps:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


One potential strategy is to convert books into mobile apps, targeting classic novels, literary works, and other ﬁctional and non-ﬁctional content. Another ﬁeld for potential growth is the development of guide apps, with a focus on popular games. 

This concept aligns with the current landscape of the App Store, which is dominated by for-fun apps. This indicates that the market may be somewhat saturated with for-fun apps, which suggests that a practical app may have a greater opportunity to distinguish itself among the vast number of apps on the App Store.

In [26]:
for app in free_ios_apps:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [27]:
for app in free_ios_apps:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [28]:
for app in free_ios_apps:
    if app[-5] == 'Weather':
        print(app[1], ':', app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

Other genres that are popular include 'Social Networking', 'Music', and 'Weather'. We have decided not to develop any apps in these categories. These genres are highly saturated, with numerous well-established players dominating the market. Without a unique selling point, it would be extremely challenging to compete effectively and attract a substantial user base.

## Most Popular Apps by Genre on Google Play

In [29]:
# Display Installs columns
display_table(free_android_apps, 5)

1,000,000+ : 15.743144114659746
100,000+ : 11.556257758717978
10,000,000+ : 10.518000225708159
10,000+ : 10.190723394650716
1,000+ : 8.396343527818532
100+ : 6.91795508407629
5,000,000+ : 6.838957228303803
500,000+ : 5.574991535944025
50,000+ : 4.773727570251665
5,000+ : 4.51416318699921
10+ : 3.5436181017943795
500+ : 3.250197494639431
50,000,000+ : 2.290937817402099
100,000,000+ : 2.1216566978896285
50+ : 1.9185193544746644
5+ : 0.7899785577248618
1+ : 0.5078433585374111
500,000,000+ : 0.27084979121995256
1,000,000,000+ : 0.22570815934996052
0+ : 0.0451416318699921
0 : 0.011285407967498025


We can see that most values are open-ended (100+, 1,000+, 5,000+, etc.). For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users. 

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. 

To perform computations, however, we'll need to convert each install number from a string to a float. This means we need to remove the commas and the plus characters, or the conversion will fail and cause an error.

Let's start by generating a frequency table for the Category column of the Google Play dataset to get the unique app genres.

In [30]:
# Use the freq_table() function we wrote to generate a frequency table for the Category column to get the unique app genres 
category_android = freq_table(free_android_apps, 1)
print(category_android)

# Loop over the category of the Google Play dataset.
for category in category_android:
    total = 0
    len_category = 0
    
    for app in free_android_apps:
        category_app = app[1]
        if category_app == category:    
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

{'ART_AND_DESIGN': 0.6319828461798894, 'AUTO_AND_VEHICLES': 0.9254034533348381, 'BEAUTY': 0.5981266222773953, 'BOOKS_AND_REFERENCE': 2.144227513824625, 'BUSINESS': 4.593161042771697, 'COMICS': 0.6206974382123914, 'COMMUNICATION': 3.2389120866719328, 'DATING': 1.8620923146371742, 'EDUCATION': 1.1736824286197944, 'ENTERTAINMENT': 0.9592596772373322, 'EVENTS': 0.7109807019523756, 'FINANCE': 3.7016138133393524, 'FOOD_AND_DRINK': 1.2413948764247829, 'HEALTH_AND_FITNESS': 3.080916375126961, 'HOUSE_AND_HOME': 0.8238347816273558, 'LIBRARIES_AND_DEMO': 0.9366888613023362, 'LIFESTYLE': 3.9047511567543167, 'GAME': 9.694165444080802, 'FAMILY': 18.936914569461685, 'MEDICAL': 3.521047285859384, 'SOCIAL': 2.663356280329534, 'SHOPPING': 2.245796185532107, 'PHOTOGRAPHY': 2.9454914795169844, 'SPORTS': 3.3969077982169056, 'TRAVEL_AND_LOCAL': 2.3360794492720913, 'TOOLS': 8.452770567656021, 'PERSONALIZATION': 3.317909942444419, 'PRODUCTIVITY': 3.8934657487868187, 'PARENTING': 0.6545536621148855, 'WEATHER':

In [31]:
# Import the 'operator' module, which provides functions for common operations like sorting.
import operator

# Create a dictionary of categories and average number of installs
installs = {
    'ART_AND_DESIGN' : 2021626.7857142857,
    'AUTO_AND_VEHICLES' : 647317.8170731707,
    'BEAUTY' : 513151.88679245283,
    'BOOKS_AND_REFERENCE' : 8767811.894736841,
    'BUSINESS' : 1712290.1474201474,
    'COMICS' : 817657.2727272727,
    'COMMUNICATION' : 38456119.167247385,
    'DATING' : 854028.8303030303,
    'EDUCATION' : 1820673.076923077,
    'ENTERTAINMENT' : 11640705.88235294,
    'EVENTS' : 253542.22222222222,
    'FINANCE' : 1387692.475609756,
    'FOOD_AND_DRINK' : 1924897.7363636363,
    'HEALTH_AND_FITNESS' : 4188821.9853479853,
    'HOUSE_AND_HOME' : 1331540.5616438356,
    'LIBRARIES_AND_DEMO' : 638503.734939759,
    'LIFESTYLE' : 1437816.2687861272,
    'GAME' : 15560965.599534342,
    'FAMILY' : 3694276.334922527,
    'MEDICAL' : 120616.48717948717,
    'SOCIAL' : 23253652.127118643,
    'SHOPPING' : 7036877.311557789,
    'PHOTOGRAPHY' : 17805627.643678162,
    'SPORTS' : 3638640.1428571427,
    'TRAVEL_AND_LOCAL' : 13984077.710144928,
    'TOOLS' : 10682301.033377837,
    'PERSONALIZATION' : 5201482.6122448975,
    'PRODUCTIVITY' : 16787331.344927534,
    'PARENTING' : 542603.6206896552,
    'WEATHER' : 5074486.197183099,
    'VIDEO_PLAYERS' : 24727872.452830188,
    'NEWS_AND_MAGAZINES' : 9549178.467741935,
    'MAPS_AND_NAVIGATION' : 4056941.7741935486
}

# Sort the dictionary 'installs' in descending order, and convert the sorted list of key-value pairs back into a dictionary named 'sorted_installs'
sorted_installs = dict(sorted(installs.items(), key=operator.itemgetter(1), reverse=True))
print(sorted_installs)

{'COMMUNICATION': 38456119.167247385, 'VIDEO_PLAYERS': 24727872.452830188, 'SOCIAL': 23253652.127118643, 'PHOTOGRAPHY': 17805627.643678162, 'PRODUCTIVITY': 16787331.344927534, 'GAME': 15560965.599534342, 'TRAVEL_AND_LOCAL': 13984077.710144928, 'ENTERTAINMENT': 11640705.88235294, 'TOOLS': 10682301.033377837, 'NEWS_AND_MAGAZINES': 9549178.467741935, 'BOOKS_AND_REFERENCE': 8767811.894736841, 'SHOPPING': 7036877.311557789, 'PERSONALIZATION': 5201482.6122448975, 'WEATHER': 5074486.197183099, 'HEALTH_AND_FITNESS': 4188821.9853479853, 'MAPS_AND_NAVIGATION': 4056941.7741935486, 'FAMILY': 3694276.334922527, 'SPORTS': 3638640.1428571427, 'ART_AND_DESIGN': 2021626.7857142857, 'FOOD_AND_DRINK': 1924897.7363636363, 'EDUCATION': 1820673.076923077, 'BUSINESS': 1712290.1474201474, 'LIFESTYLE': 1437816.2687861272, 'FINANCE': 1387692.475609756, 'HOUSE_AND_HOME': 1331540.5616438356, 'DATING': 854028.8303030303, 'COMICS': 817657.2727272727, 'AUTO_AND_VEHICLES': 647317.8170731707, 'LIBRARIES_AND_DEMO': 638

We can see that the top 5 most popular apps are 'COMMUNICATION', 'VIDEO_PLAYERS', 'SOCIAL', 'PHOTOGRAPHY', and 'PRODUCTIVITY'. Among all, communication apps have the most installs: 38,456,119. Let's take a closer look at 'COMMUNICATION':

In [32]:
for app in free_android_apps:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

We can see the number of installs (38,456,119) is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs.

In [33]:
for app in free_android_apps:
    if app[1] == 'VIDEO_PLAYERS' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+


We see the same pattern for the 'VIDEO_PLAYERS' category, the market is dominated by apps like Youtube, Google Play Movies & TV. 

The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos, PicsArt Photo Studio, LINE Camera, etc.), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.):

In [34]:
for app in free_android_apps:
    if app[1] == 'SOCIAL' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


In [35]:
for app in free_android_apps:
    if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+


In [36]:
for app in free_android_apps:
    if app[1] == 'PRODUCTIVITY' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Microsoft Word : 500,000,000+
Microsoft Outlook : 100,000,000+
Microsoft OneDrive : 100,000,000+
Microsoft OneNote : 100,000,000+
Google Keep : 100,000,000+
ES File Explorer File Manager : 100,000,000+
Dropbox : 500,000,000+
Google Docs : 100,000,000+
Microsoft PowerPoint : 100,000,000+
Samsung Notes : 100,000,000+
SwiftKey Keyboard : 100,000,000+
Google Drive : 1,000,000,000+
Adobe Acrobat Reader : 100,000,000+
Google Sheets : 100,000,000+
Microsoft Excel : 100,000,000+
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100,000,000+
Google Slides : 100,000,000+
ColorNote Notepad Notes : 100,000,000+
Evernote – Organizer, Planner for Notes & Memos : 100,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+
CamScanner - Phone PDF Creator : 100,000,000+


The biggest concern is that these app genres may appear to be more popular than they actually are. Moreover, it appears that a small number of powerful rivals dominate these areas.

The books and reference genre also appears to be a popular choice, with an average number of installations of 8,767,811. We believe there is potential for this genre to be successful on the App Store, and our objective is to recommend an app genre that has the potential to be profitable on both the App Store and Google Play. 

We can take a look at some of the apps from this genre and their number of installs:

In [37]:
for app in free_android_apps:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [38]:
for app in free_android_apps:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


We can see that there's still a small number of extremely popular apps like Google Play Books, Amazon Kindle, that skew the average. However, it looks like there are only a few very popular apps, so this market still shows potential. 

Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 5,000,000 and 100,000,000 downloads).

In [39]:
for app in free_android_apps:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
FBReader: Favorite Book Reader : 10,000,000+
AlReader -any text book reader : 5,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Quran for Android : 10,000,000+
Dictionary.com: Find Definitions for English Words : 10,000,000+
English Dictionary - Offline : 10,000,000+
Bible KJV : 5,000,000+
NOOK: Read eBooks & Magazines : 10,000,000+
Dictionary : 10,000,000+
Spanish English Translator : 10,000,000+
Dictionary - Merriam-Webster : 10,000,000+
JW Library : 10,000,000+
Oxford Dictionary of English : Free : 10,000,000+
English Hindi Dictionary : 10,000,000+
English to Hindi Diction

The market for this niche is dominated by softwares for processing and reading e-books, as well as various collections of libraries and dictionaries. Given the intense competition in this space, it is unlikely that developing similar apps would be a viable business strategy.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book.

# Conclusions
In this project, we aimed to find mobile app profiles that are profitable for the App Store and Google Play markets.

We came to the conclusion that creating an app based on a well-known book—classic novels, literary works, and other fictional and non-fictional content—might be lucrative for both the Google Play and App Store marketplaces. We need to provide some additional features in addition to the book's raw version because the markets are already crowded with libraries. Daily quotes from the book, an audio version of the book, a discussion forum, and so on might all fall under this category.