# App popularity project

what the project is about

The aim is to help our developers understand what type of apps are likely to attract more users on Google Play and the App Store

In [1]:
from csv import reader

In [2]:
# quick way to explore the dataset
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns: #if the header is removed, this is True
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        

In [3]:
apple = (list(reader(open('AppleStore.csv'))))
android = (list(reader(open('googleplaystore.csv'))))

In [4]:
explore_data(apple,0,3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16


In [5]:
explore_data(android,0,3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


In [10]:
#remove row with faulty data. Don't want to run this command again
del android[10472]
del android[10473]

In [11]:
# check for multiple entries

# create two lists, when you encounter an app for the first time
# add it to unique apps list, if encountered again, add it to duplicate list
duplicate_apps = []
unique_apps = []
for row in android:
    name = row[0]
    if name in unique_apps:
        duplicate_apps.append(row)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\na')
print('Examples of duplicate apps:',duplicate_apps[:4])

Number of duplicate apps: 1180

a
Examples of duplicate apps: [['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up'], ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device'], ['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up'], ['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 20, 2018', '4.1.28165.0716', '4.0 and up']]


We can see that there are duplicate entries in this dataset,
this is because the data recordings for these apps were taken at different times. So we are going to get unique apps by only keeping the apps with the highest review, so the latest entry for each app

In [12]:

# # we are assuming that the highest number of reviews means that it
# is the latest entry and we want to keep that. We start with a dictionary
# that for each app (name), records its reviews

# So if an app name is in the dictionary as a key, check if this row's
# rating is higher than that entry's value, if it is, record the new rating 
# as the value for that key in the dictionary.

# We know we do not want anything done when there is that app (key) in 
# dictionary, but the rating there is the higher, then do not do anything
# for this row and move on. So do not write an if or elif for this since
# nothing needs to be done except move to the next loop.

reviews_max = {}

for row in android[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Expected length:', len(android) - 1181) #1181 is the number of duplicate elements we found above
print('Actual length:', len(reviews_max))
    
    

Expected length: 9657
Actual length: 9657


In [13]:
# # use dictionary to remove dupliucates
# now that we have a dictionary with the app(name ) and its highest
# value, we are going to loop over each value in df, get the name and
# n_reviews column, then check if the name (the key in the dictionary)
# in the row matches the value in the dictionary. If it does, append
# that row to the clean list of List. 

android_clean = []
already_added = []
for row in android[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)

explore_data(android_clean, 0,3, True )
        
        


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9657
Number of columns: 13


Now we are going to focus on only apps with english names

In [26]:

# # a function that takes a string and outputs False if there is any non-english character, otherwise it outputs True
# Go through each row name and see if there is any character in the string
# with the ord to be greater than 127, if it is, probably a foregign character
# . SO for each string, go through its character and keep a count 
# of how many non-eng character it has. Then if there are more than 3
# for a string (name). 

# First create the function that operates over one input string and 
# outputs false if there are more than 3 non_eng characters 

def eng_char(string):
    non_eng = []
    for character in string:
        num = ord(character)
        if num > 127:
            non_eng.append(num)
    if len(non_eng)>3:
        return False
    else:
        return True 
 

#testing 
print(eng_char('Docs To Go™ Free Office Suite'))
print(eng_char('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_char('Instachat 😜'))



True
False
True


To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. Our filter function is still not perfect, but it should be fairly effective.

In [27]:
android_clean = []
for row in android[1:]:
    name = row[0]
    if eng_char(name) == True:
        android_clean.append(row)
        
apple_clean = []
for row in apple[1:]:
    name = row[1]
    if eng_char(name) == True:
        apple_clean.append(row)

print(len(android_clean))
print(len(apple_clean))

10796
6183


As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

Isolating the free apps will be our last step in the data cleaning process.

In [28]:
android_final=[]
for row in android_clean:
    price = row[7]
    if price == '0':
        android_final.append(row)
        
ios_final=[]
for row in apple_clean:
    price = row[4]
    if price == '0.0':
        ios_final.append(row)
        
        
print(len(android_final))
print(len(ios_final))

9999
3222


So far, we have:

Removed inaccurate data
Removed duplicate app entries
Removed non-English apps
Isolated the free apps

Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

Build a minimal Android version of the app, and add it to Google Play.
If the app has a good response from users, we develop it further.
If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

**Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.**

In [29]:
explore_data(android_final,0,3)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




In [30]:
explore_data(ios_final,0,3)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']




We'll build two functions we can use to analyze the frequency tables:

One function to generate frequency tables that show percentages
Another function we can use to display the percentages in a descending order


we'll need to make use of the built-in sorted() function. This function takes in an iterable data type (like a list, dictionary, tuple, etc.), and returns a list of the elements of that iterable sorted in ascending or descending order

The sorted() function doesn't work too well with dictionaries because it only considers and returns the dictionary keys.



**the sorted() function works well if we transform the dictionary into a list of tuples, where each tuple contains a dictionary key along with its corresponding dictionary value.**

The display_table() function:

Takes in two parameters: dataset and index. dataset is expected to be a list of lists, and index is expected to be an integer.

Generates a frequency table using the freq_table() function

Transforms the frequency table into a list of tuples, then sorts the list in a descending order.

Prints the entries of the frequency table in descending order.

freq_table() that takes in two inputs: dataset (which is expected to be a list of lists) and index (which is expected to be an integer).

The function should return the frequency table (as a dictionary) for any column we want. The frequencies should also be expressed as percentages.
We already learned how to build frequency tables in the mission on dictionaries.

In [32]:

#this loops around the dataset and creates a dictionary of each genre and how 
#many times it appears

def freq_table(dataset,index):
    freq_dict = {}
    for row in dataset:
        col = row[index]
        if col in freq_dict:
            freq_dict[col] += 1
        else:
            freq_dict[col] = 1
            
    for key in freq_dict:
        freq_dict[key] = (freq_dict[key]/len(dataset))*100
        

    return freq_dict


# takes a freq dictionary, for each key (frequency), create a tuple
# with freq_val (value) and then the freq (key) and append this tuple 
# to a list.
# sort this new list and then go through this tuple, print second
# tuple element first and first element after

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We start by examining the frequency table for the prime_genre column of the App Store data set.



In [33]:
display_table(ios_final, -5)


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Let's continue by examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).

In [34]:
display_table(android_final, 1) # Category


FAMILY : 17.67176717671767
GAME : 10.591059105910592
TOOLS : 7.640764076407641
BUSINESS : 4.45044504450445
PRODUCTIVITY : 3.95039503950395
SPORTS : 3.6003600360036003
LIFESTYLE : 3.5903590359035906
COMMUNICATION : 3.5903590359035906
MEDICAL : 3.5403540354035403
FINANCE : 3.49034903490349
HEALTH_AND_FITNESS : 3.2503250325032504
PHOTOGRAPHY : 3.1203120312031203
PERSONALIZATION : 3.08030803080308
SOCIAL : 2.9202920292029204
NEWS_AND_MAGAZINES : 2.7702770277027704
SHOPPING : 2.5702570257025705
TRAVEL_AND_LOCAL : 2.4602460246024602
DATING : 2.2702270227022705
BOOKS_AND_REFERENCE : 1.9901990199019903
VIDEO_PLAYERS : 1.7001700170017002
EDUCATION : 1.5101510151015103
ENTERTAINMENT : 1.4701470147014701
MAPS_AND_NAVIGATION : 1.3001300130013
FOOD_AND_DRINK : 1.25012501250125
HOUSE_AND_HOME : 0.88008800880088
LIBRARIES_AND_DEMO : 0.8400840084008401
AUTO_AND_VEHICLES : 0.8200820082008201
WEATHER : 0.7400740074007401
EVENTS : 0.6300630063006301
ART_AND_DESIGN : 0.6100610061006101
COMICS : 0.59005900

The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:

In [35]:
display_table(android_final, -4)


Tools : 7.630763076307631
Entertainment : 6.0006000600060005
Education : 5.1305130513051305
Business : 4.45044504450445
Productivity : 3.95039503950395
Sports : 3.74037403740374
Communication : 3.5903590359035906
Lifestyle : 3.5803580358035805
Medical : 3.5403540354035403
Finance : 3.49034903490349
Action : 3.4103410341034106
Health & Fitness : 3.2503250325032504
Photography : 3.1203120312031203
Personalization : 3.08030803080308
Social : 2.9202920292029204
News & Magazines : 2.7702770277027704
Shopping : 2.5702570257025705
Travel & Local : 2.45024502450245
Dating : 2.2702270227022705
Arcade : 2.0002000200020005
Books & Reference : 1.9901990199019903
Simulation : 1.8801880188018802
Casual : 1.84018401840184
Video Players & Editors : 1.6801680168016802
Maps & Navigation : 1.3001300130013
Food & Drink : 1.25012501250125
Puzzle : 1.21012101210121
Racing : 0.9500950095009502
Strategy : 0.9300930093009301
House & Home : 0.88008800880088
Role Playing : 0.87008700870087
Libraries & Demo : 0.8

we generated frequency tables for the columns prime_genre, Genres, and Category. We'll now focus on analyzing these frequency tables.

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Start by generating a frequency table for the prime_genre column to get the unique app genres (below, we'll need to loop over the unique genres). You can use the freq_table() function you wrote in a previous screen.

In [38]:
genres_ios = freq_table(ios_final, -5)

#genre_ios is a dictionary of a genre and their frequence, 
#now we want the average number of ratings for an of that genre

for genre in genres_ios:
    genre_rating_total= 0
    len_genre = 0
    for row in ios_final:
        genre_app=row[-5]
        n_ratings = float(row[5])
        if genre_app == genre:
            genre_rating_total += n_ratings
            len_genre += 1
    avg_n_ratings = genre_rating_total / len_genre
    print(genre, ':', avg_n_ratings)
    
            
        
#THIS IS THE AVG NUMBER OD RATINGS FOR AN APP FROM THIS GENRE
    

Social Networking : 71548.34905660378
Music : 57326.530303030304
Finance : 31467.944444444445
Reference : 74942.11111111111
Sports : 23008.898550724636
Photo & Video : 28441.54375
Travel : 28243.8
Entertainment : 14029.830708661417
Shopping : 26919.690476190477
Productivity : 21028.410714285714
Navigation : 86090.33333333333
Games : 22788.6696905016
Weather : 52279.892857142855
Health & Fitness : 23298.015384615384
Medical : 612.0
Catalogs : 4004.0
News : 21248.023255813954
Business : 7491.117647058823
Utilities : 18684.456790123455
Education : 7003.983050847458
Food & Drink : 33333.92307692308
Lifestyle : 16485.764705882353
Book : 39758.5


# Most Popular Apps by Genre on Google Play
For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

Start by generating a frequency table for the Category column of the Google Play data set to get the unique app genres


In [39]:

#getting the avg number of installs of app per category/genre
categories_android = freq_table(android_final, 1)

#use the genre/category frequency table dictionary to get unique 
#genres and then for each loop through the df and get the avg number 
#of installs for an app of that genre

for category in categories_android:
    total = 0
    len_category = 0
    for row in android_final:
        category_app = row[1]
        n_installs = row[5]
        #now we are going to replace the commas and plus in the number 
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '')
        if category_app == category:
            total += float(n_installs)
            len_category +=1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)
    
    

EDUCATION : 5760596.026490066
AUTO_AND_VEHICLES : 647317.8170731707
MAPS_AND_NAVIGATION : 5569698.307692308
HOUSE_AND_HOME : 1917187.0568181819
MEDICAL : 147563.28813559323
FAMILY : 5784094.900962083
SOCIAL : 48184458.56849315
LIFESTYLE : 1479956.6267409471
BOOKS_AND_REFERENCE : 9655197.28643216
WEATHER : 5747142.162162162
PARENTING : 542603.6206896552
COMICS : 950443.220338983
FINANCE : 2511355.6790830945
GAME : 33111302.596789423
ART_AND_DESIGN : 2038050.8196721312
PRODUCTIVITY : 35885137.50379747
VIDEO_PLAYERS : 36599010.11764706
FOOD_AND_DRINK : 2190710.008
SHOPPING : 12637504.221789883
BEAUTY : 513151.88679245283
BUSINESS : 2250454.1348314607
ENTERTAINMENT : 19516734.69387755
PHOTOGRAPHY : 32321374.407051284
SPORTS : 4860918.563888889
EVENTS : 253542.22222222222
COMMUNICATION : 90935671.86908078
NEWS_AND_MAGAZINES : 27058831.263537906
PERSONALIZATION : 7533233.402597402
DATING : 1164270.7356828193
TRAVEL_AND_LOCAL : 27921561.32520325
TOOLS : 14988276.79842932
HEALTH_AND_FITNESS : 

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [40]:
for row in android_final:
    if row[1] == 'COMMUNICATION' and (row[5] == '1,000,000,000+'
                                      or row[5] == '500,000,000+'
                                      or row[5] == '100,000,000+'):
        print(row[0], ':', row[5])

Messenger – Text and Video Chat for Free : 1,000,000,000+
WhatsApp Messenger : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Viber Messenger : 500,000,000+
Firefox Browser fast & private : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
imo beta free calls and text : 100,000,000+
imo free video calls and chat : 500,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Who : 100,000,000+
WeChat : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Android Messages : 100,000,000+
Telegram : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
WhatsApp Messenger : 1,000,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Viber Messenger : 500,000,000+
Hangouts : 1,000,000

In [41]:
#If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:


under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

4386993.665492957

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

In [42]:

for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Wattpad 📖 Free Books : 100,000,000+
Amazon Kindle : 100,000,000+
Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


# Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.