# Profitable App Profiles for the App Store and Google Play Markets
Aim of the project is to find mobile app profiles that are profitable for the App Store and Google Play markets.

Here our main concentration is to build apps that are free to download and install considering the in-app ads as the main source of revenue. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

# Exploring the datasets


The data set of Android apps from Google Play contains data of ten thousand apps. You can download the data set directly from this link.
https://www.kaggle.com/lava18/google-play-store-apps
The data set of ios apps from Apple Store contains data of seven thousand apps. You can download the data set directly from this link.
https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

Lets start with cleaning the datasets and then perform analysis

In [1]:
import pandas as pd

Load the datasets and display first 5 rows

In [2]:
android =pd.read_csv("googleplaystore.csv")
android.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [3]:
ios= pd.read_csv("AppleStore.csv")
ios.head()

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
4,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1


See the shape of datasets i.e. number of columns and orws

In [4]:
android.shape

(10841, 13)

In [5]:
ios.shape

(7197, 16)

check the columns in both the datasets to find important variables

In [6]:
android.columns
#https://www.kaggle.com/lava18/google-play-store-apps

Index(['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type',
       'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver',
       'Android Ver'],
      dtype='object')

In the android dataset the columns 'App', 'Category', 'Reviews', 'Installs', 'Type','Price' seems to be important

In [7]:
ios.columns
#https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

Index(['id', 'track_name', 'size_bytes', 'currency', 'price',
       'rating_count_tot', 'rating_count_ver', 'user_rating',
       'user_rating_ver', 'ver', 'cont_rating', 'prime_genre',
       'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'],
      dtype='object')

In ios dataset 'track_name','price', 'rating_count_tot', 'rating_count_ver' and 'prime_genre' might be useful

# Deleting wrong data
From the discussions section of the datasource it is found that there is an error in an entry of the android dataset android.
The row of index 10472 recorded rating as 19 where 5 is the maximum rating given, hence we would delete the row

In [8]:
android.iloc[10472]

App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                              3.0M
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object

In [9]:
android.drop(10472,inplace= True)

# Removing duplicating records
let us check if there are any duplicate records in both the datasets

In [10]:
android_dup =android[android.duplicated(subset='App')]

In [11]:
android_dup.shape

(1181, 13)

There are 1181 duplicate records in the android dataset

In [12]:
len(ios.id.unique())

7197

No duplicate records found in the ios dataset

Let us see an example showing few duplicate records of android dataset. We can observe Instagram app is recorded four times in the dataset.

In [13]:
android_sort= android.sort_values(['App','Reviews'])
android_sort[android_sort['App']=='Instagram']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
3909,Instagram,SOCIAL,4.5,66509917,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
2545,Instagram,SOCIAL,4.5,66577313,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
2611,Instagram,SOCIAL,4.5,66577313,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device
2604,Instagram,SOCIAL,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device


The duplicate records have different values in the Reviews column. Let us assume that the row which contains more reviews as recent update and delete all the other rows keeping the recent one

In [14]:
android_sort.drop_duplicates('App',keep='last',inplace=True)

Shape of the dataset after the duplicate records are removed

In [15]:
android_sort.shape

(9659, 13)

# Deleting Non English apps
There are few apps which are in lanuguages other than english.We're not interested in keeping these kind of apps, so we'll remove them.

For each character in the App name if the unicode of the charcter is > 127 then it is an English app

In [16]:
def fun(string):
    for i in string:
        if ord(i)>127:
            return False
        else:
            return True

In [17]:
android_eng =android_sort[android_sort['App'].apply(fun)]

In [18]:
ios_eng =ios[ios['track_name'].apply(fun)]

# Removing the paid apps

As mentioned, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps, and we'll need to isolate only the free apps for our analysis. Below, we isolate the free apps for both our data sets.

In [19]:
android_final =android_eng[android_eng['Price']=='0']

In [20]:
ios_final=ios_eng[ios_eng['price']==0]

In [21]:
android_final.shape

(8871, 13)

In [22]:
ios_final.shape

(3300, 16)

# Most popular apps by genre

Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

    Build a minimal Android version of the app, and add it to Google Play.
    If the app has a good response from users, we then develop it further.
    If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.

In [23]:
ios_final['prime_genre'].value_counts(normalize=True)*100

Games                57.484848
Entertainment         7.969697
Photo & Video         4.878788
Education             3.575758
Social Networking     3.393939
Utilities             2.696970
Shopping              2.575758
Sports                2.121212
Health & Fitness      2.060606
Music                 2.030303
Productivity          1.757576
Lifestyle             1.696970
News                  1.363636
Finance               1.181818
Travel                1.181818
Weather               0.878788
Food & Drink          0.878788
Book                  0.575758
Business              0.515152
Reference             0.515152
Navigation            0.272727
Medical               0.212121
Catalogs              0.181818
Name: prime_genre, dtype: float64



We can see that among the free English apps, more than a half (57.48%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.57% of the apps are designed for education, followed by social networking apps which amount for 3.39% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Let's continue by examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).


In [24]:
android_final['Category'].value_counts(normalize=True)*100

FAMILY                 19.017022
GAME                    9.660692
TOOLS                   8.431969
BUSINESS                4.599256
LIFESTYLE               3.934168
PRODUCTIVITY            3.900349
FINANCE                 3.686168
MEDICAL                 3.517078
SPORTS                  3.393079
PERSONALIZATION         3.325442
COMMUNICATION           3.235261
HEALTH_AND_FITNESS      3.077443
PHOTOGRAPHY             2.953444
NEWS_AND_MAGAZINES      2.818172
SOCIAL                  2.649081
TRAVEL_AND_LOCAL        2.333446
SHOPPING                2.231992
BOOKS_AND_REFERENCE     2.175628
DATING                  1.859993
VIDEO_PLAYERS           1.792357
MAPS_AND_NAVIGATION     1.397813
FOOD_AND_DRINK          1.228723
EDUCATION               1.172359
LIBRARIES_AND_DEMO      0.935633
ENTERTAINMENT           0.935633
AUTO_AND_VEHICLES       0.924360
HOUSE_AND_HOME          0.811633
WEATHER                 0.800361
EVENTS                  0.710179
PARENTING               0.653816
ART_AND_DE



The trend seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:


In [25]:
android_final['Genres'].value_counts(normalize=True)*100

Tools                                  8.420697
Entertainment                          6.064705
Education                              5.377071
Business                               4.599256
Lifestyle                              3.922895
Productivity                           3.900349
Finance                                3.686168
Medical                                3.517078
Sports                                 3.460715
Personalization                        3.325442
Communication                          3.235261
Action                                 3.088716
Health & Fitness                       3.077443
Photography                            2.953444
News & Magazines                       2.818172
Social                                 2.649081
Travel & Local                         2.322173
Shopping                               2.231992
Books & Reference                      2.175628
Simulation                             2.062902
Dating                                 1

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

# Most Popular Apps by Genre on the ios App Store¶

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

In [26]:
ios_final.groupby('prime_genre').mean()['rating_count_tot'].sort_values(ascending=False)

prime_genre
Reference            79350.470588
Social Networking    67731.214286
Navigation           57393.555556
Music                56482.029851
Weather              50477.137931
Food & Drink         29886.931034
Book                 29310.736842
Finance              29048.615385
Travel               28959.564103
Photo & Video        28264.888199
Shopping             26586.788235
Sports               22680.200000
Health & Fitness     22278.352941
Games                22199.308382
News                 20303.666667
Productivity         20303.310345
Utilities            17058.719101
Lifestyle            15023.089286
Entertainment        13549.794677
Business              7491.117647
Education             7003.983051
Catalogs              2669.333333
Medical                525.428571
Name: rating_count_tot, dtype: float64

Reference apps have highest number of user reviews and are influenced by Bible and Dictionary.com

In [27]:
ios_final[ios_final['prime_genre']=='Reference']

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
6,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
90,308750436,Dictionary.com Dictionary & Thesaurus,111275008,USD,0.0,200047,177,4.0,4.0,7.1.3,4+,Reference,37,0,1,1
335,364740856,Dictionary.com Dictionary & Thesaurus for iPad,165748736,USD,0.0,54175,10176,4.5,4.5,4.0,4+,Reference,24,5,9,1
551,414706506,Google Translate,65281024,USD,0.0,26786,27,3.5,4.5,5.10.0,4+,Reference,37,5,59,1
715,388389451,"Muslim Pro: Ramadan 2017 Prayer Times, Azan, Q...",100551680,USD,0.0,18418,706,4.5,5.0,9.2.1,4+,Reference,37,5,16,1
738,1130829481,New Furniture Mods - Pocket Wiki & Game Tools ...,52959232,USD,0.0,17588,17588,4.5,4.5,1.0,4+,Reference,38,3,2,1
757,399452287,Merriam-Webster Dictionary,155593728,USD,0.0,16849,1125,4.5,4.5,4.1,4+,Reference,38,1,12,1
913,475772902,Night Sky,596499456,USD,0.0,12122,60,4.5,4.5,4.4.1,4+,Reference,37,5,29,1
1106,1135575003,City Maps for Minecraft PE - The Best Maps for...,90124288,USD,0.0,8535,8535,4.0,4.0,1.0,4+,Reference,37,4,1,1
1451,1132715891,LUCKY BLOCK MOD ™ for Minecraft PC Edition - T...,86874112,USD,0.0,4693,4693,4.0,4.0,1.0,12+,Reference,37,4,1,1


One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

Other genres that seem popular include Navigation(heavily influenced by Waze and Google Maps),Social Networking(influenced by Facebook, Pinterest, Skype etc) ,weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

    Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

    Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside our scope.

    Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

Now let's analyze the Google Play market a bit.

# Most Popular Apps by Genre on Google Play
For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [28]:
android_final['Installs'].value_counts()

1,000,000+        1391
100,000+          1025
10,000,000+        931
10,000+            904
1,000+             749
100+               615
5,000,000+         606
500,000+           494
50,000+            428
5,000+             400
10+                315
500+               288
50,000,000+        202
100,000,000+       188
50+                170
5+                  70
1+                  46
500,000,000+        24
1,000,000,000+      20
0+                   4
0                    1
Name: Installs, dtype: int64

One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).





In [29]:

android_final['Installs']=android_final['Installs'].str.replace('+','')
android_final['Installs']=android_final['Installs'].str.replace(',','').astype(int)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [30]:
pd.set_option('display.float_format', lambda x: '%.8f' % x)
android_final.groupby('Category').mean()['Installs'].sort_values(ascending=False)

Category
COMMUNICATION         38456119.16724738
VIDEO_PLAYERS         24727872.45283019
SOCIAL                23348348.51914893
PHOTOGRAPHY           17737667.61450382
PRODUCTIVITY          16738957.55491330
GAME                  15527204.72578763
TRAVEL_AND_LOCAL      13984077.71014493
ENTERTAINMENT         11848915.66265060
TOOLS                 10696563.46791444
NEWS_AND_MAGAZINES     9472829.04000000
BOOKS_AND_REFERENCE    8631794.09326425
SHOPPING               7072366.59090909
PERSONALIZATION        5183850.80677966
WEATHER                5074486.19718310
HEALTH_AND_FITNESS     4188821.98534799
MAPS_AND_NAVIGATION    4009361.20967742
FAMILY                 3683619.85240071
SPORTS                 3638640.14285714
ART_AND_DESIGN         1986335.08771930
FOOD_AND_DRINK         1942465.60550459
EDUCATION              1820673.07692308
BUSINESS               1708215.90686275
LIFESTYLE              1439955.38395415
FINANCE                1361355.14373089
HOUSE_AND_HOME         1348645.

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [31]:
android_final[(android_final['Category']=="COMMUNICATION") & (android_final['Installs']>100000000)]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
451,Gmail,COMMUNICATION,4.3,4604483,Varies with device,1000000000,Free,0,Everyone,Communication,"August 2, 2018",Varies with device,Varies with device
411,Google Chrome: Fast & Secure,COMMUNICATION,4.3,9643041,Varies with device,1000000000,Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device
4039,Google Duo - High Quality Video Calls,COMMUNICATION,4.6,2083237,Varies with device,500000000,Free,0,Everyone,Communication,"July 31, 2018",37.1.206017801.DR37_RC14,4.4 and up
464,Hangouts,COMMUNICATION,4.0,3419513,Varies with device,1000000000,Free,0,Everyone,Communication,"July 21, 2018",Varies with device,Varies with device
474,LINE: Free Calls & Messages,COMMUNICATION,4.2,10790289,Varies with device,500000000,Free,0,Everyone,Communication,"July 26, 2018",Varies with device,Varies with device
382,Messenger – Text and Video Chat for Free,COMMUNICATION,4.0,56646578,Varies with device,1000000000,Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device
4234,Skype - free IM & video calls,COMMUNICATION,4.1,10484169,Varies with device,1000000000,Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device
420,UC Browser - Fast Download Private & Secure,COMMUNICATION,4.5,17714850,40M,500000000,Free,0,Teen,Communication,"August 2, 2018",12.8.5.1121,4.0 and up
4676,Viber Messenger,COMMUNICATION,4.3,11335481,Varies with device,500000000,Free,0,Everyone,Communication,"July 18, 2018",Varies with device,Varies with device
381,WhatsApp Messenger,COMMUNICATION,4.4,69119316,Varies with device,1000000000,Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device




We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,631,794. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:


In [32]:
android_final[android_final['Category']=='BOOKS_AND_REFERENCE']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
5076,50000 Free eBooks & Free AudioBooks,BOOKS_AND_REFERENCE,4.10000000,52312,11M,5000000,Free,0,Teen,Books & Reference,"May 19, 2018",5.3.4,4.4 and up
5259,A-J Media Vault,BOOKS_AND_REFERENCE,,1,24M,50,Free,0,Everyone,Books & Reference,"January 4, 2017",5.62.1,4.0.3 and up
4915,"AC Air condition Troubleshoot,Repair,Maintenance",BOOKS_AND_REFERENCE,4.20000000,27,3.1M,5000,Free,0,Everyone,Books & Reference,"February 7, 2018",1.1,4.0 and up
5016,AE Bulletins,BOOKS_AND_REFERENCE,4.50000000,14,30M,1000,Free,0,Everyone,Books & Reference,"July 18, 2018",2.2.5,4.0.3 and up
9244,AP Stamps and Registration,BOOKS_AND_REFERENCE,3.40000000,82,2.7M,10000,Free,0,Everyone,Books & Reference,"March 27, 2018",2.0,3.0 and up
5738,AW Tozer Devotionals - Daily,BOOKS_AND_REFERENCE,4.20000000,8,3.9M,5000,Free,0,Everyone,Books & Reference,"October 7, 2016",1.0,2.3 and up
5829,AY Sing,BOOKS_AND_REFERENCE,4.80000000,111,10M,5000,Free,0,Everyone,Books & Reference,"May 21, 2018",1.6.2,4.1 and up
4093,Aab e Hayat Full Novel,BOOKS_AND_REFERENCE,4.30000000,1476,41M,100000,Free,0,Everyone,Books & Reference,"February 21, 2017",1.0,3.0 and up
5022,Ae Allah na Dai (Rasa),BOOKS_AND_REFERENCE,4.70000000,263,9.1M,10000,Free,0,Everyone,Books & Reference,"January 1, 2015",2.0,2.1 and up
5091,Ag PhD Deficiencies,BOOKS_AND_REFERENCE,3.90000000,20,9.7M,10000,Free,0,Everyone,Books & Reference,"June 21, 2016",1.0,4.0.3 and up


The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [35]:
android_final[(android_final['Category']=="BOOKS_AND_REFERENCE") & (android_final['Installs']>=100000000)]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
4083,Amazon Kindle,BOOKS_AND_REFERENCE,4.2,814151,Varies with device,100000000,Free,0,Teen,Books & Reference,"July 27, 2018",Varies with device,Varies with device
5651,Audiobooks from Audible,BOOKS_AND_REFERENCE,4.5,568922,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 1, 2018",Varies with device,Varies with device
3941,Bible,BOOKS_AND_REFERENCE,4.7,2440695,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 2, 2018",Varies with device,Varies with device
152,Google Play Books,BOOKS_AND_REFERENCE,3.9,1433233,Varies with device,1000000000,Free,0,Teen,Books & Reference,"August 3, 2018",Varies with device,Varies with device
4715,Wattpad 📖 Free Books,BOOKS_AND_REFERENCE,4.6,2915189,Varies with device,100000000,Free,0,Teen,Books & Reference,"August 1, 2018",Varies with device,Varies with device


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [36]:
android_final[(android_final['Category']=="BOOKS_AND_REFERENCE") & (android_final['Installs']<100000000)]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
5076,50000 Free eBooks & Free AudioBooks,BOOKS_AND_REFERENCE,4.10000000,52312,11M,5000000,Free,0,Teen,Books & Reference,"May 19, 2018",5.3.4,4.4 and up
5259,A-J Media Vault,BOOKS_AND_REFERENCE,,1,24M,50,Free,0,Everyone,Books & Reference,"January 4, 2017",5.62.1,4.0.3 and up
4915,"AC Air condition Troubleshoot,Repair,Maintenance",BOOKS_AND_REFERENCE,4.20000000,27,3.1M,5000,Free,0,Everyone,Books & Reference,"February 7, 2018",1.1,4.0 and up
5016,AE Bulletins,BOOKS_AND_REFERENCE,4.50000000,14,30M,1000,Free,0,Everyone,Books & Reference,"July 18, 2018",2.2.5,4.0.3 and up
9244,AP Stamps and Registration,BOOKS_AND_REFERENCE,3.40000000,82,2.7M,10000,Free,0,Everyone,Books & Reference,"March 27, 2018",2.0,3.0 and up
5738,AW Tozer Devotionals - Daily,BOOKS_AND_REFERENCE,4.20000000,8,3.9M,5000,Free,0,Everyone,Books & Reference,"October 7, 2016",1.0,2.3 and up
5829,AY Sing,BOOKS_AND_REFERENCE,4.80000000,111,10M,5000,Free,0,Everyone,Books & Reference,"May 21, 2018",1.6.2,4.1 and up
4093,Aab e Hayat Full Novel,BOOKS_AND_REFERENCE,4.30000000,1476,41M,100000,Free,0,Everyone,Books & Reference,"February 21, 2017",1.0,3.0 and up
5022,Ae Allah na Dai (Rasa),BOOKS_AND_REFERENCE,4.70000000,263,9.1M,10000,Free,0,Everyone,Books & Reference,"January 1, 2015",2.0,2.1 and up
5091,Ag PhD Deficiencies,BOOKS_AND_REFERENCE,3.90000000,20,9.7M,10000,Free,0,Everyone,Books & Reference,"June 21, 2016",1.0,4.0.3 and up




This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

# Conclusion:

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.
