# Which free applications are the most profitable to develop?
My goal in this project is to define the most profitable applications in AppStore and Play Market. A company want to develop an application and my job is to help developers' team to make data-driven decision with respect to the kind of application they build. 
<br>
<br>
The company want to build a free application for mobile devices and the main revenue source is in-apps ads. Thus, the more users interact with an app, the more revenue. My goal for this task is to prompt developers what kind of free mobile applications are the most attractive for users.

*The function below helps us to print rows in a readable way*

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        

## Selection data sets
Due to the fact that there are more than 4 billions application both in AppStore and Play Market, I will analyze a sample of data. There are two suitable data sets for our task:
- Data set's [source](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) of Apple Store applications
<br>
- Data set's [source](https://www.kaggle.com/lava18/google-play-store-apps) of Play Market applications


In [8]:
from csv import reader
apple = list(reader(open('AppleStore.csv', encoding='utf8')))
header_apple = apple[0]
apple = apple[1:]
google = list(reader(open('googleplaystore.csv', encoding='utf8')))
header_google = google[0]
google = google[1:]


In [18]:
print(header_google, '\n')

explore_data(google, 0, 5, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

We see that Google has 10841 apps and 13 parameters. At first sight, these parameters will be helpful for the purpose of this analysis:
- **App** *Aplication name*
- __Category__ *Category the app belongs to*
- **Reviews** *Number of user reviews for the app (as when scraped)*
- **Installs** *Number of user downloads/installs for the app (as when scraped)*
- **Type** *Paid or Free*
- **Price** *Price of the app (as when scraped)*
- **Genres** *An app can belong to multiple genres (apart from its main category).*



In [19]:
print(header_apple, '\n')
print(explore_data(apple, 0, 5, True))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16
None


On the other hand, there are 7197 apps in AppStore with 16 different parameters.  At first sight, these parameters seem interesting within our research:
<br>
- **track_name** *Aplication name*
- **currency** *Currency Type*
- **price** *Price of the app (as when scraped)*
- **rating_count_tot** *User Rating counts (for all version)*
- **rating_count_ver** *User Rating counts (for current version)*
- **prime genre** *Primary Genre*

## Cleansing Data
### Removing wrong data

The Google Play data set has a dedicated discussion section, and we can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for a 10472th row.

In [24]:
print(header_google, '\n')
print(google[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


It seems the lack of category column for this row. Thus, let's delete this row and verify it:

In [25]:
del google[10472]
print(google[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


### Removing duplicate entries

In [35]:
for app in google:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


As shown in example above, there are duplicates in this data set.
Let's try to find all duplicates.

In [41]:
duplicates = []
unique = []

for app in google:
    name = app[0]
    if name in unique:
        duplicates.append(name)
    else:
        unique.append(name)

print('Number of duplicates: {}'.format(len(duplicates)), '\n')
print('Example of duplicates:{}'.format(duplicates[:15]))

Number of duplicates: 1181 

Example of duplicates:['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In `Instagram` example, it's seen that the difference  is only in quantity of review. ![Image](https://sun9-3.userapi.com/LRk9oGJbvX0y3OzRZJwwc0Kyyvwhs7uKZbh6XA/bQUkZFcfYv4.jpg)
Likely , the data set was scrabbed couple times. Obviously, the more reviews, the more recent data recording. This information we can use to keep not a random recording but the row with the highest number of reviews. 
<br>
<br>
In the cell below, we will create a dictionary where keys are names of application and values are the highest numbers of reviews:

In [50]:
app_dict = {}
for app in google:
    name = app[0]
    reviews = float(app[3])
    if name in app_dict and app_dict[name] < reviews:
        app_dict[name] = reviews
    if name not in app_dict:
        app_dict[name] = reviews



In the next cell, we will create a new list for google store without duplicates where the only recording will be done with highest number of reviews:

In [52]:
google_clean = []
already_added = []
for app in google:
    name = app[0]
    reviews = float(app[3])
    if reviews == app_dict[name] and name not in already_added:
        google_clean.append(app)
        already_added.append(name)

Let's check for duplicates in AppStore data set:

In [60]:
duplicates = []
unique = []
for app in apple:
    name = app[1]
    if name in unique:
        duplicates.append(name)
    else:
        unique.append(name)

print(header_apple, '\n')

for name in duplicates:
    for app in apple:
        if name == app[1]:
            print(app)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


As shown above, there are a couple of duplicates in Apple Store and they differ in versions. Assumingly, developers built fundamently a new game but would like to keep the old version as well. So let's keep them intact.  

## Removing Non-English applications


In [62]:
print(apple[813][1])
print(apple[6731][1])
print(google_clean[4412][0])
print(google_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


Since our targeted auditory is English speaking users, we would like to analyze the apps which suit our purpose. So we will remove any Non-English speaking apps. In other words, if a name of application contains more than 3 characters whose ASCII number is higher that 127, this application will be eliminated from our research (Otherwise, we will lose, for example, such recording as `Instachat 😜` or `Docs To Go™ Free Office Suite`.   

In [82]:
def is_english(a_string):
    non_english_char = 0
    for character in a_string:
        if ord(character) > 127:
            non_english_char += 1
    if non_english_char > 3:
        return False
    return True

Removing non-English applications from both data sets:

In [98]:
apple_english = []
google_english = []
for app in apple:
    name = app[1]
    if is_english(name):
        apple_english.append(app)
for app in google_clean:
    name = app[0]
    if is_english(name):
        google_english.append(app)
        
print('There are {} English-speaking applications in AppStore'.format(len(apple_english)))
print('There are {} English-speaking applications in Play Market'.format(len(google_english)))

There are 6183 English-speaking applications in AppStore
There are 9614 English-speaking applications in Play Market


## Removing non-free applications

In [118]:
apple_final = []
google_final = []

for app in apple_english:
    price = app[4]
    if price == '0.0':
        apple_final.append(app)
for app in google_english:
    price = app[7]
    if price == '0':
        google_final.append(app)

print(explore_data(apple_final, 0 , 3, True), '\n')
print(explore_data(google_final, 0 , 3, True))



['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16
None 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 

## Analysis
### Most common genres

To minimize risks and overhead for company, our validation strategy for an app idea is comprised of these steps:
1. Build a MVP and add it to Google Play Market.
2. If the app has a good response from users, develop it further.
3. If the app is profitable for some time limit, we build an iOS verion of the app and add it to the App Store.

Due to the fact that our goal to add the app on both aplication markets, we need to figure out app that are successful on both markets.
<br>
Create a function which return a frequency table for any given column and sort it descending:

In [155]:
def freq_table(dataset, index):
    freq_dict = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in freq_dict:
            freq_dict[value] += 1
        else:
            freq_dict[value] = 1
    table_percentage = {}        
    for key in freq_dict:
        percentage = (freq_dict[key] / total) * 100
        table_percentage[key] = round(percentage, 2)
        
    return table_percentage

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        

### Most common genres for iOS

In [159]:
# prime_genre - -5th index
display_table(apple_final, -5)

Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


As shown in the table above, more than a half (58.16%) of all free English-speaking aplications are games. Almost 8% are Entertainment apps, followed by Photo&Video with close to 5%. Only 3.66% of application relate to Education, followed by Social Networking that has 3.29% share of all applications. 
The general representation is that App Store is dominated by entertaining application while apps for practical purposes are more rare. However, it might be that second group's applications have more users than the first's ones. Let's check Play Market.
### Most common genres for Play Market

In [161]:
# Play Market
# Category - 1th index
display_table(google_final, 1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

In [162]:
# Genres - 9th index
display_table(google_final, 9)

Tools : 8.45
Entertainment : 6.07
Education : 5.35
Business : 4.59
Productivity : 3.89
Lifestyle : 3.89
Finance : 3.7
Medical : 3.53
Sports : 3.46
Personalization : 3.32
Communication : 3.24
Action : 3.1
Health & Fitness : 3.08
Photography : 2.94
News & Magazines : 2.8
Social : 2.66
Travel & Local : 2.32
Shopping : 2.25
Books & Reference : 2.14
Simulation : 2.04
Dating : 1.86
Arcade : 1.85
Video Players & Editors : 1.77
Casual : 1.76
Maps & Navigation : 1.4
Food & Drink : 1.24
Puzzle : 1.13
Racing : 0.99
Role Playing : 0.94
Libraries & Demo : 0.94
Auto & Vehicles : 0.93
Strategy : 0.91
House & Home : 0.82
Weather : 0.8
Events : 0.71
Adventure : 0.68
Comics : 0.61
Beauty : 0.6
Art & Design : 0.6
Parenting : 0.5
Card : 0.45
Casino : 0.43
Trivia : 0.42
Educational;Education : 0.39
Board : 0.38
Educational : 0.37
Education;Education : 0.34
Word : 0.26
Casual;Pretend Play : 0.24
Music : 0.2
Racing;Action & Adventure : 0.17
Puzzle;Brain Games : 0.17
Entertainment;Music & Video : 0.17
Casual;

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column has more categories. We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

## Most Popular Apps by Genre on the App Store

In [170]:
prime_genre = freq_table(apple_final, -5)
for genre in prime_genre:
    total = 0 # total user reviews  
    len_genre = 0 # total amount of aplications
    for app in apple_final:
        genre_app = app[-5]
        if genre_app == genre:
            total += float(app[5]) # user reviews of application
            len_genre += 1
    n_avg_reviews = total / len_genre
    print (genre, n_avg_reviews)

Music 57326.530303030304
Business 7491.117647058823
Weather 52279.892857142855
Sports 23008.898550724636
Utilities 18684.456790123455
Travel 28243.8
Shopping 26919.690476190477
Photo & Video 28441.54375
Reference 74942.11111111111
Social Networking 71548.34905660378
Education 7003.983050847458
Book 39758.5
Entertainment 14029.830708661417
Navigation 86090.33333333333
Finance 31467.944444444445
News 21248.023255813954
Medical 612.0
Catalogs 4004.0
Games 22788.6696905016
Productivity 21028.410714285714
Food & Drink 33333.92307692308
Health & Fitness 23298.015384615384
Lifestyle 16485.764705882353


The most interesting genres for us are:
- Navigation 
- Social Networking 
- Music 
- Reference
- Weather 
- Food & Drink 
- Finance 
- Reference
- Book 


In [171]:
#Navigation total = 86090

for app in apple_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


As shown above, more than a half of  Navigation genre's reviews belong to Waze and Google Maps. These are the giants of sample that heavily skew the figure. Hence, this genre is not so popular as it seems. The same situation is in Music (Pandora, Spotify, etc.), Social Networking(Facebook, Skype, Messanger, etc.) and References(Bible, Dictionary.com).

In [173]:
for app in apple_final:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

In [175]:
for app in apple_final:
    if app[-5] == 'Music':
        print(app[1], ':', app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

- Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.

- Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.

- Finance apps — these apps involve banking, paying bills, money transfer, etc. It requiers finance expert or online banking, which is outside of our company. 

However, Book genre looks very potetnial:
- No skews 
- Pretty popular among users
- Does not involve large cost of development
- Enormous amount of possible improvements

In [181]:
for app in apple_final:
    if app[-5] == 'Book':
        print(app[1], ':', app[5])

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


Let's move on Play Market data set
## Most Popular Apps by Genre on the Play Market
First of all, we need to rearrange the amount of installs to appropriate type:

In [192]:
categories = freq_table(google_final, 1)
for category in categories:
    total = 0
    len_cat = 0
    for app in google_final:
        category_app = category
        if category_app == app[1]:
            installs = app[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            total += float(installs)
            len_cat += 1
    n_avg_installs = total / len_cat
    print(category, ':', round(n_avg_installs))

SPORTS : 3638640
NEWS_AND_MAGAZINES : 9549178
AUTO_AND_VEHICLES : 647318
WEATHER : 5074486
MEDICAL : 120551
ART_AND_DESIGN : 1986335
PHOTOGRAPHY : 17840110
VIDEO_PLAYERS : 24727872
FINANCE : 1387692
DATING : 854029
COMICS : 817657
LIFESTYLE : 1437816
PRODUCTIVITY : 16787331
SHOPPING : 7036877
BEAUTY : 513152
GAME : 15588016
FAMILY : 3695642
FOOD_AND_DRINK : 1924898
PARENTING : 542604
EVENTS : 253542
HOUSE_AND_HOME : 1331541
TRAVEL_AND_LOCAL : 13984078
TOOLS : 10801391
HEALTH_AND_FITNESS : 4188822
COMMUNICATION : 38456119
ENTERTAINMENT : 11640706
LIBRARIES_AND_DEMO : 638504
SOCIAL : 23253652
EDUCATION : 1833495
BOOKS_AND_REFERENCE : 8767812
MAPS_AND_NAVIGATION : 4056942
PERSONALIZATION : 5201483
BUSINESS : 1712290


On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts).
<br>
<br>
The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.
Let's take a look at some of the apps from this genre and their number of installs:



In [195]:
for app in google_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E


This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

### Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.