# Apps for Apple Store and Google Play

We will work on apps available in Google Play and App Store. Only free apps will be in area of interest. We will try to understand which applications are demanded more by analyzing the data. Based on the results of this study, developers can decide which apps to create.

The data to be used for [Google Play](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv) and [App Store](https://dq-content.s3.amazonaws.com/350/AppleStore.csv) can be downloaded .


## Reading and Forming the Data
Pandas will not be used in this project. Instead of pandas, the data will be used in the lists-in-list structure.

In [2]:
from csv import reader

opened_file = open("data/AppleStore.csv", encoding='utf-8')
read_file = reader(opened_file)
apl = list(read_file)
apl_header = apl[0]
apl = apl[1:]

opened_file = open("data/googleplaystore.csv", encoding='utf-8')
read_file = reader(opened_file)
ggl = list(read_file)
ggl_header = ggl[0]
ggl = ggl[1:]

## Exploring the Data
A function is created to print the data and the number of rows and columns of the data.

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Using the function to display the first rows of data sets and number of rows and columns.

In [4]:
print(apl_header)
print("\n")
explore_data(apl, 1, 4, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


In [5]:
print(ggl_header)
print("\n")
explore_data(ggl, 1, 4, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


## Dealing with Missing and Duplicating Data 
One of apps in Google Store has a missing value. This app can be seen on the [link](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015). Its category value is missing.

In [6]:
print(ggl_header) 
print(ggl[10472]) # This row has a missing data in Category column.

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Since there is missing value only in one app, deleting it will not cause serious data loss.

In [7]:
del ggl[10472]

In addition to wrong data, there are also repeated apps. They should also be removed. It can be seen below that there are 1181 repeated apps. Repeating rows are seen only App Store data. 

*Instagram* is one of the repeated apps. Code shows that there are four different *Instagram* app entry in the Google Play data.

In [8]:
for app in ggl: 
    name = app[0]
    if name == "Instagram":
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Checking how many repeated apps. 

In [9]:
duplicate_apps = []
unique_apps = []

for app in ggl:
    name = app[0]
    if name not in unique_apps:
        unique_apps.append(name)
    else:
        duplicate_apps.append(name)

In [10]:
print(len(unique_apps))
print(len(duplicate_apps))

9659
1181


We need to define a strategy for choosing one of the repeated apps. Examine the repeating Instagram lines above. Review numbers are different for repeated apps. We can choose keeping the row having highest review number because the row with the highest review counts is probably the most recent.

First, we write a for loop to find the row with the highest review counts for all apps and try the code in  on *Instagram*.

In [11]:
reviews_max = {}

for app in ggl:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
reviews_max["Instagram"]

66577446.0

Then, we create a new list holding unique apps with max review number. 

In [12]:
ggl_clean = []
already_added = []

for app in ggl:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        ggl_clean.append(app)
        already_added.append(name)

Let's check row number of new data includes unique apps. 

In [13]:
explore_data(ggl_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing non-English Apps
We are interested in only apps which are written in English so apps in another languages are removed. 

For this purpose, we create a function. This function removes all apps having a non-ASCII characters 

In [14]:
def eng_apps(string):
    for char in string:
        if ord(char) > 127:
            return False
    return True

Let's try it on some examples. 

In [15]:
print(eng_apps('Instagram'))
print(eng_apps('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_apps('Docs To Go™ Free Office Suite'))
print(eng_apps('Instachat 😜'))

True
False
False
False


This does not make sense because function remove some apps which we want to keep. Therefore, we have to modify data function above. New version of the function removes apps having more than three non-ASCII characters in their names.   

In [16]:
def eng_apps(string):
    n_non_eng_char = 0
    for char in string:
        if ord(char) > 127:
            n_non_eng_char += 1
    if n_non_eng_char > 3:
        return False
    return True

When we try new version of the function it can be seen that results are more logical. 

In [17]:
print(eng_apps('Instagram'))
print(eng_apps('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_apps('Docs To Go™ Free Office Suite'))
print(eng_apps('Instachat 😜'))

True
False
True
True


Now, we will apply the function to all apps. 

In [18]:
apl_eng = []
ggl_eng = []

for app in ggl_clean:
    name = app[0]
    if eng_apps(name):
        ggl_eng.append(app)
        
for app in apl:
    name = app[1]
    if eng_apps(name):
        apl_eng.append(app)

Let's check the row numbers of data sets. 

In [19]:
explore_data(apl_eng, 0, 0, True)
print("\n")
explore_data(ggl_eng, 0, 0, True)

Number of rows: 6183
Number of columns: 16


Number of rows: 9614
Number of columns: 13


## Keeping Free Apps Only

At the beginning of the project, we stated that we would only examine free applications. We have to remove paid apps. 

In [20]:
apl_final = []
ggl_final = []

for app in apl_eng:
    price = app[4]
    if price == "0.0":
        apl_final.append(app)

for app in ggl_eng:
    price = app[7]
    if price == "0":
        ggl_final.append(app)

Let's check row numbers of the recent version of data sets. 

In [21]:
explore_data(apl_final, 0, 0, True)
print("\n")
explore_data(ggl_final, 0, 0, True)

Number of rows: 3222
Number of columns: 16


Number of rows: 8864
Number of columns: 13


## Data Analysis
So far, we've cleared out wrong data, duplicate apps, non-English apps and finally paid apps. Data sets are ready to analyze now. 

### Most Common Apps by Genres
Firstly, we create a function which can be form frequency table for any column of a dataset. 

In [22]:
def freq_table(dataset, index):
    frq_table = {}

    for row in dataset:
        if row[index] in frq_table:
            frq_table[row[index]] += 1/len(dataset)*100
        else: 
            frq_table[row[index]] = 1/len(dataset)*100
    
    return frq_table

After that, again we create a function. This function organizes the frequency table and sort it from high frequency to low one. 

In [23]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We can generate a frequency table for *Category* and *Genre* columns in Google Store apps.

In [24]:
display_table(ggl_final, 1)

FAMILY : 18.907942238266926
GAME : 9.724729241877363
TOOLS : 8.46119133574016
BUSINESS : 4.591606498194979
LIFESTYLE : 3.90342960288811
PRODUCTIVITY : 3.8921480144404565
FINANCE : 3.7003610108303455
MEDICAL : 3.5311371841155417
SPORTS : 3.3957581227436986
PERSONALIZATION : 3.3167870036101235
COMMUNICATION : 3.2378158844765483
HEALTH_AND_FITNESS : 3.079873646209398
PHOTOGRAPHY : 2.944494584837555
NEWS_AND_MAGAZINES : 2.7978339350180583
SOCIAL : 2.6624548736462152
TRAVEL_AND_LOCAL : 2.335288808664261
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.14350180505415
DATING : 1.861462093862813
VIDEO_PLAYERS : 1.7937725631768928
MAPS_AND_NAVIGATION : 1.398916967509025
FOOD_AND_DRINK : 1.2409747292418778
EDUCATION : 1.1620036101083042
ENTERTAINMENT : 0.9589350180505433
LIBRARIES_AND_DEMO : 0.9363718411552363
AUTO_AND_VEHICLES : 0.9250902527075828
HOUSE_AND_HOME : 0.8235559566787015
WEATHER : 0.8009927797833946
EVENTS : 0.7107400722021667
PARENTING : 0.6543321299638993
ART_AND_DESIGN : 0.6

In [25]:
display_table(ggl_final, -4) # genres
# tools and entertainment

Tools : 8.449909747292507
Entertainment : 6.069494584837599
Education : 5.34747292418777
Business : 4.591606498194979
Productivity : 3.8921480144404565
Lifestyle : 3.8921480144404565
Finance : 3.7003610108303455
Medical : 3.5311371841155417
Sports : 3.46344765342962
Personalization : 3.3167870036101235
Communication : 3.2378158844765483
Action : 3.1024368231047053
Health & Fitness : 3.079873646209398
Photography : 2.944494584837555
News & Magazines : 2.7978339350180583
Social : 2.6624548736462152
Travel & Local : 2.3240072202166075
Shopping : 2.2450361010830324
Books & Reference : 2.14350180505415
Simulation : 2.041967509025268
Dating : 1.861462093862813
Arcade : 1.8501805054151597
Video Players & Editors : 1.771209386281586
Casual : 1.7599277978339327
Maps & Navigation : 1.398916967509025
Food & Drink : 1.2409747292418778
Puzzle : 1.1281588447653441
Racing : 0.9927797833935037
Role Playing : 0.9363718411552363
Libraries & Demo : 0.9363718411552363
Auto & Vehicles : 0.9250902527075828


Also, we can generate a frequency table for *prime genre* column in Google Store apps.

In [26]:
display_table(apl_final, -5) # prime genre

# 1. games, 2. Entertainment
# apps related with entertainment is much more than apps of practical use 

Games : 58.1626319056464
Entertainment : 7.883302296710134
Photo & Video : 4.965859714463075
Education : 3.6623215394165176
Social Networking : 3.2898820608317867
Shopping : 2.6070763500931133
Utilities : 2.5139664804469306
Sports : 2.1415270018621997
Music : 2.048417132216017
Health & Fitness : 2.0173805090006227
Productivity : 1.7380509000620747
Lifestyle : 1.5828677839851035
News : 1.3345747982619496
Travel : 1.2414649286157668
Finance : 1.1173184357541899
Weather : 0.8690254500310364
Food & Drink : 0.8069522036002481
Reference : 0.558659217877095
Business : 0.5276225946617009
Book : 0.4345127250155184
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


### Most Popular Apps by Genres

We find the average number of downloads for apps under a particular title in App Store.

In [27]:
dwnl_count = {}

for genre in freq_table(apl_final, -5):
    n = 1
    for row in apl_final:
        if row[-5]==genre and genre not in dwnl_count:
            dwnl_count[genre] = int(row[5])
        elif row[-5]==genre and genre in dwnl_count:
            n += 1
            dwnl_count[genre] += int(row[5])
    dwnl_count[genre] /= n

dwnl_count

{'Social Networking': 71548.34905660378,
 'Photo & Video': 28441.54375,
 'Games': 22788.6696905016,
 'Music': 57326.530303030304,
 'Reference': 74942.11111111111,
 'Health & Fitness': 23298.015384615384,
 'Weather': 52279.892857142855,
 'Utilities': 18684.456790123455,
 'Travel': 28243.8,
 'Shopping': 26919.690476190477,
 'News': 21248.023255813954,
 'Navigation': 86090.33333333333,
 'Lifestyle': 16485.764705882353,
 'Entertainment': 14029.830708661417,
 'Food & Drink': 33333.92307692308,
 'Sports': 23008.898550724636,
 'Book': 39758.5,
 'Finance': 31467.944444444445,
 'Education': 7003.983050847458,
 'Productivity': 21028.410714285714,
 'Business': 7491.117647058823,
 'Catalogs': 4004.0,
 'Medical': 612.0}

Most popular apps in navigation category in Apple Store are below. 

Waze and Google Map have many more users than other navigation apps.
Considering that there are only 6 navigation applications, it can be seen that these two applications significantly increase the average of all navigation applications.

In [28]:
for app in apl_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Most popular apps in reference category in Apple Store are below.

A similar situation is valid for reference apps. Bible, the most popular app, has been downloaded more than sum of other apps in the same category. Developing an application in this category due to high average of reference applications may not be a good idea because the number of downloads for applications other than Bible is not very high.

In [29]:
for app in apl_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


After analyzing apps in Apple Store, we dive into apps in Google Play. 

The download information of applications in Googletore is presented differently. Download numbers are given in ranges. 

Let's examine frequencies of applications according to the download range.

In [30]:
display_table(ggl_final, 5)

1,000,000+ : 15.726534296029072
100,000+ : 11.552346570397244
10,000,000+ : 10.548285198556075
10,000+ : 10.198555956678813
1,000+ : 8.393501805054239
100+ : 6.915613718411619
5,000,000+ : 6.82536101083039
500,000+ : 5.561823104693188
50,000+ : 4.772111913357437
5,000+ : 4.512635379061404
10+ : 3.5424187725631953
500+ : 3.249097472924202
50,000,000+ : 2.3014440433213004
100,000,000+ : 2.1322202166064965
50+ : 1.9178700361010799
5+ : 0.7897111913357411
1+ : 0.5076714801444041
500,000,000+ : 0.2707581227436822
1,000,000,000+ : 0.22563176895306852
0+ : 0.04512635379061372
0 : 0.01128158844765343


We convert the data in this column from intervals to numerical values. Because we need to find the total number of downloads in a category. 

If an application has been downloaded 35000 times, the value in the *install* column will be +10,000. Since we do not know the exact number (35000), we will take the number of downloads as 10000 for this application. While this method may not be regardes as a very precise approach, it is suitable for the overall look.

In [32]:
for genre in freq_table(ggl_final, 1):
    total = 0
    len_category = 0
    for row in ggl_final:
        cat_app = row[1]
        if genre==cat_app:
            n_installs = row[5]
            n_installs = n_installs.replace(",", "")
            n_installs = n_installs.replace("+", "")
            total += float(n_installs)
            len_category += 1
    av_num_inst = total/len_category
    print(genre, av_num_inst)

ART_AND_DESIGN 1986335.0877192982
AUTO_AND_VEHICLES 647317.8170731707
BEAUTY 513151.88679245283
BOOKS_AND_REFERENCE 8767811.894736841
BUSINESS 1712290.1474201474
COMICS 817657.2727272727
COMMUNICATION 38456119.167247385
DATING 854028.8303030303
EDUCATION 1833495.145631068
ENTERTAINMENT 11640705.88235294
EVENTS 253542.22222222222
FINANCE 1387692.475609756
FOOD_AND_DRINK 1924897.7363636363
HEALTH_AND_FITNESS 4188821.9853479853
HOUSE_AND_HOME 1331540.5616438356
LIBRARIES_AND_DEMO 638503.734939759
LIFESTYLE 1437816.2687861272
GAME 15588015.603248259
FAMILY 3695641.8198090694
MEDICAL 120550.61980830671
SOCIAL 23253652.127118643
SHOPPING 7036877.311557789
PHOTOGRAPHY 17840110.40229885
SPORTS 3638640.1428571427
TRAVEL_AND_LOCAL 13984077.710144928
TOOLS 10801391.298666667
PERSONALIZATION 5201482.6122448975
PRODUCTIVITY 16787331.344927534
PARENTING 542603.6206896552
WEATHER 5074486.197183099
VIDEO_PLAYERS 24727872.452830188
NEWS_AND_MAGAZINES 9549178.467741935
MAPS_AND_NAVIGATION 4056941.774193

Communication apps are, on average, the most downloaded apps in the Google Store. However, as we have seen before, we can accept that a few giant companies like Watsapp, Facebook, Skype etc. have increased this average.

In [33]:
for row in ggl_final:
    if row[1] == 'COMMUNICATION' and (row[5] == '1,000,000,000+'
                                      or row[5] == '500,000,000+'
                                      or row[5] == '100,000,000+'):
        print(row[0], ':', row[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Let's check how the average download number will be affected if we remove apps having more than 100 million downloads.

In [34]:
under_100_m = []

for row in ggl_final:
    n_installs = row[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (row[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

From 3,845,6119 to 360,3485. It is almost one tenth.

The second highest category is "video players". Also in this category, there are big companies such as Youtube and Netflix. The same can be said in the *Social* app category, which includes apps such as Facebook and Instagram.

Let's look at the apps in the "Books and Reference" category in the Google Store. This category was a suitable category in the App Store to develop an app.

In [35]:
for row in ggl_final:
    if row[1] == 'BOOKS_AND_REFERENCE':
        print(row[0], ':', row[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

First, we'll look at apps that have been downloaded more than 100,000,000 times.

In [36]:
for row in ggl_final:
    if row[1] == 'BOOKS_AND_REFERENCE' and (row[5] == '1,000,000,000+'
                                            or row[5] == '500,000,000+'
                                            or row[5] == '100,000,000+'):
        print(row[0], ':', row[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


Now let's look at the applications that are quite popular, although not as much as the top five.

In [37]:
for row in ggl_final:
    if row[1] == 'BOOKS_AND_REFERENCE' and (row[5] == '1,000,000+'
                                            or row[5] == '5,000,000+'
                                            or row[5] == '10,000,000+'
                                            or row[5] == '50,000,000+'):
        print(row[0], ':', row[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

There seems to be a fairly large number of highly popular apps. This category can be recommended for app developers to earn money.

## Conclusion

In this project, we thought of ourselves as a data scientist in a company  developing apps. We would decide which apps can be the most profitable for the company. 

We had two data sets presenting data about the apps available in Google Store and App Store. We analyzed these data sets and researched a category that is popular but also has potential. 

Some of the popular categories were dominated by giant firms. We eliminated these categories because it is not possible to struggle with these companies and the number of downloads of the others was low. 

In some popular categories competition is tough as there are too many apps. It can be difficult to make a difference. *Game* category is a good example for this.

Under these conditions, "books and references" has been determined as the most appropriate category. There are many apps having moderate-popularity in this category. We feel ourself capable of being a rival to these companies. If the company in which I work can develop a good app in this category, it can make a great profit.