# Profitable App Profiles on the AppStore and Google Play Markets
In this project we are trying to understand what types of apps are most attractive to mobile users


In [2]:
from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
explore_data(android, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


In [5]:
explore_data(ios, 0, 5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


In [6]:
print(android_header, "\n", ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 
 ['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In the ios dataset I think the most influential variables are user_rating, price, prime_genre. Some of the tougher variables to parse their impact are going to be ratings per version, device support, and language support

The android dataset is a little less verbose but I think the same main variables are going to offer the most insight. These are category, reviews, and price
## Now let's look into the data and see if there are any missing or incorrect values
Our goal is to find the profiles that fit english speaking free app purchasers. As such we can cut out any cases that are of apps for non english speaking countries as their demographic is different.

In [7]:
# lets write a function to check for inconsistencies in our data
def badEntry(data, headerLen):
    badEntries = {}
    for row in data:
        rowLen = len(row)
        if rowLen != headerLen:
            badEntries[data.index(row)] = row
    return badEntries

In [8]:
# check if there are any missing values
missingAndroid = badEntry(android, len(android_header))
print(missingAndroid)
# row 10472 is missing some info we could go to the play store and insert this info but for this case we'll just delete it
del android[10472]
print(len(android))

{10472: ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']}
10840


In [9]:
# now let's check the ios dataset
missingIos = badEntry(ios, len(ios_header))
print(missingIos)

{}


Now we need to look for duplicates. I think there are going to be at least a few since this data includes different versions of apps and I think it might treat some as an entirely different app.

In [10]:
# let's write a function to check for our duplicates
def duplicateDetector(data):
    duplicates = {}
    duplicateList = []
    for row in data:
        name = row[0]
        if name in duplicates:
            duplicates[name] += 1
            duplicateList.append(name)
        else:
            duplicates[name] = 1
    return duplicateList
            

In [11]:
androidDuplicates = duplicateDetector(android)
iosDuplicates = duplicateDetector(ios)
print("Google Play duplicates:", len(androidDuplicates),androidDuplicates[:3], "\n","IOS Duplicates", len(iosDuplicates), iosDuplicates[:3])


Google Play duplicates: 1181 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business'] 
 IOS Duplicates 0 []


As we can see there are a lot of duplicate entries in the android dataset but none in the ios dataset. Now we need to come up with a way of choosing the entry we want to keep. We could go about doing this in a few ways.<br>
1. we could simply use the most recent version of the app as that is hopefully the best version yet.
2. we could use the entry with the most reviews as that might indicate it is the best version of the app.
3. we could do something more mathematical like adding together all duplicate entries and taking the average of all the reviews.
  
The third method does offer some good benefits especially if we were planning to use any predictive modeling on this dataset. I think the simplest and most effective method would be to use number two or keep the entry with the most reviews.

In [12]:
# we're going to do this by making a few functions to help us first
# the first function is going to find apps with the most reviews
def maxReviews(data):
    reviews = {}
    for app in data:
        name = app[0]
        nReviews = float(app[3]) #coerce to float since some are chars
        if name in reviews and reviews[name] < nReviews:
            #check if name already in reviews and check if the review count is less than the current review count
            reviews[name] = nReviews
        elif name not in reviews:
            reviews[name] = nReviews
    return reviews

In [13]:
# now we create the cleaning function using our chosen duplicate removal method
def duplicateRemover(data, maxReviews):
    clean = []
    duplicates = []
    # we'll use two lists in this case since we want our output to be an array anyway
    # the reason we need the duplicates list is because there may be some apps that have the same number of reviews for 
    # multiple entries
    for app in data:
        name = app[0]
        reviews = float(app[3])
        if(maxReviews[name] == reviews) and (name not in duplicates):
            clean.append(app)
            duplicates.append(name)
    return clean

In [14]:
reviewList = maxReviews(android)
androidClean = duplicateRemover(android, reviewList)
explore_data(androidClean, 0, 3, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Apps that don't fit our profile
Our fictional company operates in primarily english speaking markets. As such we're gonna want to gather data and insights for apps that are built to target those demographics to have the most relevant information. To make sure our dataset is geared towards that we're going to remove any apps that have names with non english characters in them. 

In [16]:
# this function iterates through a string and checks if there are three or more non english ascii chars
# we use three so as to avoid dropping any apps that include emojis or other symbols in their name
def isEnglish(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True
# this is not a perfect implementation something using regex would probably be the most effective at filtering accurately
# for this use case it is enough


In [17]:
androidEnglish = []
iosEnglish = []

for app in androidClean:
    name = app[0]
    if isEnglish(name):
        androidEnglish.append(app)
        
for app in ios:
    name = app[1]
    if isEnglish(name):
        iosEnglish.append(app)
        
explore_data(iosEnglish, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


## Remove any paid apps
Our final data cleaning step is to remove any paid apps from our dataset. This is because our company only makes free apps and we want to create the most relevant profile to understand what makes a great free application.

In [18]:
androidFinal = []
iosFinal = []

for app in androidEnglish:
    price = app[7]
    if price == '0':
        androidFinal.append(app)
        
for app in iosEnglish:
    price = app[4]
    if price == '0.0':
        iosFinal.append(app)
        
explore_data(androidFinal, 0, 3, True)
print("\n")
explore_data(iosFinal, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

This is the end of our data cleaning process. We ended up with 8864 rows in the android dataset and 3222 rows in the ios dataset. For our purposes of of analysis this is enough to do some EDA and this is large enough if we were to use and hypothesis testing on as well. The only time we may run into issues is if we chose to do any predictive modelling using this data as it may be on the smallerside to create a statistically significat model out of.

## Analysis
### Apps by Genre
Since we plan on putting this app on both the google play store and the app store we need to make sure our app caters to both markets. First we will investigate what are the most common apps by genre in both markets. We'll start by writing a few functions.

In [19]:
def freqTable(data, index):
    table = {}
    total = len(data)
    for row in data:
        value = row[index]
        if value in table:
            table[value] +=1
        else:
            table[value] = 1
    percentTable = {}
    for key in table:
        percent = (table[key] / total) * 100
        percentTable[key] = percent
    return percentTable

def printTable(data, index):
    table = freqTable(data, index)
    displayTable = []
    for key in table:
        keyVal = (table[key], key)
        displayTable.append(keyVal)
    sortedTable = sorted(displayTable, reverse = True)
    for row in sortedTable:
        print(row[1], ":", row[0])

In [20]:
# now let's go through the ios genres
printTable(iosFinal, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


As we can see by far the most popular type of app is games. Even more so the top 3 categories are all related to entertainment so clearly ios users like free entertainment apps. This is not to say that those are the most used apps since their might be a greater supply than demand. 
  
Google play is a little more verbose in the way they categorize apps. They use category as a way to put one label on an app, but apps can have multiple genres.

In [21]:
printTable(androidFinal, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [22]:
# and now by genre
printTable(androidFinal, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Using Genres might be a little too granular for some preliminary analysis we'll stick to category as that still gives us a good sense of what is happening in both markets. 
  
To get a better gauge for what types of apps users like the most we can take the average number of installs per genre to see if the app store is truly as dominated by games or if it just has a lot of games. This will be easy for the 

In [23]:
"""
Below we will create the average installs per genre table. The app store dataset does not have a variable that states 
total installs. Instead we will use the rating_count_tot variable which defines the total amount of ratings of an app 
over all versions. This is not a perfect proxy but it is the best we have given our data.
"""
iosGenres = freqTable(iosFinal, -5)

for genre in iosGenres:
    total = 0
    nGenre = 0
    for app in iosFinal:
        appGenre = app[-5]
        rating = float(app[5])
        total += rating
        nGenre += 1
    avgGenreRatings = total/nGenre
    print(genre, ":", avgGenreRatings)

Social Networking : 24824.74239602731
Photo & Video : 24824.74239602731
Games : 24824.74239602731
Music : 24824.74239602731
Reference : 24824.74239602731
Health & Fitness : 24824.74239602731
Weather : 24824.74239602731
Utilities : 24824.74239602731
Travel : 24824.74239602731
Shopping : 24824.74239602731
News : 24824.74239602731
Navigation : 24824.74239602731
Lifestyle : 24824.74239602731
Entertainment : 24824.74239602731
Food & Drink : 24824.74239602731
Sports : 24824.74239602731
Book : 24824.74239602731
Finance : 24824.74239602731
Education : 24824.74239602731
Productivity : 24824.74239602731
Business : 24824.74239602731
Catalogs : 24824.74239602731
Medical : 24824.74239602731


There are a lot of insights we could gather from this table but I think there might be some underlying discrepancies that would cause this data to be skewed. Let's investigate some of the highest reviewed app genres like navigation.

In [24]:
for app in iosFinal:
    if app[-5] == "Navigation":
        print(app[1], ":", app[5]) # app[5] is total number or reviews

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


As we can see waze and google maps account for almost half a million in this genre alone. That makes a navigation app highly unlikely to be able to dirupt this market as google owns both waze and google maps and have a significant advantage in this market. This trend is likely to continue in the social media and music genres as there are a few apps that control a significant amount of the market and as such skews our analysis. The reference genre is interesting as it is largely the same as the other categories but let's look at it anyway.

In [25]:
for app in iosFinal:
    if app[-5] == "Reference":
        print(app[1], ":", app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


One point to note about this category is that apps like the bible and other popular religious texts are relatively easy to make as those texts are open source. One option would be to simply create a bible app that offers more features than the other apps and that might be enough to turn a profit.

Now we can look at the google play store data. Something to note about the google data set is that the installs are not hard numbers but rather pseudo ranges like 100+ 1,000+ 100,000+. This is obviously very different than what we previously thought. This is still usable for our use case even though it is less accurate. Again our use is for some exploratory analysis but this would not be useful if we were trying to build some type of predictive model.

In [26]:
androidGenres = freqTable(androidFinal, 1)

for genre in androidGenres:
    total = 0
    nGenre = 0
    for app in androidFinal:
        appGenre = app[1]
        if appGenre == genre:
            installs = app[5]
            installs = installs.replace(",", "")
            installs = installs.replace("+", "")
            total += float(installs)
            nGenre += 1
    avgGenreRatings = total/nGenre
    print(genre, ":", avgGenreRatings)
        

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps seem to be the most dominant in the android market though we'll definitely need to check that and some of the others for apps that skew the category. We already know that social networking and navigation probably suffer from the same issues as in the app store

In [27]:
for app in androidFinal:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

We see this same trend in other categories like video_players where youtube and google play movies are by far the most installed apps. 
  
Let's look at the books category as that seems to be the most viable genre for us to try and enter the market.

In [28]:
for app in androidFinal:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

This book section is similar to the app store's but it also includes apps that convert text files to ebooks, ebook reader apps, tutorials, and regular books. 

In [29]:
for app in androidFinal:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [30]:
for app in androidFinal:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

We can again see that there are a few apps that dominate the market however there are also a lot of apps that seem to be open source books like we saw in the app store. This is a good sign as that means there is room in the market for entry.

# Conclusions
In this project we analyzed the ios app store and google play store datasets to see what kinds of apps are most popular to guide our decision in what kind of app to make. I think the best option would be to build a book app of a popular book that add external functionality like a built in dictionary, daily quotes, text to audio, and a forum for people to dicuss the book.