
# iOS and Android App Research

* I am putting together datasets to better understand statistics for app development. We'll be looking at data collected from the Google Play and the App Store. 

For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps.  
We make our apps available on Google Play and in the App Store.  


We only build apps that are free to download and install, and our main source of revenue consists of in-app ads.  
This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better.  
Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

In [1]:
from csv import reader

# App Store data set
opened_file_ios = open('AppleStore.csv')
read_ios = reader(opened_file_ios)
ios_all_data = list(read_ios)
ios_header = ios_all_data[0]
ios = ios_all_data[1:]

# Google Play data set
opened_file_android = open('googleplaystore.csv')
read_droid = reader(opened_file_android)
droid_all_data = list(read_droid)
droid_header = droid_all_data[0]
droid = droid_all_data[1:]

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    
    if rows_and_columns:
        print(f'Number of rows: {len(dataset)}')
        print(f'Number of columns:{len(dataset[0])}\n')
        
    for row in dataset_slice:
        print(row)
        print('\n')  # adds a new blank line after each row.


#### In order to find free, user driven apps, funded by ad revenue I believe relevant columns will be:

* name
* price
* user ratings 
* prime genre
* category
* reviews
* genre

In [3]:
print('iOS Header...')
print(ios_header, '\n')
print('iOS Data...')
explore_data(ios, 1, 3, True)

print('Android Header...')
print(droid_header, '\n')
print('Android Data...')
explore_data(droid, 1, 3, True)

iOS Header...
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

iOS Data...
Number of rows: 7197
Number of columns:16

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Android Header...
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

Android Data...
Number of rows: 10841
Number of columns:13

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Laun


### Make sure the length of all `droid` rows are even with the length of the `header` column.


In [4]:
for row in droid:
    if len(row) != len(droid_header):
        print(row)
        print('\n')
        print(f"Index position is {droid.index(row)}")

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Index position is 10472



#### Print the `row` in an easy to read fashion.


In [5]:
problem_row = droid[10472]
for item_head, item_row in zip(droid_header, problem_row):
    print(f"{item_head}: {item_row}")

App: Life Made WI-Fi Touchscreen Photo Frame
Category: 1.9
Rating: 19
Reviews: 3.0M
Size: 1,000+
Installs: Free
Type: 0
Price: Everyone
Content Rating: 
Genres: February 11, 2018
Last Updated: 1.0.19
Current Ver: 4.0 and up


**Looks like this app has a rating of 19 which is not possible.  
After reading the discussion board it seems to be missing an entry for the `Category` column.  
We'll Just delete the whole entry (`row`) for now.**

In [6]:
del droid[10472]

In [7]:
# Print the `row` in an easy to read fashion.
problem_row = droid[10472]
for item_head, item_row in zip(droid_header, problem_row):
    print(f"{item_head}: {item_row}")

App: osmino Wi-Fi: free WiFi
Category: TOOLS
Rating: 4.2
Reviews: 134203
Size: 4.1M
Installs: 10,000,000+
Type: Free
Price: 0
Content Rating: Everyone
Genres: Tools
Last Updated: August 7, 2018
Current Ver: 6.06.14
Android Ver: 4.4 and up



### Make sure the length of all `ios` rows are even with the length of the `header` column.

In [8]:
for row in ios:
    if len(row) != len(ios_header):
        print(row)
        print('\n')
        print(f"Index position is {ios.index(row)}")

**Looks like we're good.**


#### Reminder:

**Recall that at our company, we only build apps that are free to download and install, and we design them for an English-speaking audience.  
This means that we'll need to do the following:**

* Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播.
* Remove apps that aren't free.

## Investigate Duplicate Apps.

### Google Play Duplicate Data Check.

In [9]:
duplicate_apps = []
unique_apps = []

for app in droid:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print(f'Number of duplicate apps: {len(duplicate_apps)}\n')
print(f'Number of unique apps: {len(unique_apps)}\n')
print(f'Examples of duplicate apps: {duplicate_apps[:15]}')

Number of duplicate apps: 1181

Number of unique apps: 9659

Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']



##### That's 1181 duplicate apps. Let's see if we can find some discrepencies between the entries.


In [10]:
for app in droid:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']



#### Use Rating Totals to ID duplicates.
`Instagram` has multiple entries with varying `Rating` totals.  
It's safe to assume the higher the `Rating` total the more recent the data.  
Instead of removing duplicates randomly we'll use the `Rating` total column to remove duplicates.

#### Create an empty dictionary, add app name and number of reviews as the value.

In [11]:
reviews_max = {}

for app in droid:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(f'Non-duplicate length {len(reviews_max)}')
print(f'Expected length: {len(droid) - 1181}')

Non-duplicate length 9659
Expected length: 9659


* Create two lists. One for cleaned data and one for the names of apps that were already added to our cleaned list.  
* The lengths of both should be equal.

In [12]:
droid_clean = []
already_added = []

for app in droid:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        droid_clean.append(app)
        already_added.append(name)

print(f'Length of clean Google Play: {len(droid_clean)}')
print(f'Length of already added: {len(already_added)}')


Length of clean Google Play: 9659
Length of already added: 9659



### iOS Duplicate App Check


In [13]:
duplicate_apps = []
unique_apps = []

for app in ios:
    app_id = app[0]
    if app_id in unique_apps:
        duplicate_apps.append(app_id)
    else:
        unique_apps.append(app_id)

print(f'Number of duplicate apps: {len(duplicate_apps)}\n')
print(f'Number of unique apps: {len(unique_apps)}\n')
print(f'Examples of duplicate apps: {duplicate_apps[:15]}')

Number of duplicate apps: 0

Number of unique apps: 7197

Examples of duplicate apps: []


#### Well that's good news, the App Store doesn't have any duplicates. That unique ID column that really helps keep data clean :)


## Check For, and Remove, non-English apps.

Below are a few examples.


In [14]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(droid_clean[4412][0])
print(droid_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


We can iterate through the string and identify which characters are not part of the english text characters.  
Let's define a function to check characters in a string for non-English characters.

In [15]:
def english_checker(a_string):
    for char in a_string:
        if ord(char) > 127:
            return False
        
    return True
        
print(english_checker(ios[813][1]))
print(english_checker(ios[6731][1]))
print('\n')
print(english_checker(droid_clean[4412][0]))
print(english_checker(droid_clean[7940][0]))
print('\n')
print(english_checker('Docs To Go™ Free Office Suite'))
print(english_checker('Instachat 😜'))

False
False


False
False


False
False


Wait a second, the trademark symbol and emoji should count. Those apps are cleary geared toward English speakers.  
We need to modify that function a bit to account for a few non-ASCII characters like emojis and what not.

In [16]:
def english_checker(a_string):
    non_ascii = 0
    for char in a_string:
        if ord(char) > 127:
            non_ascii += 1
            
    if non_ascii > 3:
        return False
    else:    
        return True
        
print(english_checker(ios[813][1]))
print(english_checker(ios[6731][1]))
print('\n')
print(english_checker(droid_clean[4412][0]))
print(english_checker(droid_clean[7940][0]))
print('\n')
print(english_checker('Docs To Go™ Free Office Suite'))
print(english_checker('Instachat 😜'))

False
False


False
False


True
True


This seems to work pretty well. It's not perfect but it should clean out most of the non-English apps.  
Let's go ahead and store these cleaned up, English apps in a new list.  
Don't forget to use `droid_clean` as we have already spent time cleaning up a bit of the Google Play data.

In [17]:
ios_english = []
ios_non_english = []

for app in ios:
    name = app[1]
    if english_checker(name):
        ios_english.append(app)
    else:
        ios_non_english.append(app)

print(f'Length of English: {len(ios_english)}')
print(f'Length of non-English: {len(ios_non_english)}')
print(f'Total length: {len(ios_english) + len(ios_non_english)}\n') 


droid_english = []
droid_non_english = []

for app in droid_clean:
    name = app[0]
    if english_checker(name):
        droid_english.append(app)
    else:
        droid_non_english.append(app)
         
            
print(f'Length of English: {len(droid_english)}')
print(f'Length of non-English: {len(droid_non_english)}')
print(f'Total length: {len(droid_english) + len(droid_non_english)}') 

# explore_data(droid_english, 0, 3, True)
# for name in droid_english[:25]:
#     print(name[0])

Length of English: 6183
Length of non-English: 1014
Total length: 7197

Length of English: 9614
Length of non-English: 45
Total length: 9659



## Clean Out non-Free Apps

Loop through datasets, identify and remove any non-free apps.  
Prices come up as `strings` so be sure they're not in the conditional statements.

In [18]:
ios_free = []
ios_paid = []

for app in ios_english:
    price = float(app[4])
    if price == 0.0:
        ios_free.append(app)
    else:
        ios_paid.append(app)
        
print(f'Free iOS apps: {len(ios_free)}')
print(f'Paid iOS apps: {len(ios_paid)}')
print()


droid_free = []
droid_paid = []

for app in droid_english:
    price = app[7]
    if price == '0':
        droid_free.append(app)
    else:
        droid_paid.append(app)

print(f'Free Android apps: {len(droid_free)}')
print(f'Paid Android apps: {len(droid_paid)}')
print('\n')

# explore_data(ios_free, 0, 3, True)
# print('\n')
# explore_data(droid_free, 0, 3, True)

Free iOS apps: 3222
Paid iOS apps: 2961

Free Android apps: 8864
Paid Android apps: 750





## App Profile Strategy

In order to minimize our risk and costs to market we'll use three steps:
1. Build a minimal version of an Android app.
2. If the app has a good response we can develop it further.
3. If it is profitable afte 6 months develop an iOS version for the App Store.

For our app dev purposes we want to find something that is popular on both the App Store and Google Play. We'll look at genre occurence frequency to identify which genres are the most popular.  
Let's make a function extracting genre frequency as a table(`dict` to start.

In [19]:
# :%s/foo/bar/gc

def freq_table(dataset: list, index: int) -> dict:
    """
    Takes a list, returns the table of a column at `index` position 
    as percentage of dataset total.
    """
    table = {}
    
    for row in dataset:
        column = row[index]
        if column in table:
            table[column] += 1
        else:
            table[column] = 1
    
    for k, v in table.items():
        table[k] = (v / len(dataset)) * 100
    
    return table

In [20]:
genre_freq_ios = freq_table(ios_free, 11)
# print(genre_freq_ios)

ios_genre_sorted = sorted(genre_freq_ios.items(), key=lambda x: x[1], reverse=True)

for i in ios_genre_sorted:
    print(f'{i[0]}: {i[1]}')

Games: 58.16263190564867
Entertainment: 7.883302296710118
Photo & Video: 4.9658597144630665
Education: 3.662321539416512
Social Networking: 3.2898820608317814
Shopping: 2.60707635009311
Utilities: 2.5139664804469275
Sports: 2.1415270018621975
Music: 2.0484171322160147
Health & Fitness: 2.0173805090006205
Productivity: 1.7380509000620732
Lifestyle: 1.5828677839851024
News: 1.3345747982619491
Travel: 1.2414649286157666
Finance: 1.1173184357541899
Weather: 0.8690254500310366
Food & Drink: 0.8069522036002483
Reference: 0.5586592178770949
Business: 0.5276225946617008
Book: 0.4345127250155183
Navigation: 0.186219739292365
Medical: 0.186219739292365
Catalogs: 0.12414649286157665


### App Store Insights
* It looks like Games dominates with 58%. Not much of a surprise there. Entertainment could also be an evenue worth exploring.  
* If Games are removed everything seems a little more balanced.  
* No other genre has more than 7.8%, with the Photo & Video and Education genres combining for 8.5%.  
* The majority of apps appear to be geared towards entertainment.
* I would recommend games as a ripe genre but it's more than likely that it's saturated. I would like to look at user reviews as well.

Google Play has a column for `category` as well as `genre`.  
Since the lists are long we'll just look at the top 20.

In [21]:
category_freq_droid = freq_table(droid_free, 1)
genre_freq_droid = freq_table(droid_free, 9)

# for k, v in category_freq_droid.items():
#     print(f'{k}: {v}')
# print('\n')
# for k, v in genre_freq_droid.items():
#     print(f'{k}: {v}')        
# print('\n')

sort_cat_freq_droid = sorted(category_freq_droid.items(), key=lambda x: x[1], reverse=True)
sort_genre_freq_droid = sorted(genre_freq_droid.items(), key=lambda x: x[1], reverse=True)

for i in sort_cat_freq_droid:
    print(f'{i[0]}: {i[1]}')
print('\n')
for i in sort_genre_freq_droid:
    print(f'{i[0]}: {i[1]}')   

FAMILY: 18.907942238267147
GAME: 9.724729241877256
TOOLS: 8.461191335740072
BUSINESS: 4.591606498194946
LIFESTYLE: 3.9034296028880866
PRODUCTIVITY: 3.892148014440433
FINANCE: 3.7003610108303246
MEDICAL: 3.531137184115524
SPORTS: 3.395758122743682
PERSONALIZATION: 3.3167870036101084
COMMUNICATION: 3.2378158844765346
HEALTH_AND_FITNESS: 3.0798736462093865
PHOTOGRAPHY: 2.944494584837545
NEWS_AND_MAGAZINES: 2.7978339350180503
SOCIAL: 2.6624548736462095
TRAVEL_AND_LOCAL: 2.33528880866426
SHOPPING: 2.2450361010830324
BOOKS_AND_REFERENCE: 2.1435018050541514
DATING: 1.861462093862816
VIDEO_PLAYERS: 1.7937725631768955
MAPS_AND_NAVIGATION: 1.3989169675090252
FOOD_AND_DRINK: 1.2409747292418771
EDUCATION: 1.1620036101083033
ENTERTAINMENT: 0.9589350180505415
LIBRARIES_AND_DEMO: 0.9363718411552346
AUTO_AND_VEHICLES: 0.9250902527075812
HOUSE_AND_HOME: 0.8235559566787004
WEATHER: 0.8009927797833934
EVENTS: 0.7107400722021661
PARENTING: 0.6543321299638989
ART_AND_DESIGN: 0.6430505415162455
COMICS: 0.62


### Google Play Insights

Either games are underrepresent due to mislabeling or there just aren't as many.  
* In general, I believe the `Category` column will be more accurate. It seems to look at the bigger picture.  
* The `Genre` column has a lot of user labeling as opposed to standard parameters set out by Google Play.  
* It appears that Google Play has a lot more practical apps as opposed to games. Though `Family` dominates it appears it was mostly kids games back when this data was taken.
* We still don't know a lot about active user base.


### Most Popular Apps by Genre on the App Store

In order to learn more about user base we'll look at the `Installs` column for Google Play. The App Store doesn't have an installs count so we'll use `rating_count_tot` instead.

In [22]:
# print(genre_freq_ios)


for genre in genre_freq_ios:
    total = 0
    len_genre = 0
    
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            user_rating_tot = float(app[5])
            total += user_rating_tot
            len_genre += 1
    print(f'{genre} = {len_genre} apps. Total user ratings = {round(total)}')

    avg_user_ratings_count = total / len_genre
    print(f'Interpreted average user base per app = {round(avg_user_ratings_count)}')
    supply_demand = round((len_genre / avg_user_ratings_count) * 100, 3)
    print(f'Apps (supply) to user base (demand) ratio = {supply_demand}% (lower is better)')
    demand_supply = round(avg_user_ratings_count / len_genre)
    print(f'Another way to think of it: {demand_supply} users for every app (higher is better)\n')

Social Networking = 106 apps. Total user ratings = 7584125
Interpreted average user base per app = 71548
Apps (supply) to user base (demand) ratio = 0.148% (lower is better)
Another way to think of it: 675 users for every app (higher is better)

Photo & Video = 160 apps. Total user ratings = 4550647
Interpreted average user base per app = 28442
Apps (supply) to user base (demand) ratio = 0.563% (lower is better)
Another way to think of it: 178 users for every app (higher is better)

Games = 1874 apps. Total user ratings = 42705967
Interpreted average user base per app = 22789
Apps (supply) to user base (demand) ratio = 8.223% (lower is better)
Another way to think of it: 12 users for every app (higher is better)

Music = 66 apps. Total user ratings = 3783551
Interpreted average user base per app = 57327
Apps (supply) to user base (demand) ratio = 0.115% (lower is better)
Another way to think of it: 869 users for every app (higher is better)

Reference = 18 apps. Total user ratings = 13

In looking for an open market we should be looking for apps that have a higher user base to lower total apps available. This tells us that there's a demand for an app that is in short supply.  
A few suggestions:
* Navigation seems pretty clearly on top according to this metric. However the startup costs might be pretty hefty. We also know Google, Apple, and Waze really dominate hear.
* Book could be a good option. You're using the products of others, you would just need to have an interesting platform. Again though you're in competition with Amazon's Kindle.
* Weather could be an interesting path. There's been some dominant players for more than a decade now. It would have to integrate seemlessly with iOS widgets and maybe even Apple Watch.
* Food and Drink would need to be novel and engaging. Perhaps using gamification and social networking elements to keep user engagement up.
* Reference fares well on this metric. Though I'm not really sure what they mean by it. We can examine those below.

In [23]:
for app in ios_free:
    if app[11] == 'Reference':
#         print(app[1], ios_free.index(app))
        print(f'{app[1]}: {app[5]}')
    
print()   
print(english_checker(ios_free[3085][1]))
print((ios_free[3085][1]))

Bible: 985920
Dictionary.com Dictionary & Thesaurus: 200047
Dictionary.com Dictionary & Thesaurus for iPad: 54175
Google Translate: 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran: 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition: 17588
Merriam-Webster Dictionary: 16849
Night Sky: 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE): 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools: 4693
GUNS MODS for Minecraft PC Edition - Mods Tools: 1497
Guides for Pokémon GO - Pokemon GO News and Cheats: 826
WWDC: 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free: 718
VPN Express: 14
Real Bike Traffic Rider Virtual Reality Glasses: 8
教えて!goo: 0
Jishokun-Japanese English Dictionary & Translator: 0

True
教えて!goo


It looks like we missed a non-English app. Ah well. We'll leave it for later.  
We did find out what `Reference` is referring to though, things like a dictionary, bible, maps for games, pocket guides.  
This genre is largely dominated by the bible and "Dictionary.com".  
There could be a direction for development but my gut tells me most everyone (including myself) just uses various Wiki pages. Maybe a Wiki organizer?  

Taking a look at the `Navigation` apps our hunch is confirmed with Waze and Google Maps accounting for 97% of all reviews. Add to the fact Apple Maps isn't even on here and this seems like a poor choice.

In [24]:
for app in ios_free:
    if app[-5] == 'Navigation':
        print(f'{app[1]}: {app[5]}')

Waze - GPS Navigation, Maps & Real-time Traffic: 345046
Google Maps - Navigation & Transit: 154911
Geocaching®: 12811
CoPilot GPS – Car Navigation & Offline Maps: 3582
ImmobilienScout24: Real Estate Search in Germany: 187
Railway Route Search: 5


Exploring `Book` again our hunch is confirmed with Amazon, who own both Kindle and Audible, being a top heavy market.  
Perhaps gamification on a kids app for reading would make sense.

In [25]:
for app in ios_free:
    if app[-5] == 'Book':
        print(f'{app[1]}: {app[5]}')

Kindle – Read eBooks, Magazines & Textbooks: 252076
Audible – audio books, original series & podcasts: 105274
Color Therapy Adult Coloring Book for Adults: 84062
OverDrive – Library eBooks and Audiobooks: 65450
HOOKED - Chat Stories: 47829
BookShout: Read eBooks & Track Your Reading Goals: 879
Dr. Seuss Treasury — 50 best kids books: 451
Green Riding Hood: 392
Weirdwood Manor: 197
MangaZERO - comic reader: 9
ikouhoushi: 0
MangaTiara - love comic reader: 0
謎解き: 0
謎解き2016: 0


Looking at `Food & Drink` it seems like fast food ordering and food delivery are the dominators. Probably not an easy spot to get into if we don't have a restaurant. Also not driven by the ads or user base.

In [26]:
for app in ios_free:
    if app[-5] == 'Food & Drink':
        print(f'{app[1]}: {app[5]}')

Starbucks: 303856
Domino's Pizza USA: 258624
OpenTable - Restaurant Reservations: 113936
Allrecipes Dinner Spinner: 109349
DoorDash - Food Delivery: 25947
UberEATS: Uber for Food Delivery: 17865
Postmates - Food Delivery, Faster: 9519
Dunkin' Donuts - Get Offers, Coupons & Rewards: 9068
Chick-fil-A: 5665
McDonald's: 4050
Deliveroo: Restaurant Delivery - Order Food Nearby: 1702
SONIC Drive-In: 1645
Nowait Guest: 1625
7-Eleven, Inc.: 1356
Outback: 805
Bon Appetit: 750
Starbucks Keyboard: 457
Whataburger: 197
Delish Eatmoji Keyboard: 154
Lieferheld - Delicious food delivery service: 29
Lieferando.de: 29
McDo France: 22
Chefkoch - Rezepte, Kochen, Backen & Kochbuch: 20
Youmiam: 9
Marmiton Twist: 2
Open Food Facts: 1



### Most Popular Apps by Genre on Google Play

We'll be looking at the `Category` column as previously discussed.

In [27]:
install_table = freq_table(droid_free, 5)

for k, v in sorted(install_table.items(), reverse=True):
    print(f'{k}: {v}')

500,000,000+: 0.2707581227436823
500,000+: 5.561823104693141
500+: 3.2490974729241873
50,000,000+: 2.3014440433213
50,000+: 4.7721119133574
50+: 1.917870036101083
5,000,000+: 6.825361010830325
5,000+: 4.512635379061372
5+: 0.78971119133574
100,000,000+: 2.1322202166064983
100,000+: 11.552346570397113
100+: 6.915613718411552
10,000,000+: 10.548285198555957
10,000+: 10.198555956678701
10+: 3.5424187725631766
1,000,000,000+: 0.22563176895306858
1,000,000+: 15.726534296028879
1,000+: 8.393501805054152
1+: 0.5076714801444043
0+: 0.04512635379061372
0: 0.01128158844765343


These are super specific with the ranges being really wide. We're gonna strip out the `,`'s and `+`'s. It will be a little less accurate overall but it should be good enough for our purposes.

In [28]:
droid_category_freq = freq_table(droid_free, 1)

# for k, v in droid_category_freq.items():
#     print(f'{k}: {v}')
droid_sup_demand = {}

for category in droid_category_freq:
    total = 0
    len_category = 0
    for app in droid_free:
        cat_app = app[1]
        if cat_app == category:
            installs = float(app[5].replace(',', '').strip('+'))
            total += installs
            len_category += 1
    
    avg_install = total / len_category
    print(f'{category} \n{len_category} apps \n{total}: installs')
    print(f'{round(avg_install)}: Interpreted average user base per app')
    supply_demand = round((len_category / avg_install) * 100, 4)
    print(f'Apps (supply) to user base (demand) ratio = {supply_demand}% (lower is better)')
    demand_supply = round(avg_install / len_genre)
    print(f'Another way to think of it: {demand_supply} users for every app (higher is better)\n')
    
    droid_sup_demand[category] = len_category, supply_demand, demand_supply

ART_AND_DESIGN 
57 apps 
113221100.0: installs
1986335: Interpreted average user base per app
Apps (supply) to user base (demand) ratio = 0.0029% (lower is better)
Another way to think of it: 331056 users for every app (higher is better)

AUTO_AND_VEHICLES 
82 apps 
53080061.0: installs
647318: Interpreted average user base per app
Apps (supply) to user base (demand) ratio = 0.0127% (lower is better)
Another way to think of it: 107886 users for every app (higher is better)

BEAUTY 
53 apps 
27197050.0: installs
513152: Interpreted average user base per app
Apps (supply) to user base (demand) ratio = 0.0103% (lower is better)
Another way to think of it: 85525 users for every app (higher is better)

BOOKS_AND_REFERENCE 
190 apps 
1665884260.0: installs
8767812: Interpreted average user base per app
Apps (supply) to user base (demand) ratio = 0.0022% (lower is better)
Another way to think of it: 1461302 users for every app (higher is better)

BUSINESS 
407 apps 
696902090.0: installs
1712

We can take a similar approach to the App Store data and look for a supply and demand ratio.

In [29]:
# print(droid_sup_demand)
for k, v in sorted(droid_sup_demand.items(), key=lambda x: x[1][1]):
    print(f'{k}\n{v[0]} apps \n{v[1]}% S/D ratio (lower = better) \n{v[2]} Projected users per app \n')

VIDEO_PLAYERS
159 apps 
0.0006% S/D ratio (lower = better) 
4121312 Projected users per app 

COMMUNICATION
287 apps 
0.0007% S/D ratio (lower = better) 
6409353 Projected users per app 

ENTERTAINMENT
85 apps 
0.0007% S/D ratio (lower = better) 
1940118 Projected users per app 

SOCIAL
236 apps 
0.001% S/D ratio (lower = better) 
3875609 Projected users per app 

WEATHER
71 apps 
0.0014% S/D ratio (lower = better) 
845748 Projected users per app 

PHOTOGRAPHY
261 apps 
0.0015% S/D ratio (lower = better) 
2973352 Projected users per app 

TRAVEL_AND_LOCAL
207 apps 
0.0015% S/D ratio (lower = better) 
2330680 Projected users per app 

PRODUCTIVITY
345 apps 
0.0021% S/D ratio (lower = better) 
2797889 Projected users per app 

BOOKS_AND_REFERENCE
190 apps 
0.0022% S/D ratio (lower = better) 
1461302 Projected users per app 

NEWS_AND_MAGAZINES
248 apps 
0.0026% S/D ratio (lower = better) 
1591530 Projected users per app 

SHOPPING
199 apps 
0.0028% S/D ratio (lower = better) 
1172813 Pro

I think that if `Books` from the App Store is a relevant genre then we can definitely look into the same genre on Google Play. Couple it with `Socical` and `Entertainment` and we probably have something worth looking into. Maybe a social app that's a book club?  
`Games` and `Education` have a decent supply/demand ratio. If we can work towards the gamification of an educational app it could be very successful. Especially if we begin to offer paid user perks.

#### That's it for now. I've spent one too many hours on this most likely and could easily do another 10-20 so I'll leave it here for now.