### App Development Analysis
 - __This project seeks to analyze how various apps are doing on Google PlayStore and Apple AppStore. This will will help our developers know which apps will get us to most target audience for our ad reach.__

 - __We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.__

In [1]:
from csv import reader
open_android = open('googleplaystore.csv',encoding='utf8')
read_android = reader(open_android)
android_list = list(read_android)
android_header = android_list[0]
android_rows = (android_list[1:])

print('\n')
open_ios = open('AppleStore.csv',encoding='utf8')
read_ios = reader(open_ios)
ios_list = list(read_ios)
ios_header = ios_list[0]
ios_rows = ios_list[1:]


# read_android = reader(open_file_android)
# android_list = list(read_android)
# open_file_android.close()





In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
explore_data(android_rows,0,3,True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [4]:
explore_data(ios_rows,0,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [5]:
print(android_list[0])
print(ios_list[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [6]:
print(android_rows[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


### Deleting duplicate apps

- since this row has a wrong data value, it has to be deleted. however, it has to be ran just once else you'll delete the new data value taking the index 

In [7]:
del android_rows[10472]

In [8]:
for app in android_rows:
    name = app[0]
    if name == 'Instagram':
        print(name)
        print(app)

Instagram
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
Instagram
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
Instagram
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
Instagram
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [9]:
#### Find apps with duplicate occurence
duplicates = []
unique_app = []
for app in android_rows:
    name = app[0]
    if name in unique_app:
        duplicates.append(name)
    else:
        unique_app.append(name)
print('Number of duplicate apps :', len(duplicates))      
print('\n')
print('Number of unique apps :', len(unique_app))

Number of duplicate apps : 1181


Number of unique apps : 9659


### removing duplicates
- It is obvious from the above that some apps are in duplicates. 
- We need to delete all the duplicates because we need unique apps recorded once.
- Deleting randomly seems easy, but if we look at the reviews column, we realise not all reviews are same, meaning, data of duplicates are taken at different times
- It will be logical to keep the duplicate with the highest reviews, since it seems to be the latest amognst the duplicate names

- In this step, we create a new dictionary and store all the  apps, with their respective highest review value.If an app is duplicated, we want to keep the detail with the highest review
- To do this I looped through the the apps, assigned the review row to n_review variable. If the app name in the dictionary, I replace the review value to the highest value. Else I maintain the app with its review value because it is not duplicated and does not need to be checked for the highest review value 

In [10]:
review_max = {}

for app in android_rows:
    name = app[0]
    n_review = float(app[3])
    if name in review_max and review_max[name] < n_review:
        review_max[name] = n_review
    elif name not in review_max:
         review_max[name] = n_review
# review_max            

- Now that I have only the maximum reviews, we want to add only the highest value sto the android_clean list and keep track of of not adding the duplicates to it, by keeping already_added list to check have just a single occurence for the app

In [11]:
android_clean = []
already_added =[]

for app in android_rows:
    name = app[0]
    n_reviews = float(app[3])
    if ( review_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
print(len(android_clean))    

9659


### Taking Non-English apps out of the list.
- This is not an entirely holistic approach,but will take as most non-English apps. 
- We use the python method ord, which makes sure English keyboard characters have maximum _ord_ of 127, so using and ord of more than 127 will eleminate most non-English apps.


In [12]:
def app_language(string):
    not_ascii = 0
    for char in string:
        if ord(char) > 127:
            not_ascii += 1
    if not_ascii > 3:
        return False
    else:
        return True

In [28]:

app_language('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

We're not interested in keeping these kind of apps, so we'll remove them. One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

We built this function below, and we use the built-in ord() function to find out the corresponding encoding number of each character.

In [14]:
android_all_english = []
ios_all_english = []

for row in android_clean:
    name = row[0]
    if app_language(name) == True:
        android_all_english.append(row)
        
print(len(android_all_english))

for row in ios_rows:
    name = row[1]
    if app_language(name) == True:
        ios_all_english.append(row)
print(len(ios_all_english))        

9614
6183


### Isolating Free Apps

In [15]:
android_free_apps = []
for app in android_all_english:
   
    price = (app[7])
    if price == '0':
        android_free_apps.append(app)
print(len(android_free_apps))  


8864


In [16]:
ios_free_apps = []
for app in ios_all_english:
    price = app[4]
    if price == '0.0':
        ios_free_apps.append(app)
print(len(ios_free_apps))        

3222


### Our aim is to build an app that does well on both android and and ios. To accomplish this, we
- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we develop it further.
- If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
- We will rely on the prime_genre from the ios data
- For the android version, we will rely on the category and genre

In [17]:
# (android_free_apps)
# android_header

In [18]:
# (android_free_apps)

- We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function that we can use to display the percentages in a descending order

In [19]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages 


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Displaying the highest percentage apps for both anroid and ios

In [20]:
print(ios_header)
print(android_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [21]:
#The highest percentage ios apps based on prime_genre
display_table(ios_all_english, 11)

Games : 54.860100274947435
Entertainment : 7.261846999838266
Education : 6.6310852337053205
Photo & Video : 5.515122109008572
Utilities : 3.4449296458030085
Productivity : 2.7171276079573023
Health & Fitness : 2.6686074721009216
Music : 2.215752870774705
Social Networking : 2.037845705967977
Sports : 1.6820313763545207
Lifestyle : 1.6011644832605532
Shopping : 1.3747371825974446
Weather : 1.1159631246967492
Travel : 0.9704027171276078
News : 0.9218825812712276
Book : 0.8895358240336406
Reference : 0.8571890667960537
Business : 0.8571890667960537
Finance : 0.7924955523208799
Food & Drink : 0.7116286592269124
Navigation : 0.452854601326217
Medical : 0.3396409509946628
Catalogs : 0.08086689309396733


Based on the above, it is seen that the top 5 ios apps pased on prime_genre are __Games, Entertainment,Education, Photo & Video and Utilities__

In [22]:
# Hightes apps downloads based on  android apps based on Genre
# android_header
display_table(android_all_english,1)

FAMILY : 19.325982941543582
GAME : 9.819013938007073
TOOLS : 8.61244019138756
BUSINESS : 4.358227584772207
MEDICAL : 4.108591637195756
PERSONALIZATION : 3.900561680882047
PRODUCTIVITY : 3.879758685250676
LIFESTYLE : 3.786145204909507
FINANCE : 3.588516746411483
SPORTS : 3.3804867900977738
COMMUNICATION : 3.2660703141252343
HEALTH_AND_FITNESS : 2.995631370917412
PHOTOGRAPHY : 2.9124193883919283
NEWS_AND_MAGAZINES : 2.600374453921365
SOCIAL : 2.485957977948825
TRAVEL_AND_LOCAL : 2.2779280216351157
BOOKS_AND_REFERENCE : 2.26752652381943
SHOPPING : 2.090701060952777
DATING : 1.768254628666528
VIDEO_PLAYERS : 1.6954441439567296
MAPS_AND_NAVIGATION : 1.3417932182234242
FOOD_AND_DRINK : 1.1649677553567712
EDUCATION : 1.1025587684626585
ENTERTAINMENT : 0.9049303099646349
LIBRARIES_AND_DEMO : 0.8737258165175785
AUTO_AND_VEHICLES : 0.8737258165175785
WEATHER : 0.8217183274391513
HOUSE_AND_HOME : 0.7593093405450385
EVENTS : 0.6656958602038693
PARENTING : 0.6240898689411275
ART_AND_DESIGN : 0.6240

- It can been seen that the top 5 android apps based on Genre are __FAMILY,GAME, TOOLS, BUSINESS, MEDICAL__ 

In [23]:
# Hightes apps downloads based on  android apps based on Category
display_table(android_all_english,9)

Tools : 8.602038693571874
Entertainment : 5.793634283336801
Education : 5.231953401289786
Business : 4.358227584772207
Medical : 4.108591637195756
Personalization : 3.900561680882047
Productivity : 3.879758685250676
Lifestyle : 3.775743707093822
Finance : 3.588516746411483
Sports : 3.442895776991887
Communication : 3.2660703141252343
Action : 3.110047846889952
Health & Fitness : 2.995631370917412
Photography : 2.9124193883919283
News & Magazines : 2.600374453921365
Social : 2.485957977948825
Travel & Local : 2.26752652381943
Books & Reference : 2.26752652381943
Shopping : 2.090701060952777
Simulation : 1.9762845849802373
Arcade : 1.9138755980861244
Dating : 1.768254628666528
Casual : 1.7162471395881007
Video Players & Editors : 1.674641148325359
Maps & Navigation : 1.3417932182234242
Puzzle : 1.2377782400665696
Food & Drink : 1.1649677553567712
Role Playing : 1.0817557728312877
Strategy : 0.9777407946744331
Racing : 0.9465363012273768
Libraries & Demo : 0.8737258165175785
Auto & Vehicl

- It can be seen the top 5 android apps based on Genre are __Tools, Entertainment, Education,Business and Medical__

### Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

In [24]:
freq_table(ios_all_english, 11)


{'Social Networking': 2.037845705967977,
 'Photo & Video': 5.515122109008572,
 'Games': 54.860100274947435,
 'Music': 2.215752870774705,
 'Reference': 0.8571890667960537,
 'Health & Fitness': 2.6686074721009216,
 'Weather': 1.1159631246967492,
 'Utilities': 3.4449296458030085,
 'Travel': 0.9704027171276078,
 'Shopping': 1.3747371825974446,
 'News': 0.9218825812712276,
 'Navigation': 0.452854601326217,
 'Lifestyle': 1.6011644832605532,
 'Entertainment': 7.261846999838266,
 'Food & Drink': 0.7116286592269124,
 'Sports': 1.6820313763545207,
 'Book': 0.8895358240336406,
 'Finance': 0.7924955523208799,
 'Education': 6.6310852337053205,
 'Productivity': 2.7171276079573023,
 'Business': 0.8571890667960537,
 'Catalogs': 0.08086689309396733,
 'Medical': 0.3396409509946628}

In [25]:
ios_genres = freq_table(ios_all_english, 11)
for genre in ios_genres:
    total = 0
    len_genre = 0
    
    for app in ios_all_english:
        genre_app  = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':',avg_n_ratings)
        

Social Networking : 60253.84920634921
Photo & Video : 14688.715542521993
Games : 15586.759433962265
Music : 29047.109489051094
Reference : 27037.188679245282
Health & Fitness : 10802.157575757576
Weather : 23145.246376811596
Utilities : 7927.525821596244
Travel : 19030.183333333334
Shopping : 26635.011764705883
News : 16980.315789473683
Navigation : 19370.821428571428
Lifestyle : 8930.373737373737
Entertainment : 8862.409799554565
Food & Drink : 19934.386363636364
Sports : 15350.913461538461
Book : 10359.2
Finance : 23353.530612244896
Education : 2472.278048780488
Productivity : 8508.089285714286
Business : 5149.320754716981
Catalogs : 3465.0
Medical : 648.952380952381


By the numbers, Reference, Shopping and Weather are a great category of apps to develop as they are well reviewed on ios apps and among the top downloaded categories and genres on android. Reference is thus recommended category of app to be developed.
Social Networking, has great numbers, but this is skewed by giant apps like facebook with several users, painting a false narative of the entire category

Finances have good numbers as well, but these are mostly ran by business coperations who actually run financial service, and that is not what we are as a company and thus, we are better of leaving that lane. This can be be seen with a quick scan in the trend of finance apps as below.

In [26]:
for app in ios_all_english:
    if app[11] == 'Finance':
        print(app[1], ':', app[5])

Chase Mobile℠ : 233270
Mint: Personal Finance, Budget, Bills & Money : 232940
Bank of America - Mobile Banking : 119773
PayPal - Send and request money safely : 119487
Credit Karma: Free Credit Scores, Reports & Alerts : 101679
Capital One Mobile : 56110
Citi Mobile® : 48822
Wells Fargo Mobile : 43064
Chase Mobile : 34322
Square Cash - Send Money for Free : 23775
Capital One for iPad : 21858
Venmo : 21090
USAA Mobile : 19946
TaxCaster – Free tax refund calculator : 17516
Amex Mobile : 11421
TurboTax Tax Return App - File 2016 income taxes : 9635
Bank of America - Mobile Banking for iPad : 7569
BillMinder - Bill Reminder and Organizer : 4262
HomeBudget with Sync : 3042
Wells Fargo for iPad : 2207
Stash Invest: Investing & Financial Education : 1655
Digit: Save Money Without Thinking About It : 1506
IRS2Go : 1329
Stock Market HD: Real Time Stocks Tracker + Forex : 1275
Stock Market Pro: Stock Trading, Charts & Alerts : 1034
Capital One CreditWise - Credit score and report : 1019
U by BB&

In [27]:
for app in ios_all_english:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Sky Guide: View Stars Night or Day : 22100
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
Dictionary.com Dictionary & Thesaurus Premium : 11530
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
WolframAlpha : 7410
e-Sword HD: Bible Study Made Easy : 7309
iHandy Translator Pro : 5163
Dictionary.com Premium Dictionary & Thesaurus for iPad : 4922
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Speak & Translate － Live Voice and Text Translator : 4344
National Geographic World Atlas : 4255
Knots 3D : 3196
iQuran : 2929
Merriam-Webster Dictionary & Thesaurus : 2843
e-Sword LT: Bible Study on the Go : 2152
GUNS MODS for Minecraf

This shows a skewed distribution from Bible and dictionary. Thus, what we could do is create on of these two, but add extra features like _quote_ or _word_ of the day if we go with the Bible or dictionary respectively. We could also add an audio feature to enable readers listen to content or reminders that prompts them to look up something in the book everyday
 However, it seems a good idea to stick to this since weather apps are not mostly used for long and shopping apps may results in us using non-free api's

### Most Popular Apps by Genre on the Google PlayStore

In [30]:
display_table(android_all_english, 5)

1,000,000+ : 14.70771791137924
100,000+ : 11.504056584148117
10,000+ : 10.619929269814854
10,000,000+ : 9.746203453297275
1,000+ : 9.153318077803203
100+ : 7.322654462242563
5,000,000+ : 6.292906178489703
500,000+ : 5.242354899105471
5,000+ : 4.836696484293738
50,000+ : 4.815893488662367
10+ : 3.994175161223216
500+ : 3.4116912835448305
50,000,000+ : 2.1219055543998335
50+ : 2.1219055543998335
100,000,000+ : 1.9658830871645518
5+ : 0.8529228208862076
1+ : 0.6864988558352403
500,000,000+ : 0.249635947576451
1,000,000,000+ : 0.20802995631370919
0+ : 0.13521947160391096
0 : 0.010401497815685458


In [32]:
android_categories = freq_table(android_all_english, 1)
android_categories


{'ART_AND_DESIGN': 0.6240898689411275,
 'AUTO_AND_VEHICLES': 0.8737258165175785,
 'BEAUTY': 0.5512793842313293,
 'BOOKS_AND_REFERENCE': 2.26752652381943,
 'BUSINESS': 4.358227584772207,
 'COMICS': 0.5720823798627002,
 'COMMUNICATION': 3.2660703141252343,
 'DATING': 1.768254628666528,
 'EDUCATION': 1.1025587684626585,
 'ENTERTAINMENT': 0.9049303099646349,
 'EVENTS': 0.6656958602038693,
 'FINANCE': 3.588516746411483,
 'FOOD_AND_DRINK': 1.1649677553567712,
 'HEALTH_AND_FITNESS': 2.995631370917412,
 'HOUSE_AND_HOME': 0.7593093405450385,
 'LIBRARIES_AND_DEMO': 0.8737258165175785,
 'LIFESTYLE': 3.786145204909507,
 'GAME': 9.819013938007073,
 'FAMILY': 19.325982941543582,
 'MEDICAL': 4.108591637195756,
 'SOCIAL': 2.485957977948825,
 'SHOPPING': 2.090701060952777,
 'PHOTOGRAPHY': 2.9124193883919283,
 'SPORTS': 3.3804867900977738,
 'TRAVEL_AND_LOCAL': 2.2779280216351157,
 'TOOLS': 8.61244019138756,
 'PERSONALIZATION': 3.900561680882047,
 'PRODUCTIVITY': 3.879758685250676,
 'PARENTING': 0.624089

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [42]:
for category in android_categories:
    total = 0
    len_category = 0
    for app in android_all_english:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    
    print(category, ':', avg_n_installs)


ART_AND_DESIGN : 1887285.0
AUTO_AND_VEHICLES : 632501.3214285715
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 7641777.871559633
BUSINESS : 1663758.627684964
COMICS : 817657.2727272727
COMMUNICATION : 35153714.17515924
DATING : 828971.2176470588
EDUCATION : 1782566.0377358492
ENTERTAINMENT : 11375402.298850575
EVENTS : 249580.640625
FINANCE : 1319851.4028985507
FOOD_AND_DRINK : 1891060.2767857143
HEALTH_AND_FITNESS : 3972300.388888889
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 630903.6904761905
LIFESTYLE : 1369954.7774725275
GAME : 14256217.600635594
FAMILY : 3345018.516684607
MEDICAL : 96944.49873417722
SOCIAL : 22961790.384937238
SHOPPING : 6966908.880597015
PHOTOGRAPHY : 16636241.267857144
SPORTS : 3373767.6861538463
TRAVEL_AND_LOCAL : 13218662.767123288
TOOLS : 9785955.211352658
PERSONALIZATION : 4086652.4853333333
PRODUCTIVITY : 15530942.008042896
PARENTING : 525351.8333333334
WEATHER : 4570892.658227848
VIDEO_PLAYERS : 24121489.079754602
NEWS_AND_MAGAZINES : 947

In [44]:
for app in android_all_english:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Even though communication tops this list, it is heavily skewed by big communication apps like WhatsApp,imo and Messanger and need quite a number of regulations. This is not something we need for our kind of business.

In [46]:
below_100M = []

for app in android_all_english:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        below_100M.append(float(n_installs))
        
sum(below_100M) / len(below_100M)

3269220.386759582

Now, can focus on apps around below __3,269,220__ as this looks like the average without communication apps skewing the data. 
Apps in this range includes __Youtube, Google Play Movies & TV, or MX Player.__ and social media apps like Facebook, Instagram. With productivity apps, Microsoft apps, and dropbox just takes majority of the data, making it inapropraite to venture.


Let's look into books, since it did well in appstore and getting another good performance here will make it a perfect niche to venture

In [48]:
for app in android_all_english:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [63]:
for app in android_all_english:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'
                                           ):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [56]:
for app in android_all_english:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

It looks like giants software apps dominate here as well, so picking a popular book like a dictionary, Bible or Quran and adding extra features makes a good place to start with

### Conclusion

It is recommended a popular book, like the Bible, Quran, dictionary or popular book is developed, and extra features like reminders, audio options and word or quote of the day be added to stand out. These apps do well on both playstore and appstore and has high potential of getting us reach more audience.