# Analyzing Mobile App Data

I'm pretending to be a data analyst at a company that build Android and iOS mobile apps

The maion goal is to analyze data to help our developers understand what type of apps are likely to attract more users

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
from csv import reader

#Apple Dataset
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

#Android Dataset
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

In [3]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


As we can see, the ios dataset contains 7197 rows and 16 columns. 

The most important columns for our analysis would then be: `'track_name'`, `'currency'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'`.

This aren't self explanatory, so for further explanation click [here](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)

In [4]:
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


As we can see, the android dataset contains 10841 rows and 13 columns. 

The most important columns for our analysis would then be: `'App'`, `'Category'`, `'Rating'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, and `'Genres'`.

## Data Cleaning

### Removing incorrect data

In [5]:
print(android_header, '\n')

for row in android:
    if len(row) != len(android_header):
        print(row)
        print('\n')
        print("Index postion is:", android.index(row))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Index postion is: 10472


In [6]:
del android[10472]

### Removing Duplicates

After further reviewing the [discussion section](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) of the android dataset, we can observe that it contains duplicate rows

In [7]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps: ', duplicate_apps[:10])

Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Now i'll check the repeated instances of a single app to check where the main differences lie on:

In [8]:
for app in android:
    name = app[0]
    if name == 'Google Ads':
        print(app)

['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


As previously mentioned, the duplicate entries have nearly identical data but vary in the number of reviews. Thus, we will retain the entry with the highest number of reviews, as it represents the most recent instance.

In [9]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        

print('The expected length of the android dataset with unique entries is ', len(android) - 1181) 
print('\n')
print('The actual length of the dictionary containing all unique apps is ', len(reviews_max))

The expected length of the android dataset with unique entries is  9659


The actual length of the dictionary containing all unique apps is  9659


Now, we will loop through the android dataset and compare the number of reviews of each app to the one stored in the `reviews_max` dictionary. 

We add the current row (app) to the `android_clean` list, and the app name (name) to the `already_added` list if:

The current app's review count corresponds to the review count listed in the `reviews_max` dictionary; and
The app's name is not already present in the `already_added` list. This additional condition is necessary to handle cases where multiple entries have the same highest review count for a duplicate app.

In [10]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

print('The clean version of the android dataset has a length of ', len(android_clean))

The clean version of the android dataset has a length of  9659


### Removing Non-English Apps

In both datasets, there are apps designed for non-English speakers. As data analysts for an English-speaking firm, we aim to exclude these non-English instances.

To achieve this, we'll create a function that checks if each character falls within the ASCII range for English words, which is up to 127. Additionally, some English apps may use special characters or emojis that fall outside this range. To address this, our function will classify an app as English only if it has three or more characters falling outside the ASCII range of 127.

In [11]:
def is_english(str):
    non_english_char_count = 0
    for char in str:
        if ord(char) > 127:
            non_english_char_count += 1
    if non_english_char_count > 3:
        return False
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


Now that we have the function working properly, we'll loop through each dataset and in each iteration we'll do the following:

* If an app name is identified as English, we append the whole row to a separate list

In [12]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
    
for app in ios:
    name = app[1]
    if is_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0 ,3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

### Isolating Free Apps

In this company we only build apps that are free to download and install, and our main source of revenue consists of in-app ads.

Our datasets contain both free and paid apps, so we'll need to isolate only the free apps for our analysis

In order to do so, we will iterate through both datasets and for each iteration we will append only the apps with a price equal to 0 to a new list that we created

In [13]:
android_free = []
ios_free = []

for app in android_english:
    type = app[6]
    if type == 'Free':
        android_free.append(app)

for app in ios_english:
    price = float(app[4])
    if price == 0:
        ios_free.append(app)
        
print('The length of the cleaned android dataset is ', len(android_free))
print('\n')
print('The length of the cleaned ios dataset is ', len(ios_free))

The length of the cleaned android dataset is  8863


The length of the cleaned ios dataset is  3222


## Data Analysis

In order to minimize the risk and overhead, our validation strategy for an app idea is comprised of three steps:
1. Build a minimal Android version of the app, and add it to Google Play
2. If the app has a good response from users, we develop it further
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are succesful on both markets

### Most common apps by genre

In [14]:
print(ios_header, '\n', '\n', android_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 
 
 ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


We note from the headers above that the genre for each app in the ios dataset is located at index 11, while for android apps, it is located at index 9

We'll develop two functions to analyze the frequency tables:

1. A function to generate frequency tables displaying percentages.
2. Another function to sort and display the percentages in descending order.

In [15]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for app in dataset:
        total += 1
        value = app[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentage = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentage[key] = percentage
    return table_percentage
        
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We begin by examining the frequency table for the `prime_genre` column of the App Store dataset.

In [16]:
display_table(ios_free, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Among free English apps, over half (58.16%) are games. Entertainment apps follow at nearly 8%, with photo and video apps at around 5%. Education apps make up only 3.66%, and social networking apps account for 3.29%.

The App Store's free English apps are predominantly for entertainment, with practical apps being less common. However, high numbers don't necessarily mean high user demand; the supply might not match the demand.

Let's proceed by analyzing the `Genres` and `Category` columns in the Google Play dataset, as they appear to be related.

In [17]:
display_table(android_free, 9)

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

In [18]:
display_table(android_free, 1)

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

The `Genres` column offers more granularity than the `Category` column. For our overview, we'll stick with the broader `Category` column.

We've seen the App Store leans towards entertainment apps, while Google Play has a more balanced mix. Now, let's explore the apps that attract the most users.

### Most Popular Apps by Genre on the App Store

In [19]:
genres_ios = freq_table(ios_free, 11)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[11]
        if genre_app == genre:
            user_ratings = float(app[5])
            total += user_ratings
            len_genre += 1
    avg_rating = total / len_genre
    print(genre, ':', avg_rating)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


By looking at the average number of ratings for each genre, we can tell that the Navigation genre has the highest average. So lests further analyze this genre

In [20]:
def show_ratings_ios(genre):
    for app in ios_free:
        if app[11] == genre:
            print(app[1], ':', app[5])

show_ratings_ios('Navigation')

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The results of printing the average rating count for Navigation apps gives us a clue that most of these ratings are coming from Waze and Google Maps, meaning that this might not be what we are aiming to build as these two giants are taking all the downloads.

Now lets analyze the second highest average genre which is `Reference`

In [21]:
show_ratings_ios('Reference')

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


We encounter a similar issue here. The top two apps with the highest average rating counts significantly influence the average, leaving limited room for newly developed apps to compete effectively.

Let's proceed with analyzing other categories. As anticipated, we won't delve into the `Social Networking` or `Music` categories, given their reputation for being dominated by a few major players.

We'll next analyse the `Weather` and `Finance` genres

In [22]:
show_ratings_ios('Weather')

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

For `Weather` apps, `The Weather Channel` stands out with the highest rating count. However, the other apps in this category are still competitive among themselves, with rating counts similar to those observed in the previously analyzed genres.

In [23]:
show_ratings_ios('Finance')    

Chase Mobile℠ : 233270
Mint: Personal Finance, Budget, Bills & Money : 232940
Bank of America - Mobile Banking : 119773
PayPal - Send and request money safely : 119487
Credit Karma: Free Credit Scores, Reports & Alerts : 101679
Capital One Mobile : 56110
Citi Mobile® : 48822
Wells Fargo Mobile : 43064
Chase Mobile : 34322
Square Cash - Send Money for Free : 23775
Capital One for iPad : 21858
Venmo : 21090
USAA Mobile : 19946
TaxCaster – Free tax refund calculator : 17516
Amex Mobile : 11421
TurboTax Tax Return App - File 2016 income taxes : 9635
Bank of America - Mobile Banking for iPad : 7569
Wells Fargo for iPad : 2207
Stash Invest: Investing & Financial Education : 1655
Digit: Save Money Without Thinking About It : 1506
IRS2Go : 1329
Capital One CreditWise - Credit score and report : 1019
U by BB&T : 790
Paribus - Rebates When Prices Drop : 768
KeyBank Mobile : 623
VyStar Mobile Banking for iPhone : 434
Sparkasse - Your mobile branch : 77
VyStar Mobile Banking for iPad : 57
Zaim : 4

In the `Finance` genre, we observe some concentration towards a few major competitors, but it's not as pronounced as in the `Reference` or `Navigation` genres. Therefore, it would be worthwhile to delve deeper into this category.

Finally, we will analyze the `Book` genre

In [24]:
show_ratings_ios('Book')

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0



The Books genre also appears promising due to its lower concentration.

In conclusion, we've identified three promising categories for app development in the Apple Store:

* Finance
* Weather
* Books

#### Now lets perform the same analysis for the Google Play Store

In [25]:
categories_android = freq_table(android_free, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs = float(n_installs)
            total += n_installs
            len_category += 1
    avg_installs = total / len_category
    print(category, ':', avg_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

In [26]:
def show_installs_android(category):
    for app in android_free:
        if app[1] == category:
            print(app[0], ':', app[5])
        
show_installs_android('COMMUNICATION')

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

Everbridge : 100,000+
Best Auto Call Recorder Free : 500+
EZ Wifi Notification : 10,000+
Test Server SMS FA : 5+
Lite for Facebook Messenger : 1,000,000+
FC Browser - Focus Privacy Browser : 1,000+
EHiN-FH conferenceapp : 100+
Carpooling FH Hagenberg : 100+
Wi-Fi Auto-connect : 1,000,000+
Talkie - Wi-Fi Calling, Chats, File Sharing : 500,000+
WeFi - Free Fast WiFi Connect & Find Wi-Fi Map : 1,000,000+
Sat-Fi : 5,000+
Portable Wi-Fi hotspot Free : 100,000+
TownWiFi | Wi-Fi Everywhere : 500,000+
Jazz Wi-Fi : 10,000+
Sat-Fi Voice : 1,000+
Free Wi-fi HotspoT : 50,000+
FN Web Radio : 10+
FNH Payment Info : 10+
MARKET FO : 100+
FO OP St-Nazaire : 100+
FO SODEXO : 100+
FO RCBT : 100+
FO Interim : 100+
FO PSA Sept-Fons : 100+
FO AIRBUS TLSE : 1,000+
FO STELIA Méaulte : 100+
FO AIRBUS Nantes : 100+
Firefox Focus: The privacy browser : 1,000,000+
FP Connect : 100+
FreedomPop Messaging Phone/SIM : 500,000+
FP Live : 10+
HipChat - beta version : 50,000+


The `COMMUNICATION` category shows heavy skewness towards a few giants such as Whatsapp Messenger and Messenger

In [27]:
show_installs_android('VIDEO_PLAYERS')

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

The `VIDEO_PLAYERS` category also shows skewness towards a few large incumbents

Now, lets analyze the `FINANCE` category

In [28]:
show_installs_android('FINANCE')

K PLUS : 10,000,000+
ING Banking : 1,000,000+
Citibanamex Movil : 5,000,000+
The postal bank : 5,000,000+
KTB Netbank : 5,000,000+
Mobile Bancomer : 10,000,000+
Nedbank Money : 500,000+
SCB EASY : 5,000,000+
CASHIER : 10,000,000+
Rabo Banking : 1,000,000+
Capitec Remote Banking : 1,000,000+
Itau bank : 10,000,000+
Nubank : 5,000,000+
The Societe Generale App : 1,000,000+
IKO : 1,000,000+
Cash App : 10,000,000+
Standard Bank / Stanbic Bank : 1,000,000+
Bualuang mBanking : 5,000,000+
Intesa Sanpaolo Mobile : 1,000,000+
UBA Mobile Banking : 1,000,000+
BBVA Spain : 5,000,000+
MyMo by GSB : 1,000,000+
VTB-Online : 5,000,000+
Ecobank Mobile Banking : 1,000,000+
Banorte Movil : 1,000,000+
Zenith Bank Mobile App : 1,000,000+
GCash - Buy Load, Pay Bills, Send Money : 1,000,000+
Post Bank : 1,000,000+
İşCep : 10,000,000+
People's Bank : 1,000,000+
Transfer : 5,000,000+
T-Mobile in : 1,000,000+
TrueMoney Wallet : 5,000,000+
Alfa-Bank (Alfa-Bank) : 1,000,000+
Bank of Brazil : 10,000,000+
WiseBanya

Here we can see more competition and thus higher potential for profiting apps in this sector.

Next, I'll analyze the `BOOKS_AND_REFERENCE` category to determine if it's also a promising genre in the Google Play Store, given its potential in the iOS market.

In [29]:
show_installs_android('BOOKS_AND_REFERENCE')

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E