This data set contains information about mobile apps in AppStore and Google Play which are free to download and install, but with knowledge of amount of people who use these apps we can speak about the revenue because the main source of revenue of free mobile apps consists of in-app ads and it is mostly influenced by the number of users that use our app.

**The main goal** of this analysis is to find out the most attractive apps and then give this information to developers for future development.

[Google Play Store Apps](https://www.kaggle.com/lava18/google-play-store-apps/home) - about approximately 10000 Android apps from Google Play  
[Mobile App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) - about approximately 7000 iOS apps from the App Store

In [2]:
from csv import reader
def open_file(file, header=True):
    opened_file=open(file)
    read_file=reader(opened_file)
    file=list(read_file)
    if header:
        return file[0],file[1:]
    else:
        return file

In [3]:
android_header, android=open_file('googleplaystore.csv')
ios_header, ios=open_file('AppleStore.csv')

In [4]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice=dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [5]:
explore_data(ios,0,3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [6]:
explore_data(android,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [7]:
print(android_header)
print('\n')
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


### Find mistakes

In [8]:
#1 mistake in android
for row in android:
    if len(row)!=len(android_header):
        print(android.index(row))

10472


In [9]:
#0 mistake in ios
for row in ios:
    if len(row)!=len(ios_header):
        print(ios.index(row))

In [10]:
del android[10472]

Check data for duplicates in Android

In [11]:
duplicates=[]
unique=[]
for row in android:
    name=row[0]
    if name in unique:
        duplicates.append(name)
    else:
        unique.append(name)
duplicates[:15]

['Quick PDF Scanner + OCR FREE',
 'Box',
 'Google My Business',
 'ZOOM Cloud Meetings',
 'join.me - Simple Meetings',
 'Box',
 'Zenefits',
 'Google Ads',
 'Google My Business',
 'Slack',
 'FreshBooks Classic',
 'Insightly CRM',
 'QuickBooks Accounting: Invoicing & Expenses',
 'HipChat - Chat Built for Teams',
 'Xero Accounting Software']

Check data for duplicates in AppleStore

In [12]:
duplicates=[]
unique=[]
for row in ios:
    name=row[0]
    if name in unique:
        duplicates.append(name)
    else:
        unique.append(name)
duplicates[:15]

[]

Data from GoogleStore has some duplicates, we won't remove the duplicates randomly but we should make a criteria to save one row. Let's have a look at one duplicate

In [13]:
for app in android:
    name=app[0]
    if name=='Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


The only difference is in 4 column: the number of reviews, let's save the row with the highest number of reviews.
Firstly, we'll build a dictionary where:

   * key - name of app
   * value - maximum of number of reviews

In [14]:
reviews_max={}
for row in android:
    name=row[0]
    n_reviews=float(row[3])
    if name in reviews_max:
        reviews_max[name]=max(reviews_max[name],n_reviews)
    else:
        reviews_max[name]=n_reviews
len(reviews_max)

9659

Check for one app if everything is correct

In [15]:
reviews_max['Slack']

51510.0

Now we'll make clean dataset but we have to consider that there can be several lines with the same number of reviews, in this case we need only one, so for ceckout such apps we need to create a massive 'already_added'

In [16]:
android_clean=[]
already_added=[]
for row in android:
    name=row[0]
    n_reviews=float(row[3])
    if reviews_max[name]==n_reviews and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
len(android_clean)

9659

Let's check that now data does not contain duplicates

In [17]:
explore_data(android_clean,0,5,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9659
Number of columns: 13


Our company wants to analyze only the apps that are directed toward an English-speaking audience, so we should clean the data from apps that are not directed toward an English-speaking audience.

We can solve this problem by leaving just those lines that contain English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

Function ord() can help to get the corresponding number for each character

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

Make a function that returns False if any of characters in string contain non-English symbol, else return True

In [18]:
def is_english(string):
    for character in string:
        if ord(character)>127:
            return False
    return True
print (is_english('Instagram'))
print (is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print (is_english('Docs To Go™ Free Office Suite'))
print (is_english('Instachat 😜'))

True
False
False
False


The strange symbols like 😜 and ™ are out of ASCII so to minimize the impact of data loss we will only remove an app if its name has more than three non-ASCII characters:

In [19]:
def is_english(string):
    n=0
    for character in string:
        if ord(character)>127:
            n+=1
    if n>3:
        return False
    else:
        return True
print (is_english('Instagram'))
print (is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print (is_english('Docs To Go™ Free Office Suite'))
print (is_english('Instachat 😜'))

True
False
True
True


Let's apply the function to our dataset

In [20]:
english_android=[]
english_ios=[]
for row in android_clean:
    name=row[0]
    if is_english(name):
        english_android.append(row)
for row in ios:
    name=row[1]
    if is_english(name):
        english_ios.append(row)
print ('The number of Android English applications in relation to all applications:',
       round(len(english_android)/len(android),3),
       '\nNumber of applications:',len(english_android))
print ('The number of IOS English applications in relation to all applications:',
       round(len(english_ios)/len(android),3),
       '\nNumber of applications:',len(english_ios))

The number of Android English applications in relation to all applications: 0.887 
Number of applications: 9614
The number of IOS English applications in relation to all applications: 0.57 
Number of applications: 6183


About half pf apllications in ios dataset are not English (due to their titles)

Now we will work with two new data sets which contain only English apps

In next step we will isolate only free apps for our analysis

In [21]:
import re
android_final=[]
ios_final=[]
for row in english_android:
    price=float(re.sub("[^\d\.]","",row[7]))
    if price==0:
        android_final.append(row)
for row in english_ios:
    price=float(re.sub("[^\d\.]","",row[4]))
    if price==0:
        ios_final.append(row)
print ('Free Android applications among the English applications:',
       round(len(android_final)/len(english_android),3),
       '\nNumber of free applications:',len(android_final))
print ('Free Android applications among the English applications:',
       round(len(ios_final)/len(english_ios),3),
       '\nNumber of free applications:',len(ios_final))

Free Android applications among the English applications: 0.922 
Number of free applications: 8864
Free Android applications among the English applications: 0.521 
Number of free applications: 3222


Now after cleaning the data we are ready to start analyze data
## Data analysis

Our aim as future developers is to determine the kinds of apps that are likely to attract more users

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

  * Build a minimal Android version of the app, and add it to Google Play.
  * If the app has a good response from users, we develop it further.
  * If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

Let's inspect both data sets to identify what columns we will use to generate frequency tables

In [22]:
import pandas as pd
pd.DataFrame(android_final,columns=android_header).head(1).transpose()

Unnamed: 0,0
App,Photo Editor & Candy Camera & Grid & ScrapBook
Category,ART_AND_DESIGN
Rating,4.1
Reviews,159
Size,19M
Installs,"10,000+"
Type,Free
Price,0
Content Rating,Everyone
Genres,Art & Design


Possible analysis:
  1.  To look at apps with highest rating categorized as Genres, Size
  2.  To look at number of installs and reviews with the same categories

But first we should find out what the most popular genres

In [23]:
pd.DataFrame(ios_final,columns=ios_header).head(1).transpose()

Unnamed: 0,0
id,284882215
track_name,Facebook
size_bytes,389879808
currency,USD
price,0.0
rating_count_tot,2974676
rating_count_ver,212
user_rating,3.5
user_rating_ver,3.5
ver,95.0


To see the most common app genres we need to make function that will return frequency of the given variable, the frequency will be returned as percentages

In [24]:
def freq_table(dataset,index):
    freq_dict={}
    for row in dataset:
        key=row[index]
        if key in freq_dict:
            freq_dict[key]+=1
        else:
            freq_dict[key]=1
    for i in freq_dict:
        freq_dict[i]=round(freq_dict[i]*100/len(dataset),3)
    return freq_dict

Check sum of frequencies in table for genres in Android apps:

In [25]:
sum=0
and_genr=freq_table(android_final,9)
for i in and_genr:
    sum+=and_genr[i]
sum

99.99300000000001

Because of function sorted() we need to transform our dictionary into a list of tuples and then apply the function, the sorting will work roght if dictionary value comes first, and the dictionary key comes second

In [26]:
def display_table(dataset, index):
    table=freq_table(dataset,index)
    table_display=[]
    for key in table:
        #when we write expression in (), we make a tuple
        #table[key] will return value
        key_value_as_tuple=(table[key],key)
        #then make a list of tuples
        table_display.append(key_value_as_tuple)
        
    table_sorted=sorted(table_display,reverse=True)
    for entry in table_sorted:
        print(entry[1],':',entry[0])

### Most common genres in iOS apps:

In [27]:
display_table(ios_final,-5)

Games : 58.163
Entertainment : 7.883
Photo & Video : 4.966
Education : 3.662
Social Networking : 3.29
Shopping : 2.607
Utilities : 2.514
Sports : 2.142
Music : 2.048
Health & Fitness : 2.017
Productivity : 1.738
Lifestyle : 1.583
News : 1.335
Travel : 1.241
Finance : 1.117
Weather : 0.869
Food & Drink : 0.807
Reference : 0.559
Business : 0.528
Book : 0.435
Navigation : 0.186
Medical : 0.186
Catalogs : 0.124


It's time to draw some conclusions. As we can see the most common genre among English **free** apps in App Store are Games (more than 50%), second most common genre is Entertainment (about 8%), the third place is Photo&Video (5%) and only then Education and Social Networking.

We can deduce the conclusion that most English free apps are associated with entertaiment (three most common genres - Games, Entertainment, Photo&Video together are about 70%), Social Networking, Shopping, Sports and Music we can also tie with Entertainment and there are just a little bit free English apps that are associated with practical purpose - Education, Utilities, Productivity, Lifestyle.

But we should understand that these statistics tell us about the most common genres, but not the most popular - the demand might not be the same as the offer.

### Most common genres in Android apps:

In [28]:
display_table(android_final,9)

Tools : 8.45
Entertainment : 6.069
Education : 5.347
Business : 4.592
Productivity : 3.892
Lifestyle : 3.892
Finance : 3.7
Medical : 3.531
Sports : 3.463
Personalization : 3.317
Communication : 3.238
Action : 3.102
Health & Fitness : 3.08
Photography : 2.944
News & Magazines : 2.798
Social : 2.662
Travel & Local : 2.324
Shopping : 2.245
Books & Reference : 2.144
Simulation : 2.042
Dating : 1.861
Arcade : 1.85
Video Players & Editors : 1.771
Casual : 1.76
Maps & Navigation : 1.399
Food & Drink : 1.241
Puzzle : 1.128
Racing : 0.993
Role Playing : 0.936
Libraries & Demo : 0.936
Auto & Vehicles : 0.925
Strategy : 0.914
House & Home : 0.824
Weather : 0.801
Events : 0.711
Adventure : 0.677
Comics : 0.609
Beauty : 0.598
Art & Design : 0.598
Parenting : 0.496
Card : 0.451
Casino : 0.429
Trivia : 0.417
Educational;Education : 0.395
Board : 0.384
Educational : 0.372
Education;Education : 0.338
Word : 0.259
Casual;Pretend Play : 0.237
Music : 0.203
Racing;Action & Adventure : 0.169
Puzzle;Brain G

The image with genres looks significantly different in Google Play: first of all, there are more genres and the difference between Category and Genres in Android dataset that Genres is more granular variable so in our future analysis we will use Category, but for now let's have a look at the current variable:

There is no obvious genre that outnumbers all other genres. We can even claim that there is a balance between apps with practical purpose and apps for fun because the TOP-5 apps are associated with Tools, Entertainment, Education, Business and Productivity. 

### Most common Category in Android apps:

In [29]:
display_table(android_final,1)

FAMILY : 18.908
GAME : 9.725
TOOLS : 8.461
BUSINESS : 4.592
LIFESTYLE : 3.903
PRODUCTIVITY : 3.892
FINANCE : 3.7
MEDICAL : 3.531
SPORTS : 3.396
PERSONALIZATION : 3.317
COMMUNICATION : 3.238
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.944
NEWS_AND_MAGAZINES : 2.798
SOCIAL : 2.662
TRAVEL_AND_LOCAL : 2.335
SHOPPING : 2.245
BOOKS_AND_REFERENCE : 2.144
DATING : 1.861
VIDEO_PLAYERS : 1.794
MAPS_AND_NAVIGATION : 1.399
FOOD_AND_DRINK : 1.241
EDUCATION : 1.162
ENTERTAINMENT : 0.959
LIBRARIES_AND_DEMO : 0.936
AUTO_AND_VEHICLES : 0.925
HOUSE_AND_HOME : 0.824
WEATHER : 0.801
EVENTS : 0.711
PARENTING : 0.654
ART_AND_DESIGN : 0.643
COMICS : 0.62
BEAUTY : 0.598


While analyzing the Category of apps on Google Play we can make a conclusion that it is interlinked with genres, but the most common category is Family and the second one is Game, then most of the categories are tied up with practical apps.

**As a result we can come to the conclusion that on Google Play among the free English applications practical apps have a better representation compared to AppStore.**

Now let's analyze kind of apps that users use the most.

In Android datset we can find this information using: `Installs`
In iOS dataset such a variable can be: `rating_count_tot` (total number of user ratings as proxy)

### Most popular genres in AppStore

In [30]:
ios_genre=freq_table(ios_final,-5)
for genre in ios_genre:
    total=0
    length=0
    for row in ios_final:
        installs=float(row[5])
        genre_ios=row[-5]
        if genre_ios==genre:
            total+=installs
            length+=1
    print('The average number for genre',genre,'in App Store is', round(total/length,2))

The average number for genre Shopping in App Store is 26919.69
The average number for genre Catalogs in App Store is 4004.0
The average number for genre Lifestyle in App Store is 16485.76
The average number for genre Education in App Store is 7003.98
The average number for genre Navigation in App Store is 86090.33
The average number for genre Reference in App Store is 74942.11
The average number for genre Travel in App Store is 28243.8
The average number for genre Photo & Video in App Store is 28441.54
The average number for genre Productivity in App Store is 21028.41
The average number for genre Sports in App Store is 23008.9
The average number for genre Games in App Store is 22788.67
The average number for genre Social Networking in App Store is 71548.35
The average number for genre Food & Drink in App Store is 33333.92
The average number for genre Music in App Store is 57326.53
The average number for genre Weather in App Store is 52279.89
The average number for genre Entertainment i

At first glance largest number of installations has `Navigation` genre but this is due to the fact that almost every person use maps so there are so many installs of this genre. No less popular are `Reference` apps. Next most popular apps are `Music` and `Weather`, so we can hypothesize that almost every person has on his/her phone map, app that allows you to listen to the music, app for connecting with friends and some refernce app.

Let's take a closer look at each kind of these genres:

In [31]:
for row in ios_final:
    if row[-5]=='Navigation':
        print(row[1],':',row[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [32]:
for row in ios_final:
    if row[-5]=='Music':
        print(row[1],':',row[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602
Sonos Controller : 48905
Bandsintown Concerts : 30845
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
My Mixtapez Music : 26286
Sing Karaoke Songs Unlimited with StarMaker : 26227
Ringtones for iPhone & Ringtone Maker : 25403
Musi - Unlimited Music For YouTube : 25193
AutoRap by Smule : 18202
Spinrilla - Mixtapes For Free : 15053
Napster - Top Music & Radio : 14268
edjing Mix:DJ turntable to remix and scratch music : 13580
Free Music - MP3 Streamer & Playlist Manager Pro : 13443
Free Piano app by Yokee : 13016
Google Play Music : 10118
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
TIDAL : 7398
YouTube Mu

In [33]:
for row in ios_final:
    if row[-5]=='Weather':
        print(row[1],':',row[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

So apps with largest number of installs are tied up with weather (The Weather Channel), with music (Pandora - Music & Radio, Spotify) and with navigation (Waze, Google Maps) but we can't be sure that they are most popular because average number of installs can be skewed by very few apps - giants in this area which have hundreds of thousands of user installs, while the other apps may struggle to get past the 10,000 threshold.

Next genre in number of installs is `Reference`, let's have a look at these apps to see if it is skewed by one or two apps like previous genres or not.

In [34]:
for row in ios_final:
    if row[-5]=='Reference':
        print(row[1],':',row[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


As we can see most of downloads came from apps **Bible** and **Dictionary.com** so they skewed up the statistics in this genre too. However, we can think about this niche like a prospect for our future app. For example, we can develop app where people can read and see meanings of the word simultaneously. Or we can add a feature of listening the book.

After analyzing the AppStore we came to conclusion that majority of applications are made for fun, but most installs have apps with different genre, maybe we can try to integrate different areas or make brand new feature, for example, in dictionary *search for words using voice recording*.

Let's analyze apps on Google Play:

### Most popular genres in Google Play

To analyze popularity of apps on Google Play we will use number of installs, but it doesn't have precise number, instead of them there are thresholds. However, for briefly analysis it will be enough. Firstly, we need to remove pluses and other symbols.

In [44]:
category_android=freq_table(android_final,1)
category=[]
for cat in category_android:
    total=0
    length=0
    for row in android_final:
        if cat==row[1]:
            inst=row[5].replace(',','')
            inst=float(inst.replace('+',''))
            total+=inst
            length+=1
    category.append((round(total/length,2),cat))
table_sorted=sorted(category,reverse=True)
for entry in table_sorted:
    print(entry[1],':',entry[0])
    #print('The average number of',cat,'in Google Play is', round(total/length,2))

COMMUNICATION : 38456119.17
VIDEO_PLAYERS : 24727872.45
SOCIAL : 23253652.13
PHOTOGRAPHY : 17840110.4
PRODUCTIVITY : 16787331.34
GAME : 15588015.6
TRAVEL_AND_LOCAL : 13984077.71
ENTERTAINMENT : 11640705.88
TOOLS : 10801391.3
NEWS_AND_MAGAZINES : 9549178.47
BOOKS_AND_REFERENCE : 8767811.89
SHOPPING : 7036877.31
PERSONALIZATION : 5201482.61
WEATHER : 5074486.2
HEALTH_AND_FITNESS : 4188821.99
MAPS_AND_NAVIGATION : 4056941.77
FAMILY : 3695641.82
SPORTS : 3638640.14
ART_AND_DESIGN : 1986335.09
FOOD_AND_DRINK : 1924897.74
EDUCATION : 1833495.15
BUSINESS : 1712290.15
LIFESTYLE : 1437816.27
FINANCE : 1387692.48
HOUSE_AND_HOME : 1331540.56
DATING : 854028.83
COMICS : 817657.27
AUTO_AND_VEHICLES : 647317.82
LIBRARIES_AND_DEMO : 638503.73
PARENTING : 542603.62
BEAUTY : 513151.89
EVENTS : 253542.22
MEDICAL : 120550.62


The most downloads have `Communication`, `Video Players`, `Social`, `Photography` and `Productivity`, we' re gonna take a closer look at them

In [71]:
display_table(android_final,5)

1,000,000+ : 15.727
100,000+ : 11.552
10,000,000+ : 10.548
10,000+ : 10.199
1,000+ : 8.394
100+ : 6.916
5,000,000+ : 6.825
500,000+ : 5.562
50,000+ : 4.772
5,000+ : 4.513
10+ : 3.542
500+ : 3.249
50,000,000+ : 2.301
100,000,000+ : 2.132
50+ : 1.918
5+ : 0.79
1+ : 0.508
500,000,000+ : 0.271
1,000,000,000+ : 0.226
0+ : 0.045
0 : 0.011


In [54]:
for row in android_final:
    if row[1]=='COMMUNICATION' and (row[5]=='1,000,000,000+'):
        print(row[0], ':', row[5])

WhatsApp Messenger : 1,000,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
Skype - free IM & video calls : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+


Statistics for `COMMUNICATION` category are skewed up by such giants as WhatsApp, Skype and othere Messengers, there's no point in taking up this niche.

In [58]:
for row in android_final:
    if row[1]=='VIDEO_PLAYERS' and (row[5]=='1,000,000,000+'):
        print(row[0], ':', row[5])

YouTube : 1,000,000,000+
Google Play Movies & TV : 1,000,000,000+


In [59]:
for row in android_final:
    if row[1]=='SOCIAL' and (row[5]=='1,000,000,000+'):
        print(row[0], ':', row[5])

Facebook : 1,000,000,000+
Google+ : 1,000,000,000+
Instagram : 1,000,000,000+


The same landscape is for Video Players and Social apps. It's unlikely that the users of these applications will be enticed.

Let's have a look more attentively at `Productivity` app.

In [66]:
for row in android_final:
    if row[1]=='PRODUCTIVITY' and (row[5]=='1,000,000,000+'
                          or row[5]=='500,000,000+'):
        print(row[0], ':', row[5])

Microsoft Word : 500,000,000+
Dropbox : 500,000,000+
Google Drive : 1,000,000,000+
Google Calendar : 500,000,000+
Cloud Print : 500,000,000+


Installs are skewed up by international leaders too. Microsoft, Google have occupied this niche.

But as we remember Google Play have quite a lot of apps for fun - Game and Entertainment categories, let's explore them:

In [64]:
for row in android_final:
    if row[1]=='GAME' and (row[5]=='1,000,000,000+'
                          or row[5]=='500,000,000+'):
        print(row[0], ':', row[5])

Candy Crush Saga : 500,000,000+
Subway Surfers : 1,000,000,000+
Pou : 500,000,000+
My Talking Tom : 500,000,000+
Temple Run 2 : 500,000,000+


In [69]:
for row in android_final:
    if row[1]=='ENTERTAINMENT' and (row[5]=='100,000,000+'
                          or row[5]=='50,000,000+'):
        print(row[0], ':', row[5])

Hotstar : 100,000,000+
Talking Angela : 100,000,000+
Talking Ginger 2 : 50,000,000+
Amazon Prime Video : 50,000,000+
IMDb Movies & TV : 100,000,000+
Twitch: Livestream Multiplayer Games & Esports : 50,000,000+
PlayStation App : 50,000,000+
Talking Ben the Dog : 100,000,000+
Netflix : 100,000,000+


We can see how one game can be successful and "shoot" among the others. It's `Subway Surfers` with more than 1 billion installs. Another popular game is `Candy Crush Saga` - tetris analogue and `My Talking Tom` - probably, this app people download for their children.

In Entertainment category most apps are for watching films or for voice query like `Talking ...`.

But for AppStore we've discussed that it'll be god to develop something connected with Books. So let's have a look at this Category on Google Play:

In [70]:
for row in android_final:
    if row[1]=='BOOKS_AND_REFERENCE' and (row[5]=='1,000,000,000+'
                          or row[5]=='500,000,000+'
                          or row[5]=='100,000,000+'
                          or row[5]=='50,000,000+'):
        print(row[0], ':', row[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


Google Play Books is the most popular app, but considering that on Google Play very popular are applications that process human voice, we definetly should be thinking about this direction.

# Conclusion

Finally, we can advice some directions for developers of English free apps:

* Niche in Social Networking, Navigation, Communication, Weather is occupied by international leaders, but notwithstanding we can't lurn users, we can think about integration with these apps.
* Books apps where people can have an opportunity to take actions with their voice can be breakthrough, as an example: turning pages with voice, find sentences, highlight different parts of texts.