# Google Play and Apple Store markets apps analysis

The aim of this project is to put into work my new knowledge of Python coding and at the same time create a valuable data analytics project that could provide insights to applications developers about what kind of free english apps are more likely to attract users. 

There is a vast amount of applications in Google Play and Apple Store markets.  As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play. Obviously, all of them can not be equally popular and profitable. The goal of this project is to highlight trends among most succesfull applications.

## Obtaining data

To reach the goal above free open source data was obtained from two different sources to provide information about apps on two major application markets: Google Play(2018) and Apple Store(2017).<br> - Google Play apps sample of around 10 000 apps information was obtained [here](https://www.kaggle.com/lava18/google-play-store-apps).<br> - Apple Store apps sample of over 7000 apps data was downloaded [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

## Opening and Exploring the Data

In order to start working with the new data we, first, have to import a csv reader and assign the dataset to a variable. 

In [1]:
# importing reader to open the datasets
from csv import reader

In [2]:
# reading datasets
# reading Google Play data, calling it list and asigning it to a variable 'android'. The dataset has a header so we separate it from the rest of the data and asign it to varibale 'android_header'.
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# repeating the same actions with Apple Store data, just asigning it to a variable called 'ios' and it´s header to variable 'ios_header'. 
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

We are creating a function 'explore_data' to have a quick overview of datasets, like few rows and also a number of columns and number of rows. 

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if rows_and_columns:
        print('Total number of rows:', len(dataset))
        print('Total number of columns:', len(dataset[0]))

Now lets explore the data and compare the datasets itself.

In [4]:
print('Android apps dataset Header')
print(android_header)
print('\n')
print('Android apps dataset first data row')
explore_data(android, 0, 1, True)
print('\n')
print('\n')
print('IOS apps dataset Header')
print(ios_header)
print('\n')
print('IOS apps dataset first data row')
explore_data(ios, 0, 1, True)

Android apps dataset Header
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Android apps dataset first data row
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Total number of rows: 10841
Total number of columns: 13




IOS apps dataset Header
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


IOS apps dataset first data row
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Total number of rows: 7197
Total number of columns: 16


As we can see from the data extracted above, despite some of the data are similar, the datasets are not organized the same way. We have 3 more data columns in IOS dataset, nevertheless the columns does not match if we compare headers. Thus, we have to be careful to run the comparisons. Even if we create finctions, loops that can be suitable for both datasets, we have to make sure we use the right indexes,for example, app name is index 0 for android dataset, while it´s index 1 for IOS. Nevertheless, in order to make this data more comparable and also to make sure all the data within data sets are relevant (there are no duplicates, no wrong data) we will perform data set cleaning, see more in the next chapter. 

# Cleaning and modifying datasets

## Deleting false rows
On the website where the data was obtained, we found a [comment pointing out](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), that within android data line with index 10472 was false. In order to investigate this fact we will print the headline of android data, the first line for comparison, and the line number 10472 to see if the latter line is false.

In [5]:
print(android_header)
print('\n')
print(android[0])
print('\n')
print(android[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


We see that indeed, data on the list android[10472] is not maching the header, as  'Catergory' supposed to be a string like on the first line 'ART_AND_DESIGN', but instead it´s a float '1.9', the rest of the values looks missplaced accordingly. As we have enough of data, we are not gonna figure out how this particular faulty list should actually look like and we just going to go forward with deleting it by using deletetion function (see below). To make sure the row is successfully deleted we are going to run the length count of android list of list before the deletion and after.  

In [6]:
print(len(android))
del(android[10472])
print(len(android))

10841
10840


There were no comments about false/ missing data on IOS dataset, thus, actions above are not necessary to repeat with IOS data. We move on with clearing the duplicates. 

## Looking for duplicates
Common problem in large datasets is duplicates. That means that there are more or less identical rows, throughut the dataset. To make sure there are no duplicates we have to create a function that collects duplicates, so we could delete it and have dataset combined only from unique rows. 

In [7]:
duplicate_data_android = [] # creating an empty list to gather the names of the apps that reoccures in android data set
unique_data_android = [] # creating an empty list to gather all the different app names taht we can find in android data set

for row in android:
    app_name = row[0] # we know that app names are stored in the very first column of android list of lists
    if app_name in unique_data_android:
        duplicate_data_android.append(app_name) # if the name already occured in the unique app names list we count this value into the list of duplicate data
    else:
        unique_data_android.append(app_name) # and if the name was not yet meet in the data set we will add it to the list of unique app names

Now let see if there are any duplicates by counting the length of duplicate_data_android.

In [8]:
print(len(duplicate_data_android))

1181


Wow, what a surprise! we found 1181 duplicates within a datset. That´s quite a few. Now we should investigate and try to understand why there are duplicates if possible, before we get rid of them. We don´t know yet whether the duplicates are identical or identical is just the app name but the values that are following are actually different. First, lets find a first duplicattion example and compare all the data that has the same app name. 

In [9]:
print(duplicate_data_android[0]) #printing the first line from the newly created list of duplicates

Quick PDF Scanner + OCR FREE


Now we gonna loop through all the data of android datset and print out all the rows with app name 'Quick PDF Scanner + OCR FREE' to compare it in between. 

In [10]:
print(android_header) # we will also print the header so we can understand what each value stands for
print('\n')
for row in android:
    app_name = row[0]
    if app_name == 'Quick PDF Scanner + OCR FREE':
        print(row)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


As we can see from the comparison above, the only difference we can find is on the fourth column called 'Size', it provides the number of total reviews submited for the app. It is logical to assume, that all the rows are correct, however, obtained information on different timings, thus the  row with the highest number of reviews (4th column) should be most reliable with latest data about the app. Now printing all the duplicates and manualy picking the latest data about the app is not a great idea as we saw we have over 1000 duplicates to review. We have to come up with a smarter and faster way to clean our data, but before that let´s see if there are any duplicates in IOS data before we move on with cleaning Android dataset duplicates.  

In [11]:
duplicate_data_ios = []
unique_data_ios = []

for row in ios:
    app_name = row[1]
    if app_name not in unique_data_ios: # previously we used in command, this is an alternative, just filling up the lists from the other end
        unique_data_ios.append(app_name)
    else:
        duplicate_data_ios.append(app_name)

print('sum of two lists above:', len(duplicate_data_ios) + len(unique_data_ios))
print('actual ios dataset sum of rows:', len(ios))

sum of two lists above: 7197
actual ios dataset sum of rows: 7197


As we see above the lengths are the same, so that means in further sections about cleaning data from duplicates we will work only with Android dataset.

## Cleaning duplicates

First we are going to go forward with creating a dictionary for unique data with highest number of reviews (in case of duplicates). That is necessary not only to see if our assumption is correct, but also it will be used in cleaning the data from duplictaes later on. 

In [12]:
reviews_max = {}
for row in android:
    app_name = row[0]
    total_reviews = row[3]
    if app_name in reviews_max and reviews_max[app_name] < total_reviews:
        reviews_max[app_name] = total_reviews
    elif app_name not in reviews_max:
        reviews_max[app_name] = total_reviews

If we are right the length of this dictionary above should match the difference of android data set length and number of duplicates. Lets see:

In [13]:
print('Amount of unique data:', len(android) - len(duplicate_data_android))
print('Lenght of apps with highest reviews numbers:', len(reviews_max))

Amount of unique data: 9659
Lenght of apps with highest reviews numbers: 9659


Great, we confirmed we are on the right track, but that´s not yet it. We have to create two new lists to divide the data to either android_cleaned (no duplicates), or already_added that will store only app names so we could keep recognising the duplicates.

In [14]:
android_clean = [] # our actual cleaned dataset for analysis cleared of duplicates
already_added = [] # just let us keep track of unique data we have already added
for row in android:
    app_name = row[0]
    total_reviews = row[3]
    if total_reviews == reviews_max[app_name] and app_name not in already_added:
        android_clean.append(row)
        already_added.append(app_name)

To confirm that our code worked and we have created succesfully a new list called android_clean with all unique data, we will run once again the length command.

In [15]:
explore_data(android_clean, 0, 1, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Total number of rows: 9659
Total number of columns: 13


The numbers match. That means we are done with clearing the duplicates, we have to keep in mind that for further work we have to use list of lists calles 'android_clean', not just 'android'. An alternative is just to reasign 'android_clean' to 'android', so we do not forget which dataset to use, but I will keep it as it is, so it is easier to follow what has been done to the dataset. </p> Our data is not yet ready for analysis, there´s one more step we want to do. 


## Cleaning from unecessary data

Sometimes we won´t consider all the correct data in the data set as useful. For instance, our datasets combine thousands of apps from all around the world in several languages (see example below), however, we want to analyse only those available in English. We are going to attempt to clean the dataset from apps in languages that we don´t understand and focus on apps that are available for broader English speaking world population.

In [16]:
print(ios[813][1])
print(android_clean[4412][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
Wowkwis aq Ka'qaquj


Unfortunately, there´s no inbuilt command to recognize the language of the app from the dataset, nevertheless, our datasets do not have a column with such information. No worries, we can find our way around it. By following [ASCII](https://en.wikipedia.org/wiki/ASCII) coding standard, we know that each letter, symbol has a corresponding number, and English alphabet letters corresponds to numbers from 0 to 127 (uppercase counts like a separate symbol). It is very important not to forget to get well familiar with your dataset because only assuming things will make you write incorrect code or simply lose data. in this occasion, I have reviewed and noticed that there are english apps that uses some symbols in their title (emojies for example) that won´t be recognised as english alphabet of course. To avoid losing date we are going to use a special code that demands at least three characters to be of an English alphabet in order to pass through. For this code we will create a function for name characters also afterwards use loops and if commands to return an improved dataset:

In [17]:
def english_app(string): #crating a function that return false(for english app) in case three or more characters in the string is not from traditional english alphabet
    non_english_alphabet_character = 0 
    
    for character in string:
        if ord(character) > 127:
            non_english_alphabet_character += 1
    
    if non_english_alphabet_character > 3:
        return False
    else:
        return True

In [18]:
android_english = [] # creating new list for newest android dataset

for row in android_clean:
    app_name = row[0]
    if english_app(app_name):
        android_english.append(row)

ios_english = [] # creating new list for newest Apple store dataset

for row in ios:
    app_name = row[1] # remember the app name is placed in a different column on this data set
    if english_app(app_name):
        ios_english.append(row)

Now see few lines of each dataset to see if the code worked well.

In [19]:
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Total number of rows: 9614
Total number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24

Explore date output looks fine, no red flags so far, we don´t see mixed, false data, duplicates or non English apps so far also, the total number of apps in each data set have shrunked what means the code has succesfully eliminated the some suspicious data. 

## Focusing on free apps

We will once again create a new list of lists that will meet our needs better. This time we want to get only data of free apps for the further analysis. As both datasets have a price of apps included in it this part is relatively easy. In the end we will once again see the length of the dataset to see if the amount of rows has changed. 

In [20]:
android_english_free = []

for row in android_english:
    price = row[7]
    if price == '0':
        android_english_free.append(row)
        
print('Total of free apps in english on Google play store:')
print(len(android_english_free))
print('\n')

ios_english_free = []

for row in ios_english:
    price = row[4]
    if price == '0.0':
        ios_english_free.append(row)
        
print('Total of free apps in english on Apple play store:')
print(len(ios_english_free))

Total of free apps in english on Google play store:
8862


Total of free apps in english on Apple play store:
3222


## Apps popularity

Now as our app data is pretty clean  and ready for analysis we can start performing some funtions to draw valued conclusions. My very first question is, what genre of free apps ar most popular in both IOS and Android  stores? To find out that we will have to write a code to create a dictionary with genre types and count their frequency, and create another function that could make the first dictionary more readable and display data in descending order. 

In [21]:
def freq_table(dataset, variable_index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1 # we will need this total to count the percentage, that is basically the total number of rows in the dataset
        genre = row[variable_index]
        
        if genre in table:
            table[genre] += 1
        else:
            table[genre] = 1
           
    freq_table_percent = {}
    for key in table:
        freq_table_percent[key] = round((table[key]/total)*100)

    return freq_table_percent #at first we creted  frequence dictionary for different genres and then we created a second part in the function to calculate the percentage showing what percent of the total apps number is certain genre.


That´s not it yet. having a clear table might not be the easiest way to compare apps genre in case of wider range of genre variety. Thus, we should organize the data in descending order. Unfortunately, we can not just sort the dictionary, actually we can but it won´t display everything we want, only the keys in the right order, so if we want to see the percentages next to the sorted keys we have to create another function to convert dictionary to tuple and then sort it. See below:

In [22]:
def display_table(dataset, variable_index):
    table_display = []
    table = freq_table(dataset, variable_index)
    for key in table:
        key_value_as_tuple = (table[key], key)
        table_display.append(key_value_as_tuple)
        
    table_sorted_descending = sorted(table_display, reverse = True)
# the rule to create tuples is first to write the value then the key, we are not used to see data this way, thus we gonna add up a little change below to print data in traditional manner:
        
    for entry in table_sorted_descending:
        print(entry[1], ':', entry[0], '%')

Let´s see how did it go:

In [23]:
print('Android apps frequence table for genre:')
display_table(android_english_free, 1)

Android apps frequence table for genre:
FAMILY : 19 %
GAME : 10 %
TOOLS : 8 %
BUSINESS : 5 %
PRODUCTIVITY : 4 %
MEDICAL : 4 %
LIFESTYLE : 4 %
FINANCE : 4 %
SPORTS : 3 %
SOCIAL : 3 %
PHOTOGRAPHY : 3 %
PERSONALIZATION : 3 %
NEWS_AND_MAGAZINES : 3 %
HEALTH_AND_FITNESS : 3 %
COMMUNICATION : 3 %
VIDEO_PLAYERS : 2 %
TRAVEL_AND_LOCAL : 2 %
SHOPPING : 2 %
DATING : 2 %
BOOKS_AND_REFERENCE : 2 %
WEATHER : 1 %
PARENTING : 1 %
MAPS_AND_NAVIGATION : 1 %
LIBRARIES_AND_DEMO : 1 %
HOUSE_AND_HOME : 1 %
FOOD_AND_DRINK : 1 %
EVENTS : 1 %
ENTERTAINMENT : 1 %
EDUCATION : 1 %
COMICS : 1 %
BEAUTY : 1 %
AUTO_AND_VEHICLES : 1 %
ART_AND_DESIGN : 1 %


Since I am too lazy to count the number of different categories I created a simple function n_categories to do that for me. See it below. 

In [24]:
def n_categories(dataset, index):
    unique_categories = []
    total = 0
    
    for row in dataset:
        category = row[index]
        
        if category not in unique_categories:
            unique_categories.append(category)
            total += 1
            
    return print('Total number of unique categories:', total)        

In [25]:
n_categories(android_english_free, 1)

Total number of unique categories: 33


In [26]:
print('IOS apps frequency data for genre:')
display_table(ios_english_free, 11)

IOS apps frequency data for genre:
Games : 58 %
Entertainment : 8 %
Photo & Video : 5 %
Education : 4 %
Utilities : 3 %
Social Networking : 3 %
Shopping : 3 %
Sports : 2 %
Productivity : 2 %
Music : 2 %
Lifestyle : 2 %
Health & Fitness : 2 %
Weather : 1 %
Travel : 1 %
Reference : 1 %
News : 1 %
Food & Drink : 1 %
Finance : 1 %
Business : 1 %
Navigation : 0 %
Medical : 0 %
Catalogs : 0 %
Book : 0 %


In [27]:
n_categories(ios_english_free,11)

Total number of unique categories: 23


Well we can see this is a pretty interesting results, as numbers differ quite a bit between the two appstores. Of course we have to keep in mind that the cathegories are not alike either, android dataset has 10 genre categories more than IOS dataset. Anyhow, we can see that the most of the IOS apps are games apps (58%) or suitable for other kind of entertainment (+8%), while the significant part of android apps are aiming to please families´ needs (19%), also there´s a high level of games apps (10%) and practical apps that are considered as tools (8%).

In case of looking for  a type of free english app to create, it looks like entertaining game, suitable for families could be a good way to go to succeed in both apps stores. However, let´s not forget that the frequency we have recently obtained does not say that these are the most popular apps, it only point out that there are significant amounts of these type of apps in apps stores, what could also mean there´s the largest competition. To get a better view we have to get more statistics about other variables.  

## Counting total instals for different app genres

It looks smart to get an average number of instals for different genre for both appstores to see if our hypotheses above is on track. As for this computing we will manipulate the data inside the list, we have to double ckeck if the data contained is in a right format and can be used for average count. From the function explore data, that we already used couple of times above we can see that install vallues has some characters like commas and + signs, what will make (ex. '10,000+'), that won´t work in counting the average, thus, we will have to remove it. That´s alright, there is an easy way to do so, we will just add couple of lines to our code, I will highlight it with a comment.  

In [28]:
categories_android = freq_table(android_english_free,1)
installs_total = 0 
len_category = 0
for category in categories_android:
    for app in android_english_free:
        category_app = app[1]
        if category == category_app:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '') # removing + character from the value in order to be able to use value fr calculations
            n_installs = n_installs. replace(',', '') # removing comma from the value to be able to make average calculations
            installs_total += float(n_installs) # making sure the type of value is suitable for calculations
            len_category += 1
    avg_installs = round(installs_total/len_category, 0)
    print(category, 'average installs per app of this category: ', avg_installs)

ART_AND_DESIGN average installs per app of this category:  1986335.0
AUTO_AND_VEHICLES average installs per app of this category:  1196411.0
BEAUTY average installs per app of this category:  1007803.0
BOOKS_AND_REFERENCE average installs per app of this category:  4867493.0
BUSINESS average installs per app of this category:  3239904.0
COMICS average installs per app of this category:  3082057.0
COMMUNICATION average installs per app of this category:  12058499.0
DATING average installs per app of this category:  10632004.0
EDUCATION average installs per app of this category:  9977448.0
ENTERTAINMENT average installs per app of this category:  10072651.0
EVENTS average installs per app of this category:  9673036.0
FINANCE average installs per app of this category:  8224426.0
FOOD_AND_DRINK average installs per app of this category:  7875509.0
HEALTH_AND_FITNESS average installs per app of this category:  7429973.0
HOUSE_AND_HOME average installs per app of this category:  7239071.0
LI

Unfortunately,this data is not too trustworthy as values were not precise, let´s see again the possible values below.  all the install numbers are rounded and 16% of all the numbers ar just pointed as more than a million what can vary up to 4, 999, 999 installs per app. But is the best data we have so let´s try to use it the best we can.

In [29]:
display_table(android_english_free, 5)

1,000,000+ : 16 %
100,000+ : 12 %
10,000,000+ : 11 %
10,000+ : 10 %
1,000+ : 8 %
5,000,000+ : 7 %
100+ : 7 %
500,000+ : 6 %
50,000+ : 5 %
5,000+ : 5 %
10+ : 4 %
500+ : 3 %
50,000,000+ : 2 %
50+ : 2 %
100,000,000+ : 2 %
5+ : 1 %
1+ : 1 %
500,000,000+ : 0 %
1,000,000,000+ : 0 %
0+ : 0 %
0 : 0 %


Our top categories Family and Games have high numbes of installs (both have over 5mln installs), however I know that every tenth android app of some sort has more than 10mln instals. Well I know it´s only an avarege, but I want to know what is most comnon genre of the apps that has been installed for over 10mln times. I´ll do this before I analyse IOS data to see which approach I like better.

## Most popular genre among most downloaded apps

Now we gonna create a new list with apps that has 10 mln instals or more and see what is a most popular genre among them using the function we have created earlier (display table).

In [30]:
android_most_instals = []

for row in android_english_free:
    n_instals = row[5]
    if n_instals == '10,000,000+' or n_instals == '50,000,000+' or n_instals == '100,000,000+' or n_installs == '500,000,000+' or n_instals == '1,000,000,000+':
        android_most_instals.append(row)

print(len(android_most_instals))

1343


In [31]:
display_table(android_most_instals, 1)

GAME : 21 %
FAMILY : 14 %
TOOLS : 10 %
PHOTOGRAPHY : 7 %
PRODUCTIVITY : 5 %
COMMUNICATION : 5 %
SHOPPING : 4 %
VIDEO_PLAYERS : 3 %
SPORTS : 3 %
SOCIAL : 3 %
PERSONALIZATION : 3 %
TRAVEL_AND_LOCAL : 2 %
NEWS_AND_MAGAZINES : 2 %
LIFESTYLE : 2 %
HEALTH_AND_FITNESS : 2 %
ENTERTAINMENT : 2 %
BOOKS_AND_REFERENCE : 2 %
WEATHER : 1 %
MAPS_AND_NAVIGATION : 1 %
FOOD_AND_DRINK : 1 %
FINANCE : 1 %
EDUCATION : 1 %
DATING : 1 %
BUSINESS : 1 %
PARENTING : 0 %
LIBRARIES_AND_DEMO : 0 %
HOUSE_AND_HOME : 0 %
COMICS : 0 %
BEAUTY : 0 %
AUTO_AND_VEHICLES : 0 %
ART_AND_DESIGN : 0 %


We can see that we proved that the most popular apps on Android  is mainly games or tools or oriented to support famillies (what usually is still game). I am not gonna repeat these steps with IOS data, just because it didn´t reveal any new hypothesis, just supported the one made earlier. Nevertheless, IOS data had event a more significant freqency distribution, what should also remain among the most instaled apps.

# Conclusions

Despite we can see significant differences between Google play and Apple Store in scope and some numbers of the most popular genre of apps. There is one tendency that could improve the chances of newly created English app to be downloaded - it has to be entertaining and fun or at least practical. In case that´s a game , the one that can include the whole family or simply multiple players that can be using their own devices is a good way to go. Just think of it, mobile phones are so often used while trying to make time go faster, while commuting, traveling, waiting  for something or someone or simply when having nothing else to do. So the main goal is to be entertained, if the app suggest a possibility to be entertained and do this in connection with people you care that can be even more entertaining and plus, encaurages people to 'recruite' new users on their own. If that sounds like too complicated or the market too competing because of number of apps in it, there is an option B what is  practicalityies. People love when phones solves their everyday problems, we don´t like carrying  everythign with ourselves, and if possible we want our phones to save us from struggles or unexpected situations, such apps for example are those that can measure the room by only using the app and your camera, a digital compas, weather forecast apps or period calendar.   