### Profitable App Profiles for the App Store and Google Play Markets

#### In this project we'll be analyzing data from apps that are profitable for App Store and Google Play. Our main source of revenue consists of in-app ads, we need to know more about these apps.

##### For this project our goal is to understando what kinds of apps are likely to attract more users.

### Exploring the data

It's no easy and not even cheap to collect data. So, for this project we'll analyze a sample of data instead.  

We'll use the following data sets for our project (they are available on kaggle).

- [A data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data about approximately ten thousand Android apps from Google Play — the data was collected in August 2018

- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) containing data about approximately seven thousand iOS apps from the App Store — the data was collected in July 2017

### Let's open and explore the data sets.

In [1]:
from csv import reader

#App Store Data set
opened_file_apple = open('AppleStore.csv')
read_file_apple = reader(opened_file_apple)
list_apple = list(read_file_apple)


#Google Play Data set
opened_file_google = open('googleplaystore.csv')
read_file_google = reader(opened_file_google)
list_google = list(read_file_google)

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

### Lets print the first few rows of each data set

In [3]:
#AppleStoreDataSet
explore_data(list_apple[1:], 1, 30,True)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1']


['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


['553834731', 'Candy Crush Saga', '222846976', 'USD', '0.0', '961794', '2453', '4.5', '4.5', '1.101.0', '4+', 'Games', '43', '5', '24', '1']


['32468458

In [4]:
#GooglePlayDataSet
explore_data(list_google[1:], 1, 30,True)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+', 'Free', '

In [5]:
#Lets see the column names for AppleStore
print((list_apple[0]))

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [6]:
#Lets see the column names for GooglePlay
print((list_google[0]))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Now, we need to take a look at our column and ask ourselves, is there any column that I can use to perform some analyze about the apps? The answer is Yes, we do have.

For the AppleStore data set we can pick track_name, price, rating_count_tot and prime_genre

For the GooglePlay data set we can  pick App, Category, Rating, Price and Genre.

### Deleting wrong data

Before beginning our analysis, we need to check if the data is correct, otherwise the results will be wrong, and, obviously, we do not want that to happen.

We need to do the following things:

-  Detect inaccurate data and correct (or remove) it
-  Detect duplicate data and remove the duplicates

We need to remove data with the following characteristics:

-  Non-English apps
-  Non-free apps


This phase is called data cleaning and it's really necessary because from it we can do apply analysis over the data. With the cleaning data we can have better results.

The Google Play data set has an error on a certain row. [The discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes the error. The error is on entry 10473. We need to remove it.

In [7]:
#Showing the row
print(list_google[10473]) #Python index starts at 0

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [8]:
#Removing the row
del(list_google[10473])

The AppleStore has no errors.

### Removig Duplicate Entries

Exploring the Google Play data set we can see some duplicate entrie. For instance, Instagram has four entries.

In [9]:
for app in list_google:
    name = app[0]
    if name == 'Instagram':
        print(app)    

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In total, there are 1,180 cases where an app occurs more than once.

In [10]:
duplicate_apps = []
unique_apps = []

for app in list_google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    unique_apps.append(name)
    
print(len(duplicate_apps))
print(duplicate_apps[:15])

1181
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


We need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly. But we can do something even better.

Looking at the rows we printed for the Instagram app (the one with four equal entries), the main difference happens on the fourth position of each row, which corresponds to the number of reviews. We can think this way: the higher the number of reviews, the more recent the data should be. We'll only keep the row with the highest number of reviews and remove the other entries for any given app.

So, let's do the following items:
-  Create a dictionary, where each key is a unique app name and each value is the highest number of reviews of that app.

-  With the stored information from the dictionary we'll create a new data set, this new data set will have only one entry per app.

In [11]:
#Yo, dictionary
reviews_max = {}

for row in list_google[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews             

We'll use the reviews_max dictionary to remove all the duplicate entries that the list_google (Google Play) has

We need to create two list, empty lists for start. The first list will be store the new cleaned data set. The second list will just store the app names). 

In [12]:
android_clean = []
already_added = []

for row in list_google[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)

In [13]:
#Lets explore the android_clean list
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### Removing Non-English Apps

The language we use for our apps is English. We'd like to analyze only the apps that are directed toward an English-speaking audience. 

Looking at the data set we can see that we have some data that does not seem to belong to the English language. We need need to exterminate these data (haha, sorry!).

In [14]:
#Some examples
print(list_apple[814][1])
print(list_apple[6732][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


We need to remove them. We need to remove each app whose name contains a symbol that is not commonly used in English text. English text usually includes letters, digits, punctuation marks and other symbols.

One thing we can notice, each character uses a number associated with it. For example, the character 'a' is 97, for 'A' we have 65, for '爱' we have 29,233. We can get the corresponding number using the ord() function.

In [15]:
#Characters numbers
print(ord('a'))
print(ord('A'))
print(ord('爱'))
print(ord('5'))
print(ord('+'))

97
65
29233
53
43


Another important fact, the numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, that is, if the number is equal to or less than 127, then character belongs to the set of common English characters, otherwise it doesn't (looks like we need to build a function for this :) ).

In [16]:
#The exterminator function
def detect_weird_character(character):
    for x in character:
        ord_test = ord(x)
        if (ord_test > 127):
            return False
    return True
        
#Running some tests   
print(detect_weird_character('Instagram'))
print(detect_weird_character('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(detect_weird_character('Docs To Go™ Free Office Suite'))
print(detect_weird_character('Instachat 😜'))

True
False
False
False


We have a problem (Houston?). Notice that app names like "Docs To Go™ Free Office Suite" and "Instachat 😜" were identified as Non-English (the False output). We'll lose useful data because of emojis and weird characters, we don't want that. To minimize this effect we'll only remove an app if its name has more than three characters with corresponding numbers falling outside de ASCII range (0 to 127). Let's roll.

In [17]:
detect_weird_character('Docs To Go™ Free Office Suite')

False

In [18]:
#The exterminator function - improved
def detect_weird_character(character):
    num_weird_ch = 0
    for x in character:
        ord_test = ord(x)
        if (ord_test > 127):
            num_weird_ch += 1
            if (num_weird_ch > 3):
                return False
    return True

#Running some tests   
print(detect_weird_character('Instagram'))
print(detect_weird_character('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(detect_weird_character('Docs To Go™ Free Office Suite'))
print(detect_weird_character('Instachat 😜'))

True
False
True
True


Now we need to apply the function to the both data sets, Google Play ad Apple Store. If an app name is identified as English, then we append the whole row to a separate list and after that we can explore the two new data set.

In [19]:
#Running the function through Google Play dataset

google_play_english = []
apple_store_english = []

for row in android_clean[1:]:
    name = row[0]
    
    if(detect_weird_character(name)):
        google_play_english.append(row)

for row in list_apple[1:]:
    name = row[1] #Be careful with de column index
    
    if(detect_weird_character(name)):
        apple_store_english.append(row)    

In [20]:
#Let's explore
explore_data(google_play_english, 0,3,True)
print('\n')
explore_data(apple_store_english, 0,3,True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9613
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

Now we have 9613 rows for Google Play and 6183 for Apple Store.

### Isolating the Free Apps

The road so far:
- Removed inaccurate data.
- Removed duplicate app entries.
- Removed non-English apps.

We need to do more a few things. We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. We need to isolate only the free aps for our analysis. This is the last step in the data cleaning process.

In [21]:
#Isolating the free apps
google_play_free_apps = []
apple_store_free_apps = []

for row in google_play_english[1:]:
    app_type = row[6]
    
    if app_type == 'Free':
        google_play_free_apps.append(row)

for row in apple_store_english[1:]:
    price = float(row[4])
    
    if price == 0:
        apple_store_free_apps.append(row) 
    


In [22]:
#Let's explore
explore_data(google_play_free_apps, 0,3,True)
print('\n')
explore_data(apple_store_free_apps, 0,3,True)

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8861
Number of columns: 13


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of ro

Now we have 8861 rows for Google Play and 3221 for Apple Store.

## Let's do some analysis

### Most Common Apps by Genre

The road so far (updated):
- Removed inaccurate data.
- Removed duplicate app entries.
- Removed non-English apps.
- Isolated the free apps.

Our aim here is to determine the kinds of apps that are likely to attract more users. Our revenue is influenced by the number of people using our apps.

We need minimize risks, to do that our validation strategy for an app idea is comprised of three steps:

- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we then develop it further.
- If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Let's begin the analysis by getting a sense of what are the most common genres for each market. Shall we? Let's create some frequency tables and look at them (really?).

In [23]:
#Function to create our frequency tables
def freq_table(dataset, index):
    
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

In [24]:
#Function to display our frequency tables
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [25]:
#Let's see the frequency table of the column prime_genre
#from AppleStore
display_table(apple_store_free_apps, -5)

Games : 58.180689226948154
Entertainment : 7.885749767153058
Photo & Video : 4.967401428127911
Education : 3.6634585532443342
Social Networking : 3.2598571872089415
Shopping : 2.607885749767153
Utilities : 2.5147469729897547
Sports : 2.1421918658801617
Music : 2.049053089102763
Health & Fitness : 2.018006830176964
Productivity : 1.7385904998447685
Lifestyle : 1.5833592052157717
News : 1.334989133809376
Travel : 1.2418503570319777
Finance : 1.11766532132878
Weather : 0.8692952499223843
Food & Drink : 0.8072027320707855
Reference : 0.55883266066439
Business : 0.5277864017385905
Book : 0.43464762496119214
Navigation : 0.18627755355479667
Medical : 0.18627755355479667
Catalogs : 0.12418503570319776


We can see that the App Games apps are the majority, it represents more than half with 58%. Entertainment is the next one with 7.88%, far way from the first one. We can use this information to decide what kind app we want to create, of course we are only talking about English apps.

In [26]:
#Genres from Google Play
display_table(google_play_free_apps, -4)

Tools : 8.452770567656021
Entertainment : 6.071549486513937
Education : 5.349283376594064
Business : 4.593161042771697
Productivity : 3.8934657487868187
Lifestyle : 3.8934657487868187
Finance : 3.7016138133393524
Medical : 3.532332693826882
Sports : 3.464620246021894
Personalization : 3.317909942444419
Communication : 3.2389120866719328
Action : 3.103487191061957
Health & Fitness : 3.080916375126961
Photography : 2.9454914795169844
News & Magazines : 2.7987811759395105
Social : 2.663356280329534
Travel & Local : 2.3247940413045933
Shopping : 2.245796185532107
Books & Reference : 2.144227513824625
Simulation : 2.0426588421171425
Dating : 1.8620923146371742
Arcade : 1.8508069066696762
Video Players & Editors : 1.7718090508971898
Casual : 1.760523642929692
Maps & Navigation : 1.3993905879697552
Food & Drink : 1.2413948764247829
Puzzle : 1.1285407967498025
Racing : 0.9931159011398263
Role Playing : 0.9366888613023362
Libraries & Demo : 0.9366888613023362
Auto & Vehicles : 0.925403453334838

This colums in kind of confuse. It has a mix of categories and sub-categories. Let's see the Category column.

In [27]:
#Category from Google Play
display_table(google_play_free_apps, 1)

FAMILY : 18.90305834555919
GAME : 9.728021667983297
TOOLS : 8.46405597562352
BUSINESS : 4.593161042771697
LIFESTYLE : 3.9047511567543167
PRODUCTIVITY : 3.8934657487868187
FINANCE : 3.7016138133393524
MEDICAL : 3.532332693826882
SPORTS : 3.3969077982169056
PERSONALIZATION : 3.317909942444419
COMMUNICATION : 3.2389120866719328
HEALTH_AND_FITNESS : 3.080916375126961
PHOTOGRAPHY : 2.9454914795169844
NEWS_AND_MAGAZINES : 2.7987811759395105
SOCIAL : 2.663356280329534
TRAVEL_AND_LOCAL : 2.3360794492720913
SHOPPING : 2.245796185532107
BOOKS_AND_REFERENCE : 2.144227513824625
DATING : 1.8620923146371742
VIDEO_PLAYERS : 1.794379866832186
MAPS_AND_NAVIGATION : 1.3993905879697552
FOOD_AND_DRINK : 1.2413948764247829
EDUCATION : 1.1623970206522967
ENTERTAINMENT : 0.9592596772373322
LIBRARIES_AND_DEMO : 0.9366888613023362
AUTO_AND_VEHICLES : 0.9254034533348381
HOUSE_AND_HOME : 0.8238347816273558
WEATHER : 0.8012639656923597
EVENTS : 0.7109807019523756
PARENTING : 0.6545536621148855
COMICS : 0.62069743

For Google Play we can see that App Tools are the most popular on the platform, for English Language. Game is the second one. We have a good public so far for the our app game.

Now let's take a look at what kind of apps have the most users. We can do that  calculating the average number of installs for each app genre. For the Google Play data set, we can find this information in the Install column. For the Apple Store data set we can use the rating_count_top column.

In [28]:
#Average number of installs Apple Store
apple_store_genre = freq_table(apple_store_free_apps, -5)

for genre in apple_store_genre:
    total = 0
    len_genre = 0
    for row in apple_store_free_apps:
        genre_app = row[-5]
        if genre_app == genre:            
            n_ratings = float(row[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

News : 21248.023255813954
Entertainment : 14029.830708661417
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Food & Drink : 33333.92307692308
Travel : 28243.8
Catalogs : 4004.0
Games : 22788.6696905016
Productivity : 21028.410714285714
Music : 57326.530303030304
Social Networking : 43899.514285714286
Business : 7491.117647058823
Education : 7003.983050847458
Book : 39758.5
Utilities : 18684.456790123455
Finance : 31467.944444444445
Weather : 52279.892857142855
Medical : 612.0
Photo & Video : 28441.54375
Lifestyle : 16485.764705882353
Shopping : 26919.690476190477
Reference : 74942.11111111111
Navigation : 86090.33333333333


We can see that Navigation Apps have the highest numbers of reviews.

In [29]:
#Average number of installs Google Play
google_play_categories = freq_table(google_play_free_apps, 1)

for category in google_play_categories:
    total = 0
    len_category = 0
    for row in google_play_free_apps:
        category_app = row[1]
        if category_app == category:            
            n_installs = row[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

NEWS_AND_MAGAZINES : 9549178.467741935
PARENTING : 542603.6206896552
COMMUNICATION : 38456119.167247385
LIFESTYLE : 1437816.2687861272
BOOKS_AND_REFERENCE : 8767811.894736841
HOUSE_AND_HOME : 1331540.5616438356
BEAUTY : 513151.88679245283
PERSONALIZATION : 5201482.6122448975
FINANCE : 1387692.475609756
EVENTS : 253542.22222222222
SOCIAL : 23253652.127118643
TRAVEL_AND_LOCAL : 13984077.710144928
COMICS : 817657.2727272727
MAPS_AND_NAVIGATION : 4056941.7741935486
TOOLS : 10801391.298666667
LIBRARIES_AND_DEMO : 638503.734939759
EDUCATION : 1833495.145631068
ART_AND_DESIGN : 1967474.5454545454
AUTO_AND_VEHICLES : 647317.8170731707
GAME : 15588015.603248259
VIDEO_PLAYERS : 24727872.452830188
BUSINESS : 1712290.1474201474
SPORTS : 3638640.1428571427
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
DATING : 854028.8303030303
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
FOOD_AND_DRINK : 1924897.7363636363
SHOPPING : 7036877.311557789
PRODUCTIVITY : 16787331.34492

We can see that Communication Apps have the most installs, this can be explained by apps like WhatsApp and Telegram, which have the most installs.

### Conclusions

In this quickly analysis we noticed that Game Apps are the most interesting for iOS systems.

For Android system we have some categories that are the most used, like communication and book apps. With a deep analysis we can identify better what kind of apps we can really use as base to build our own.