# Study on in-app ads 

For this project, I'll pretend I'm working as data analysts for a company that builds Android and iOS mobile apps. This company make their apps available on Google Play and the App Store.

They only build apps that are free to download and install, and their main source of revenue consists of in-app ads. This means our (the company I work for) revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

To do this, we'll need to collect and analyze data about mobile apps available on Google Play and the App Store.

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play, based on Statista's data.

I will be using two datasets from kaggle: 

* A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018 → https://www.kaggle.com/lava18/google-play-store-apps

* A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017 → https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

It is important to clarify a few things in the header's row for the latest of those two. Unlike Google play's header row, AppleStore header row is not as clear, as you can see here:

['id', 'track_name', 'size_bytes', 'currency', 'price', **'rating_count_tot', 'rating_count_ver'**, 'user_rating', **'user_rating_ver', 'ver', 'cont_rating'**, 'prime_genre', **'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'**]

That list represents all the columns this dataset has, so I will explain the ones that I think are more difficult to grasp:
* rating_count_tot: User Rating counts (for all version)
* rating_count_ver: User Rating counts (for current version)
* user_rating_ver: Average User Rating value (for current version)
* ver: Latest version code
* cont_rating: Content Rating
* sup_devices: Number of supporting devices
* ipadSc_urls.num: Number of screenshots showed for display
* lang.num: Number of supported languages
* vpp_lic: Vpp Device Based Licensing Enabled

In [1]:
def open_file(base_csv):
    from csv import reader
    
    opened_file = open(base_csv, encoding="utf8")
    read_file = reader(opened_file)
    dataset = list(read_file)
    
    return dataset    

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
google_play_dataset = open_file('googleplaystore.csv')
applestore_dataset = open_file('AppleStore.csv')

explore_data(google_play_dataset, 0, 4, rows_and_columns = True) #Just to make sure that it's working properly
print('\n')
print('----------------')
print('\n')
explore_data(applestore_dataset, 0, 4, rows_and_columns = True) #Just to make sure that it's working properly

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


----------------


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vp

Before analysing these datasets, I need to clean the datasets, as the company I work for only build apps that are free to download and install, and that are directed toward an English-speaking audience. This means that I'll need to: 

* Detect inaccurate data, and correct or remove it.
* Detect duplicate data, and remove the duplicates.
* Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播.
* Remove apps that aren't free.

***

## Data cleaning

* **Inaccurate data**:

It has been discused in the dataset associated with Google Play had an error, it seems like one of the rows had lesser columns than the others (it was the row 10473 excluding the header, with the header it would be 10474), so the first step was finding which one was and deleting it:


In [4]:
row_number = 0
temporary_list = google_play_dataset[1:]
print(len(temporary_list))

for each_row in temporary_list:
    column = len(each_row)
    if column != 13:
        del temporary_list[row_number]
    row_number += 1
        
print(len(temporary_list))

10841
10840


* **Duplicated data**

Running a simple script to check if each row is "unique" allowed me to find 1181 cases where the app appeared more than once as shown here:

In [5]:
unique_apps = []
duplicated_apps = []

for each_app in temporary_list:
    if each_app[0] in unique_apps:
        duplicated_apps.append(each_app[0])
    else:
        unique_apps.append(each_app[0])

print(len(duplicated_apps))      
    

1181


So the following step was deleting those rows, which used a really similar approach, but in this case I also found that there were differences between the duplicates, and those were the ratings. That lead me to think that I should not delete randomly the data, but follow a criterion: choose the one with higher number of ratings (which would be the newest one).

To do that, I will:

* Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app
* Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

In a previous code cell, I found that there are 1181 cases where an app occurs more than once, so the length of the dictionary (of unique apps) should be equal to the difference between the length of our data set (10840 after removing inaccurate data) and 1181.
    

In [6]:
duplicated_app_ratings = {}

for each_app in temporary_list:
    name = each_app[0]
    n_ratings = int(each_app[3])
    if name not in duplicated_app_ratings:
        duplicated_app_ratings[name] = n_ratings
    elif name in unique_apps and duplicated_app_ratings[name] < n_ratings:
        duplicated_app_ratings[name] = n_ratings


print('Expected length:', len(temporary_list) - 1181)
print('Actual length:', len(duplicated_app_ratings))

Expected length: 9659
Actual length: 9659


Once I have the dictionary, we can begin removing the items from the temporary_list, so duplicate data gets deleted.

In [7]:
clean_dataset = []
already_added = []

for each_app in temporary_list:
    name = each_app[0]
    n_rating = int(each_app[3])
    if (duplicated_app_ratings[name] == n_rating) and (name not in already_added) :
        clean_dataset.append(each_app)
        already_added.append(name)    

print(len(clean_dataset))

9659


* **Removing non-English apps:**

If an app name contains a character that is greater than 127, then it probably means that the app has a non-English name, therefore it should be deleted.

In [16]:
def is_english_or_not(name):
    
    english_app = True
    count = 0
    for each_char in name:
        if ord(each_char) > 127:
            count += 1
            if count >= 3:
                english_app = False
    return english_app

english_apps_cleaned = []
non_english_apps = []

ios_english_apps = []
ios_nonen_apps = []

for each_app in clean_dataset:
    name = each_app[0]
    is_english = is_english_or_not(name)
    if is_english == False:
        non_english_apps.append(each_app)
    else: 
        english_apps_cleaned.append(each_app)        

for each_app in applestore_dataset[1:]:
    name = each_app[1]
    is_english = is_english_or_not(name)
    if is_english == False:
        ios_nonen_apps.append(each_app)
    else: 
        ios_english_apps.append(each_app)
        
print(len(english_apps_cleaned))
print(len(ios_english_apps))

9597
6155


* **Creating a list with only free apps:**

The company only build apps that are free to download and install, and the main source of revenue consists of in-app ads. The data sets contain both free and non-free apps right now, so I will proceed and isolate only the free apps.

In [97]:
free_apps_cleaned = []
non_free_apps = []

free_ios_apps =[]
nonfree_ios_apps = []

for each_app in english_apps_cleaned:
    price = each_app[6]
    if price == 'Free':
        free_apps_cleaned.append(each_app)
    else:
        non_free_apps.append(each_app)

for each_app in ios_english_apps:
    price = float(each_app[4])
    if price == 0.0:
        free_ios_apps.append(each_app)
    else:
        nonfree_ios_apps.append(each_app)

print(len(free_apps_cleaned))
print(len(free_ios_apps))

8847
3203


***
## Most Common Apps by Genre

As we mentioned in the introduction, the aim is to determine what kinds of apps are likely to attract more users because the revenue of our company is highly influenced by the number of people using our apps. To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we develop it further.
* If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. The analysis should begin by getting a sense of what are the most common genres for each market, so I will be building frequency tables for a few columns in these data sets.

In [98]:
print('Google Play')
print('----------------')
print('\n')
explore_data(google_play_dataset, 0, 2, rows_and_columns = False)
print('********')
print('\n')
print('Apple Store')
print('----------------')
print('\n')
explore_data(applestore_dataset, 0, 2, rows_and_columns = False)

Google Play
----------------


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


********


Apple Store
----------------


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']




For this, we can check what we printed at the beginning with our "explore_data" function. The important columns I'm going to be focusing in are going to be:

* Google Play: Category (1), Rating (2), Reviews (3) and Genre (9)
* Apple Store: rating_count_tot (5), user_rating (7) and prime_genre (11)

There are two main reasons for why I chose these variables:

1. Those three variables are - by definition - the same in both datasets
2. We can verify the relation between a genre and how good it has been rated, as well as get a frequency table to see which genre is the most common. That way, with those two things we can make an educated decision.

I'll build two functions to analyze the frequency tables:

* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order

***

### Frequency tables:

Considering the fact that both datasets have different orders in their columns, you will need to pass the index.

In [101]:
def freq_tables(a_list, index):
    freq_table = {}
    total = 0
    for each_app in a_list:
        genre = each_app[index]
        total += 1
        if genre not in freq_table:
            freq_table[genre] = 1
        else:
            freq_table[genre] += 1
    
    for each_app in freq_table:
        percentage = round((freq_table[each_app] / total)*100,2)
        freq_table[each_app] = percentage
    
    return freq_table


In [113]:
freq_table_andr_cat = freq_tables(free_apps_cleaned, 1)
freq_table_andr_gen = freq_tables(free_apps_cleaned, 9)
freq_table_ios = freq_tables(free_ios_apps, 11)


print('Android:')
print('\n')
print(freq_table_andr_cat)
print('\n')
print('-----------')
print('\n')
print(freq_table_andr_gen)
print('\n')
print('-----------')
print('\n')
print('IOS:')
print('\n')
print(freq_table_ios)

Android:


{'ART_AND_DESIGN': 0.64, 'AUTO_AND_VEHICLES': 0.93, 'BEAUTY': 0.6, 'BOOKS_AND_REFERENCE': 2.14, 'BUSINESS': 4.6, 'COMICS': 0.61, 'COMMUNICATION': 3.23, 'DATING': 1.87, 'EDUCATION': 1.16, 'ENTERTAINMENT': 0.96, 'EVENTS': 0.71, 'FINANCE': 3.71, 'FOOD_AND_DRINK': 1.24, 'HEALTH_AND_FITNESS': 3.09, 'HOUSE_AND_HOME': 0.8, 'LIBRARIES_AND_DEMO': 0.94, 'LIFESTYLE': 3.89, 'GAME': 9.7, 'FAMILY': 18.93, 'MEDICAL': 3.54, 'SOCIAL': 2.67, 'SHOPPING': 2.25, 'PHOTOGRAPHY': 2.95, 'SPORTS': 3.39, 'TRAVEL_AND_LOCAL': 2.34, 'TOOLS': 8.45, 'PERSONALIZATION': 3.32, 'PRODUCTIVITY': 3.9, 'PARENTING': 0.66, 'WEATHER': 0.79, 'VIDEO_PLAYERS': 1.8, 'NEWS_AND_MAGAZINES': 2.8, 'MAPS_AND_NAVIGATION': 1.39}


-----------


{'Art & Design': 0.6, 'Art & Design;Creativity': 0.07, 'Auto & Vehicles': 0.93, 'Beauty': 0.6, 'Books & Reference': 2.14, 'Business': 4.6, 'Comics': 0.6, 'Comics;Creativity': 0.01, 'Communication': 3.23, 'Dating': 1.87, 'Education': 5.36, 'Education;Creativity': 0.05, 'Education;Education

### Ordering the frequency tables:

To be able create a descending order list with the most popular genres, I need to analyse and compare each genre to decide where it should be placed. Dictionaries per se can not be sorted, but I can generate a list of tuples with each genre and its value per row.

In [123]:
def desc_ord_list(a_dictionary):
    a_list = []
    for each_value in a_dictionary:
        a_list.append((a_dictionary[each_value], each_value))
    
    return sorted(a_list, reverse = True)
        
def display_list(a_list):
    for each_value in a_list:
        percentage = str(each_value[0])
        name = each_value[1]
        print(name + ": " + percentage)

desc_ord_andr_cat = desc_ord_list(freq_table_andr_cat)
desc_ord_andr_gen = desc_ord_list(freq_table_andr_gen)
desc_ord_ios = desc_ord_list(freq_table_ios)

print('\n')
print('IOS:')
print('\n')
display_list(desc_ord_ios)




IOS:


Games: 58.26
Entertainment: 7.84
Photo & Video: 5.0
Education: 3.68
Social Networking: 3.31
Shopping: 2.59
Utilities: 2.47
Sports: 2.15
Music: 2.06
Health & Fitness: 2.03
Productivity: 1.75
Lifestyle: 1.56
News: 1.34
Travel: 1.25
Finance: 1.09
Weather: 0.87
Food & Drink: 0.81
Reference: 0.53
Business: 0.53
Book: 0.37
Navigation: 0.19
Medical: 0.19
Catalogs: 0.12


It seems like games are the most common genre, and entertainment as a second place, but way behind games (around 51% difference).

As third and fifth place we have "Photo & Video" and "Social Networking", that, in certain ways, the first is mostly made to upload content to the second. 

Also, the concept of "Entertainment" as an individual cathegory might be a little bit forced, as most games can be considered a form of entertainment as well; and aside from games, there are not many other options regarding entertainment in mobile phones that are not games. Also, the same rule - but in a lesser amount - can be applied to education, where lots of educational games can be included.

So we can conclude that there is mostly a prevalence of entertainment apps over practical apps. In spite the fact that we do not really know if there are that many apps because there are a lot of users playing them or not, in this case and for the English-app market, the company may want to try developing games. However, it depends, because despite having a larger amount of users or potential customers, there is a lot of competition, whereas almost the rest of the cathegories could be better in terms of competition but with lower number of potential customers.

In [124]:
print('Android:')
print('\n')
display_list(desc_ord_andr_cat)
print('\n')
print('-----------')
print('\n')
display_list(desc_ord_andr_gen)

Android:


FAMILY: 18.93
GAME: 9.7
TOOLS: 8.45
BUSINESS: 4.6
PRODUCTIVITY: 3.9
LIFESTYLE: 3.89
FINANCE: 3.71
MEDICAL: 3.54
SPORTS: 3.39
PERSONALIZATION: 3.32
COMMUNICATION: 3.23
HEALTH_AND_FITNESS: 3.09
PHOTOGRAPHY: 2.95
NEWS_AND_MAGAZINES: 2.8
SOCIAL: 2.67
TRAVEL_AND_LOCAL: 2.34
SHOPPING: 2.25
BOOKS_AND_REFERENCE: 2.14
DATING: 1.87
VIDEO_PLAYERS: 1.8
MAPS_AND_NAVIGATION: 1.39
FOOD_AND_DRINK: 1.24
EDUCATION: 1.16
ENTERTAINMENT: 0.96
LIBRARIES_AND_DEMO: 0.94
AUTO_AND_VEHICLES: 0.93
HOUSE_AND_HOME: 0.8
WEATHER: 0.79
EVENTS: 0.71
PARENTING: 0.66
ART_AND_DESIGN: 0.64
COMICS: 0.61
BEAUTY: 0.6


-----------


Tools: 8.44
Entertainment: 6.08
Education: 5.36
Business: 4.6
Productivity: 3.9
Lifestyle: 3.88
Finance: 3.71
Medical: 3.54
Sports: 3.46
Personalization: 3.32
Communication: 3.23
Action: 3.1
Health & Fitness: 3.09
Photography: 2.95
News & Magazines: 2.8
Social: 2.67
Travel & Local: 2.33
Shopping: 2.25
Books & Reference: 2.14
Simulation: 2.05
Dating: 1.87
Arcade: 1.84
Video Players & Edi

In Android, genre-wise, the most common would be "Tools" but considering the categories, then "Family" would be the most common one. 

Games seems to be on the second place (category-wise), but that should be reconsidered, as "Family" also have a lot of games on them made for kids, so it happens more or less the same as IOS. Despite creators are able to choose, if they have an app with hybrid categories, they may choose one or the other based on what they think that fits better their app. 

On one hand, in Android, Tools, Business and Productivity seems to be on the top 5 (summing almost 17% of the apps); which could mean that Android users are usually more interested into getting apps that helps them in their daily lives.

On the other hand, in IOS, Utility, Productivity and Business falls almost to the bottom of the list (summing almost 5% of the apps), which shows that Android users seems to be more concerned about those topics, whereas IOS users prefer to enjoy games over the rest of the categories. 

If the company is going to invest money into creating a great app, then they should go for a gaming one. Despite not having more information about the mean time spent on each category, it is not difficult to conceive that people will tend to spent more time playing games than using a productivity/utility tool, which also allows more room for advertisements to be displayed.

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. To be fair, I will consider the Reviews column in Android and the rating_count_tot in IOS, as a proxy to be able to compare the number of installs each category has in both stores.

***

## Calculating the average number of user ratings per app genre

To do that, we'll need to:

* Isolate the apps of each genre.
* Sum up the user ratings for the apps of that genre. 
* Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

In [150]:
def genre_isolation(a_list, genre_ind, user_rat_ind):
    a_dictionary = {}
    total_apps_genre = {}
    
    for each_app in a_list:
        genre = each_app[genre_ind]
        user_rat = float(each_app[user_rat_ind])
        if genre in a_dictionary:
            a_dictionary[genre] += user_rat
            total_apps_genre[genre] +=1
        else:
            a_dictionary[genre] = user_rat
            total_apps_genre[genre] =1
    
    another_list = []
    for each_item in a_dictionary:
        genre = each_item
        value = round(a_dictionary[genre]/total_apps_genre[genre]) #Number of reviews / number of apps per genre
        another_list.append((value, genre))
        
    
    return sorted(another_list, reverse = True)

android_reviews_per_genre = genre_isolation(free_apps_cleaned, 1, 3)
ios_reviews_per_genre = genre_isolation(free_ios_apps, 11, 5)

print('Android (number reviews / app category): ')
print('------------')
print('\n')
display_list(android_reviews_per_genre)
print('\n')
print('IOS (number reviews / app category):')
print('------------')
print('\n')
display_list(ios_reviews_per_genre)
#free_apps_cleaned
#free_ios_apps

Android (number reviews / app category): 
------------


COMMUNICATION: 999090
SOCIAL: 965831
GAME: 684290
VIDEO_PLAYERS: 425350
PHOTOGRAPHY: 404081
TOOLS: 306550
ENTERTAINMENT: 301752
SHOPPING: 223887
PERSONALIZATION: 181122
WEATHER: 173680
PRODUCTIVITY: 160635
MAPS_AND_NAVIGATION: 143611
TRAVEL_AND_LOCAL: 129484
SPORTS: 117317
FAMILY: 113211
NEWS_AND_MAGAZINES: 93088
BOOKS_AND_REFERENCE: 88461
HEALTH_AND_FITNESS: 78095
FOOD_AND_DRINK: 57479
EDUCATION: 56293
COMICS: 43372
FINANCE: 38536
LIFESTYLE: 34119
HOUSE_AND_HOME: 27113
ART_AND_DESIGN: 24699
BUSINESS: 24240
DATING: 21953
PARENTING: 16379
AUTO_AND_VEHICLES: 14140
LIBRARIES_AND_DEMO: 10926
BEAUTY: 7476
MEDICAL: 3730
EVENTS: 2556


IOS (number reviews / app category):
------------


Navigation: 86090
Reference: 79350
Social Networking: 71548
Music: 57327
Weather: 52280
Book: 46385
Food & Drink: 33334
Finance: 32367
Photo & Video: 28442
Travel: 28244
Shopping: 27231
Health & Fitness: 23298
Sports: 23009
Games: 22886
News: 21248
Produ

***

### IOS:

Now based in this new information, it may suggest that the logical route when deciding about developing an app is going for the "Navigation" genre in IOS, but that is not completely correct. It is important to remember that those are averages, so if you have almost 90,000 in IOS, you could also have just two apps, one with 180,000 reviews and the other with barely 1, so the mean would be around 90.000 as well. This can be easily demostrated with the top 3 categories (Navigation, Reference, Social Networking and Music):

In [152]:
for each_app in free_ios_apps:
    if each_app[11] == 'Navigation':
        print(each_app[1], ':', each_app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [154]:
for each_app in free_ios_apps:
    if each_app[11] == 'Reference':
        print(each_app[1], ':', each_app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


In [168]:
for each_app in free_ios_apps:
    if each_app[11] == 'Social Networking' and int(each_app[5]) > 50000:
        print(each_app[1], ':', each_app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642


In [170]:
for each_app in free_ios_apps:
    if each_app[11] == 'Music' and int(each_app[5]) > 50000:
        print(each_app[1], ':', each_app[5])

Pandora - Music & Radio : 1126879
Spotify Music : 878563
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
SoundCloud - Music & Audio : 135744
Magic Piano by Smule : 131695
Smule Sing! : 119316
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Amazon Music : 106235
SoundHound Song Search & Music Player : 82602


This way we can see that there are big companies behind these top 4 categories that are quite difficult to compete with, where the difference between the top one with most reviews can, in certain cases, suppose a big part of the total number of reviews. In fact, after those 4, the number of apps and the number of reviews are quite low compared to the previous ones.

In [175]:
for each_app in free_ios_apps:
    if each_app[11] == 'Weather':
        print(each_app[1], ':', each_app[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

In [174]:
for each_app in free_ios_apps:
    if each_app[11] == 'Book':
        print(each_app[1], ':', each_app[5])

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0


Based on this data, it might not seem to clear which option should our company go for, but once again I will recall the important points about our scope:
1. They only build apps that are free to download and install, and their main source of revenue consists of in-app ads
2. They are open to any kind of app there, but it has to maximize the amount of time the user will spend in our app, so we can pop-up more adds.

So there might be two options here: Games, Food&Drinks and Books. Why? Because of the amount of time users need to spent on each one. When you are in game, there are a lot of ways to make you want to see an add, for instance, giving you some extra lives, money or any other feature; so we are "playing" with the willingness of our users, it is a trade-off. 

The second and third cases are really similar, people who use those apps also spent quite a lot of time on it. If you are following a recipe, it can eventually pop-up an add, and the same happens with a book. If we do not want to have intrusive adds, we can follow the same idea that was applied in the "Games", which is a trade-off where the customer gets a feature for a certain amount of time if they watch an add.

***

### Android:

Let's apply the same principles for Android users.

In [180]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'COMMUNICATION' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+


In [183]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'SOCIAL' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+


In [186]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'GAME' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

Solitaire : 10,000,000+
Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Bubble Witch 3 Saga : 50,000,000+
Race the Traffic Moto : 10,000,000+
Marble - Temple Quest : 10,000,000+
Shooting King : 10,000,000+
Geometry Dash World : 10,000,000+
Jungle Marble Blast : 5,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Block Craft 3D: Building Simulator Games For Free : 50,000,000+


In [188]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'VIDEO_PLAYERS' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+


Just like the issue with IOS, Android shows a top 4 full of companies with a really well set-up niche and a lot of users. Now let's see if what I recommended for IOS would apply here:

In [190]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'ENTERTAINMENT' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

Complete Spanish Movies : 1,000,000+
Pluto TV - It’s Free TV : 1,000,000+
Mobile TV : 10,000,000+
TV+ : 5,000,000+
Digital TV : 5,000,000+
Motorola Spotlight Player™ : 10,000,000+
Vigo Lite : 5,000,000+
Hotstar : 100,000,000+
Peers.TV: broadcast TV channels First, Match TV, TNT ... : 5,000,000+
The green alien dance : 1,000,000+
Spectrum TV : 5,000,000+


In [193]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'FAMILY' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

Jewels Crush- Match 3 Puzzle : 1,000,000+
Coloring & Learn : 5,000,000+
Mahjong : 5,000,000+
Super ABC! Learning games for kids! Preschool apps : 1,000,000+
Toy Pop Cubes : 1,000,000+
Educational Games 4 Kids : 5,000,000+
Candy Pop Story : 1,000,000+
Princess Coloring Book : 5,000,000+
Hello Kitty Nail Salon : 50,000,000+
Candy Smash : 1,000,000+
Happy Fruits Bomb - Cube Blast : 500,000+


In [199]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'NEWS_AND_MAGAZINES' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

Fox News – Breaking News, Live Video & News Alerts : 10,000,000+
NEW - Read Newspaper, News 24h : 10,000,000+
BaBe + - Indonesian News : 1,000,000+
daily News : 1,000,000+
BBC News : 10,000,000+
Free TV Shows App:News, TV Series, Episode, Movies : 1,000,000+
News24 : 1,000,000+
Le Monde, the continuous news : 5,000,000+
Wireless news : 1,000,000+
CNN Breaking US & World News : 10,000,000+
Gnoche entertainment news · sports news is also free : 5,000,000+


In [202]:
amount_shown = 0
for each_app in free_apps_cleaned:
    if each_app[1] == 'BOOKS_AND_REFERENCE' and amount_shown < 11:
        print(each_app[0], ':', each_app[5])
        amount_shown +=1

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+


## Conclusions:

Are Android and IOS users downloading/using the same kind of apps? Surely, not. In fact, Android games seems to be quite a populated one, but it also is full of possibilities based on what our company wants, which is in-app adds, and that is centainly related to the Android user's phylosophy, where they are more likely to get a free-to-use app than pay for one. But, considering that our company also needs to grow or see some revenue before launching it on IOS, which have shown to have more fun-related apps, this road might be too complicated.

On the other hand, books and news seems to have a decent amount of competition but a solid niche of users as well in both stores. Considering the amount of features and improvements that could be created, all the options for in-app adds, and the fact that Android users tend to have a more balanced selection of apps downloaded, it may ease the transition from Android to IOS eventually, therefore our company may want to prefer to develop these kinds of apps first.