# Profitable App Profiles for the App Store and Google Play Markets

In this project we will look at what mobile applications on Andriod and iOS attract the most customers.

The company currently build mobile applications that are free to download and that are directed toward an English-speaking audience. In-app ads are their main source of income, this is directly influenced by the number of users using the application.

Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

## Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our goals:

- [Play Store](https://www.kaggle.com/lava18/google-play-store-apps), a data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.
- [App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps), a data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017.

We'll start by opening and exploring these two data sets.

In [1]:
from csv import reader

# google dataset
opened_file = open('googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# apple dataset
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

To make them easier to explore, we created a function named `explore_data()` that you can repeatedly use to print rows in a readable way. We'll also add an option for our function to show numbers of rows and columns for any datasets.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
print(android_header)
print('\n')
explore_data(android, 0, 2, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful for the purpose of our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

In [4]:
print(ios_header)
print('\n')
explore_data(ios, 0, 2, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 7197
Number of columns: 17


We have 7197 iOS apps in this data set, and the columns that seem interesting are: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.

Not all columns are self-explanatory but details about each column can be found [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).

## Deleting Wrong Data

The Google Play data set has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and we can see that one of the discussions describes an error at row 10472.

In [5]:
print(android[10471]) # correct row

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


In [6]:
print(android[10472]) # incorrect row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [7]:
print(android_header) # header

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


The category column seems to be missing the value '1.9' for 'Catogory' makes no sense but fits in with the next column called 'Rating'.

We will delete this row and confirm the number of rows have reduced by one.

In [8]:
print('Number of rows before:', len(android))
del android[10472]
print('Number of rows after:', len(android))

Number of rows before: 10841
Number of rows after: 10840


After checking the App Store data set [discussion section](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion), we did not find any wrong data.

## Removing Duplicate Entries

If you explore the Google Play data set long enough or look at the [discussions section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), you'll notice some apps have duplicate entries. 

In [9]:
for app in android:
    name = app[0]
    if name == 'Facebook':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


We searched for the app Facebook in the data set and found two entries.

We will now count the number of duplicates in this data set.

In [10]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('Expected length:', len(android) - 1181)

Number of duplicate apps: 1181
Expected length: 9659


There are 1181 duplicate applications. The Google Play data set should have 9659 applications once the duplicates have been removed.

In [11]:
print('Examples of duplicate apps:', duplicate_apps[:15])

Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [12]:
for app in android:
    name = app[0]
    if name == 'Box':
        print(app)


['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [13]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)
    

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


By looking at the Instagram duplicate, we can see that the number of reviews are different. We can assume that the entry with the higher number of reviews would be the most up to date one.

Looking at the app Box, we can see that there is no difference. For this case we will delete any duplicates as we come across them.

To remove the duplicates, we will:

- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [14]:
reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
print('Length of reviews_max:', len(reviews_max))

Length of reviews_max: 9659


The length of reviews_max is the same as the expected length. So we can assume that the duplicates have been removed. We will now use the reviews_max dictionary to create a clean android data set.

In the code cell below:

- We start by initializing two empty lists, android_clean and already_added.
- We loop through the android data set, and for every iteration:
 - We isolate the name of the app and the number of reviews.
 - We add the current row (app) to the android_clean list, and the app name (name) to the already_cleaned list if:
   - The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
   - The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [15]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
explore_data(android_clean, 0, 2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


The data set android_clean has the right number of entries. We can assume there are no duplicates within android_clean.

## Removing Non-English Apps

The company only develop apps that are directed to an English-speaking audience. Both data sets contain non-english apps. We will remove all the apps that are not directed to an English-speaking audience.

One way to go about this is to remove each app with a name containing a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

Behind the scenes, each character we use in a string has a corresponding number associated with it. For instance, the corresponding number for character 'a' is 97, character 'A' is 65, and character '爱' is 29,233. We can get the corresponding number of each character using the ord() built-in function.

The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII (American Standard Code for Information Interchange) system. Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not. If the number is equal to or less than 127, then the character belongs to the set of common English characters.

If an app name contains a character that is greater than 127, then it probably means that the app has a non-English name. To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This means all English apps with up to three emoji or other special characters will still be labeled as English. 


In [16]:
def app_english(string):
    ascii = 0 # count for charcters above 127
    for character in string:
        if ord(character) > 127:
            ascii += 1
    if ascii > 3:
        return False
    return True
print(app_english('asd?#奇'))

True


In [17]:
print(app_english('Docs To Go™ Free Office Suite'))
print(app_english('Instachat 😜'))
print(app_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


The function seems to work fine, English app names using emojis or other symbols that fall outside the ASCII range are being included.

The function is still not perfect, and very few non-English apps might get past our filter, but this seems good enough at this point in our analysis — we shouldn't spend too much time on optimization at this point.

Below, we use the `app_english()` function to filter out the non-English apps for both data sets:

In [18]:
english_android = []
english_ios = []

for app in android_clean:
    name = app[0]
    if app_english(name):
        english_android.append(app)
for app in ios:
    name = app[2]
    if app_english(name):
        english_ios.append(app)

In [19]:
explore_data(english_android, 0, 3, True)
print('\n')
explore_data(english_ios, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188

We are left with 9614 android apps and 6183 iOS apps.

## Isolating the Free Apps

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps, we'll need to isolate only the free apps for our analysis.

In [20]:
print(android_header)
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Price column index for android and iOS respectively are 7, 4.

In [21]:
free_android = []
free_ios = []

for app in english_android:
    price = app[7]
    if price == '0':
        free_android.append(app)
for app in english_ios:
    price = app[5]
    if price == '0':
        free_ios.append(app)
android_length = print(len(free_android))
ios_length = print(len(free_ios))

8864
3222


We now have 8864 android apps and 3222 iOS apps.

## The Anaylsis

Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.

We will begin by getting a sense for what are the most common genres for each market. We will do this by building frequency tables for a few columns in our data sets.

## Most Common Genres

In [22]:
print(android_header)
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


We can use the following columns to generate a frequency table to find out what the most common genres are:
- Android
    - Category[1]
    - Genres[9]
- iOS
    - Prime_genre[-5]

We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages
- Another function that we can use to display the percentages in a descending order

In [23]:
def freq_table(dataset, index):
    table = {}
    total = 0
    for app in dataset:
        key = app[index]
        total += 1
        if key in table:
            table[key] += 1
        else:
            table[key] = 1
    table_percent = {}
    for row in table:
        value = table[row]
        name = row
        table_percent[name] = (value / total) * 100
    return table_percent

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### iOS prime genre
We will start by examining the frequency table for the prime_genre column of the App Store data set.

In [24]:
print('prime genre'.title())        
display_table(free_ios, -5)

Prime Genre
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


The most common genre for apps that are free and english would be Games at 58% followed by Entertainment at 8%. Photo & Video has 5%, Education at 4% and Social Networking at 3%.

The general impression for free and english apps in the Apple store are that most apps are designed for entertainment purposes. Practical apps such as education, shopping and productivity are more rare.

Even though there are alot of entertainment apps in the Apple store, this does not mean that they have the highest number of users.

## Andriod Category and Genres

In [25]:
print('category'.title())
display_table(free_android, 1)
print('\n')
print('genres'.title())
display_table(free_android, 9)

Category
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_

The most common genres in the Category column are family 19%, game 10%, tools 8%, business 5% and lifestyle 4%. The android market has more balance between entertainment and practical apps.

The android market Genres column has alot of different genres. We can also see a similar spread as the category column between entertainment and pratical apps.

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

We would not be able to recommend an app profile using the frequency table for genres as we only know how many apps of that genre are in the market but this does not indicate how many users these apps have. 

### iOS genre vs ratings

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

We will now calculate the average number of user ratings per app genre on the App Store:

In [26]:
unique_genres = freq_table(free_ios, -5)
avg_n_rating = [] 
for genre in unique_genres:
    total = 0
    len_genre = 0
    for app in free_ios:
        genre_app = app[-5]
        if genre_app == genre:
            total += float(app[6])
            len_genre += 1
    avg_rating = total / len_genre
    #print(genre, ':', avg_rating)
    tuple_v = (avg_rating, genre)
    avg_n_rating.append(tuple_v)
s_avg = sorted(avg_n_rating, reverse=True)
order = 0
for genre in s_avg:
    order += 1
    print(str(order) + ')', genre[1], ':', genre[0])


1) Navigation : 86090.33333333333
2) Reference : 74942.11111111111
3) Social Networking : 71548.34905660378
4) Music : 57326.530303030304
5) Weather : 52279.892857142855
6) Book : 39758.5
7) Food & Drink : 33333.92307692308
8) Finance : 31467.944444444445
9) Photo & Video : 28441.54375
10) Travel : 28243.8
11) Shopping : 26919.690476190477
12) Health & Fitness : 23298.015384615384
13) Sports : 23008.898550724636
14) Games : 22788.6696905016
15) News : 21248.023255813954
16) Productivity : 21028.410714285714
17) Utilities : 18684.456790123455
18) Lifestyle : 16485.764705882353
19) Entertainment : 14029.830708661417
20) Business : 7491.117647058823
21) Education : 7003.983050847458
22) Catalogs : 4004.0
23) Medical : 612.0


We can see the top 5 most popular genres in the Apple store above. We will analyse the most popular genre 'Navigation' to find out whether this is a suitable recommendation for the App store.

In [27]:
for app in free_ios:
    cat = app[-5]
    if cat == 'Navigation':
        print(app[2],':',app[6])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching® : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


There are only 6 apps in this category and 2 of them dominate by having the most users. The two apps that have the greatest number of users, do not have adverts within them and would not suit our company profile as we generate our revenue from the ads. Implementing ads to these apps would be possible but users would always opt for the ad free software.

As Navigation is not a good profile for our app, we will look at the next most popular genre 'references'.

In [28]:
def sort_cat(dataset, name):
    tuples = []
    for app in dataset:
        cat = app[-5]
        if cat == name:
            tuple_v = (float(app[6]), app[2])
            tuples.append(tuple_v)
    display = sorted(tuples, reverse=True)
    order = 0
    for row in display:
        order += 1
        print(str(order) + ')', row[1], ':', row[0])

sort_cat(free_ios, 'Reference')

1) Bible : 985920.0
2) Dictionary.com Dictionary & Thesaurus : 200047.0
3) Dictionary.com Dictionary & Thesaurus for iPad : 54175.0
4) Google Translate : 26786.0
5) Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418.0
6) New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588.0
7) Merriam-Webster Dictionary : 16849.0
8) Night Sky : 12122.0
9) City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535.0
10) LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693.0
11) GUNS MODS for Minecraft PC Edition - Mods Tools : 1497.0
12) Guides for Pokémon GO - Pokemon GO News and Cheats : 826.0
13) WWDC : 762.0
14) Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718.0
15) VPN Express : 14.0
16) Real Bike Traffic Rider Virtual Reality Glasses : 8.0
17) 教えて!goo : 0.0
18) Jishokun-Japanese English Dictionary & Translator : 0.0


References is also dominated by the first two apps.

We will now take a look at Social Networking:

In [29]:
sort_cat(free_ios, 'Social Networking')

1) Facebook : 2974676.0
2) Pinterest : 1061624.0
3) Skype for iPhone : 373519.0
4) Messenger : 351466.0
5) Tumblr : 334293.0
6) WhatsApp Messenger : 287589.0
7) Kik : 260965.0
8) ooVoo – Free Video Call, Text and Voice : 177501.0
9) TextNow - Unlimited Text + Calls : 164963.0
10) Viber Messenger – Text & Call : 164249.0
11) Followers - Social Analytics For Instagram : 112778.0
12) MeetMe - Chat and Meet New People : 97072.0
13) We Heart It - Fashion, wallpapers, quotes, tattoos : 90414.0
14) InsTrack for Instagram - Analytics Plus More : 85535.0
15) Tango - Free Video Call, Voice and Chat : 75412.0
16) LinkedIn : 71856.0
17) Match™ - #1 Dating App. : 60659.0
18) Skype for iPad : 60163.0
19) POF - Best Dating App for Conversations : 52642.0
20) Timehop : 49510.0
21) Find My Family, Friends & iPhone - Life360 Locator : 43877.0
22) Whisper - Share, Express, Meet : 39819.0
23) Hangouts : 36404.0
24) LINE PLAY - Your Avatar World : 34677.0
25) WeChat : 34584.0
26) Badoo - Meet New People, C

Social Networking is also heavily influenced by the first two apps. There is also alot of apps in this category. This category does have apps such as Facebook that generate income from ads and are extremely successful.

In [30]:
sort_cat(free_ios, 'Music')

1) Pandora - Music & Radio : 1126879.0
2) Spotify Music : 878563.0
3) Shazam - Discover music, artists, videos & lyrics : 402925.0
4) iHeartRadio – Free Music & Radio Stations : 293228.0
5) SoundCloud - Music & Audio : 135744.0
6) Magic Piano by Smule : 131695.0
7) Smule Sing! : 119316.0
8) TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420.0
9) Amazon Music : 106235.0
10) SoundHound Song Search & Music Player : 82602.0
11) Sonos Controller : 48905.0
12) Bandsintown Concerts : 30845.0
13) Karaoke - Sing Karaoke, Unlimited Songs! : 28606.0
14) My Mixtapez Music : 26286.0
15) Sing Karaoke Songs Unlimited with StarMaker : 26227.0
16) Ringtones for iPhone & Ringtone Maker : 25403.0
17) Musi - Unlimited Music For YouTube : 25193.0
18) AutoRap by Smule : 18202.0
19) Spinrilla - Mixtapes For Free : 15053.0
20) Napster - Top Music & Radio : 14268.0
21) edjing Mix:DJ turntable to remix and scratch music : 13580.0
22) Free Music - MP3 Streamer & Playlist Manager Pro : 13443.0
23) Free Pia

The music category is heavily influenced by the top 3 categories. There is alot of competition in this genre as there are many apps. This category would be a good suggestion as many of these app take revenues from ads.

In [31]:
sort_cat(free_ios, 'Weather')

1) The Weather Channel: Forecast, Radar & Alerts : 495626.0
2) The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648.0
3) WeatherBug - Local Weather, Radar, Maps, Alerts : 188583.0
4) MyRadar NOAA Weather Radar Forecast : 150158.0
5) AccuWeather - Weather for Life : 144214.0
6) Yahoo Weather : 112603.0
7) Weather Underground: Custom Forecast & Local Radar : 49192.0
8) NOAA Weather Radar - Weather Forecast & HD Radar : 45696.0
9) Weather Live Free - Weather Forecast & Alerts : 35702.0
10) Storm Radar : 22792.0
11) QuakeFeed Earthquake Map, Alerts, and News : 6081.0
12) Moji Weather - Free Weather Forecast : 2333.0
13) Hurricane by American Red Cross : 1158.0
14) Forecast Bar : 375.0
15) Hurricane Tracker WESH 2 Orlando, Central Florida : 203.0
16) FEMA : 128.0
17) iWeather - World weather forecast : 80.0
18) Weather - Radar - Storm with Morecast App : 78.0
19) Yurekuru Call : 53.0
20) Weather & Radar : 37.0
21) WRAL Weather Alert : 25.0
22) Météo

This genre is dominated by the first 10 apps. These apps run ad free so I would definitely not suggest this catogory.

Looking at the top 5 genres, I would say social networking and music would be a good suggestion due successful apps that already generate revenues from ads. But getting users from the others apps would be extremely difficult.

### Play store 

We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [32]:
display_table(free_android, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


We don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. 

To perform computations, however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.

We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [33]:
unique_genres_android = freq_table(free_android, 1)
avg_n = []
for category in unique_genres_android:
    total = 0
    len_category = 0
    for app in free_android:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            n_installs = float(n_installs)
            total += n_installs
            len_category += 1
    avg_n_installs = total / len_category
    #print(category, ':', avg_n_installs)
    tuple_v = (avg_n_installs, category)
    avg_n.append(tuple_v)
s_avg_n = sorted(avg_n, reverse=True)
order = 0
for genre in s_avg_n:
    order += 1
    print(str(order) + ')', genre[1], ':', genre[0])

1) COMMUNICATION : 38456119.167247385
2) VIDEO_PLAYERS : 24727872.452830188
3) SOCIAL : 23253652.127118643
4) PHOTOGRAPHY : 17840110.40229885
5) PRODUCTIVITY : 16787331.344927534
6) GAME : 15588015.603248259
7) TRAVEL_AND_LOCAL : 13984077.710144928
8) ENTERTAINMENT : 11640705.88235294
9) TOOLS : 10801391.298666667
10) NEWS_AND_MAGAZINES : 9549178.467741935
11) BOOKS_AND_REFERENCE : 8767811.894736841
12) SHOPPING : 7036877.311557789
13) PERSONALIZATION : 5201482.6122448975
14) WEATHER : 5074486.197183099
15) HEALTH_AND_FITNESS : 4188821.9853479853
16) MAPS_AND_NAVIGATION : 4056941.7741935486
17) FAMILY : 3695641.8198090694
18) SPORTS : 3638640.1428571427
19) ART_AND_DESIGN : 1986335.0877192982
20) FOOD_AND_DRINK : 1924897.7363636363
21) EDUCATION : 1833495.145631068
22) BUSINESS : 1712290.1474201474
23) LIFESTYLE : 1437816.2687861272
24) FINANCE : 1387692.475609756
25) HOUSE_AND_HOME : 1331540.5616438356
26) DATING : 854028.8303030303
27) COMICS : 817657.2727272727
28) AUTO_AND_VEHICLES

We will examine the top 5 categories:

In [34]:
def sort_cat_android(dataset, name):
    tuples = []
    for app in dataset:
        cat = app[1]
        if cat == name:
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            tuple_v = (float(n_installs), app[0])
            tuples.append(tuple_v)
    display = sorted(tuples, reverse=True)
    order = 0
    for row in display:
        order += 1
        print(str(order) + ')', row[1], ':', row[0])
sort_cat_android(free_android, 'COMMUNICATION')

1) WhatsApp Messenger : 1000000000.0
2) Skype - free IM & video calls : 1000000000.0
3) Messenger – Text and Video Chat for Free : 1000000000.0
4) Hangouts : 1000000000.0
5) Google Chrome: Fast & Secure : 1000000000.0
6) Gmail : 1000000000.0
7) imo free video calls and chat : 500000000.0
8) Viber Messenger : 500000000.0
9) UC Browser - Fast Download Private & Secure : 500000000.0
10) LINE: Free Calls & Messages : 500000000.0
11) Google Duo - High Quality Video Calls : 500000000.0
12) imo beta free calls and text : 100000000.0
13) Yahoo Mail – Stay Organized : 100000000.0
14) Who : 100000000.0
15) WeChat : 100000000.0
16) UC Browser Mini -Tiny Fast Private & Secure : 100000000.0
17) Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000.0
18) Telegram : 100000000.0
19) Opera Mini - fast web browser : 100000000.0
20) Opera Browser: Fast and Secure : 100000000.0
21) Messenger Lite: Free Calls & Messages : 100000000.0
22) Kik : 100000000.0
23) KakaoTalk: Free Calls & Text : 10000000

On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs.

In [35]:
sort_cat_android(free_android, 'VIDEO_PLAYERS')

1) YouTube : 1000000000.0
2) Google Play Movies & TV : 1000000000.0
3) MX Player : 500000000.0
4) VivaVideo - Video Editor & Photo Movie : 100000000.0
5) VideoShow-Video Editor, Video Maker, Beauty Camera : 100000000.0
6) VLC for Android : 100000000.0
7) Motorola Gallery : 100000000.0
8) Motorola FM Radio : 100000000.0
9) Dubsmash : 100000000.0
10) Vote for : 50000000.0
11) Vigo Video : 50000000.0
12) VMate : 50000000.0
13) Samsung Video Library : 50000000.0
14) Ringdroid : 50000000.0
15) MiniMovie - Free Video and Slideshow Editor : 50000000.0
16) LIKE – Magic Video Maker & Community : 50000000.0
17) KineMaster – Pro Video Editor : 50000000.0
18) HD Video Downloader : 2018 Best video mate : 50000000.0
19) DU Recorder – Screen Recorder, Video Editor, Live : 50000000.0
20) video player for android : 10000000.0
21) iMediaShare – Photos & Music : 10000000.0
22) YouTube Studio : 10000000.0
23) Video Player All Format : 10000000.0
24) Video Downloader - for Instagram Repost App : 10000000.0

In [36]:
sort_cat_android(free_android, 'SOCIAL')

1) Instagram : 1000000000.0
2) Google+ : 1000000000.0
3) Facebook : 1000000000.0
4) Snapchat : 500000000.0
5) Facebook Lite : 500000000.0
6) VK : 100000000.0
7) Tumblr : 100000000.0
8) Tik Tok - including musical.ly : 100000000.0
9) Tango - Live Video Broadcast : 100000000.0
10) Pinterest : 100000000.0
11) LinkedIn : 100000000.0
12) Badoo - Free Chat & Dating App : 100000000.0
13) BIGO LIVE - Live Stream : 100000000.0
14) ooVoo Video Calls, Messaging & Stories : 50000000.0
15) Zello PTT Walkie Talkie : 50000000.0
16) SKOUT - Meet, Chat, Go Live : 50000000.0
17) POF Free Dating App : 50000000.0
18) MeetMe: Chat & Meet New People : 50000000.0
19) textPlus: Free Text & Calls : 10000000.0
20) magicApp Calling & Messaging : 10000000.0
21) YouNow: Live Stream Video Chat : 10000000.0
22) We Heart It : 10000000.0
23) Waplog - Free Chat, Dating App, Meet Singles : 10000000.0
24) TextNow - free text + calls : 10000000.0
25) Text free - Free Text + Call : 10000000.0
26) Text Me: Text Free, Call F

In [37]:
sort_cat_android(free_android, 'PHOTOGRAPHY')

1) Google Photos : 1000000000.0
2) Z Camera - Photo Editor, Beauty Selfie, Collage : 100000000.0
3) YouCam Perfect - Selfie Photo Editor : 100000000.0
4) YouCam Makeup - Magic Selfie Makeovers : 100000000.0
5) Sweet Selfie - selfie camera, beauty cam, photo edit : 100000000.0
6) S Photo Editor - Collage Maker , Photo Collage : 100000000.0
7) Retrica : 100000000.0
8) PicsArt Photo Studio: Collage Maker & Pic Editor : 100000000.0
9) PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100000000.0
10) Photo Editor Pro : 100000000.0
11) Photo Editor Collage Maker Pro : 100000000.0
12) Photo Collage Editor : 100000000.0
13) LINE Camera - Photo editor : 100000000.0
14) Cymera Camera- Photo Editor, Filter,Collage,Layout : 100000000.0
15) Candy Camera - selfie, beauty camera, photo editor : 100000000.0
16) Camera360: Selfie Photo Editor with Funny Sticker : 100000000.0
17) BeautyPlus - Easy Photo Editor & Selfie Camera : 100000000.0
18) B612 - Beauty & Filter Camera : 100000000.0
19) AR effect

192) DU Collage Maker - Photo Collage & Grid & Layout : 100000.0
193) DSLR Photography Training apps : 100000.0
194) DP Maker : 100000.0
195) DP Creator for WhatsApp : 100000.0
196) Cover Camera for FB : 100000.0
197) Camera V7 24 Megapixel : 100000.0
198) Black & White Camera - Lovely BW : 100000.0
199) B Tiff Viewer : 100000.0
200) AZ Camera - Manual Pro Cam : 100000.0
201) 4K Ultra Camera : 100000.0
202) 24 megapixel hd camera : 100000.0
203) Wifi Action Camera : 50000.0
204) Moony Lab - Print Photos, Books & Magnets ™ : 50000.0
205) DV Lottery Photo Tool : 50000.0
206) DP Status 2017 : 50000.0
207) BlitzWolf Shutter - BW Shutter : 50000.0
208) B&W Photo Filter Editor : 50000.0
209) Z Camera : 10000.0
210) Square DP For Whatsapp : 10000.0
211) Profile w/o crop for Telegram : 10000.0
212) Night Camera Blur Effect : 10000.0
213) Leica Q : 10000.0
214) FN Cam : 10000.0
215) FH WiFiCam : 10000.0
216) DSLR camera - Auto Focus and Blur Professional : 10000.0
217) DP Photo Editor : 10000.0

In [38]:
sort_cat_android(free_android, 'PRODUCTIVITY')

1) Google Drive : 1000000000.0
2) Microsoft Word : 500000000.0
3) Google Calendar : 500000000.0
4) Dropbox : 500000000.0
5) Cloud Print : 500000000.0
6) WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100000000.0
7) SwiftKey Keyboard : 100000000.0
8) Samsung Notes : 100000000.0
9) Microsoft PowerPoint : 100000000.0
10) Microsoft Outlook : 100000000.0
11) Microsoft OneNote : 100000000.0
12) Microsoft OneDrive : 100000000.0
13) Microsoft Excel : 100000000.0
14) Google Slides : 100000000.0
15) Google Sheets : 100000000.0
16) Google Keep : 100000000.0
17) Google Docs : 100000000.0
18) Evernote – Organizer, Planner for Notes & Memos : 100000000.0
19) ES File Explorer File Manager : 100000000.0
20) ColorNote Notepad Notes : 100000000.0
21) CamScanner - Phone PDF Creator : 100000000.0
22) Adobe Acrobat Reader : 100000000.0
23) myAT&T : 50000000.0
24) Verizon Cloud : 50000000.0
25) QR Droid : 50000000.0
26) My Airtel-Online Recharge, Pay Bill, Wallet, UPI : 50000000.0
27) Mobizen Screen Re

305) Builder (by Engineer.ai) : 100.0
306) BW-IVMS : 100.0
307) BV Mobile Apps : 100.0
308) Ag Trucking Mobile App : 100.0
309) Ag Guardian : 100.0
310) My Ag Report : 50.0
311) MY GULFPORT FL : 50.0
312) EY Team Connect : 50.0
313) EB Scanner : 50.0
314) CX Summit : 50.0
315) CT CONNECT : 50.0
316) CONNECT: The Mobile CX Summit : 50.0
317) CE SODEXO PASS FRANCE : 50.0
318) BN DB1 App : 50.0
319) BAMMS for BM 1Park : 50.0
320) SKIN BK : 10.0
321) PLMGSS FR : 10.0
322) My CG : 10.0
323) MinT BI : 10.0
324) EU Brazil Green Business Forum : 10.0
325) ER Assist : 10.0
326) EP Cloud : 10.0
327) EI! : 10.0
328) EC QR : 10.0
329) DV Web Design Tips : 10.0
330) CQ Ukraine : 10.0
331) BW t&t : 10.0
332) BW Ultra : 10.0
333) BN Inscript Improved Keyboard : 10.0
334) BN Inscript Improved Input keyboard : 10.0
335) BM Interventi : 10.0
336) BAMMS for BM SQ : 10.0
337) AQ Guards : 10.0
338) APagri CD : 10.0
339) ACCEPT CE MARKING : 10.0
340) Shaggy's DM Assistant : 5.0
341) FK Events : 5.0
342) EC 

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

## Conclusion

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

The video player category would be a good profile for the app as you could embed ads into the videos. This is also similar to the music genre in the app store.

If we were to create an app that focused on music videos and maybe allow for some sort of social networking it could be quite successful. We would also be able to play ads through the video player.