# Categorizing Profitable Free Mobile Apps

Our discussion concerns mobile apps available for Android and Apple operating systems. In particular, we wish to uncover ideas for new mobile apps which are
both **free** and **profitable**. 

Profitability of free apps is based on in-app adverstisements. Thus, the more users of a free app, the greater its actual or potential profitability. 

This simple principle leads us to our main strategies of analyzing existing apps:
    1. What kinds of free apps have the most number of users?  
       These are the apps we might wish to emulate as developers. 
    
    2. What kinds of free apps have the fewest number of users?  
       These are the categories of apps we might wish to avoid  
       as developers. 
    
    3. What kinds of paid apps are not well-represented among free apps?  
       The idea is that free apps of this type could take future market  
       share away from paid apps, and thus fulfill our profitability goal    
       of having a large number of users. 
    
We are going to dig into two different data sets for mobile apps. One set contains data for Android apps ([link here](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)), and the other for iOS apps ([link here](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)). The basics of Python programming language for data analysis will be put to use. We explain everything step-by-step and try to justify our conclusions. 

In [49]:
from csv import reader

#obtain the data set for iOS apps
apple_file = open('AppleStore.csv', encoding = "utf8")
opened_file = reader(apple_file)
ios = list(opened_file) #list of lists, with header row
#obtain the data set for Android apps
google_file = open('googleplaystore.csv', encoding = "utf8")
opened_file = reader(google_file)
android = list(opened_file) #list of lists, with header row

Here is a function that allows us to easily print and view rows from our data sets `iOS` and `android` in a readable way.

In [50]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Number of rows:' , len(dataset))
        print('Number of columns:', len(dataset[0]))

Here are the first few rows of the iOS and android data sets, respectively:

In [51]:
explore_data(ios, 1, 4, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7198
Number of columns: 16


In [52]:
explore_data(android, 1, 4, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


Notice above that the ios dataset has 7,198 rows, and the android dataset has 10,842 rows, not including the headers. We next display the headers for the ios and android datasets, respectively.

In [53]:
print(ios[0])
print('\n')
print(android[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Here is the meaning of the columns for the ios dataset:

|Column name      |Description
|:---             |:---
|id                |Apple Id
|track_name        |App name
|size_bytes        |Size in bytes
|currency          |Currency type
|price             |Price amount
|rating_count_tot  |User rating counts, all versions
|rating_count_ver  |User rating counts, current version
|user_rating       |Average user rating, all versions
|user_rating_ver   |Average user rating value, current version
|ver               |Latest version code
|cont_rating       |Content rating
|prime_genre       |Primary genre
|sup_devices.num   |Number of supporting devices
|ipadSc_urls.num   |Number of screenshots showed for display
|lang.num          |Number of supported devices
|vpp_lic           |Vpp Device Based Licensing Enabled

Because we wish to study free profitable apps, the columns `price` and `rating_count_tot` are especially useful to our analysis (free or not, and how many people used the app). Naturally, we need to keep track of `prime_genre` and `cont_rating` as well, to see what kind of apps we are dealing with.

Similarly, for the android columns, we will be looking at the columns `Price`, `Installs`, `Genres` and `Content Rating`.


### Data Cleaning

We have been made aware that there is an error in the android dataset at or near row 10,472 (one of the column entries is reportedly missing in this row). Let's print the lengths of a few rows in this range. (Recall from above that the correct length is 13.)

In [54]:
for row in android[10471:10474]:
    print(len(row))

13
13
12


We see it is row 10,473 which contains the error! We delete this row from the data.

In [55]:
del android[10473]
print(len(android))

10841


This is indeed the number of rows that should remain after we delete one row. We also suspect that there are duplicate apps in some rows of this dataset. For example, here are all the rows for the app 'Instagram':

In [56]:
for row in android:
    if row[0] == 'Instagram':
        print(row)
        print('\n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




How many rows in all contain duplicate apps?

In [57]:
duplicate = []
unique = []
for row in android:
    app = row[0]
    if app in unique:
        duplicate.append(app)
    else:
        unique.append(app)

print(len(duplicate), 'duplicate rows in all!')

1181 duplicate rows in all!


How to deal with this? The only significant entry that varies between rows for duplicate apps is 'Reviews' -- since we wish to focus on apps with the largest number of reviews (profitable), we will keep only the row with the highest number, and delete the rest. (The data were entered into the table at different times, and the row with the largest number for 'Reviews' should be the most recent entry.)

### More Data Cleaning: Removing Duplicates
After we remove duplicate app rows according to our plan, we expect the android dataset (without header) to contain 10840 - 1181 = 9,659 rows. To see which rows we will **not** remove, we create a dictionary with the apps as keys, and the maximum number of reviews we find as the values. For example, looking back at the 'Instagram' rows printed above, the dictionary will contain 'Instagram': `66577446` as its one and only element with 'Instagram' as key.

In [58]:
reviews_max = {}
#exlude header row below
for row in android[1:]:
    name = row[0] #app name
    n_reviews = int(row[3])
    if name in reviews_max and n_reviews > reviews_max[name]:
        reviews_max[name] = n_reviews
    elif app not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Number of rows =', (len(reviews_max)))   

Number of rows = 9659


This dictionary is used to finally create a "clean" android list of lists with each app named exactly once; we expect 9,659 rows. The following code cell creates the new dataset `android_clean` by selecting an instance of the row from the android dataset which contains the max number of reviews, and ignoring *all other instances* of the row for that app in the android dataset.

In [59]:
android_clean = []
already_added = []

for row in android[1:]:
    name = row[0]
    n_reviews = int(row[3])
    if name not in already_added and n_reviews == reviews_max[name]:
        android_clean.append(row)
        already_added.append(name)

print('Number of rows in clean android dataset =', len(android_clean))

Number of rows in clean android dataset = 9659


Let's also check if the ios data set (Apple store) contains any duplicates (two or more rows for the same app name).

In [60]:
names = []
duplicates = []
for row in ios[1:]:
    app_name= row[1]
    if app_name in names:
        duplicates.append(app_name)
    else:
        names.append(app_name)
        
print(duplicates)

['Mannequin Challenge', 'VR Roller Coaster']


There are only two duplicate apps here. We can print out the rows to see what is going on:

In [61]:
for row in ios[1:]:
    if row[1] == 'Mannequin Challenge' or row[1] == 'VR Roller Coaster':
        print(row)
        print()

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']

['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']

['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']

['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']



The two rows for 'Mannequin Challenge' contain different values for content ratings (4+ and 9+). Because it is unclear which row is correct, we will remove them from our data. After analyzing the rows for 'VR Roller Coaster', we decide to keep only the row with the greater number of user rating counts (6th entry in the row).

In [62]:
ios_clean=[]
for row in ios[1:]:
    if row[1] != 'Mannequin Challenge' and row[1] != 'VR Roller Coaster':
        ios_clean.append(row)
    elif row[1] == 'VR Roller Coaster' and row[5] == '107':
        ios_clean.append(row)

        #check to see if we really removed 3 rows 
print(len(ios[1:]), 'rows in original ios data set')
print(len(ios_clean), 'rows in clean ios data set')

7197 rows in original ios data set
7194 rows in clean ios data set


### Additional Data Cleaning: Keep Only English-Language Apps
Symbols from outside the English alphabet along with normal punctuation marks have ASCII codes > 127. We define a function to check these codes.


In [63]:
def english(any_string):
    for character in any_string:
        if ord(character) > 127:
            return False
            break
    return True
#Check to see f it works

print ('Instagram:' , english('Instagram'))
print()
print('爱奇艺PPS -《欢乐颂2》电视剧热播:', english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print()
print('Docs To Go™ Free Office Suite:', english('Docs To Go™ Free Office Suite'))
print()
print('Instachat 😜:', english('Instachat 😜'))

Instagram: True

爱奇艺PPS -《欢乐颂2》电视剧热播: False

Docs To Go™ Free Office Suite: False

Instachat 😜: False


We observe that the presence of certain symbols in an English-language app name makes the function claim the app is non-English language. To get around this problem, we will allow the the name of an app to contain at least three unusual symbols before we declare it to be a non-English app.

In [64]:
def english(any_string):
    unusual=0
    for character in any_string:
        if ord(character) > 127:
            unusual += 1
    if unusual <= 3:
        return True
    else:
        return False
    
#check again to see if it works
print ('Instagram:' , english('Instagram'))
print()
print('爱奇艺PPS -《欢乐颂2》电视剧热播:', english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print()
print('Docs To Go™ Free Office Suite:', english('Docs To Go™ Free Office Suite'))
print()
print('Instachat 😜:', english('Instachat 😜'))

Instagram: True

爱奇艺PPS -《欢乐颂2》电视剧热播: False

Docs To Go™ Free Office Suite: True

Instachat 😜: True


Now we use this function to remove non-English apps from both lists. We print out the number of rows remaining in each list.

In [65]:
android_english = []
for row in android_clean:
    app_name=row[0] #the name of the android app is the first entry in the row
    if english(app_name):
        android_english.append(row)
print('English-language android apps: ', len(android_english))

ios_english = []
for row in ios_clean:
    app_name = row[1] #The name of the ios app is the second entry in the row
    if english(app_name):
        ios_english.append(row)
print('English-language iOS apps:', len(ios_english))

English-language android apps:  9614
English-language iOS apps: 6180


Our last data-cleaning step is to remove all non-free apps from the data sets. We will be using these final data sets to answer our original questions (1) and (2). Question (3) will be set aside for a future project.

In [66]:
android_final = []
#the price of the android apps appears in the 7th position of the row
for row in android_english:
    if row[6] == 'Free':
        android_final.append(row)

ios_final = []
#the price of the ios apps appears in the 5th position of the row
for row in ios_english:
    if row[4] == '0.0':
        ios_final.append(row)

print('Free android apps we will analyze:', len(android_final))
print()
print('Free iOS apps we will analyze:', len(ios_final))  

Free android apps we will analyze: 8863

Free iOS apps we will analyze: 3219


## Analysis of the data: building lists by popularity

With these two suitable data sets in hand, we can pursue our main theme: Finding the kids of apps that attract the most users.

Our **mission** as developers is:
    
    - First build a basic Android version of an app.
    - Wait to see if it attracts a sizable number of users over 6 months.
    - In the event the app attract enough users, we develop an iOS version
      of it and add it to the App Store.

Therefore, it is important that we find the kinds of apps (app profiles) that are successful or popular on *both* platforms. With that in mind, we focus on finding out what are the most common genres for each platform. In the ios data set, we will focus on "primary genre" (column 12, corresponding to index 11). In the android data set, we have two columns dealing with genres: "Genres" (column index 9) and "Category" (column index 1).  

We build frequency tables that reveal this information about app genres.


In [67]:
def freq_table(dataset, index): #dataset is a list of lists, index is an integer
    dict1={}

    for row in dataset:
        x = row[index]
        if x in dict1:
            dict1[x] += 1
        else:
            dict1[x]= 1
    
    return dict1

def freq_table_perc(dataset, index): #return freq table with percentage values
    dict1 = freq_table(dataset, index)
    dict2 = {}
    total = 0
    
    for y in dict1:
        total += dict1[y]
    
    for z in dict1:
        dict2[z] = 100*dict1[z]/total
    
    return dict2

In [68]:
# this function display our frequency tables in a descending list
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We can now display the sorted frequency tables of the columns `prime_genre`, `Genres` and `Category`.

In [69]:
print('App store: prime_genre (primary genre): number of apps:')
print()
display_table(ios_final, 11)

App store: prime_genre (primary genre): number of apps:

Games : 1871
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


In [70]:
print('Google Play: Genres: number of apps:')
print()
print(display_table(android_final, 9))

Google Play: Genres: number of apps:

Tools : 749
Entertainment : 538
Education : 474
Business : 406
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 314
Sports : 307
Personalization : 294
Communication : 288
Action : 275
Health & Fitness : 272
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 165
Video Players & Editors : 158
Casual : 155
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 80
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 39
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action &

In [71]:
print('Google Play: Category: number of apps:')
print()
print(display_table(android_final, 1))

Google Play: Category: number of apps:

FAMILY : 1703
GAME : 843
TOOLS : 750
BUSINESS : 406
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 314
SPORTS : 303
PERSONALIZATION : 294
COMMUNICATION : 288
HEALTH_AND_FITNESS : 272
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 158
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 100
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
ENTERTAINMENT : 78
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53
None


Looking up at the `prime_genre` table, we observe:  

    - Games are by far the most popular genre, followed by Entertainment
    - The vast majority of all the 23 free apps are in the first few  
      categories
    - Most of the free apps in the table are for entertainment or social  
      purposes, while practical apps like finance, news, and education
      comprise a much smaller share

This is suggestive data, but we cannot immediately conclude that gaming and entertainment apps fit our criteria for having a large number of users. For example, we could conceive of a situation where most of the individual gaming and entertainment apps have a small number of users, but many of the education apps have very large numbers of users. In other words, a large number of apps for one genre doesn't imply that apps of that genre have a large number of users. 

Next, looking at the `Genres` and `Category` frequency tables above, we see:

    - The genres and categories are more evenly divided between practical
      and entertainment/fun-styled apps. 
    - There are numerous genres with a large number of apps, with no clear
      preference given to the type of genre. "Tools" has the greatest number
      of apps
    - Under categories, the winners are "Family" and "Game", but there are 
      many categories like "Tools" and "Business" which balance things out
      between practical and fun

This is useful information, but is not enough for us to recommend a specific app profile that will generate a large number of users. Our frequency tables do not clearly reveal which genres have the largest number of users, or even what is actually the most popular genre in the world of free apps. 

To determine which apps really have the greatest numbers of users, we should calculate the average number of installations for each app genre. In the Google Play (`android_clean`) data set, the number of installations for each app is under `Installs`, or column index 5. The number of installs is not shown in the Apple Store (`ios_clean`) data set, so we will take the total number of user ratings as a substitute figure; this is found in column index 5, `rating_count_tot`.

Starting with the Apple Store data set:

In [72]:
#calculate average number of user ratings per app genre on Apple Store
#prime_genre is index 11
dict = freq_table(ios_clean, 11)
ourtable = []

for genre in dict:
    
    total = 0
    len_genre = 0
    
    for row in ios_clean:
        genre_app = row[11]
        
        if genre_app == genre:
            total += float(row[5])
            len_genre += 1

    ourtable.append((int(total/len_genre),genre))
                     
#sort and print it out
print('Average number of ratings (proxy for average number of users) per genre')  
print('in Apple Store:')
print()
table_sorted = sorted(ourtable, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Average number of ratings (proxy for average number of users) per genre
in Apple Store:

Social Networking : 45498
Music : 28842
Reference : 22410
Weather : 22181
Shopping : 18615
Photo & Video : 14352
Travel : 14129
Sports : 14026
Food & Drink : 13938
Games : 13702
News : 13015
Navigation : 11853
Finance : 11047
Health & Fitness : 9913
Productivity : 8051
Entertainment : 7533
Utilities : 6863
Lifestyle : 6161
Book : 5125
Business : 4788
Education : 2239
Catalogs : 1732
Medical : 592


We saw above that "Games" dominated the number of apps in the App Store, but our latest table shows "Games" comes much further down the list when we look at average number of users per app across genres. 

Looking at the latest table, if we had to choose a free app to develop for the App Store that would attract the most number of users, we would want to choose a social networking app. In second place, we would choose to develop a free app in any of the genres of music, reference or weather. 

Finally, as developers we want to avoid the last ten or so genres at the bottom of the list. We would not immediately choose to develop free apps dealing with medical topics, catalogs, education, etc. 

But recall we are going to look at *both* the App Store and Google Play data sets to fulfill our mission. Thus, we need to generate similar tables for the `android_clean` (Google Play) data set. This will involve replacing `numbers of users` in the data set written like '100,000+' (string) with an integer 100000. Otherwise, the program functions just like that used for the Apple Store. 

In [73]:
#calculate average number of installs per app Category on Google Play
#Category is index 1
dict = freq_table(android_clean, 1)
ourtable = []

for category in dict:
    
    total = 0
    len_category = 0
    
    for row in android_clean:
        category_app = row[1]
        
        if category_app == category:
            installs = row[5]
            #remove commas and plus signs from number of installs
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            total += float(installs)
            len_category += 1

    ourtable.append((int(total/len_category),category))
                     
#sort and print it out
print('Average number of installs per Category on Google Play:')
print()
table_sorted = sorted(ourtable, reverse = True)
for entry in table_sorted:
    print(entry[1], ':', entry[0])

Average number of installs per Category on Google Play:

COMMUNICATION : 34934418
VIDEO_PLAYERS : 24030691
SOCIAL : 22961790
PHOTOGRAPHY : 16577037
PRODUCTIVITY : 15476180
TRAVEL_AND_LOCAL : 13218662
GAME : 11766908
TOOLS : 9774151
NEWS_AND_MAGAZINES : 9327628
ENTERTAINMENT : 8920750
BOOKS_AND_REFERENCE : 7504367
SHOPPING : 6932419
FAMILY : 4655437
WEATHER : 4570892
PERSONALIZATION : 4075783
SPORTS : 3964753
HEALTH_AND_FITNESS : 3951297
MAPS_AND_NAVIGATION : 3841846
FOOD_AND_DRINK : 1891060
ART_AND_DESIGN : 1856362
EDUCATION : 1711076
BUSINESS : 1651944
LIFESTYLE : 1365375
HOUSE_AND_HOME : 1331540
FINANCE : 1319851
DATING : 828971
COMICS : 803234
LIBRARIES_AND_DEMO : 630903
AUTO_AND_VEHICLES : 625061
PARENTING : 525351
BEAUTY : 513151
EVENTS : 249580
MEDICAL : 98997


Changing the index in the last program also gives us the table for the number of installs per `Genre` on Google Play.

In [74]:
#calculate average number of installs per app Genre on Google Play
#Genres column has index 9
dict = freq_table(android_clean, 9)
ourtable = []

for genre in dict:
    
    total = 0
    len_genre = 0
    
    for row in android_clean:
        genre_app = row[9]
        
        if genre_app == genre:
            installs = row[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            total += float(installs)
            len_genre += 1

    ourtable.append((int(total/len_genre),genre))
                     
print('Average number of installs per Genre on Google Play:')
print()
table_sorted = sorted(ourtable, reverse = True)
for entry in table_sorted[0:40]:
    print(entry[1], ':', entry[0])
print('etc.')

Average number of installs per Genre on Google Play:

Communication : 34934418
Video Players & Editors : 24030691
Social : 22961790
Adventure;Action & Adventure : 21230000
Arcade : 20325022
Casual : 18553845
Puzzle;Action & Adventure : 18366666
Educational;Action & Adventure : 17016666
Photography : 16577037
Productivity : 15476180
Racing : 15387333
Travel & Local : 13278840
Casual;Action & Adventure : 11923153
Action : 11648960
Tools;Education : 10000000
Role Playing;Brain Games : 10000000
Lifestyle;Pretend Play : 10000000
Casual;Music & Video : 10000000
Adventure;Education : 10000000
Tools : 9773879
Strategy : 9623027
News & Magazines : 9327628
Word : 9094458
Music : 8953710
Racing;Action & Adventure : 8266250
Puzzle;Brain Games : 8253529
Books & Reference : 7504367
Video Players & Editors;Music & Video : 7500000
Puzzle : 6995603
Shopping : 6932419
Entertainment;Music & Video : 6413333
Casual;Pretend Play : 5850440
Educational;Pretend Play : 5781930
Card;Action & Adventure : 5500000


## Conclusions

Looking at the last two tables, we see that apps in the categories of communication, social, and video players are at or near the top of both lists for android. But if we develop such an app for android, what are the chances it will be successful if we add it to the Apple Play store? Clearly, social apps are a good candidate for both platforms, because this genre of app appears at or near the top of all three lists. Also, video players are fairly successful as free apps on both platforms--notice 'Photo and Video" appears in 6th place (out of 23) in the Apple Store list above. 

To continue this reasoning in a systematic way, we visually inspect the three tables to find free apps that we might want to develop. First, we identify app categories that land in the top 10 list in **both** the Google Play Categories and Apple Store Genres tables. Results:  

    - Social 
    - Video Players
    - Photography
    - Travel
    - Game

Because we analyzed the Google Play data in two ways, let's also see which kinds of apps land in the top 10 list in both the Google Play Genres list and Apple Store Genre tables. A little thought is involved to find overlapping genres, for example "Action And Adventure" would overlap with "Games." Results:  

    - Video Players & Editors
    - Social
    - Games
        - Adventure; Action & Adventure
        - Arcade
        - Puzzle; Action & Adventure

As developers of free profitable apps, we could develop a free app in any of the categories above with a good potential for success. There are 5 general categories in all, and all fall under the category of "fun" apps versus "reference" or other "dry" categories. One way to proceed would be to choose one of these 5 categories we are knowledgeable about or interested in, and develop an associated app for the android platform. But despite the fact that these are popular apps, we will still face competition from similar apps in the app stores. Therefore, we might be better off selecting the app category from the list that has the fewest number of competitors.

We have already produced tables that list the number of apps available in each category, that is, how much competition an app might face. (Let's refer to these as competition tables, with the competition greatest at the top.) "Games" and "Social" appear very near the top of at least one competition table, so we will avoid these genres. That leaves "Video", "Photography" and "Travel" to think about. But "Photo & Video" appears in third place in the competition table for the Apple Store, so that may not be our most promising choice either. That leaves only "Travel", which appears at least halfway down the competition tables for Google Category and Apple Genre. Moreover, "Travel" also appears fairly far down in the competition table for Google Genre. This is exactly what we want for less competition! Bingo! Our data analysis has honed in on "Travel."

In conclusion, a **free travel app** is a solid choice for both profitability and success in the Google Play and Apple Store. 