# Investigation the Most Profitable Apps in App Store

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We will use two kaggle datasets to build our analysis on 

For android apps we will compile data from :
https://www.kaggle.com/lava18/google-play-store-apps

For ios apps we will compile data from :
https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps


With the proliferation of free apps in app stores we want to find the most appealing that have captured the largest market share. These apps will help us understand which apps have the greatest potential for in-app monetization and will also allow for us to dig deeper and see what makes these apps so popular

Lets begin by opening and examining this data

In [1]:
#Create a CSV opener
def open_csv(csv):
    from csv import reader
    f = open(csv)
    read = reader(f)
    parse = list(read)
    
    return parse

In [2]:
#Create an explorer function
def explore_data(dataset, start, end, rows_and_columns = False):
    ds = dataset[start:end]
    for row in ds:
        print(row,'\n')
    if rows_and_columns:
        print('Number of Rows :',len(dataset))
        print('Number of Columns :', len(dataset[0]),'\n')

In [3]:
#Open CSVs
ios = open_csv('AppleStore.csv')
andr = open_csv('googleplaystore.csv')

#Explore The And. and IOS App CSVs
explore_data(andr,0,2, True)
explore_data(ios,0,2, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

Number of Rows : 10842
Number of Columns : 13 

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 

Number of Rows : 7198
Number of Columns : 16 



column descriptions = https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

In [4]:
#Identify Missing Cols
def missing_col(data):
    for index,row in enumerate(data):
        if len(row) < len(data[0]):
            print("At row : {} \n{}\nThe row has {} columns but the number of columns at the header is {}"
                  .format(index,row,len(row),len(data[0])))

In [5]:
#find missing Cols
print(missing_col(ios))
print(missing_col(andr))

None
At row : 10473 
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
The row has 12 columns but the number of columns at the header is 13
None


In [6]:
#In android apps explore the row with a missing column value
andr[10473]

['Life Made WI-Fi Touchscreen Photo Frame',
 '1.9',
 '19',
 '3.0M',
 '1,000+',
 'Free',
 '0',
 'Everyone',
 '',
 'February 11, 2018',
 '1.0.19',
 '4.0 and up']

Because it seems we are missing the Category column, lets manually input a relevant value into this column. In this case it seems that we are looking at a Lifestyle app. Let's do that now.

In [7]:
#Insert the missing value in the column
andr[10473].insert(1,'Lifestyle')

In [8]:
#Check if it looks okay now
andr[10473]

['Life Made WI-Fi Touchscreen Photo Frame',
 'Lifestyle',
 '1.9',
 '19',
 '3.0M',
 '1,000+',
 'Free',
 '0',
 'Everyone',
 '',
 'February 11, 2018',
 '1.0.19',
 '4.0 and up']

We want to now examine if there are any duplicate apps in our dataset

In [9]:
#Create duplicate finder function
def duplicate_data(data,ios=False):
    duplicate_apps=[]
    unique_apps=[]
    for app in data:
        if ios:
            if app[1] in unique_apps:
                duplicate_apps.append(app[1])
            else:
                unique_apps.append(app[1])
        else:
            if app[0] in unique_apps:
                duplicate_apps.append(app[0])
            else:
                unique_apps.append(app[0])
    
    length_dup = len(duplicate_apps)
    lenght_unique = len(unique_apps)
    
    print('How many unique apps :',lenght_unique
          ,'\nHow many duplicate apps :',length_dup
          ,'\nSample of apps:\n',duplicate_apps[:5])
    
    #return unique_apps,duplicate_apps

In [10]:
#Check for duplicate IOS apps
duplicate_data(ios,True)

How many unique apps : 7196 
How many duplicate apps : 2 
Sample of apps:
 ['Mannequin Challenge', 'VR Roller Coaster']


We can see that in the IOS dataset there are 2 duplicate apps

In [11]:
#Check for Duplicate Android Apps
duplicate_data(andr)

How many unique apps : 9661 
How many duplicate apps : 1181 
Sample of apps:
 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


We can see that in the android dataset, there, is a lot more noise, it appears that there are 1181 duplicate apps

### Removing Duplicate Records

To remove these duplicate records we'll operate based on the assumption that the duplicate record with the highest number of reviews is the most recent entry and the most stable version released for the app.

In [12]:
#Create a function that creates a dictionary of all app names in a csv
# and stores the highest number of reviews that app has as its value.
def find_dups_dict(data,ios=False):
    reviews_max={}
    if ios:
        for app in data[1:]:
            name=app[1]
            n_reviews=float(app[5])

            if name not in reviews_max:
                reviews_max[name]=0
            if name in reviews_max and reviews_max[name] < n_reviews:
                reviews_max[name]=n_reviews
    
    else:
        for app in data[1:]:
            name=app[0]
            n_reviews=float(app[3])

            if name not in reviews_max:
                reviews_max[name]=0
            if name in reviews_max and reviews_max[name] < n_reviews:
                reviews_max[name]=n_reviews
    
    return reviews_max

In [13]:
# Create a duplicate remover. Function will run through all app names
# it will find the row where the app had the highest number of reviews 
# (using the above created function) and append that row to a new list
# this new list will now have no duplicates 
# i.e. it takes only the row with the highest rating
def remove_duplicates(data,ios=False):
    clean=[]
    already_added=[]
    if ios:
        reviews_max = find_dups_dict(data,ios=True)
        for app in data[1:]:
            name=app[1]
            reviews=float(app[5])
            if (reviews == reviews_max[name]) and (name not in already_added):
                clean.append(app)
                already_added.append(name)           
    else:
        reviews_max = find_dups_dict(data)
        for app in data[1:]:
            name=app[0]
            reviews=float(app[3])
            if (reviews == reviews_max[name]) and (name not in already_added):
                clean.append(app)
                already_added.append(name)        
    clean.insert(0,data[0])
    return clean  

In [14]:
android_clean = remove_duplicates(andr)
ios_clean=remove_duplicates(ios,ios=True)

In [15]:
print(len(ios_clean))
print(len(android_clean))

7196
9661


### Removing Non-English Apps

We want to now remove all apps that are not in the English language. Let's start by constructing a english text string parser. This will scan a string and return true if each letter is in the English language

In [16]:
def english_text(string):
    counter=0
    for letter in string:
        if ord(letter) > 127:
            counter+=1    
    if counter>3:
        return False
    else:
        return True

In [17]:
android_english = []
ios_english = []

for app in android_clean[1:]:
    name = app[0]
    if english_text(name):
        android_english.append(app)
android_english.insert(0,android_clean[0])
        
for app in ios_clean[1:]:
    name = app[1]
    if english_text(name):
        ios_english.append(app)
ios_english.insert(0,ios_clean[0])        

explore_data(android_english, 0, 2, True)
print('\n')
explore_data(ios_english, 0, 2, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

Number of Rows : 9616
Number of Columns : 13 



['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 

Number of Rows : 6182
Number of Columns : 16 



After cleaning our data and removing foreign language apps we find we have 9616 Android and 6182 IOS apps left

## Isolating Free Apps

Apps that use advertising as a revenue model are often free to use, in order to remove barriers to entry for potential customers. Our data sets contain both free and non-free apps. We'll clean our datasets further by isolating the apps that use a free to use model, as we're particuarly interested in seeing what diffrentiates these apps in the in-app ad market.
 

In [18]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
android_final.insert(0,android_english[0])        


for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
ios_final.insert(0,ios_english[0])        

print(len(android_final))
print(len(ios_final))

8866
3221


We now have 8865 free Android apps and 3220 IOS apps left

## Most Common Apps by Genre

So far we've cleaned our dataset to isolate apps that have the following features:
        1. Unique English apps in both the IOS App Store and Android Play Store
        2. Apps in the above stores that are free

The reason why we've done this, is because we want to validate which free English apps gain the most traction in both the Android and IOS App Stores. This information can be useful for app developers because it will allow them to create apps, gauge responses for it in the Play Store. If profitable the app can then be further rolled out to the App Store. 

To build on our goal, we now want to get a sense of what are the most common genres of apps in both markets. 

We can start doing this by building a Frequency table in both datasets to count apps by their genre

### Build Frequency Table

In [19]:
explore_data(ios_final,0,1) #col 11 is genre
explore_data(android_final,0,2) #col 9 is genre, col 2 is category

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 



In [20]:
def freq_table(dataset,index):
    table_entries={}
    table_counts = 0
    
    for row in dataset[1:]:
        table_counts += 1
        entry=row[index]
        if entry in table_entries:
            table_entries[entry]+=1
        else:
            table_entries[entry]=1
    
    table_freq={}
    for key in table_entries:
        table_freq[key]=round((table_entries[key]/table_counts)*100,2)
    
    return table_freq

In [21]:
#Let's create a function that sorts these dictionary values in decending order
def display_table(dataset,index):
    table=freq_table(dataset,index)
    table_display=[]
    for key in table:
        key_val_as_tuple=(table[key],key)
        table_display.append(key_val_as_tuple)

    table_sorted=sorted(table_display,reverse=True)

    for entry in table_sorted:
        print(entry[1],':',entry[0],'%')


Let's begin by examining the Prime Genre in the IOS App store

In [22]:
display_table(ios_final,11) # IOS Genres

Games : 58.14 %
Entertainment : 7.89 %
Photo & Video : 4.97 %
Education : 3.66 %
Social Networking : 3.29 %
Shopping : 2.61 %
Utilities : 2.52 %
Sports : 2.14 %
Music : 2.05 %
Health & Fitness : 2.02 %
Productivity : 1.74 %
Lifestyle : 1.58 %
News : 1.34 %
Travel : 1.24 %
Finance : 1.12 %
Weather : 0.87 %
Food & Drink : 0.81 %
Reference : 0.56 %
Business : 0.53 %
Book : 0.43 %
Navigation : 0.19 %
Medical : 0.19 %
Catalogs : 0.12 %


We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Let's continue by examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).

In [23]:
display_table(android_final, 1) # Category

FAMILY : 18.91 %
GAME : 9.72 %
TOOLS : 8.46 %
BUSINESS : 4.59 %
LIFESTYLE : 3.9 %
PRODUCTIVITY : 3.89 %
FINANCE : 3.7 %
MEDICAL : 3.53 %
SPORTS : 3.4 %
PERSONALIZATION : 3.32 %
COMMUNICATION : 3.24 %
HEALTH_AND_FITNESS : 3.08 %
PHOTOGRAPHY : 2.94 %
NEWS_AND_MAGAZINES : 2.8 %
SOCIAL : 2.66 %
TRAVEL_AND_LOCAL : 2.34 %
SHOPPING : 2.24 %
BOOKS_AND_REFERENCE : 2.14 %
DATING : 1.86 %
VIDEO_PLAYERS : 1.79 %
MAPS_AND_NAVIGATION : 1.4 %
FOOD_AND_DRINK : 1.24 %
EDUCATION : 1.16 %
ENTERTAINMENT : 0.96 %
LIBRARIES_AND_DEMO : 0.94 %
AUTO_AND_VEHICLES : 0.92 %
HOUSE_AND_HOME : 0.82 %
WEATHER : 0.8 %
EVENTS : 0.71 %
PARENTING : 0.65 %
ART_AND_DESIGN : 0.64 %
COMICS : 0.62 %
BEAUTY : 0.6 %
Lifestyle : 0.01 %



The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:

In [24]:
display_table(android_final,9) # Android Genre

Tools : 8.45 %
Entertainment : 6.07 %
Education : 5.35 %
Business : 4.59 %
Productivity : 3.89 %
Lifestyle : 3.89 %
Finance : 3.7 %
Medical : 3.53 %
Sports : 3.46 %
Personalization : 3.32 %
Communication : 3.24 %
Action : 3.1 %
Health & Fitness : 3.08 %
Photography : 2.94 %
News & Magazines : 2.8 %
Social : 2.66 %
Travel & Local : 2.32 %
Shopping : 2.24 %
Books & Reference : 2.14 %
Simulation : 2.04 %
Dating : 1.86 %
Arcade : 1.85 %
Video Players & Editors : 1.77 %
Casual : 1.76 %
Maps & Navigation : 1.4 %
Food & Drink : 1.24 %
Puzzle : 1.13 %
Racing : 0.99 %
Role Playing : 0.94 %
Libraries & Demo : 0.94 %
Auto & Vehicles : 0.92 %
Strategy : 0.91 %
House & Home : 0.82 %
Weather : 0.8 %
Events : 0.71 %
Adventure : 0.68 %
Comics : 0.61 %
Beauty : 0.6 %
Art & Design : 0.6 %
Parenting : 0.5 %
Card : 0.45 %
Casino : 0.43 %
Trivia : 0.42 %
Educational;Education : 0.39 %
Board : 0.38 %
Educational : 0.37 %
Education;Education : 0.34 %
Word : 0.26 %
Casual;Pretend Play : 0.24 %
Music : 0.2 %
R

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the free Engligh language apps in the App Store are dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

### Most Popular Apps by Genre on the App Store

#### Apple's App Store
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Below, we calculate the average number of user ratings per app genre on the App Store:

In [25]:
genre_lists=[]
genre_ios_dict=freq_table(ios_final,11)
for genre in genre_ios_dict:
    total=0
    len_genre=0
    for app in ios_final:
        genre_app=app[11]
        if genre_app==genre:
            number_of_ratings=float(app[5])
            total += number_of_ratings
            len_genre += 1
    avg=float(round(total/len_genre,1))
    tup=(avg,genre)
    genre_lists.append(tup)
genre_lists=sorted(genre_lists,reverse=True)

In [26]:
genre_lists

[(86090.3, 'Navigation'),
 (74942.1, 'Reference'),
 (71548.3, 'Social Networking'),
 (57326.5, 'Music'),
 (52279.9, 'Weather'),
 (39758.5, 'Book'),
 (33333.9, 'Food & Drink'),
 (31467.9, 'Finance'),
 (28441.5, 'Photo & Video'),
 (28243.8, 'Travel'),
 (26919.7, 'Shopping'),
 (23298.0, 'Health & Fitness'),
 (23008.9, 'Sports'),
 (22812.9, 'Games'),
 (21248.0, 'News'),
 (21028.4, 'Productivity'),
 (18684.5, 'Utilities'),
 (16485.8, 'Lifestyle'),
 (14029.8, 'Entertainment'),
 (7491.1, 'Business'),
 (7004.0, 'Education'),
 (4004.0, 'Catalogs'),
 (612.0, 'Medical')]

On average, navigation apps have the highest number of user reviews. Let's investigate it a little more

In [27]:
for app in ios_final:
    if app[11]=='Navigation':
        print(app[1],'-',app[5],' Ratings')

Waze - GPS Navigation, Maps & Real-time Traffic - 345046  Ratings
Google Maps - Navigation & Transit - 154911  Ratings
Geocaching® - 12811  Ratings
CoPilot GPS – Car Navigation & Offline Maps - 3582  Ratings
ImmobilienScout24: Real Estate Search in Germany - 187  Ratings
Railway Route Search - 5  Ratings


It appears that in the app store Navigation is dominated by Waze and Google Maps. These are skewing our results as these two apps are taking the bulk of the reviews in this category, and thus dont provide a very helpful picture of the types of apps we can develop if we were to create our own app. Let's turn instead to the next most popular catergory 'Reference'

In [28]:
for app in ios_final:
    if app[11]=='Reference':
        print(app[1],'-',app[5],' Ratings')

Bible - 985920  Ratings
Dictionary.com Dictionary & Thesaurus - 200047  Ratings
Dictionary.com Dictionary & Thesaurus for iPad - 54175  Ratings
Google Translate - 26786  Ratings
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran - 18418  Ratings
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition - 17588  Ratings
Merriam-Webster Dictionary - 16849  Ratings
Night Sky - 12122  Ratings
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) - 8535  Ratings
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools - 4693  Ratings
GUNS MODS for Minecraft PC Edition - Mods Tools - 1497  Ratings
Guides for Pokémon GO - Pokemon GO News and Cheats - 826  Ratings
WWDC - 762  Ratings
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free - 718  Ratings
VPN Express - 14  Ratings
Real Bike Traffic Rider Virtual Reality Glasses - 8  Ratings
教えて!goo - 0  Ratings
Jishokun-Japanese Engli

We're seeing more variation in this category. This is good as now we have a sense of the genre of apps that are being downloaded most by users. One idea may be to take a popular book in the public domain and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app. Let's turn our attention now to the Google Play store to understand the genre of apps most popular there

#### Google's Play Store

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. Let's take a look at the breakdown of this column now

In [29]:
display_table(android_final,5)

1,000,000+ : 15.72 %
100,000+ : 11.55 %
10,000,000+ : 10.55 %
10,000+ : 10.2 %
1,000+ : 8.4 %
100+ : 6.91 %
5,000,000+ : 6.82 %
500,000+ : 5.56 %
50,000+ : 4.77 %
5,000+ : 4.51 %
10+ : 3.54 %
500+ : 3.25 %
50,000,000+ : 2.3 %
100,000,000+ : 2.13 %
50+ : 1.92 %
5+ : 0.79 %
1+ : 0.51 %
500,000,000+ : 0.27 %
1,000,000,000+ : 0.23 %
0+ : 0.05 %
0 : 0.01 %


We can see that the data in here is binned, into ever increasing buckets, rather than raw values that we created earlier for our App Store analysis. We can still use this data however to find popular genres. To get around the binning, we will hold the following assumption an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on. To perform computations, however, we'll need to convert each install number from string to float. This means we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error.

In [49]:
categories_android=freq_table(android_final,1)
categories_android

{'ART_AND_DESIGN': 0.64,
 'AUTO_AND_VEHICLES': 0.92,
 'BEAUTY': 0.6,
 'BOOKS_AND_REFERENCE': 2.14,
 'BUSINESS': 4.59,
 'COMICS': 0.62,
 'COMMUNICATION': 3.24,
 'DATING': 1.86,
 'EDUCATION': 1.16,
 'ENTERTAINMENT': 0.96,
 'EVENTS': 0.71,
 'FAMILY': 18.91,
 'FINANCE': 3.7,
 'FOOD_AND_DRINK': 1.24,
 'GAME': 9.72,
 'HEALTH_AND_FITNESS': 3.08,
 'HOUSE_AND_HOME': 0.82,
 'LIBRARIES_AND_DEMO': 0.94,
 'LIFESTYLE': 3.9,
 'Lifestyle': 0.01,
 'MAPS_AND_NAVIGATION': 1.4,
 'MEDICAL': 3.53,
 'NEWS_AND_MAGAZINES': 2.8,
 'PARENTING': 0.65,
 'PERSONALIZATION': 3.32,
 'PHOTOGRAPHY': 2.94,
 'PRODUCTIVITY': 3.89,
 'SHOPPING': 2.24,
 'SOCIAL': 2.66,
 'SPORTS': 3.4,
 'TOOLS': 8.46,
 'TRAVEL_AND_LOCAL': 2.34,
 'VIDEO_PLAYERS': 1.79,
 'WEATHER': 0.8}

In [55]:
genre_lists_andr=[]
for category in categories_android:
    total=0
    len_category=0
    for app in android_final:
        category_app=app[1]
        if category_app==category:
            n_installs=app[5]
            n_installs=n_installs.replace(',','')
            n_installs=n_installs.replace('+','')
            total+=float(n_installs)
            len_category+=1
    avg=round(total/len_category,1)
    tup=(avg,category)
    genre_lists_andr.append(tup)
genre_lists_andr=sorted(genre_lists_andr,reverse=True)
genre_lists_andr

[(38456119.2, 'COMMUNICATION'),
 (24727872.5, 'VIDEO_PLAYERS'),
 (23253652.1, 'SOCIAL'),
 (17840110.4, 'PHOTOGRAPHY'),
 (16787331.3, 'PRODUCTIVITY'),
 (15588015.6, 'GAME'),
 (13984077.7, 'TRAVEL_AND_LOCAL'),
 (11640705.9, 'ENTERTAINMENT'),
 (10801391.3, 'TOOLS'),
 (9549178.5, 'NEWS_AND_MAGAZINES'),
 (8767811.9, 'BOOKS_AND_REFERENCE'),
 (7036877.3, 'SHOPPING'),
 (5201482.6, 'PERSONALIZATION'),
 (5074486.2, 'WEATHER'),
 (4188822.0, 'HEALTH_AND_FITNESS'),
 (4056941.8, 'MAPS_AND_NAVIGATION'),
 (3695641.8, 'FAMILY'),
 (3638640.1, 'SPORTS'),
 (1986335.1, 'ART_AND_DESIGN'),
 (1924897.7, 'FOOD_AND_DRINK'),
 (1833495.1, 'EDUCATION'),
 (1712290.1, 'BUSINESS'),
 (1437816.3, 'LIFESTYLE'),
 (1387692.5, 'FINANCE'),
 (1331540.6, 'HOUSE_AND_HOME'),
 (854028.8, 'DATING'),
 (817657.3, 'COMICS'),
 (647317.8, 'AUTO_AND_VEHICLES'),
 (638503.7, 'LIBRARIES_AND_DEMO'),
 (542603.6, 'PARENTING'),
 (513151.9, 'BEAUTY'),
 (253542.2, 'EVENTS'),
 (120550.6, 'MEDICAL'),
 (1000.0, 'Lifestyle')]

On average, communication apps have the most installs: 38,456,119. Let's examine apps in this category, to find out if we again are not picking a highly skewed catedgory like we saw in the App Store. This time lets isolate those apps that have more than 100,000,000 ratings

In [69]:

for app in android_final:
    if app[1] == 'COMMUNICATION' and (float((app[5].replace('+','').replace(',',''))) >= 100000000):
        print(app[0],'-',app[5],' Ratings')

WhatsApp Messenger - 1,000,000,000+  Ratings
imo beta free calls and text - 100,000,000+  Ratings
Android Messages - 100,000,000+  Ratings
Google Duo - High Quality Video Calls - 500,000,000+  Ratings
Messenger – Text and Video Chat for Free - 1,000,000,000+  Ratings
imo free video calls and chat - 500,000,000+  Ratings
Skype - free IM & video calls - 1,000,000,000+  Ratings
Who - 100,000,000+  Ratings
GO SMS Pro - Messenger, Free Themes, Emoji - 100,000,000+  Ratings
LINE: Free Calls & Messages - 500,000,000+  Ratings
Google Chrome: Fast & Secure - 1,000,000,000+  Ratings
Firefox Browser fast & private - 100,000,000+  Ratings
UC Browser - Fast Download Private & Secure - 500,000,000+  Ratings
Gmail - 1,000,000,000+  Ratings
Hangouts - 1,000,000,000+  Ratings
Messenger Lite: Free Calls & Messages - 100,000,000+  Ratings
Kik - 100,000,000+  Ratings
KakaoTalk: Free Calls & Text - 100,000,000+  Ratings
Opera Mini - fast web browser - 100,000,000+  Ratings
Opera Browser: Fast and Secure - 

We can see that even in this category we're seeing highly skewed data. This is true for the next couple of categories. Let's turn to the Books and Reference Category in the Play Store to see if we can find a similar breakdown to what we saw in the App Store.

In [70]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [74]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (float((app[5].replace('+','').replace(',',''))) >= 100000000):
        print(app[0],'-',app[5],' Ratings')

Google Play Books - 1,000,000,000+  Ratings
Bible - 100,000,000+  Ratings
Amazon Kindle - 100,000,000+  Ratings
Wattpad 📖 Free Books - 100,000,000+  Ratings
Audiobooks from Audible - 100,000,000+  Ratings


However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [75]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (1000000 <= float((app[5].replace('+','').replace(',',''))) <= 100000000):
        print(app[0],'-',app[5],' Ratings')

Wikipedia - 10,000,000+  Ratings
Cool Reader - 10,000,000+  Ratings
Book store - 1,000,000+  Ratings
FBReader: Favorite Book Reader - 10,000,000+  Ratings
Free Books - Spirit Fanfiction and Stories - 1,000,000+  Ratings
AlReader -any text book reader - 5,000,000+  Ratings
FamilySearch Tree - 1,000,000+  Ratings
Cloud of Books - 1,000,000+  Ratings
ReadEra – free ebook reader - 1,000,000+  Ratings
Ebook Reader - 5,000,000+  Ratings
Read books online - 5,000,000+  Ratings
eBoox: book reader fb2 epub zip - 1,000,000+  Ratings
All Maths Formulas - 1,000,000+  Ratings
Ancestry - 5,000,000+  Ratings
HTC Help - 10,000,000+  Ratings
Moon+ Reader - 10,000,000+  Ratings
English-Myanmar Dictionary - 1,000,000+  Ratings
Golden Dictionary (EN-AR) - 1,000,000+  Ratings
All Language Translator Free - 1,000,000+  Ratings
Bible - 100,000,000+  Ratings
Amazon Kindle - 100,000,000+  Ratings
Aldiko Book Reader - 10,000,000+  Ratings
Wattpad 📖 Free Books - 100,000,000+  Ratings
Dictionary - WordWeb - 5,000


This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran as well as many dictionary/reference libraries. This suggests that building an app around a popular book can be profitable venture for both the Google Play and the App Store markets. An opportunity exists for an app developer to pick a popular book (perhaps a classic novel in the public domain that's free of any copyright restrictions) and turn it into an app. This app might include features like:  
    - Daily quotes from the book, 
    - An audio version of the book, 
    - Quizzes on the book 
    - A forum where people can discuss the book

### Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a classical novel in the public domain) and turning it into an app could be profitable for both the Google Play and the App Store markets. This app might include features like daily quotes/audio snippets from the book in addition to quizzes and/or a forum where people can discuss the book.