# Profitable App Profiles for the App Store and Google Play Markets

What is the project about?
* find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

# Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:
* [A data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data about approximately ten thousand Android apps from Google Play
* [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)containing data about approximately seven thousand iOS apps from the App Store.

Let's start by opening the two data sets and then continue with exploring the data.

In [1]:
from csv import reader

###the Google Play data set###
open_file=open('googleplaystore.csv')
read_file=reader(open_file)
android=list(read_file)
android_header=android[0]
android=android[1:]

###the Apple Store data set###
open_file=open('AppleStore.csv')
read_file=reader(open_file)
ios=list(read_file)
ios_header=ios[0]
ios=ios[1:]


To make it easier to anylise the dataset,we creat a function **explore_data** as followings：

In [2]:
def explore_data(dataset,start,end,rows_and_columns=False):
    data_slice=dataset[start:end]
    for row in data_slice:
        print(row)
        print('\n')
        
    if rows_and_columns:
        print('Numbers of rows:',len(dataset))
        print('Numbers of columns',len(dataset[0]))


Then we will print first few rows of each data set.

In [3]:
print(android_header)
print('\n')
explore_data(android,0,4,True)



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Numbers of rows: 10841
Numbers of columns 13


We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful for the purpose of our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.

Now let's take a look at the App Store data set.

In [4]:
print(ios_header)
print('\n')
explore_data(ios,0,4,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Numbers of rows: 7197
Numbers of columns 16


We have 7197 iOS apps in this data set, and the columns that seem interesting are: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.

# Deleting Wrong data

The Google Play data set has a dedicated [dicussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion).And we can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row.

In [5]:
print(android[10472])
print('\n')
print(android_header)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


The row 10472 is an app called"Life Made WI-Fi Touchscreen Photo Frame",its rating is 19,which is obviously incorrect.Becasue the maximum rating for a Google Play app is  5.Thus we delete it.

In [6]:
print(len(android))
del android[10472]
print(len(android))

10841
10840


# Removing Duplicate Entries

## Part 1

There are some duplicate entries in the Google Play.For instance,we can use the following codes to find that the Instagram app has four entries.

In [7]:
for app in android:
    name=app[0]
    if name=='Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Then we will use the following code to count the number of duplicates:

In [8]:
duplicate_apps=[]
unique_apps=[]

for apps in android:
    name=apps[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Nmubers of duplicate apps:',len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:',duplicate_apps[:15])      

Nmubers of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


How to deal with duplicate entries,one thing we can do is to reove duplicate rows randomly,but we could probably find a better way.

If you examine the rows we printed for the Instagram app,the main difference happens on the fourth position of each row,which corresponds to the number of reviews. The different numbers show the data was collected at different times.We can use this information to build a criterion for removing the duplicates. The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

To do that,we will:
1.Creat a dictionary where each key is a unique app name,and the value is the highest review of that app.
2.Use the dictionary to caeat a new set,which will have only one entry per app.



## Part 2

In [9]:
reviews_max={}
for app in android:
    name=app[0]
    n_reviews=float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name]=n_reviews
    elif name not in reviews_max:
        reviews_max[name]=n_reviews


In a previous code cell, we found that there are 1,181 cases where an app occurs more than once, so the length of our dictionary should be equal to the difference between the length of our data set and 1,181.

In [10]:
print('Expected length:',len(android)-1181)
print('Actual length:',len(reviews_max))

Expected length: 9659
Actual length: 9659


Now we use the dictionary to remove the duplicate rows:
* Start by creating two empty lists: *android_clean* (which will store our new cleaned data set) and *already_added* (which will just store app names).

* We loop through the android data set, and for every iteration:
  * We isolate the name of the app and the number of reviews.
  * We add the current row (app) to the android_clean list, and the app name (name) to the already_cleaned list if:
    * The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
    * The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [11]:
android_clean=[]
already_added=[]

for app in android:
    name=app[0]
    n_reviews=float(app[3])
    if (n_reviews ==reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        

Now let's quickly explore the new data set,and confirm that the number of rows is 9659.

In [12]:
explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Numbers of rows: 9659
Numbers of columns 13


We have 9659 rows,as we expected.

# Removing Non-English Apps

## Part 1

If you explore the data sets enough,you will find some apps are not direxted towards an English-speaking audience.Below,we see some instances:

In [13]:
print(android_clean[4412][0])
print(android_clean[7940][0])

中国語 AQリスニング
لعبة تقدر تربح DZ


We're not interested in keeping these kind of apps, so we'll remove them.

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

We bulit the following function with built-in ord() function to find out english apps.

In [14]:
def is_English(string):
    for char in string:
        if ord(char) > 127:
            return False
    return True

print(is_English('Facebook'))
print(is_English('爱奇艺PPS'))

True
False


The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form。

In [15]:
print(is_English('Docs To Go™ Free Office Suite'))
print(is_English('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

False
False
8482
128540


## Part 2

To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range.  Our filter function is still not perfect, but it should be fairly effective.

In [16]:
def is_English(string):
    non_asc =0
    for character in string:
        if ord(character) > 127:
            non_asc += 1
            
    if non_asc>=3:
        return False
    else:
        return True

print(is_English('Docs To Go™ Free Office Suite'))
print(is_English('Instachat 😜'))

True
True


Below,we use above function to filter out the non-English apps.

In [17]:
android_english=[]
ios_english=[]

for app in android_clean:
    name=app[0]
    if is_English(name):
        android_english.append(app)
        
for app in ios:
    name=app[1]
    if is_English(name):
        ios_english.append(app)
        
explore_data(android_english,0,3,True)
print('\n')
explore_data(ios_english,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Numbers of rows: 9597
Numbers of columns 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+',

We can see that we're left with 9597 Android apps and 6155 iOS apps.

# Isolating the Free Apps

As we mentioned in the introduction, we only build apps that are free to download and install. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [18]:
android_final=[]
ios_final=[]

for app in android_english:
    price=app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

8848
3203


We're left with 8848 Android apps and 3203 iOS apps for our following analysis.

# Most Common Apps by Genre

## Part 1

So far, we spent a good amount of time on cleaning data, and:
* Removed inaccurate data
* Removed duplicate app entries
* Removed non-English apps
* Isolated the free apps

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.

Let's begin the analysis by getting a sense of what are the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our data sets.


## Part 2

We'll build two functions we can use to analyze the frequency tables:
* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order

To do that, we'll need to make use of the built-in sorted() function. The sorted() function doesn't work too well with dictionaries because it only considers and returns the dictionary keys.

However, the sorted() function works well if we transform the dictionary into a list of tuples, where each tuple contains a dictionary key along with its corresponding dictionary value. To ensure the sorting works right, the dictionary value comes first, and the dictionary key comes second.

 Using the workaround above, we wrote a helper function for you named display_table(). The display_table() function you see below:
* Takes in two parameters: dataset and index. dataset is expected to be a list of lists, and index is expected to be an integer.
* Generates a frequency table using the freq_table() function (which you're going to write as an exercise).
* Transforms the frequency table into a list of tuples, then sorts the list in a descending order.
* Prints the entries of the frequency table in descending order.

In [19]:
def freq_table(dataset,index):
    table={}
    total=0
    
    for row in dataset:
        value=row[index]
        total+=1
        if value in table:
            table[value]+=1
        else:
            table[value]=1
            
    table_percentages={}
    for key in table:
        percentage=(table[key]/total)*100
        table_percentages[key]=percentage
        
    return table_percentages

def display_table(dataset,index):
    table= freq_table(dataset,index)
    table_display=[]
    for key in table:
        key_val_as_tuple=(table[key],key)
        table_display.append(key_val_as_tuple)
        
    table_sorted=sorted(table_display,reverse=True)
    for entry in table_sorted:
        print(entry[1],':',entry[0])
        
        
        


## Part 3

Now we analyze the frequency table we generated for the prime_genre column of the App Store data set.

In [20]:
display_table(ios_final,11)

Games : 58.25788323446769
Entertainment : 7.836403371838902
Photo & Video : 4.995316890415236
Education : 3.6840462066812365
Social Networking : 3.3093974399000934
Shopping : 2.5913206369029034
Utilities : 2.466437714642523
Sports : 2.1542304089915705
Music : 2.0605682172962845
Health & Fitness : 2.0293474867311896
Productivity : 1.7483609116453322
Lifestyle : 1.5610365282547611
News : 1.3424914142990947
Travel : 1.248829222603809
Finance : 1.0927255697783327
Weather : 0.8741804558226661
Food & Drink : 0.8117389946924758
Reference : 0.5307524196066188
Business : 0.5307524196066188
Book : 0.3746487667811427
Navigation : 0.18732438339057134
Medical : 0.18732438339057134
Catalogs : 0.1248829222603809


From the result we can see:Games is the most common genre cause more than half of the free apps are in that genre.Followed is Entertainment genre with only about 7.8 %. Then Photo&Video up to nearly 5%.

The General impression is that most of the apps are designed  for entertainment(games, photo and video, social networking, sports, music),while the apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) are more less.

Let's continue to examine the apps in Google Play dataset.

In [21]:
display_table(android_final,1)

FAMILY : 18.942133815551536
GAME : 9.697106690777577
TOOLS : 8.453887884267631
BUSINESS : 4.599909584086799
PRODUCTIVITY : 3.899186256781193
LIFESTYLE : 3.887884267631103
FINANCE : 3.7070524412296564
MEDICAL : 3.5375226039783
SPORTS : 3.390596745027125
PERSONALIZATION : 3.322784810126582
COMMUNICATION : 3.2323688969258586
HEALTH_AND_FITNESS : 3.0854430379746836
PHOTOGRAPHY : 2.949819168173599
NEWS_AND_MAGAZINES : 2.802893309222423
SOCIAL : 2.667269439421338
TRAVEL_AND_LOCAL : 2.3395117540687163
SHOPPING : 2.2490958408679926
BOOKS_AND_REFERENCE : 2.1360759493670884
DATING : 1.8648282097649187
VIDEO_PLAYERS : 1.7970162748643763
MAPS_AND_NAVIGATION : 1.3901446654611211
FOOD_AND_DRINK : 1.2432188065099457
EDUCATION : 1.164104882459313
ENTERTAINMENT : 0.9606690777576853
LIBRARIES_AND_DEMO : 0.9380650994575045
AUTO_AND_VEHICLES : 0.9267631103074141
HOUSE_AND_HOME : 0.8024412296564195
WEATHER : 0.7911392405063291
EVENTS : 0.7120253164556962
PARENTING : 0.6555153707052441
ART_AND_DESIGN : 0.64

As we can see the most popular genre is Family catagory,whcih accounts for almost 19% of the apps.

The result seems significantly different on Google Play:there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). 

And,practical apps seems to have a better  representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:

In [22]:
display_table(android_final, -4)

Tools : 8.44258589511754
Entertainment : 6.080470162748644
Education : 5.357142857142857
Business : 4.599909584086799
Productivity : 3.899186256781193
Lifestyle : 3.8765822784810124
Finance : 3.7070524412296564
Medical : 3.5375226039783
Sports : 3.4584086799276674
Personalization : 3.322784810126582
Communication : 3.2323688969258586
Action : 3.096745027124774
Health & Fitness : 3.0854430379746836
Photography : 2.949819168173599
News & Magazines : 2.802893309222423
Social : 2.667269439421338
Travel & Local : 2.328209764918626
Shopping : 2.2490958408679926
Books & Reference : 2.1360759493670884
Simulation : 2.0456600361663653
Dating : 1.8648282097649187
Arcade : 1.842224231464738
Video Players & Editors : 1.7744122965641953
Casual : 1.763110307414105
Maps & Navigation : 1.3901446654611211
Food & Drink : 1.2432188065099457
Puzzle : 1.1301989150090417
Racing : 0.9945750452079566
Role Playing : 0.9380650994575045
Libraries & Demo : 0.9380650994575045
Auto & Vehicles : 0.9267631103074141
St

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

# Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

In [23]:
genre_ios=freq_table(ios_final,-5)

for genre in genre_ios:
    total=0
    genre_length=0
    for app in ios_final:
        if app[-5]==genre:
            n_ratings=float(app[5])
            total+=n_ratings
            genre_length += 1
    genre_ave=total/genre_length
    print(genre,':',genre_ave)

Weather : 52279.892857142855
Music : 57326.530303030304
Photo & Video : 28441.54375
Navigation : 86090.33333333333
Productivity : 21028.410714285714
Medical : 612.0
Catalogs : 4004.0
Shopping : 27230.734939759037
Entertainment : 14195.358565737051
Education : 7003.983050847458
Health & Fitness : 23298.015384615384
Social Networking : 71548.34905660378
Business : 7491.117647058823
Food & Drink : 33333.92307692308
Reference : 79350.4705882353
Book : 46384.916666666664
Utilities : 19156.493670886077
Finance : 32367.02857142857
Games : 22886.36709539121
Sports : 23008.898550724636
Travel : 28243.8
Lifestyle : 16815.48
News : 21248.023255813954


As we can see that Navigation apps have the highest number of user reviews on average.

# Most Popular Apps by Genre on Google Play

We have data about the number of installs for the Google Play market, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.)

However, we don't need very precise data for our purposes — we only want to find out which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.To perform computations, however, we'll need to convert each install number from string to float.

In [24]:
categories_android=freq_table(android_final,1)

for category in categories_android:
    total=0
    len_category=0
    for app in android_final:
        category_app= app[1]
        if category_app==category:
            n_installs= app[5]
            n_installs= n_installs.replace('+','')
            n_installs= n_installs.replace(',','')
            total+= float(n_installs)
            len_category+=1
    ave_n_installs=total/len_category
    print(category,':',ave_n_installs)
    
    

BOOKS_AND_REFERENCE : 8814199.78835979
NEWS_AND_MAGAZINES : 9549178.467741935
COMICS : 832613.8888888889
HOUSE_AND_HOME : 1360598.042253521
PERSONALIZATION : 5201482.6122448975
LIFESTYLE : 1446158.2238372094
WEATHER : 5145550.285714285
EVENTS : 253542.22222222222
SHOPPING : 7036877.311557789
MEDICAL : 120550.61980830671
ENTERTAINMENT : 11640705.88235294
AUTO_AND_VEHICLES : 647317.8170731707
FOOD_AND_DRINK : 1924897.7363636363
FINANCE : 1387692.475609756
TOOLS : 10830251.970588235
COMMUNICATION : 38590581.08741259
PARENTING : 542603.6206896552
BEAUTY : 513151.88679245283
BUSINESS : 1712290.1474201474
FAMILY : 3695641.8198090694
TRAVEL_AND_LOCAL : 13984077.710144928
EDUCATION : 1833495.145631068
ART_AND_DESIGN : 1986335.0877192982
GAME : 15544014.51048951
HEALTH_AND_FITNESS : 4188821.9853479853
DATING : 854028.8303030303
SOCIAL : 23253652.127118643
PRODUCTIVITY : 16787331.344927534
SPORTS : 3650602.276666667
MAPS_AND_NAVIGATION : 4049274.6341463416
LIBRARIES_AND_DEMO : 638503.734939759
V

As we can see the most installed app category in Google Play is communication one(38590581)

In [25]:
for app in android_final:
    if app[1]=='COMMUNICATION'and (app[5]=='1,000,000,000+' or app[5]=='500,000,000+'or app[5]=='100,000,000+'):
        print (app[0],':',app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

we can see that a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts).

Except communication category, Video_Players (24727872) and Social (23253652) also have a nice performance at installation.

But the main concern is that these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The PHOTOGRAPHY genre looks fairly popular as well, with an average number of installs of 1784 0100. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

In [26]:
for app in android_final:
    if app[1]=='PHOTOGRAPHY':
        print (app[0],':',app[5])

TouchNote: Cards & Gifts : 1,000,000+
FreePrints – Free Photos Delivered : 1,000,000+
Groovebook Photo Books & Gifts : 500,000+
Moony Lab - Print Photos, Books & Magnets ™ : 50,000+
LALALAB prints your photos, photobooks and magnets : 1,000,000+
Snapfish : 1,000,000+
Motorola Camera : 50,000,000+
HD Camera - Best Cam with filters & panorama : 5,000,000+
LightX Photo Editor & Photo Effects : 10,000,000+
Sweet Snap - live filter, Selfie photo edit : 10,000,000+
HD Camera - Quick Snap Photo & Video : 1,000,000+
B612 - Beauty & Filter Camera : 100,000,000+
Waterfall Photo Frames : 1,000,000+
Photo frame : 100,000+
Huji Cam : 5,000,000+
Unicorn Photo : 1,000,000+
HD Camera : 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera : 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor : 1,000,000+
Moto Photo Editor : 5,000,000+
InstaBeauty -Makeup Selfie Cam : 50,000,000+
Garden Photo Frames - Garden Photo Editor : 500,000+
Photo Frame : 10,000,000+
Selfie Camera - Photo Edito

The PHOTOGRAPHY genre includes  a majority of five apps:camera,photo processing,makeup,photo college and photo galary.

Among these:camera,photo processing and makeup category count for the vast majority. 

In [27]:
for app in android_final:
    if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

B612 - Beauty & Filter Camera : 100,000,000+
YouCam Makeup - Magic Selfie Makeovers : 100,000,000+
Sweet Selfie - selfie camera, beauty cam, photo edit : 100,000,000+
Google Photos : 1,000,000,000+
Retrica : 100,000,000+
Photo Editor Pro : 100,000,000+
BeautyPlus - Easy Photo Editor & Selfie Camera : 100,000,000+
PicsArt Photo Studio: Collage Maker & Pic Editor : 100,000,000+
Photo Collage Editor : 100,000,000+
Z Camera - Photo Editor, Beauty Selfie, Collage : 100,000,000+
PhotoGrid: Video & Pic Collage Maker, Photo Editor : 100,000,000+
Candy Camera - selfie, beauty camera, photo editor : 100,000,000+
YouCam Perfect - Selfie Photo Editor : 100,000,000+
Camera360: Selfie Photo Editor with Funny Sticker : 100,000,000+
S Photo Editor - Collage Maker , Photo Collage : 100,000,000+
AR effect : 100,000,000+
Cymera Camera- Photo Editor, Filter,Collage,Layout : 100,000,000+
LINE Camera - Photo editor : 100,000,000+
Photo Editor Collage Maker Pro : 100,000,000+


It looks like there are many very popular apps, so this market shows more competition.

However,as we can see before :among PHOTOGRAPHY genre,photo college and photo galary these two category are still have potential cause there are few apps about them.Thus we may come to a conclusion that those apps who teach people how to photo more better and photo platform where people can share and find excellent photo could be profitable for both the Google Play and the App Store markets.

Moreover,there is no doubt that one app integrating multiple functions, like a camera with photo college or other feature,may significantlyincrease 

# Conclusion

In this project,we analyzed data about Google Play the App Store apps and try to find which genre that can be with profitable potential for both markets.

In the end we come to a conclude that a PHOTOGRAPHY app with one or more other special features like photo college could be profitable for both the Google Play and the App Store markets.Although the photography markets are full of camera and photo processing genre,we may enrich the app with learning or social features.