# Profitable App Profiles for the App Store and Google Play Markets


   We're analysing for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.

   We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

# Opening and Exploring the Data

In [1]:
import pandas as pd

apple_store = pd.read_csv("AppleStore.csv")
googlep_store = pd.read_csv("googleplaystore.csv")

In [2]:
apple_store.head()

Unnamed: 0.1,Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,1,281656475,PAC-MAN Premium,100788224,USD,3.99,21292,26,4.0,4.5,6.3.5,4+,Games,38,5,10,1
1,2,281796108,Evernote - stay organized,158578688,USD,0.0,161065,26,4.0,3.5,8.2.2,4+,Productivity,37,5,23,1
2,3,281940292,"WeatherBug - Local Weather, Radar, Maps, Alerts",100524032,USD,0.0,188583,2822,3.5,4.5,5.0.0,4+,Weather,37,5,3,1
3,4,282614216,"eBay: Best App to Buy, Sell, Save! Online Shop...",128512000,USD,0.0,262241,649,4.0,4.5,5.10.0,12+,Shopping,37,5,9,1
4,5,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1


In [3]:
apple_store.shape

(7197, 17)

In [4]:
googlep_store.head(60)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up
6,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19M,"50,000+",Free,0,Everyone,Art & Design,"April 26, 2018",1.1,4.0.3 and up
7,Infinite Painter,ART_AND_DESIGN,4.1,36815,29M,"1,000,000+",Free,0,Everyone,Art & Design,"June 14, 2018",6.1.61.1,4.2 and up
8,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33M,"1,000,000+",Free,0,Everyone,Art & Design,"September 20, 2017",2.9.2,3.0 and up
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up


In [5]:
googlep_store.shape

(10841, 13)

# Removing redundant Columns

   
   
   We have 7197 iOS apps in this data set, and the columns that seem interesting are:
    'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'
    
    
    

In [6]:
apple_store1 = apple_store[['track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver','prime_genre']]
del apple_store
apple_store1.head()

Unnamed: 0,track_name,currency,price,rating_count_tot,rating_count_ver,prime_genre
0,PAC-MAN Premium,USD,3.99,21292,26,Games
1,Evernote - stay organized,USD,0.0,161065,26,Productivity
2,"WeatherBug - Local Weather, Radar, Maps, Alerts",USD,0.0,188583,2822,Weather
3,"eBay: Best App to Buy, Sell, Save! Online Shop...",USD,0.0,262241,649,Shopping
4,Bible,USD,0.0,985920,5320,Reference




We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful for the purpose of our analysis are
'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.



In [7]:
googlep_store1 = googlep_store[['App','Reviews', 'Installs', 'Type', 'Price','Genres']]
del googlep_store
googlep_store1.head()

Unnamed: 0,App,Reviews,Installs,Type,Price,Genres
0,Photo Editor & Candy Camera & Grid & ScrapBook,159,"10,000+",Free,0,Art & Design
1,Coloring book moana,967,"500,000+",Free,0,Art & Design;Pretend Play
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",87510,"5,000,000+",Free,0,Art & Design
3,Sketch - Draw & Paint,215644,"50,000,000+",Free,0,Art & Design
4,Pixel Draw - Number Art Coloring Book,967,"100,000+",Free,0,Art & Design;Creativity


# Cleaning Data




 The company  builds apps that are free to download and install, and that are directed toward an English-speaking audience. This means that we'll need to:

* Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播
* Remove non-free apps

In [8]:

for row in apple_store1["track_name"]:  
    for char in row:
        if len(str(ord(char))) == 5: 
            apple_store1 = apple_store1[apple_store1["track_name"]!= row]
            break    

In [9]:
apple_store1["track_name"].value_counts().head()


Mannequin Challenge                         2
VR Roller Coaster                           2
The Lost Heir: The Fall of Daria            1
Wild Kratts Creature Power                  1
WatchNotes - Display notes on watch face    1
Name: track_name, dtype: int64

# Deleting the duplicates

# Part One

 The apple store data has 2 duplicates .We are deleting the duplicates by taking the values which has 
    the highest count of reviews since it will be the recent review

In [10]:
apple_store1["track_name"].shape

(6133,)

In [11]:
apple_store1["track_name"].unique().shape

(6131,)

In [12]:
apple_store1["track_name"].value_counts()

Mannequin Challenge                                                              2
VR Roller Coaster                                                                2
The Lost Heir: The Fall of Daria                                                 1
Wild Kratts Creature Power                                                       1
WatchNotes - Display notes on watch face                                         1
DIRECTV App for iPad                                                             1
Bad Piggies HD                                                                   1
Dream Machine : The Game                                                         1
Math 42                                                                          1
Microsoft OneDrive – File & photo cloud storage                                  1
Runtastic Road Bike GPS Cycling Route Tracker PRO                                1
Knights Fight: Medieval Arena                                                    1
Twit

In [13]:
apple_store1[apple_store1["track_name"] == "VR Roller Coaster"]


Unnamed: 0,track_name,currency,price,rating_count_tot,rating_count_ver,prime_genre
3319,VR Roller Coaster,USD,0.0,107,102,Games
5603,VR Roller Coaster,USD,0.0,67,44,Games


In [14]:
apple_store1=apple_store1[~(apple_store1["track_name"] == "VR Roller Coaster")& ~(apple_store1["rating_count_tot"] == 67)]


In [15]:
apple_store1[apple_store1["track_name"] == "Mannequin Challenge"]

Unnamed: 0,track_name,currency,price,rating_count_tot,rating_count_ver,prime_genre
7092,Mannequin Challenge,USD,0.0,668,87,Games
7128,Mannequin Challenge,USD,0.0,105,58,Games


In [16]:
apple_store1=apple_store1[~(apple_store1["track_name"] == "Mannequin Challenge")& ~(apple_store1["rating_count_tot"] == 105)]


In [17]:
apple_store1["track_name"].shape

(6104,)

In [18]:
apple_store1["track_name"].unique().shape

(6104,)

# Part two

Deleting duplicates from google play data

In [20]:
googlep_store1["App"].unique().shape

(9660,)

In [21]:
googlep_store1["App"].shape

(10841,)

In [22]:
googlep_store1.drop_duplicates(keep='first')
googlep_store2 = pd.DataFrame()
for row in googlep_store1["App"].unique():
    max_val = googlep_store1.loc[googlep_store1["App"] == row , "Reviews"].max()
    matched_data = googlep_store1[(googlep_store1["App"] == row)&(googlep_store1["Reviews"] == max_val)].drop_duplicates()
    googlep_store2 = googlep_store2.append(matched_data)
del googlep_store1

In [23]:
googlep_store2["App"].unique().shape

(9660,)

In [24]:
googlep_store2["App"].shape

(9660,)

# Price column

We want to analyse the application which is free . We can clean the data further by removing 
the price rows which has values other than 0 

In [25]:
apple_store1 = apple_store1[apple_store1["price"]== 0.00]

In [26]:
googlep_store2 = googlep_store2[googlep_store2["Type"]== "Free"]
    

In [27]:
googlep_store2.shape

(8902, 6)

In [28]:
apple_store1.shape

(3166, 6)

In [29]:
googlep_store2.head()

Unnamed: 0,App,Reviews,Installs,Type,Price,Genres
0,Photo Editor & Candy Camera & Grid & ScrapBook,159,"10,000+",Free,0,Art & Design
2033,Coloring book moana,974,"500,000+",Free,0,Art & Design;Pretend Play
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",87510,"5,000,000+",Free,0,Art & Design
3,Sketch - Draw & Paint,215644,"50,000,000+",Free,0,Art & Design
4,Pixel Draw - Number Art Coloring Book,967,"100,000+",Free,0,Art & Design;Creativity


In [30]:
apple_store1.head()

Unnamed: 0,track_name,currency,price,rating_count_tot,rating_count_ver,prime_genre
1,Evernote - stay organized,USD,0.0,161065,26,Productivity
2,"WeatherBug - Local Weather, Radar, Maps, Alerts",USD,0.0,188583,2822,Weather
3,"eBay: Best App to Buy, Sell, Save! Online Shop...",USD,0.0,262241,649,Shopping
4,Bible,USD,0.0,985920,5320,Reference
6,PayPal - Send and request money safely,USD,0.0,119487,879,Finance


#             # Analysis

our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.
Let's begin the analysis by getting a sense of what are the most common genres for each market.

# Determining the most common genre

In [31]:
((googlep_store2["Genres"].value_counts(normalize = True))*100).head(20)

Tools                8.413840
Entertainment        6.088519
Education            5.392047
Business             4.583240
Lifestyle            3.920467
Productivity         3.886767
Finance              3.684565
Medical              3.504830
Sports               3.448663
Personalization      3.313862
Communication        3.235228
Action               3.089193
Health & Fitness     3.066727
Photography          2.943159
News & Magazines     2.830825
Social               2.651090
Travel & Local       2.314087
Shopping             2.246686
Books & Reference    2.179286
Simulation           2.066951
Name: Genres, dtype: float64

In [32]:
((apple_store1["prime_genre"].value_counts(normalize = True))*100).head(20)

Games                58.370183
Entertainment         7.927985
Photo & Video         5.022110
Education             3.695515
Social Networking     3.284902
Shopping              2.526848
Utilities             2.400505
Sports                2.179406
Music                 2.053064
Health & Fitness      2.021478
Productivity          1.674037
Lifestyle             1.547694
News                  1.358181
Travel                1.168667
Finance               1.105496
Weather               0.884397
Food & Drink          0.821226
Reference             0.536955
Business              0.536955
Book                  0.379027
Name: prime_genre, dtype: float64

Tools are the most common genre of the applications in Google Play . Whereas Games is the most common genre in Apple store

# Most Installed Application

# Part One

One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [33]:
def data_type(c):
    c= c.replace("+","")
    c= c.replace(",","")
    return c

googlep_store2["Installs"] = googlep_store2["Installs"].apply(data_type)
googlep_store2["Installs"]= pd.to_numeric(googlep_store2["Installs"])


In [34]:
googlep_store2.groupby('Genres', as_index=False).agg({"Installs": 'sum'}).sort_values(by=['Installs'],ascending=False).head(20)


Unnamed: 0,Genres,Installs
33,Communication,11036916201
104,Tools,7991044474
79,Productivity,5791679314
97,Social,5487861902
78,Photography,4647268915
110,Video Players & Editors,3916831720
5,Arcade,3753691940
0,Action,3465986940
24,Casual,3042798570
49,Entertainment,3014472513


In [35]:

googlep_store2["Reviews"]= pd.to_numeric(googlep_store2["Reviews"])
googlep_store2.groupby('Genres', as_index=False).agg({"Reviews": 'sum'}).sort_values(by=['Reviews'],ascending=False).head(25)


Unnamed: 0,Genres,Reviews
33,Communication,285739868
104,Tools,228916587
97,Social,227936113
0,Action,149868296
24,Casual,130533147
5,Arcade,116248093
78,Photography,105237134
100,Strategy,101507317
110,Video Players & Editors,67352253
98,Sports,65525584


In [36]:
googlep_store2.loc[googlep_store2["Genres"] == "Communication","App"].head(10)

382     Messenger – Text and Video Chat for Free
336                           WhatsApp Messenger
337                            Messenger for SMS
411                 Google Chrome: Fast & Secure
4106       Messenger Lite: Free Calls & Messages
451                                        Gmail
464                                     Hangouts
4676                             Viber Messenger
343                                     My Tele2
412               Firefox Browser fast & private
Name: App, dtype: object

In [37]:
googlep_store2.loc[googlep_store2["Genres"] == "Social","App"].head(10)

2544                                        Facebook
2604                                       Instagram
2546                                   Facebook Lite
10170    Messages, Text and Video Chat for Messenger
2548                                          Tumblr
10171                            All Social Networks
2610                                        Snapchat
2551                  Social network all in one 2018
2552                                       Pinterest
2553                     TextNow - free text + calls
Name: App, dtype: object

Messaging Applications has the most installations in Google store.B This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

# Part two

In [38]:
apple_store1.groupby("prime_genre")["rating_count_tot"].sum().sort_values(ascending=False).head(25)

prime_genre
Games                42699932
Social Networking     7583321
Photo & Video         4550487
Music                 3783327
Entertainment         3563031
Shopping              2259035
Sports                1587614
Health & Fitness      1514266
Utilities             1477258
Weather               1463837
Reference             1348958
Productivity          1165092
Finance               1132846
Travel                1129316
News                   913665
Food & Drink           866682
Lifestyle              840580
Education              703280
Book                   556619
Navigation             516542
Business               127349
Catalogs                16016
Medical                  3672
Name: rating_count_tot, dtype: int64

In [39]:
((apple_store1["prime_genre"].value_counts(normalize = True))*100).head(10)

Games                58.370183
Entertainment         7.927985
Photo & Video         5.022110
Education             3.695515
Social Networking     3.284902
Shopping              2.526848
Utilities             2.400505
Sports                2.179406
Music                 2.053064
Health & Fitness      2.021478
Name: prime_genre, dtype: float64

58% of the applications in Apple store are Games .The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. 
However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.
For example,Weather Application has the higher number of reviews.But Apple has less weather applications

In [40]:
apple_store1[apple_store1["prime_genre"] == "Social Networking"].head(10)

Unnamed: 0,track_name,currency,price,rating_count_tot,rating_count_ver,prime_genre
16,Facebook,USD,0.0,2974676,212,Social Networking
30,LinkedIn,USD,0.0,71856,62,Social Networking
91,Skype for iPhone,USD,0.0,373519,127,Social Networking
92,Tumblr,USD,0.0,334293,919,Social Networking
94,Match™ - #1 Dating App.,USD,0.0,60659,57,Social Networking
125,WhatsApp Messenger,USD,0.0,287589,73088,Social Networking
142,TextNow - Unlimited Text + Calls,USD,0.0,164963,69,Social Networking
160,"Grindr - Gay and same sex guys chat, meet and ...",USD,0.0,23201,14,Social Networking
239,imo video calls and chat,USD,0.0,18841,0,Social Networking
300,Ameba,USD,0.0,269,0,Social Networking


# Result:

The applications(Games,Social Networking,Photo & Video,Music,Entertainment)
which has the highest number of reviews are dominated by big players like Facebook ,linkedin etc . 
And the other app with higher reviews require indepth domain knowledge . 
The app 'Books' seems to be interesting since it has higher reviews

In [41]:
googlep_store2.loc[googlep_store2["Genres"] == "Books & Reference","App"].head(10)

4715                  Wattpad 📖 Free Books
140       E-Book Read - Read Book for free
141     Download free book with green book
142                              Wikipedia
4083                         Amazon Kindle
144                            Cool Reader
9621          Dictionary - Merriam-Webster
6497         NOOK: Read eBooks & Magazines
147                 Free Panda Radio Music
148                             Book store
Name: App, dtype: object

There are less number of applications with the BOOKS genre in google and apple store.
Book has good user base(556619) in Apple store and it has higher reviews in google store too.It is free and it doesn't require in-depth domain knowledge