## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [29]:
import pandas as pd

apps= pd.read_csv('datasets/apps.csv')

# Exploring the data
apps.info()
apps.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
App             9659 non-null object
Category        9659 non-null object
Rating          8196 non-null float64
Reviews         9659 non-null int64
Size            8432 non-null float64
Installs        9659 non-null object
Type            9659 non-null object
Price           9659 non-null float64
Last Updated    9659 non-null object
dtypes: float64(3), int64(1), object(5)
memory usage: 679.2+ KB


Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,"10,000+",Free,0.0,"January 7, 2018"
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,"500,000+",Free,0.0,"January 15, 2018"
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,"5,000,000+",Free,0.0,"August 1, 2018"
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,"50,000,000+",Free,0.0,"June 8, 2018"
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,"100,000+",Free,0.0,"June 20, 2018"


In [30]:
# Creating a variable of characters to remove from the Installs column
chars_to_remove=[',', '+']

# Creating for loop to replace the characters with empty value
for chars in chars_to_remove:
    apps['Installs']= apps['Installs'].apply(lambda x: x.replace(chars,''))

# Converting the column into integer data type
apps['Installs']= apps['Installs'].astype(int)

apps.info()
apps.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
App             9659 non-null object
Category        9659 non-null object
Rating          8196 non-null float64
Reviews         9659 non-null int64
Size            8432 non-null float64
Installs        9659 non-null int64
Type            9659 non-null object
Price           9659 non-null float64
Last Updated    9659 non-null object
dtypes: float64(3), int64(2), object(4)
memory usage: 679.2+ KB


Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,10000,Free,0.0,"January 7, 2018"
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018"
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,5000000,Free,0.0,"August 1, 2018"
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,50000000,Free,0.0,"June 8, 2018"
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,100000,Free,0.0,"June 20, 2018"


In [31]:
# Grouping the data by Category column and aggregating the count and mean of price and rating.
counts_categories= apps.groupby('Category').agg({'App': 'count','Price':'mean','Rating':'mean'})

# Rename the columns
app_category_info= counts_categories.rename(columns={'App':'Number of apps', 'Price': 'Average price','Rating': 'Average rating'})

print(app_category_info)

                     Number of apps  Average price  Average rating
Category                                                          
ART_AND_DESIGN                   64       0.093281        4.357377
AUTO_AND_VEHICLES                85       0.158471        4.190411
BEAUTY                           53       0.000000        4.278571
BOOKS_AND_REFERENCE             222       0.539505        4.344970
BUSINESS                        420       0.417357        4.098479
COMICS                           56       0.000000        4.181481
COMMUNICATION                   315       0.263937        4.121484
DATING                          171       0.160468        3.970149
EDUCATION                       119       0.150924        4.364407
ENTERTAINMENT                   102       0.078235        4.135294
EVENTS                           64       1.718594        4.435556
FAMILY                         1832       1.309967        4.179664
FINANCE                         345       8.408203        4.11

In [32]:
# Load user review.csv file
user_reviews = pd.read_csv('datasets/user_reviews.csv')

# Merge the apps df with user_reviews df
merged = apps.merge(user_reviews, on='App')

merged.info()
merged.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 61556 entries, 0 to 61555
Data columns (total 12 columns):
App                   61556 non-null object
Category              61556 non-null object
Rating                61556 non-null float64
Reviews               61556 non-null int64
Size                  41150 non-null float64
Installs              61556 non-null int64
Type                  61556 non-null object
Price                 61556 non-null float64
Last Updated          61556 non-null object
Review                35929 non-null object
Sentiment Category    35934 non-null object
Sentiment Score       35934 non-null float64
dtypes: float64(4), int64(2), object(6)
memory usage: 6.1+ MB


Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated,Review,Sentiment Category,Sentiment Score
0,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018",A kid's excessive ads. The types ads allowed a...,Negative,-0.25
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018",It bad >:(,Negative,-0.725
2,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018",like,Neutral,0.0
3,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018",,,
4,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018",I love colors inspyering,Positive,0.5


In [33]:
# Filter the merged data by only selecting Finance category
merged_finance = merged[merged['Category']=='FINANCE']

# Filter again by only selecting free apps
merged_free_finance = merged_finance[merged_finance['Type']=='Free']

print(merged_free_finance.head())

                     App Category  Rating  Reviews  Size  Installs  Type  \
14112  Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000  Free   
14113  Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000  Free   
14114  Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000  Free   
14115  Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000  Free   
14116  Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000  Free   

       Price   Last Updated  \
14112    0.0  July 27, 2018   
14113    0.0  July 27, 2018   
14114    0.0  July 27, 2018   
14115    0.0  July 27, 2018   
14116    0.0  July 27, 2018   

                                                  Review Sentiment Category  \
14112  Forget paying app, designed make fail payments...           Negative   
14113  It's working expected, talking best bank Mexic...           Positive   
14114  It has many problems with Android 8.1. You can...           Positive   
14115  I changed my phone to a Xiaomi Re

In [34]:
# Group by App column and calculate average sentiment score
merged_sorted_grouped = merged_free_finance.groupby('App').agg({'Sentiment Score': 'mean'})

print(merged_sorted_grouped)

                                                App  Sentiment Score
0                                         A+ Mobile         0.329592
1                                         ACE Elite         0.252171
2                      Acorns - Invest Spare Change         0.046667
3                                       Amex Mobile         0.175666
4                    Associated Credit Union Mobile         0.388093
5                              BBVA Compass Banking         0.205590
6                                        BBVA Spain         0.515086
7                                    BZWBK24 mobile         0.326883
8                    Bank of America Mobile Banking         0.180027
9                               BankMobile Vibe App         0.353455
10                                    Banorte Movil         0.116999
11                          Barclays US for Android         0.017928
12                                       Betterment         0.143252
13                           Bloom

In [35]:
merged_sorted_final = merged_sorted_grouped.sort_values(by= 'Sentiment Score', ascending= False)
top_10_user_feedback = merged_sorted_final.head(10)
print(top_10_user_feedback)

                                           App  Sentiment Score
6                                   BBVA Spain         0.515086
4               Associated Credit Union Mobile         0.388093
9                          BankMobile Vibe App         0.353455
0                                    A+ Mobile         0.329592
29   Current debit card and app made for teens         0.327258
7                               BZWBK24 mobile         0.326883
34  Even - organize your money, get paid early         0.283929
26                                Credit Karma         0.270052
39                Fortune City - A Finance App         0.266966
15                                      Branch         0.264230
