##  Introduction

<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>


Task : 
* Find the number of apps in each category, the average price, and the average rating
* Find the top 10 free FINANCE apps having the highest average sentiment score

### Getting the Data

In [91]:
# use kaggle api to get the data
import kaggle
kaggle.api.authenticate()
kaggle.api.dataset_download_files('lava18/google-play-store-apps', unzip=True)

In [92]:
# load libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [93]:
# read in csv files
google_playstore_df  = pd.read_csv("googleplaystore.csv")
google_playstore_user_review_df = pd.read_csv("googleplaystore_user_reviews.csv")

### Data Exploration

In [94]:
google_playstore_df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [95]:
google_playstore_df.shape

(10841, 13)

In [96]:
google_playstore_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  object 
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


### Cleaning

In [97]:
# Cleaning Installs column
google_playstore_df["Installs"] = google_playstore_df["Installs"].str.replace("+", "")
google_playstore_df["Installs"] = google_playstore_df["Installs"].str.replace(",", "")

  google_playstore_df["Installs"] = google_playstore_df["Installs"].str.replace("+", "")


In [98]:
# remove rows with "Free"
google_playstore_df = google_playstore_df[google_playstore_df["Installs"] != "Free"]

# convert Installs column to integer
google_playstore_df["Installs"] = google_playstore_df["Installs"].astype(int)

In [99]:
google_playstore_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10840 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10840 non-null  object 
 1   Category        10840 non-null  object 
 2   Rating          9366 non-null   float64
 3   Reviews         10840 non-null  object 
 4   Size            10840 non-null  object 
 5   Installs        10840 non-null  int32  
 6   Type            10839 non-null  object 
 7   Price           10840 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10840 non-null  object 
 10  Last Updated    10840 non-null  object 
 11  Current Ver     10832 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), int32(1), object(11)
memory usage: 1.1+ MB


In [100]:
# Cleaning Price column
google_playstore_df["Price"] = google_playstore_df["Price"].str.replace("$", "")

# convert column to float type and round to two decimals
google_playstore_df["Price"] = google_playstore_df["Price"].astype(float).round(2)

  google_playstore_df["Price"] = google_playstore_df["Price"].str.replace("$", "")


In [101]:
google_playstore_df["Price"].unique()

array([  0.  ,   4.99,   3.99,   6.99,   1.49,   2.99,   7.99,   5.99,
         3.49,   1.99,   9.99,   7.49,   0.99,   9.  ,   5.49,  10.  ,
        24.99,  11.99,  79.99,  16.99,  14.99,   1.  ,  29.99,  12.99,
         2.49,  10.99,   1.5 ,  19.99,  15.99,  33.99,  74.99,  39.99,
         3.95,   4.49,   1.7 ,   8.99,   2.  ,   3.88,  25.99, 399.99,
        17.99, 400.  ,   3.02,   1.76,   4.84,   4.77,   1.61,   2.5 ,
         1.59,   6.49,   1.29,   5.  ,  13.99, 299.99, 379.99,  37.99,
        18.99, 389.99,  19.9 ,   8.49,   1.75,  14.  ,   4.85,  46.99,
       109.99, 154.99,   3.08,   2.59,   4.8 ,   1.96,  19.4 ,   3.9 ,
         4.59,  15.46,   3.04,   4.29,   2.6 ,   3.28,   4.6 ,  28.99,
         2.95,   2.9 ,   1.97, 200.  ,  89.99,   2.56,  30.99,   3.61,
       394.99,   1.26,   1.2 ,   1.04])

### Data Wrangling and Analysis

In [102]:
# calculate the number of app, average price and average rating per category
app_category_info = google_playstore_df.groupby("Category").agg({"App" : "count", "Price": "mean", "Rating" : "mean"})

# round Price and Rating columns to two decimals
app_category_info[["Price", "Rating"]] = app_category_info[["Price", "Rating"]].round(2)

# change columns names and sort by average rating
app_category_info.rename(columns = {"App" : "Number of apps", "Price": "Average price", "Rating" : "Average Rating"}).sort_values(by = "Average Rating", ascending=False)

Unnamed: 0_level_0,Number of apps,Average price,Average Rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
EVENTS,64,1.72,4.44
EDUCATION,156,0.12,4.39
ART_AND_DESIGN,65,0.09,4.36
BOOKS_AND_REFERENCE,231,0.52,4.35
PERSONALIZATION,392,0.39,4.34
PARENTING,60,0.16,4.3
GAME,1144,0.25,4.29
BEAUTY,53,0.0,4.28
HEALTH_AND_FITNESS,341,0.2,4.28
SOCIAL,295,0.05,4.26


In [103]:
# subset free finance apps
free_finance_apps = google_playstore_df[(google_playstore_df["Category"] == "FINANCE") & (google_playstore_df["Price"] == 0)]

# print
free_finance_apps

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
1048,K PLUS,FINANCE,4.4,124424,Varies with device,10000000,Free,0.0,Everyone,Finance,"June 26, 2018",4.6.0,4.2 and up
1049,ING Banking,FINANCE,4.4,39041,Varies with device,1000000,Free,0.0,Everyone,Finance,"August 3, 2018",Varies with device,Varies with device
1050,Citibanamex Movil,FINANCE,3.6,52306,42M,5000000,Free,0.0,Everyone,Finance,"July 27, 2018",20.1.0,5.0 and up
1051,The postal bank,FINANCE,3.7,36718,Varies with device,5000000,Free,0.0,Everyone,Finance,"July 16, 2018",Varies with device,Varies with device
1052,KTB Netbank,FINANCE,3.8,42644,19M,5000000,Free,0.0,Everyone,Finance,"June 28, 2018",8.18,4.2 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10718,BankNordik,FINANCE,3.9,28,15M,5000,Free,0.0,Everyone,Finance,"August 8, 2018",7.3.2,5.0 and up
10744,FP Markets,FINANCE,,1,2.0M,100,Free,0.0,Everyone,Finance,"January 30, 2018",1.0.0.0,4.3 and up
10745,FP Boss,FINANCE,,1,5.8M,1,Free,0.0,Everyone,Finance,"July 27, 2018",1.0.2,5.0 and up
10752,FP FCU,FINANCE,3.6,48,26M,5000,Free,0.0,Everyone,Finance,"April 5, 2018",4.6.71,4.0.3 and up


In [104]:
# perform a left merge on free finance apps and user reviews
free_finance_apps_with_user_reviews = free_finance_apps.merge(google_playstore_user_review_df, on = "App", how = "left")

# print
free_finance_apps_with_user_reviews

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,K PLUS,FINANCE,4.4,124424,Varies with device,10000000,Free,0.0,Everyone,Finance,"June 26, 2018",4.6.0,4.2 and up,,,,
1,ING Banking,FINANCE,4.4,39041,Varies with device,1000000,Free,0.0,Everyone,Finance,"August 3, 2018",Varies with device,Varies with device,,,,
2,Citibanamex Movil,FINANCE,3.6,52306,42M,5000000,Free,0.0,Everyone,Finance,"July 27, 2018",20.1.0,5.0 and up,"Forget paying app, designed make fail payments...",Negative,-0.50,0.30
3,Citibanamex Movil,FINANCE,3.6,52306,42M,5000000,Free,0.0,Everyone,Finance,"July 27, 2018",20.1.0,5.0 and up,"It's working expected, talking best bank Mexic...",Positive,0.40,0.45
4,Citibanamex Movil,FINANCE,3.6,52306,42M,5000000,Free,0.0,Everyone,Finance,"July 27, 2018",20.1.0,5.0 and up,It has many problems with Android 8.1. You can...,Positive,0.25,0.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3367,BankNordik,FINANCE,3.9,28,15M,5000,Free,0.0,Everyone,Finance,"August 8, 2018",7.3.2,5.0 and up,,,,
3368,FP Markets,FINANCE,,1,2.0M,100,Free,0.0,Everyone,Finance,"January 30, 2018",1.0.0.0,4.3 and up,,,,
3369,FP Boss,FINANCE,,1,5.8M,1,Free,0.0,Everyone,Finance,"July 27, 2018",1.0.2,5.0 and up,,,,
3370,FP FCU,FINANCE,3.6,48,26M,5000,Free,0.0,Everyone,Finance,"April 5, 2018",4.6.71,4.0.3 and up,,,,


In [105]:
# calculate sentiment score 
free_finance_apps_with_user_reviews["Sentiment_Score"] = free_finance_apps_with_user_reviews["Sentiment_Polarity"] * free_finance_apps_with_user_reviews["Sentiment_Subjectivity"]

# aggregate average sentiment score
top_10_user_feedback = free_finance_apps_with_user_reviews.groupby("App").agg({"Sentiment_Score": "mean"})

# top 10 user rating
top_10_user_feedback = top_10_user_feedback.sort_values("Sentiment_Score", ascending=False).head(10)

In [106]:
top_10_user_feedback

Unnamed: 0_level_0,Sentiment_Score
App,Unnamed: 1_level_1
BBVA Spain,0.373276
Associated Credit Union Mobile,0.266245
BZWBK24 mobile,0.250838
A+ Mobile,0.227251
BankMobile Vibe App,0.215809
Current debit card and app made for teens,0.183786
"Even - organize your money, get paid early",0.17743
ACE Elite,0.153703
Credit Karma,0.153241
CNBC: Breaking Business News & Live Market Data,0.149957
