## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

You work as a Data Analyst for a finance company which is closely eyeing the Android market before it launches its new app into Google Play. You have been asked to present an analysis of Google Play apps so that the team gets a comprehensive overview of different categories of apps, their ratings, and other metrics.

This will require you to use your data manipulation and data analysis skills.

Your three questions are as follows:

Read the apps.csv file and clean the Installscolumn to convert it into integer data type. Save your answer as a DataFrame apps. Going forward, you will do all your analysis on the apps DataFrame.

Find the number of apps in each category, the average price, and the average rating. Save your answer as a DataFrame app_category_info. Your should rename the four columns as: Category, Number of apps, Average price, Average rating.

Find the top 10 free FINANCE apps having the highest average sentiment score. Save your answer as a DataFrame top_10_user_feedback. Your answer should have exactly 10 rows and two columns named: App and Sentiment Score, where the average Sentiment Score is sorted from highest to lowest.

In [149]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Jeksik/The-Android-App-Market-on-Google-Play/main/googleplaystore.csv')

In [150]:
df['Installs']

0            10,000+
1           500,000+
2         5,000,000+
3        50,000,000+
4           100,000+
            ...     
10836         5,000+
10837           100+
10838         1,000+
10839         1,000+
10840    10,000,000+
Name: Installs, Length: 10841, dtype: object

In [151]:
for i in range(0,len(df['Installs'])):
    df['Installs'][i] = int(df['Installs'][i].split('+')[0].replace(',','').replace('Free','0'))
df['Installs']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Installs'][i] = int(df['Installs'][i].split('+')[0].replace(',','').replace('Free','0'))


0           10000
1          500000
2         5000000
3        50000000
4          100000
           ...   
10836        5000
10837         100
10838        1000
10839        1000
10840    10000000
Name: Installs, Length: 10841, dtype: object

In [152]:
apps = df
apps

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,10000,Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,500000,Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,5000000,Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,50000000,Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,100000,Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,5000,Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,1000,Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,1000,Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [153]:
for i in range(0,len(df['Price'])):
    df['Price'][i] = float(str(df['Price'][i]).replace('$','').replace('Everyone','0'))
df['Price']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Price'][i] = float(str(df['Price'][i]).replace('$','').replace('Everyone','0'))


0        0.0
1        0.0
2        0.0
3        0.0
4        0.0
        ... 
10836    0.0
10837    0.0
10838    0.0
10839    0.0
10840    0.0
Name: Price, Length: 10841, dtype: object

In [154]:
import numpy as np

apps1 = apps.groupby("Category").agg(Numberofapps = ('App', 'count'), Averageprice=('Price', np.mean),
                             Averagerating=('Rating', np.mean))
apps1 = apps1.rename(columns={'Numberofapps': 'Number of apps', 'Averagerating':'Average rating','Averageprice':'Average Price'})

apps1 = apps1.drop(labels='1.9')
apps1 = apps1.reset_index()
app_category_info = apps1
app_category_info

Unnamed: 0,Category,Number of apps,Average Price,Average rating
0,ART_AND_DESIGN,65,0.091846,4.358065
1,AUTO_AND_VEHICLES,85,0.158471,4.190411
2,BEAUTY,53,0.0,4.278571
3,BOOKS_AND_REFERENCE,231,0.518485,4.346067
4,BUSINESS,460,0.402761,4.121452
5,COMICS,60,0.0,4.155172
6,COMMUNICATION,387,0.214832,4.158537
7,DATING,234,0.134316,3.970769
8,EDUCATION,156,0.115128,4.389032
9,ENTERTAINMENT,149,0.053557,4.126174


In [155]:
df2 = pd.read_csv('https://raw.githubusercontent.com/Jeksik/The-Android-App-Market-on-Google-Play/main/googleplaystore_user_reviews.csv')
df2 = df2.drop(['Translated_Review','Sentiment','Sentiment_Subjectivity'], axis=1)
df2

Unnamed: 0,App,Sentiment_Polarity
0,10 Best Foods for You,1.00
1,10 Best Foods for You,0.25
2,10 Best Foods for You,
3,10 Best Foods for You,0.40
4,10 Best Foods for You,1.00
...,...,...
64290,Houzz Interior Design Ideas,
64291,Houzz Interior Design Ideas,
64292,Houzz Interior Design Ideas,
64293,Houzz Interior Design Ideas,


In [156]:
df_df2 = df.merge(df2, on='App', suffixes=('_df','_df2'))
df_df2

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Sentiment_Polarity
0,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,500000,Free,0.0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,-0.250
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,500000,Free,0.0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,-0.725
2,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,500000,Free,0.0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,0.000
3,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,500000,Free,0.0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,
4,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,500000,Free,0.0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,0.500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
122657,A+ Gallery - Photos & Videos,PHOTOGRAPHY,4.5,223941,Varies with device,10000000,Free,0.0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,
122658,A+ Gallery - Photos & Videos,PHOTOGRAPHY,4.5,223941,Varies with device,10000000,Free,0.0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,
122659,A+ Gallery - Photos & Videos,PHOTOGRAPHY,4.5,223941,Varies with device,10000000,Free,0.0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,0.200
122660,A+ Gallery - Photos & Videos,PHOTOGRAPHY,4.5,223941,Varies with device,10000000,Free,0.0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,0.000


In [157]:
top_10_user_feedback = df_df2[((df_df2['Type'] == 'Free') & (df_df2['Category'] == 'FINANCE'))][['App','Sentiment_Polarity']]
top_10_user_feedback

Unnamed: 0,App,Sentiment_Polarity
26528,Citibanamex Movil,-0.500000
26529,Citibanamex Movil,0.400000
26530,Citibanamex Movil,0.250000
26531,Citibanamex Movil,0.175000
26532,Citibanamex Movil,-0.158333
...,...,...
121337,Fortune City - A Finance App,
121338,Fortune City - A Finance App,
121339,Fortune City - A Finance App,
121340,Fortune City - A Finance App,


In [158]:
for index, value in top_10_user_feedback['Sentiment_Polarity'].items():
    if (pd.isna(value) == True):
        top_10_user_feedback['Sentiment_Polarity'][index] = 0


top_10_user_feedback = top_10_user_feedback.groupby('App').agg({'Sentiment_Polarity':np.mean})
top_10_user_feedback = top_10_user_feedback.reset_index()
top_10_user_feedback

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_10_user_feedback['Sentiment_Polarity'][index] = 0


Unnamed: 0,App,Sentiment_Polarity
0,A+ Mobile,0.276857
1,ACE Elite,0.208041
2,Acorns - Invest Spare Change,0.002917
3,Amex Mobile,0.1581
4,Associated Credit Union Mobile,0.339581
5,BBVA Compass Banking,0.149053
6,BBVA Spain,0.424946
7,BZWBK24 mobile,0.187957
8,Bank of America Mobile Banking,0.144021
9,BankMobile Vibe App,0.203236


In [159]:
top_10_user_feedback = top_10_user_feedback.rename(columns={'Sentiment_Polarity': 'Sentiment Score'})



In [161]:
top_10_user_feedback = top_10_user_feedback.sort_values('Sentiment Score',ascending=False)
top_10_user_feedback = top_10_user_feedback[:10].reset_index(drop=True)
top_10_user_feedback

Unnamed: 0,App,Sentiment Score
0,BBVA Spain,0.424946
1,Associated Credit Union Mobile,0.339581
2,Current debit card and app made for teens,0.327258
3,A+ Mobile,0.276857
4,Branch,0.26423
5,CNBC: Breaking Business News & Live Market Data,0.229126
6,CreditWise from Capital One,0.227482
7,ACE Elite,0.208041
8,BankMobile Vibe App,0.203236
9,BZWBK24 mobile,0.187957


In [162]:
apps

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,10000,Free,0.0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,500000,Free,0.0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,5000000,Free,0.0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,50000000,Free,0.0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,100000,Free,0.0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,5000,Free,0.0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100,Free,0.0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,1000,Free,0.0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,1000,Free,0.0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [168]:
app_category_info.round({'Average Price':0,'Average rating':1})

Unnamed: 0,Category,Number of apps,Average Price,Average rating
0,ART_AND_DESIGN,65,0.0,4.4
1,AUTO_AND_VEHICLES,85,0.0,4.2
2,BEAUTY,53,0.0,4.3
3,BOOKS_AND_REFERENCE,231,1.0,4.3
4,BUSINESS,460,0.0,4.1
5,COMICS,60,0.0,4.2
6,COMMUNICATION,387,0.0,4.2
7,DATING,234,0.0,4.0
8,EDUCATION,156,0.0,4.4
9,ENTERTAINMENT,149,0.0,4.1


In [165]:
top_10_user_feedback.round(2)

Unnamed: 0,App,Sentiment Score
0,BBVA Spain,0.42
1,Associated Credit Union Mobile,0.34
2,Current debit card and app made for teens,0.33
3,A+ Mobile,0.28
4,Branch,0.26
5,CNBC: Breaking Business News & Live Market Data,0.23
6,CreditWise from Capital One,0.23
7,ACE Elite,0.21
8,BankMobile Vibe App,0.2
9,BZWBK24 mobile,0.19
