## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [0,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

## Task 1:

### Import necessary libraries

In [87]:
import pandas as pd
import numpy as np

### Importing and cleaning dataset

In [88]:
# The path of the file is 'datasets/app.csv'
apps = pd.read_csv('datasets/apps.csv')

# Cleaning the installs column
clean_chr = [',' , '+']
for chr in clean_chr:
    apps['Installs'] = apps['Installs'].apply(lambda x: x.replace(chr, ''))

# Change the data type for Installs
apps['Installs'] = apps['Installs'].astype(np.int64)

# Check the data type for Installs
apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Last Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,10000,Free,0.0,"January 7, 2018"
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,500000,Free,0.0,"January 15, 2018"
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,5000000,Free,0.0,"August 1, 2018"
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,50000000,Free,0.0,"June 8, 2018"
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,100000,Free,0.0,"June 20, 2018"


## Task 2:

### Get the mean of price and rating for different categories applications

In [89]:
# Get the average rating and average price for different categories application
app_category_info = apps.groupby('Category').mean()[['Rating', 'Price']]
app_category_info.reset_index(inplace=True)

### Get the amount of different categories applications and create the final DataFrame

In [90]:
# Get the amount for different categories application
num_of_apps = pd.DataFrame(apps['Category'].value_counts())
app_category_info = pd.merge(left=app_category_info, right=num_of_apps, left_on='Category', right_on=num_of_apps.index, how='inner')
app_category_info = app_category_info[['Category', 'Rating', 'Price', 'Category_y']]

# Rename for the columns
app_category_info.rename(columns={'Rating':'Average rating', 'Price':'Average price', 'Category_y': 'Number of apps'}, inplace=True)
app_category_info.head()

Unnamed: 0,Category,Average rating,Average price,Number of apps
0,ART_AND_DESIGN,4.357377,0.093281,64
1,AUTO_AND_VEHICLES,4.190411,0.158471,85
2,BEAUTY,4.278571,0.0,53
3,BOOKS_AND_REFERENCE,4.34497,0.539505,222
4,BUSINESS,4.098479,0.417357,420


## Task 3:

### Combining the App dataset and sentiment dataset

In [91]:
# read the sentiment dataset and combining with app dataset
fin_apps = apps[(apps['Category'] == 'FINANCE') & (apps['Type'] == 'Free')]
sen_df = pd.read_csv('datasets/user_reviews.csv')
merge_df = pd.merge(left=fin_apps, right=sen_df, on='App', how='inner')

### Calculate the mean value of app and get the result

In [92]:
merge_df = merge_df.groupby('App').mean()
merge_df = merge_df[['Sentiment Score']]

# First step is to select the top 10 app
top_10 = merge_df.sort_values('Sentiment Score', ascending=False).head(10)

# Then sort the App by alphabetically sorted
top_10_user_feedback = top_10.sort_values('App', ascending=True)
top_10_user_feedback

Unnamed: 0_level_0,Sentiment Score
App,Unnamed: 1_level_1
A+ Mobile,0.329592
Associated Credit Union Mobile,0.388093
BBVA Spain,0.515086
BZWBK24 mobile,0.326883
BankMobile Vibe App,0.353455
Branch,0.26423
Credit Karma,0.270052
Current debit card and app made for teens,0.327258
"Even - organize your money, get paid early",0.283929
Fortune City - A Finance App,0.266966
