## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

<p>Your three questions are as follows:</p>

1. **Read the ```apps.csv``` file and clean the ```Installscolumn``` to convert it into integer data type.** Save your answer as a DataFrame ```apps```. Going forward, you will do all your analysis on the ```apps``` DataFrame.

2. **Find the number of apps in each category, the average price, and the average rating.** Save your answer as a DataFrame ```app_category_info```. Your should rename the four columns as: ```Category```, ```Number of apps```, ```Average price```, ```Average rating```.

3. **Find the top 10 free ```FINANCE``` apps having the highest average sentiment score.** Save your answer as a DataFrame ```top_10_user_feedback```. Your answer should have exactly 10 rows and two columns named: ```App``` and Sentiment Score, where the average ```Sentiment Score``` is sorted from **highest to lowest.**

In [33]:
# 1st Task

import pandas as pd 

# Read the apps.csv file
apps_with_duplicates = pd.read_csv('datasets\\apps.csv')

# Drop Duplicates
apps = apps_with_duplicates.drop_duplicates()


## Clean and convert Install Column into Integer

# Cleaning the Install Column
chars_to_remove = ['+', ',', '$']
col_to_remove = ['Installs', 'Price']
# Loop in Columns
for col in col_to_remove:
    # Loop in chars
    for char in chars_to_remove:
         apps[col] = apps[col].apply(lambda x: x.replace(char, ''))

# Converting into Integer
import numpy as np 
apps['Installs'] = apps['Installs'].astype(int)
# Convert into float
apps['Price'] = apps['Price'].astype(float)
apps.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9659 entries, 0 to 9658
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      9659 non-null   int64  
 1   App             9659 non-null   object 
 2   Category        9659 non-null   object 
 3   Rating          8196 non-null   float64
 4   Reviews         9659 non-null   int64  
 5   Size            8432 non-null   float64
 6   Installs        9659 non-null   int32  
 7   Type            9659 non-null   object 
 8   Price           9659 non-null   float64
 9   Content Rating  9659 non-null   object 
 10  Genres          9659 non-null   object 
 11  Last Updated    9659 non-null   object 
 12  Current Ver     9651 non-null   object 
 13  Android Ver     9657 non-null   object 
dtypes: float64(3), int32(1), int64(2), object(8)
memory usage: 1.1+ MB


In [34]:
# 2nd Task

# Number Ofand Apps in Each Category and avarege Price, average Rating,
app_category_info = apps.groupby('Category').agg({'App': 'count', 'Price': 'mean', 'Rating': 'mean'})
# Renaming the columns
app_category_info = app_category_info.rename(columns={'App': 'Number of apps', 'Price': 'Average price', 'Rating': 'Average rating'})
# Print
app_category_info

Unnamed: 0_level_0,Count of apps,Average price,Average rating
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ART_AND_DESIGN,64,0.093281,4.357377
AUTO_AND_VEHICLES,85,0.158471,4.190411
BEAUTY,53,0.0,4.278571
BOOKS_AND_REFERENCE,222,0.539505,4.34497
BUSINESS,420,0.417357,4.098479
COMICS,56,0.0,4.181481
COMMUNICATION,315,0.263937,4.121484
DATING,171,0.160468,3.970149
EDUCATION,119,0.150924,4.364407
ENTERTAINMENT,102,0.078235,4.135294


In [44]:
# 3rd Task

# Reading the user_reviews csv file
reviews = pd.read_csv('datasets\\user_reviews.csv')
# Getting the FINANCE Apps from the main DataFrame
finance_apps = apps[apps['Category'] == 'FINANCE']
# Selecting only the Free Finance Apps
free_finance_apps = finance_apps[finance_apps['Type'] == 'Free']
# Merging free_finance_apps with reviews DataFrame
merged_df = pd.merge(free_finance_apps, reviews, on='App')
# Finding the average sentiment score for each app
app_sentiment_score = merged_df.groupby('App').agg({'Sentiment_Subjectivity': 'mean'})
# Sorting the score
user_feedback = app_sentiment_score.sort_values(by='Sentiment_Subjectivity', ascending=False)
# Getting the first 10 rows
top_10_user_feedback = user_feedback[:10]
# Print Result
top_10_user_feedback


Unnamed: 0_level_0,Sentiment_Subjectivity
App,Unnamed: 1_level_1
BBVA Spain,0.594632
Banorte Movil,0.567312
BankMobile Vibe App,0.56461
Associated Credit Union Mobile,0.559535
BZWBK24 mobile,0.554781
Ecobank Mobile Banking,0.541573
CreditWise from Capital One,0.535131
GoBank,0.530302
Experian - Free Credit Report,0.527846
Discover Mobile,0.527778
