## 1. Introduction
<p><img src="https://assets.datacamp.com/production/project_1197/img/google_play_store.png" alt="Google Play logo"></p>
<p>Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market<sup><a href="https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009">[1]</a></sup>. </p>
<p>The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.</p>
<p>The dataset you will use here was scraped from Google Play Store in September 2018 and was published on <a href="https://www.kaggle.com/lava18/google-play-store-apps">Kaggle</a>. Here are the details: <br>
<br></p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/apps.csv</b></div>
This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
<ul>
    <li><b>App:</b> Name of the app</li>
    <li><b>Category:</b> Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.</li>
    <li><b>Rating:</b> The current average rating (out of 5) of the app on Google Play</li>
    <li><b>Reviews:</b> Number of user reviews given on the app</li>
    <li><b>Size:</b> Size of the app in MB (megabytes)</li>
    <li><b>Installs:</b> Number of times the app was downloaded from Google Play</li>
    <li><b>Type:</b> Whether the app is paid or free</li>
    <li><b>Price:</b> Price of the app in US$</li>
    <li><b>Last Updated:</b> Date on which the app was last updated on Google Play </li>

</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/user_reviews.csv</b></div>
This file contains a random sample of 100 <i>[most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/)</i> user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
<ul>
    <li><b>App:</b> Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file</li>
    <li><b>Review:</b> The pre-processed user review text</li>
    <li><b>Sentiment Category:</b> Sentiment category of the user review - Positive, Negative or Neutral</li>
    <li><b>Sentiment Score:</b> Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.</li>

</ul>
</div>
<p>From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.<br></p>

In [3]:
import pandas as pd
import numpy as np

apps=pd.read_csv('datasets/apps.csv')
print(apps.sample(n=1))
print('*************************************************************')
print(apps.info())

                   App Category  Rating  Reviews  Size  Installs  Type  Price  \
7713  DS Tower Defence     GAME     3.2      768   1.4  100,000+  Free    0.0   

      Last Updated  
7713  June 5, 2013  
*************************************************************
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9659 entries, 0 to 9658
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   App           9659 non-null   object 
 1   Category      9659 non-null   object 
 2   Rating        8196 non-null   float64
 3   Reviews       9659 non-null   int64  
 4   Size          8432 non-null   float64
 5   Installs      9659 non-null   object 
 6   Type          9659 non-null   object 
 7   Price         9659 non-null   float64
 8   Last Updated  9659 non-null   object 
dtypes: float64(3), int64(1), object(5)
memory usage: 679.3+ KB
None


In [4]:
apps['Installs']=apps['Installs'].str.replace('[+,]','').astype('int')

print(apps.sample(n=1).T)

                                                           1530
App           DEAD TARGET: FPS Zombie Apocalypse Survival Games
Category                                                   GAME
Rating                                                      4.5
Reviews                                                 1468591
Size                                                        NaN
Installs                                               50000000
Type                                                       Free
Price                                                         0
Last Updated                                      July 23, 2018


In [5]:
app_category_info=apps.groupby('Category').agg({'App':'count','Price':'mean','Rating':'mean'})
app_category_info=app_category_info.rename(columns={'App':'Number of apps','Price':'Average price','Rating':'Average rating'})
app_category_info.reset_index(level=0, inplace=True)
print(app_category_info)

               Category  Number of apps  Average price  Average rating
0        ART_AND_DESIGN              64       0.093281        4.357377
1     AUTO_AND_VEHICLES              85       0.158471        4.190411
2                BEAUTY              53       0.000000        4.278571
3   BOOKS_AND_REFERENCE             222       0.539505        4.344970
4              BUSINESS             420       0.417357        4.098479
5                COMICS              56       0.000000        4.181481
6         COMMUNICATION             315       0.263937        4.121484
7                DATING             171       0.160468        3.970149
8             EDUCATION             119       0.150924        4.364407
9         ENTERTAINMENT             102       0.078235        4.135294
10               EVENTS              64       1.718594        4.435556
11               FAMILY            1832       1.309967        4.179664
12              FINANCE             345       8.408203        4.115563
13    

In [6]:
df2=pd.read_csv('datasets/user_reviews.csv')
merged_df=apps.merge(df2,on='App')
print(merged_df.columns)
print('*************************************************************')
filtered_merged_df=merged_df[merged_df['Category']=='FINANCE'][merged_df['Type']=='Free']
print(filtered_merged_df)

Index(['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type',
       'Price', 'Last Updated', 'Review', 'Sentiment Category',
       'Sentiment Score'],
      dtype='object')
*************************************************************
                                App Category  Rating  Reviews  Size  Installs  \
14112             Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000   
14113             Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000   
14114             Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000   
14115             Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000   
14116             Citibanamex Movil  FINANCE     3.6    52306  42.0   5000000   
...                             ...      ...     ...      ...   ...       ...   
60231  Fortune City - A Finance App  FINANCE     4.6    49275  91.0    500000   
60232  Fortune City - A Finance App  FINANCE     4.6    49275  91.0    500000   
60233  Fortune City -

  filtered_merged_df=merged_df[merged_df['Category']=='FINANCE'][merged_df['Type']=='Free']


In [8]:
top_10_user_feedback=filtered_merged_df.groupby('App').agg({'Sentiment Score':'mean'}).sort_values(by='Sentiment Score',ascending=False).reset_index(level=0).head(10)
top_10_user_feedback

Unnamed: 0,App,Sentiment Score
0,BBVA Spain,0.515086
1,Associated Credit Union Mobile,0.388093
2,BankMobile Vibe App,0.353455
3,A+ Mobile,0.329592
4,Current debit card and app made for teens,0.327258
5,BZWBK24 mobile,0.326883
6,"Even - organize your money, get paid early",0.283929
7,Credit Karma,0.270052
8,Fortune City - A Finance App,0.266966
9,Branch,0.26423
