<a href="https://colab.research.google.com/github/ROARMarketingConcepts/Machine-Learning-Projects/blob/master/Google_PlayStore_Apps_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Google PlayStore Apps Analysis

Performed by
Ken Wood

ROAR Marketing Concepts LLC

ken@roarmarketingconcepts.com


### Content

Each app (row) has values for catergory, rating, size, and more.

### Acknowledgements

This information is scraped from the Google Play Store. This app information would not be available without it.

### Inspiration

The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!

### Mount the Google Drive where the datasets are located...

In [1]:
from google.colab import drive
drive.mount('/gdrive')

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


### Install some necessary packages to perform the required analysis...

In [2]:
!pip install -U scikit-learn
!pip install --user --upgrade tables

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from datetime import datetime, date
from dateutil.relativedelta import relativedelta

from sklearn.preprocessing import StandardScaler

from math import ceil

from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop

%matplotlib inline

# Ignore useless warnings (see SciPy issue #5998)
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

Requirement already up-to-date: scikit-learn in /usr/local/lib/python3.6/dist-packages (0.20.2)
Requirement already up-to-date: tables in /usr/local/lib/python3.6/dist-packages (3.4.4)


Using TensorFlow backend.


### Load the datasets..

In [0]:
reviews = pd.read_csv('/gdrive/My Drive/Colab Notebooks/Google Play Store Apps/google-play-store-apps/googleplaystore_user_reviews.csv')
store = pd.read_csv('/gdrive/My Drive/Colab Notebooks/Google Play Store Apps/google-play-store-apps/googleplaystore.csv')

### Let's look at some properties of the datasets...

In [4]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64295 entries, 0 to 64294
Data columns (total 5 columns):
App                       64295 non-null object
Translated_Review         37427 non-null object
Sentiment                 37432 non-null object
Sentiment_Polarity        37432 non-null float64
Sentiment_Subjectivity    37432 non-null float64
dtypes: float64(2), object(3)
memory usage: 2.5+ MB


In [5]:
reviews.head()

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462
2,10 Best Foods for You,,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4,0.875
4,10 Best Foods for You,Best idea us,Positive,1.0,0.3


In [6]:
store.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
App               10841 non-null object
Category          10841 non-null object
Rating            9367 non-null float64
Reviews           10841 non-null object
Size              10841 non-null object
Installs          10841 non-null object
Type              10840 non-null object
Price             10841 non-null object
Content Rating    10840 non-null object
Genres            10841 non-null object
Last Updated      10841 non-null object
Current Ver       10833 non-null object
Android Ver       10838 non-null object
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [7]:
store.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [0]:
row = store.loc[store['Installs'] == 'Free']
row = row.shift(1,axis='columns')
row['App']=row['Category']
row['Rating'] = row['Reviews']
row['Category'] = np.nan
row['Reviews'] = 19.0
store.loc[store['Installs'] == 'Free'] = row

In [9]:
store.Installs.value_counts()

1,000,000+        1579
10,000,000+       1252
100,000+          1169
10,000+           1054
1,000+             908
5,000,000+         752
100+               719
500,000+           539
50,000+            479
5,000+             477
100,000,000+       409
10+                386
500+               330
50,000,000+        289
50+                205
5+                  82
500,000,000+        72
1+                  67
1,000,000,000+      58
0+                  14
0                    1
Name: Installs, dtype: int64

In [10]:
store['Installs'] = store['Installs'].map(lambda x: x.rstrip('+'))
store['Installs'] = store['Installs'].str.replace(',', '').astype(float) 
store['Installs'].head()

0       10000.0
1      500000.0
2     5000000.0
3    50000000.0
4      100000.0
Name: Installs, dtype: float64

### Change the 'Last Updated' column to a pandas datetime variable...

In [0]:
store['Last Updated'] = store['Last Updated'].apply(lambda x: datetime.strptime(x, '%B %d, %Y'))

In [12]:
merged = reviews.merge(store, how='left', on = 'App')
merged.head()

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333,HEALTH_AND_FITNESS,4,2490,3.8M,500000.0,Free,0,Everyone 10+,Health & Fitness,2017-02-17,1.9,2.3.3 and up
1,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333,HEALTH_AND_FITNESS,4,2490,3.8M,500000.0,Free,0,Everyone 10+,Health & Fitness,2017-02-17,1.9,2.3.3 and up
2,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462,HEALTH_AND_FITNESS,4,2490,3.8M,500000.0,Free,0,Everyone 10+,Health & Fitness,2017-02-17,1.9,2.3.3 and up
3,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462,HEALTH_AND_FITNESS,4,2490,3.8M,500000.0,Free,0,Everyone 10+,Health & Fitness,2017-02-17,1.9,2.3.3 and up
4,10 Best Foods for You,,,,,HEALTH_AND_FITNESS,4,2490,3.8M,500000.0,Free,0,Everyone 10+,Health & Fitness,2017-02-17,1.9,2.3.3 and up


In [13]:
review_sentiment = merged[['Translated_Review','Sentiment']].copy()
review_sentiment.dropna(inplace=True)
review_sentiment.drop_duplicates(subset = 'Translated_Review',inplace=True )
review_sentiment.Sentiment.value_counts()

Positive    17593
Negative     6240
Neutral      4161
Name: Sentiment, dtype: int64

### Let's code the 'Sentiment' column using sklearn's OrdinalEncoder and OneHotEncoder functions...



In [21]:
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder

sentiment = review_sentiment.Sentiment.values.reshape(-1,1)
ordinal_encoder = OrdinalEncoder()
sentiment_coded = ordinal_encoder.fit_transform(sentiment)

encoder = OneHotEncoder(categories='auto')
sentiment_1hot_encoded = encoder.fit_transform(sentiment_coded.reshape(-1,1))
sentiment_1hot_encoded.toarray()

array([[0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       ...,
       [1., 0., 0.],
       [0., 0., 1.],
       [1., 0., 0.]])

In [0]:
sentiment_analysis = merged.groupby(['App','Rating','Installs'])['Sentiment_Polarity','Sentiment_Subjectivity'].mean().reset_index()
sentiment_analysis = sentiment_analysis[(sentiment_analysis.Sentiment_Polarity > 0.5) & (sentiment_analysis.Sentiment_Subjectivity > 0.5)]
sentiment_analysis.sort_values(by = ['Rating','Installs'],inplace=True)

In [16]:
app_reviews = merged.groupby(['App','Category','Installs','Translated_Review'])['Sentiment_Polarity','Sentiment_Subjectivity'].mean().reset_index()
app_reviews = app_reviews.sort_index()
app_reviews_high_sentiment = app_reviews[(app_reviews.Sentiment_Polarity > 0.5) & (app_reviews.Sentiment_Subjectivity > 0.5)]
feedback = app_reviews_high_sentiment.groupby(['App'])['Translated_Review']

for key, item in feedback:
    print(key,'\n', feedback.get_group(key), "\n\n")

10 Best Foods for You 
 0                10 best foods 4u Excellent chose foods
3                                               Amazing
5     Awesome resources I begin new journey. I can't...
17                            Excellent It really works
19    Food list easy I predibetic, I scared. All Dr....
21                                                 Good
22                                          Good V good
23    Good health...... Good health first priority.....
24                                  Good healthy foods.
25                                              Good.!!
26                                          Great Great
27    Great Its really best unique provides detailed...
28                                           Great Love
29                                      Great Love food
30                                      Great app. Love
31                                          Great ideas
32    Great wife. My wife enjoy much. She's kinda pe...
34    Greatest ever Comp

In [0]:
store['Reviews Per Install'] = store.Reviews.astype(float)/store.Installs