**Importing all libraries**

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import warnings
warnings.filterwarnings(action='ignore')
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)

**Reading the data from the csv file**

In [2]:
data = pd.read_csv('playstore_reviews.csv')
data.head()

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462
2,10 Best Foods for You,,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4,0.875
4,10 Best Foods for You,Best idea us,Positive,1.0,0.3


**Making copy of the Original file**

In [3]:
df = data.copy()
df.head()

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462
2,10 Best Foods for You,,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4,0.875
4,10 Best Foods for You,Best idea us,Positive,1.0,0.3


**Here we can see that 'Sentiment' and 'Sentiment_Polarity' both are giving the same information.But in the 'sentiment_polarity' variable we have many values in float which not specifying the actual characterstic of the review.so we can drop the 'Sentiment analysis' column so that we can have a better analysis and though we can get the reviews result that: is the review is positive, negative or neutral? so we don't need the 'Translated_Reviews' variable.                                                    though we have a column called 'Sentiment_subjectivity' which is identifying that how many special characters are there? the range for 'Sentiment_Subjectivity' is (0 to 1).
Sentiment analysis is the automated process of analyzing test to determine the sentiment expressed (Positive,Negative or Neutral).Sentiment polarity is also same as sentiment analysis. the range for sentiment analysis is -1 to 1.

**Dropping some Unnecessary variables from the data**

In [4]:
df = df.drop(df[['Sentiment_Polarity','Translated_Review']],axis=1)
df.head()

Unnamed: 0,App,Sentiment,Sentiment_Subjectivity
0,10 Best Foods for You,Positive,0.533333
1,10 Best Foods for You,Positive,0.288462
2,10 Best Foods for You,,
3,10 Best Foods for You,Positive,0.875
4,10 Best Foods for You,Positive,0.3


**Checking the shape of the data(Rows & Columns)**

In [5]:
# Now we can see we have only four columns.
# lets check the shape of the data.
df.shape

(64295, 3)

**Checking the Datatypes**

In [6]:
# let's check the datatypes.
df.dtypes

App                        object
Sentiment                  object
Sentiment_Subjectivity    float64
dtype: object

**Checking the information of the data**

In [7]:
# let's check the information of the data.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64295 entries, 0 to 64294
Data columns (total 3 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   App                     64295 non-null  object 
 1   Sentiment               37432 non-null  object 
 2   Sentiment_Subjectivity  37432 non-null  float64
dtypes: float64(1), object(2)
memory usage: 1.5+ MB


**Checking The Null Values**

In [8]:
df.isnull().sum()

App                           0
Sentiment                 26863
Sentiment_Subjectivity    26863
dtype: int64

In [9]:
# as we can see there are many null values in the dataset.

**Getting the Unique values from each variables.**

In [10]:
def unique(d,columns):
    return{i: list(d[i].unique()) for i in columns}
def categorical_data(d):
    return [i for i in d.columns if d.dtypes[i] == 'object']

In [11]:
unique(df,categorical_data(df))

{'App': ['10 Best Foods for You',
  '104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室',
  '11st',
  '1800 Contacts - Lens Store',
  '1LINE – One Line with One Touch',
  '2018Emoji Keyboard 😂 Emoticons Lite -sticker&gif',
  '21-Day Meditation Experience',
  '2Date Dating App, Love and matching',
  '2GIS: directory & navigator',
  '2RedBeans',
  '2ndLine - Second Phone Number',
  '30 Day Fitness Challenge - Workout at Home',
  '365Scores - Live Scores',
  '3D Blue Glass Water Keyboard Theme',
  '3D Color Pixel by Number - Sandbox Art Coloring',
  '3D Live Neon Weed Launcher',
  '4 in a Row',
  '4K Wallpapers and Ultra HD Backgrounds',
  '591房屋交易-租屋、中古屋、新建案、實價登錄、別墅透天、公寓套房、捷運、買房賣房行情、房價房貸查詢',
  '591房屋交易-香港',
  '7 Cups: Anxiety & Stress Chat',
  '7 Day Food Journal Challenge',
  '7 Minute Workout',
  '7 Weeks - Habit & Goal Tracker',
  '8 Ball Pool',
  '850 Sports News Digest',
  '8fit Workouts & Meal Planner',
  '95Live -SG#1 Live Streaming App',
  'A Call From Santa Claus!',
  'A Manual of Acupuncture',
 

**Replacing the values to thier numeric form in the "Sentiment" Column.**

In [12]:
df['Sentiment']=df['Sentiment'].replace('Positive',1)
df['Sentiment']=df['Sentiment'].replace('Negative',-1)
df['Sentiment']=df['Sentiment'].replace('Neutral',0)

In [13]:
df['Sentiment'].unique()

array([ 1., nan,  0., -1.])

In [14]:
df.dtypes

App                        object
Sentiment                 float64
Sentiment_Subjectivity    float64
dtype: object

**Null Values Treatment by filling it with mean of the Data**

In [15]:
# now we can see that all our values in Sentiment got updated and it became an numeric data type.
# so now we can fill the null values.
df['Sentiment'] = df['Sentiment'].ffill(axis=0)
df['Sentiment_Subjectivity'] = df['Sentiment_Subjectivity'].ffill(axis=0)

In [16]:
df.isnull().sum()

App                       0
Sentiment                 0
Sentiment_Subjectivity    0
dtype: int64

In [17]:
# Now we can see that there are no null values right now.

**Checking the duplicate values in the Dataset**

In [18]:
duplicate = df[df.duplicated()]
duplicate.shape

(41022, 3)

In [19]:
duplicate.head(15)

Unnamed: 0,App,Sentiment,Sentiment_Subjectivity
2,10 Best Foods for You,1.0,0.288462
5,10 Best Foods for You,1.0,0.3
7,10 Best Foods for You,1.0,0.9
9,10 Best Foods for You,0.0,0.0
12,10 Best Foods for You,1.0,0.875
15,10 Best Foods for You,1.0,0.511111
18,10 Best Foods for You,1.0,0.1
19,10 Best Foods for You,1.0,1.0
22,10 Best Foods for You,0.0,0.0
24,10 Best Foods for You,1.0,0.5


**Dropping the duplicated values and checking the shape**

In [20]:
df.drop_duplicates(keep='first',inplace=True)

In [21]:
df.duplicated().sum()

0

In [22]:
df.shape

(23273, 3)

In [23]:
# checking the number of the unique values in the data.
df['App'].nunique()

1074

**Checking the co-relation, Variance and co-variance of the data**

In [24]:
df.corr()

Unnamed: 0,Sentiment,Sentiment_Subjectivity
Sentiment,1.0,0.089396
Sentiment_Subjectivity,0.089396,1.0


In [25]:
df.var()

Sentiment                 0.752125
Sentiment_Subjectivity    0.046922
dtype: float64

In [26]:
df.cov()

Unnamed: 0,Sentiment,Sentiment_Subjectivity
Sentiment,0.752125,0.016794
Sentiment_Subjectivity,0.016794,0.046922


**Finding the apps and their highest positive response.**

In [27]:
df.groupby('App')['Sentiment'].max()

App
10 Best Foods for You                                 1.0
104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室                      1.0
11st                                                  1.0
1800 Contacts - Lens Store                            1.0
1LINE – One Line with One Touch                       1.0
2018Emoji Keyboard 😂 Emoticons Lite -sticker&gif      1.0
21-Day Meditation Experience                          1.0
2Date Dating App, Love and matching                   1.0
2GIS: directory & navigator                           1.0
2RedBeans                                             1.0
2ndLine - Second Phone Number                         1.0
30 Day Fitness Challenge - Workout at Home            1.0
365Scores - Live Scores                               1.0
3D Blue Glass Water Keyboard Theme                    1.0
3D Color Pixel by Number - Sandbox Art Coloring       1.0
3D Live Neon Weed Launcher                            1.0
4 in a Row                                            1.0
4K Wallpap

In [28]:
# counting how many apps are getting the highest Sentiment.

In [29]:
# so we confirmed that there are 1000 apps which got positive response.
# similarly there are 41 apps which got Negative Reviews.
# And there are 33 apps which are getting Neutral Reviews.
df.groupby('App')['Sentiment'].max().value_counts()

 1.0    1000
-1.0      41
 0.0      33
Name: Sentiment, dtype: int64

In [30]:
# Let's see the Description of the data including thier count and the Five_Point_Summary(min,25%,50%,75%).

In [31]:
df.groupby(['App'])['Sentiment'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
App,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
10 Best Foods for You,41.0,0.780488,0.61287,-1.0,1.0,1.0,1.0,1.0
104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室,23.0,0.826087,0.491026,-1.0,1.0,1.0,1.0,1.0
11st,25.0,0.32,0.9,-1.0,-1.0,1.0,1.0,1.0
1800 Contacts - Lens Store,30.0,0.733333,0.639684,-1.0,1.0,1.0,1.0,1.0
1LINE – One Line with One Touch,21.0,0.571429,0.810643,-1.0,1.0,1.0,1.0,1.0
2018Emoji Keyboard 😂 Emoticons Lite -sticker&gif,18.0,0.777778,0.548319,-1.0,1.0,1.0,1.0,1.0
21-Day Meditation Experience,39.0,0.717949,0.686284,-1.0,1.0,1.0,1.0,1.0
"2Date Dating App, Love and matching",28.0,0.535714,0.838082,-1.0,0.75,1.0,1.0,1.0
2GIS: directory & navigator,27.0,0.407407,0.843949,-1.0,0.0,1.0,1.0,1.0
2RedBeans,23.0,0.73913,0.619192,-1.0,1.0,1.0,1.0,1.0
