## summary of data

In [None]:
import pandas as pd
import numpy as np
from textblob import TextBlob

In [None]:
data = pd.read_csv("employee.csv")
data.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,...,4,80,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,...,2,80,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,...,3,80,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,...,4,80,1,6,3,3,2,2,2,2


### To describe numerical data

In [None]:
data.describe()

Unnamed: 0,Age,DailyRate,DistanceFromHome,Education,EmployeeCount,EmployeeNumber,EnvironmentSatisfaction,HourlyRate,JobInvolvement,JobLevel,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
count,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,...,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0
mean,36.92381,802.485714,9.192517,2.912925,1.0,1024.865306,2.721769,65.891156,2.729932,2.063946,...,2.712245,80.0,0.793878,11.279592,2.79932,2.761224,7.008163,4.229252,2.187755,4.123129
std,9.135373,403.5091,8.106864,1.024165,0.0,602.024335,1.093082,20.329428,0.711561,1.10694,...,1.081209,0.0,0.852077,7.780782,1.289271,0.706476,6.126525,3.623137,3.22243,3.568136
min,18.0,102.0,1.0,1.0,1.0,1.0,1.0,30.0,1.0,1.0,...,1.0,80.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
25%,30.0,465.0,2.0,2.0,1.0,491.25,2.0,48.0,2.0,1.0,...,2.0,80.0,0.0,6.0,2.0,2.0,3.0,2.0,0.0,2.0
50%,36.0,802.0,7.0,3.0,1.0,1020.5,3.0,66.0,3.0,2.0,...,3.0,80.0,1.0,10.0,3.0,3.0,5.0,3.0,1.0,3.0
75%,43.0,1157.0,14.0,4.0,1.0,1555.75,4.0,83.75,3.0,3.0,...,4.0,80.0,1.0,15.0,3.0,3.0,9.0,7.0,3.0,7.0
max,60.0,1499.0,29.0,5.0,1.0,2068.0,4.0,100.0,4.0,5.0,...,4.0,80.0,3.0,40.0,6.0,4.0,40.0,18.0,15.0,17.0


### To describe object data

In [None]:
data.describe(include="object")

Unnamed: 0,Attrition,BusinessTravel,Department,EducationField,Gender,JobRole,MaritalStatus,Over18,OverTime
count,1470,1470,1470,1470,1470,1470,1470,1470,1470
unique,2,3,3,6,2,9,3,1,2
top,No,Travel_Rarely,Research & Development,Life Sciences,Male,Sales Executive,Married,Y,No
freq,1233,1043,961,606,882,326,673,1470,1054


## Sentiment Analysis

In [None]:
data = pd.read_csv("amazon_alexa.tsv", sep="\t")
data.shape

(3150, 5)

In [None]:
data.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1


In [None]:
data.isnull().sum()

rating              0
date                0
variation           0
verified_reviews    1
feedback            0
dtype: int64

In [None]:
data = data.loc[data['verified_reviews'].notnull()]

In [None]:
# calculate the length of the rewiews
data["length"] = data['verified_reviews'].apply(len) # len is the builtin function in python

### Text Polarity

Text polarity is a measure of the sentiment expressed in a piece of text. It ranges from -1 to 1, where:

- -1 indicates a very negative sentiment.
- 0 indicates a neutral sentiment.
- 1 indicates a very positive sentiment.

Polarity is often used in sentiment analysis to determine the overall sentiment of a text, such as a review, comment, or any other form of written content. This can be useful for understanding customer opinions, feedback, and general sentiment towards a product, service, or topic.

In [None]:
def get_polarity(text):
    textblob = TextBlob(str(text.encode('utf-8'))) # encode is optional
    pol = textblob.sentiment.polarity
    return pol

data['polarity'] = data['verified_reviews'].apply(get_polarity)

In [None]:
data.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback,length,polarity
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1,13,0.625
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1,9,0.875
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1,195,-0.1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1,172,0.35
4,5,31-Jul-18,Charcoal Fabric,Music,1,5,0.0


### Text Subjectivity

Text subjectivity is a measure of how subjective or objective a piece of text is. It ranges from 0 to 1, where:

- 0 indicates a very objective text.
- 1 indicates a very subjective text.

Subjectivity is often used in sentiment analysis to determine the degree to which personal opinions, emotions, and biases are expressed in a text. Objective texts are typically factual and neutral, while subjective texts contain personal views, opinions, and feelings. This can be useful for understanding the nature of the content, such as distinguishing between factual reports and opinion pieces.

In [None]:
def get_subjectivity(text):
    textblob = TextBlob(str(text))
    subj = textblob.sentiment.subjectivity
    return subj

data['subjectivity'] = data['verified_reviews'].apply(get_subjectivity)

data.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback,length,polarity,subjectivity
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1,13,0.625,0.6
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1,9,0.875,0.8
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1,195,-0.1,0.5125
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1,172,0.35,0.45
4,5,31-Jul-18,Charcoal Fabric,Music,1,5,0.0,0.0


In [None]:
# Summarise the newly created columns

data[['length', 'polarity', 'subjectivity']].describe()

Unnamed: 0,length,polarity,subjectivity
count,3149.0,3149.0,3149.0
mean,132.090187,0.349903,0.529156
std,182.114569,0.303346,0.256128
min,1.0,-1.0,0.0
25%,30.0,0.125,0.419643
50%,74.0,0.35,0.584848
75%,165.0,0.533333,0.694444
max,2851.0,1.0,1.0


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=5f404f83-de41-42a0-af9d-afbc4f2e2f7f' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>