# Text Analysis on Women's E-commerce Clothing

Download the csv file for women's e-commerce clothing reviews from Kaggle. Then in a Jupyter notebook, use the pandas library to read in the dataset. 

Create a function that will analyze the "Review Text" column and calculate a sentiment value. Make a new column in the dataframe that will contain the sentiment value for each review.

HINTS: Don't forget to import all the libraries/functions that you need. Also, before using your created function, remove the missing values from the "Review Text" column.

Upload your Jupyter notebook to Github and submit the URL for this assignment.




In [79]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize
from nltk.tokenize import TweetTokenizer
from nltk.probability import FreqDist
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

#this is sample data
from nltk.corpus import names  

from string import punctuation
%matplotlib inline
import pandas as pd

In [80]:
#load the data from the Reviews.csv file
filepath = "Womens Clothing E-Commerce Reviews.csv"
women_clothing = pd.read_csv(filepath, encoding = "latin-1") #this file is encoded differently

women_clothing.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,ReviewText,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses


In [81]:
women_clothing = women_clothing[women_clothing['ReviewText'].notnull()]
women_clothing.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,ReviewText,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses


In [82]:
women_clothing.dtypes

Unnamed: 0                  int64
Clothing ID                 int64
Age                         int64
Title                      object
ReviewText                 object
Rating                      int64
Recommended IND             int64
Positive Feedback Count     int64
Division Name              object
Department Name            object
Class Name                 object
dtype: object

In [83]:
## updating the punctuation
new_punc= "'‘','’'"
punctuation += new_punc
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\'‘\',\'’\''

In [84]:
## declaring variable for english stopp words
eng_stopwords = stopwords.words('english')


In [85]:
#create a function to clean up each review
#then it will analyze and assign a sentiment polarity
def reviewSentiment(review):
    
    #make text lowercase
    review = review.lower()
    
    #tokenize the review, it is a list
    tknz_review = word_tokenize(review)
    
    #remove puntuation
    for token in tknz_review:
        if token in punctuation:
            tknz_review.remove(token)
    
    #empty list to hold "cleaned" tokens
    clean_tokens = []
    
    #remove filler words
    for token in tknz_review:
        if token not in eng_stopwords:
            clean_tokens.append(token)
            
    #put sentence back together with remaining clean words
    clean_review = ' '.join(clean_tokens)
    
    #get the polarity scores dictionary
    sid_rev = sid.polarity_scores(clean_review) # polarity can't take a list only string
    
    #get sentiment polarity from the "compound" key in the sid_rev dictionary
    r_comp = sid_rev['compound']
    
    #return the sentiment value
    return r_comp

In [86]:
#create a new column to hold sentiment value from function
women_clothing['Review_Sentiment'] = women_clothing['ReviewText'].apply(reviewSentiment)
women_clothing.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,ReviewText,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,Review_Sentiment
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,0.8991
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,0.971
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,0.9062
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,0.9464
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,0.9117


In [87]:
#create a function to assign a polarity category to the sentiment
def sentimentCategory(sent_num):
    if sent_num >= 0.2:
        return "positive"
    if sent_num <= -0.2:
        return "negative"
    else:
        return "neutral"

In [88]:
#create a new column to hold sentiment category
women_clothing['sentiment_category'] = women_clothing['Review_Sentiment'].apply(sentimentCategory)

women_clothing.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,ReviewText,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,Review_Sentiment,sentiment_category
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,0.8991,positive
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,0.971,positive
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,0.9062,positive
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,0.9464,positive
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,0.9117,positive


In [89]:
# checking some of the review against the sentiment catagories
women_clothing['ReviewText'].iloc[0]

'Absolutely wonderful - silky and sexy and comfortable'

In [90]:
women_clothing['ReviewText'].iloc[2]

'I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c'