## Womens Ecommerce Clothing Reviews
This dataset was gotten from https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews and is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers.  Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”.

This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

- **Clothing ID:** Integer Categorical variable that refers to the specific piece being reviewed.
- **Age:** Positive Integer variable of the reviewers age.
- **Title:** String variable for the title of the review.
- **Review Text:** String variable for the review body.
- **Rating:** Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
- **Recommended IND:** Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
- **Positive Feedback Count:** Positive Integer documenting the number of other customers who found this review positive.
- **Division Name:** Categorical name of the product high level division.
- **Department Name:** Categorical name of the product department name.
- **Class Name:** Categorical name of the product class name.

**Data Exploration**

In [35]:
import pandas as pd
pd.set_option('display.max_colwidth', 100)

In [74]:
data = pd.read_csv('Womens Clothing E-Commerce Reviews.csv')
data.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,0,767,33,,Absolutely wonderful - silky and sexy and comfortable,4,1,0,Initmates,Intimate,Intimates
1,1,1080,34,,"Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i n...",5,1,4,General,Dresses,Dresses
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and really wanted it to work for me. i initially ordered th...,3,0,0,General,Dresses,Dresses
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get no...",5,1,0,General Petite,Bottoms,Pants
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to the adjustable front tie. it is the perfect length t...,5,1,6,General,Tops,Blouses


In [75]:
#drop columns and check class distribution of target class
data.drop(labels=['Unnamed: 0','Clothing ID'], inplace = True, axis = 1)
data['Class Name'].value_counts()

Dresses           6319
Knits             4843
Blouses           3097
Sweaters          1428
Pants             1388
Jeans             1147
Fine gauge        1100
Skirts             945
Jackets            704
Lounge             691
Swim               350
Outerwear          328
Shorts             317
Sleep              228
Legwear            165
Intimates          154
Layering           146
Trend              119
Casual bottoms       2
Chemises             1
Name: Class Name, dtype: int64

In [76]:
##remove null values
print(data.isnull().sum())
data.dropna(inplace = True)
print('\nNull values dropped\n')
print(data.isnull().sum())

Age                           0
Title                      3810
Review Text                 845
Rating                        0
Recommended IND               0
Positive Feedback Count       0
Division Name                14
Department Name              14
Class Name                   14
dtype: int64

Null values dropped

Age                        0
Title                      0
Review Text                0
Rating                     0
Recommended IND            0
Positive Feedback Count    0
Division Name              0
Department Name            0
Class Name                 0
dtype: int64


## Text Cleaning

**Remove Punctuation**

In [77]:
import string

In [78]:
def remove_puncts(text):
    """Function to remove punctuations from review text using list comprehension"""
    text_nopunct = ''.join([char for char in text if char not in string.punctuation ])
    return text_nopunct

In [79]:
##apply remove_puncts function to text data
data['cleaned_text'] = data['Review Text'].apply(remove_puncts)
data['cleaned_text'].head()

2    I had such high hopes for this dress and really wanted it to work for me i initially ordered the...
3    I love love love this jumpsuit its fun flirty and fabulous every time i wear it i get nothing bu...
4    This shirt is very flattering to all due to the adjustable front tie it is the perfect length to...
5    I love tracy reese dresses but this one is not for the very petite i am just under 5 feet tall a...
6    I aded this in my basket at hte last mintue to see what it would look like in person store pick ...
Name: cleaned_text, dtype: object

**Tokenization**

In [80]:
import re

In [81]:
def tokenize(text):
    """Tokenize text using regular expression"""
    tokens = re.split('\W', text)
    return tokens

In [82]:
data['cleaned_text'] = data['cleaned_text'].apply(tokenize)
data['cleaned_text'].head()

2    [I, had, such, high, hopes, for, this, dress, and, really, wanted, it, to, work, for, me, i, ini...
3    [I, love, love, love, this, jumpsuit, its, fun, flirty, and, fabulous, every, time, i, wear, it,...
4    [This, shirt, is, very, flattering, to, all, due, to, the, adjustable, front, tie, it, is, the, ...
5    [I, love, tracy, reese, dresses, but, this, one, is, not, for, the, very, petite, i, am, just, u...
6    [I, aded, this, in, my, basket, at, hte, last, mintue, to, see, what, it, would, look, like, in,...
Name: cleaned_text, dtype: object

**Remove Stopwords**

In [83]:
import nltk

stopwords = nltk.corpus.stopwords.words('english')

In [84]:
def remove_stopwords(text):
    """Remove stopwords from data"""
    cleaned_text = [word for word in text if word not in stopwords]
    return cleaned_text

In [85]:
data['cleaned_text'] = data['cleaned_text'].apply(remove_stopwords)
data['cleaned_text'].head()

2    [I, high, hopes, dress, really, wanted, work, initially, ordered, petite, small, usual, size, fo...
3    [I, love, love, love, jumpsuit, fun, flirty, fabulous, every, time, wear, get, nothing, great, c...
4    [This, shirt, flattering, due, adjustable, front, tie, perfect, length, wear, leggings, sleevele...
5    [I, love, tracy, reese, dresses, one, petite, 5, feet, tall, usually, wear, 0p, brand, dress, pr...
6    [I, aded, basket, hte, last, mintue, see, would, look, like, person, store, pick, went, teh, dar...
Name: cleaned_text, dtype: object

**Stemming**

In [86]:
#instantiate stemmer
wn = nltk.WordNetLemmatizer()

In [87]:
def stemming(text):
    """Stemm text in the data"""
    stemmed_text = [wn.lemmatize(word) for word in text]
    return stemmed_text

In [88]:
data['cleaned_text'] = data['cleaned_text'].apply(stemming)
data['cleaned_text'].head()

2    [I, high, hope, dress, really, wanted, work, initially, ordered, petite, small, usual, size, fou...
3    [I, love, love, love, jumpsuit, fun, flirty, fabulous, every, time, wear, get, nothing, great, c...
4    [This, shirt, flattering, due, adjustable, front, tie, perfect, length, wear, legging, sleeveles...
5    [I, love, tracy, reese, dress, one, petite, 5, foot, tall, usually, wear, 0p, brand, dress, pret...
6    [I, aded, basket, hte, last, mintue, see, would, look, like, person, store, pick, went, teh, dar...
Name: cleaned_text, dtype: object