## Women’s E-Commerce Clothing Reviews:

### Introduction
In this article, we explore the **Women’s E-Commerce Clothing Reviews Dataset**, which contains valuable information about customer reviews for women’s clothing items. We'll cover aspects such as clothing details, reviewer demographics, and sentiment analysis.

### Dataset Overview
The dataset includes the following columns:

*You can download the dataset from [here](https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews)*

1. **Clothing ID**: An integer representing the specific clothing item being reviewed.
2. **Age**: A positive integer indicating the reviewer’s age.
3. **Title**: A string variable for the review title.
4. **Review Text**: A string variable containing the detailed review body.
5. **Rating**: A positive ordinal integer (1 to 5) representing the product score.
6. **Recommended IND**: A binary variable (1 for recommended, 0 for not recommended).
7. **Positive Feedback Count**: A positive integer documenting the number of other customers who found this review positive.
8. **Division Name**: A categorical name representing the product high-level division.
9. **Department Name**: A categorical name representing the product department.
10. **Class Name**: A categorical name representing the product class.

### Exploratory Data Analysis
Before diving into specific analyses, let's explore some basic statistics and visualizations:

1. **Age Distribution**: Plot a histogram to visualize the distribution of reviewer ages.
2. **Rating Distribution**: Show the distribution of product ratings (1 to 5 stars).
3. **Recommended vs. Not Recommended**: Compare the number of recommended and not recommended products.
4. **Word Cloud for Review Text**: Create a word cloud to highlight frequently used words in review text.

### Sentiment Analysis
We'll perform sentiment analysis on the review text using the following steps:

1. **Tokenization**: Break down review text into individual words (tokens).
2. **Stemming or Lemmatization**: Reduce words to their base form (stem or lemma).
3. **Sentiment Score**: Calculate sentiment scores (positive, negative, or neutral) for each review.

### Insights and Recommendations
Based on the analysis, we can draw insights such as:

- Which clothing divisions or departments receive the highest ratings?
- Are there specific age groups that provide more positive feedback?
- What are the most common words used in positive/negative reviews?

## Import Required Liberaries

In [2]:
import pandas as pd
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
import spacy
nltk.download('punkt')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Marwa\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## Reading Dataset

In [3]:
# Load the dataset
df = pd.read_csv("D:\Collage Materials\Womens Clothing E-Commerce Reviews.csv")

# Display the first few rows to verify the data
df.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses


## Tokenization

In [6]:
# Clean data (remove NaNs)
df.dropna(subset=['Review Text'], inplace=True)

# Verify data type of 'Review Text' column
print(df['Review Text'].dtype)

# Tokenize the review text
df['tokenized_text'] = df['Review Text'].apply(word_tokenize)

# Display tokenized text for the first review
print(df['tokenized_text'][0])

object
['Absolutely', 'wonderful', '-', 'silky', 'and', 'sexy', 'and', 'comfortable']


## Stemming

In [7]:
# Initialize the stemmer
stemmer = PorterStemmer()

# Apply stemming to the tokenized text
df['stemmed_text'] = df['tokenized_text'].apply(lambda tokens: [stemmer.stem(token) for token in tokens])

# Display stemmed text for the first review
print(df['stemmed_text'][0])

['absolut', 'wonder', '-', 'silki', 'and', 'sexi', 'and', 'comfort']


In [None]:
# Team Member : 
# 1 - Marwa Essam Elmorsy Alshafei
# 2 - Sara Mahmoud Hassanin
# 4th Year - CS -NLP
