Women Cloth Reviews Prediction

Title : Women Cloth Reviews Prediction

Objective : The objective of this project is to analyze customer reviews of women's clothing and predict their sentiment using a machine learning approach. The sentiments are categorized into three classes: Positive, Neutral, and Negative.

Import Library

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

Import Dataset

In [None]:
df=pd.read_csv(r"/content/Womens Clothing E-Commerce Reviews.csv")

Data Visualization

In [None]:
df.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23486 entries, 0 to 23485
Data columns (total 11 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Unnamed: 0               23486 non-null  int64 
 1   Clothing ID              23486 non-null  int64 
 2   Age                      23486 non-null  int64 
 3   Title                    19676 non-null  object
 4   Review Text              22641 non-null  object
 5   Rating                   23486 non-null  int64 
 6   Recommended IND          23486 non-null  int64 
 7   Positive Feedback Count  23486 non-null  int64 
 8   Division Name            23472 non-null  object
 9   Department Name          23472 non-null  object
 10  Class Name               23472 non-null  object
dtypes: int64(6), object(5)
memory usage: 2.0+ MB


In [None]:
df.describe()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Rating,Recommended IND,Positive Feedback Count
count,23486.0,23486.0,23486.0,23486.0,23486.0,23486.0
mean,11742.5,918.118709,43.198544,4.195776,0.822362,2.535936
std,6779.968547,203.29898,12.279544,1.110153,0.382216,5.702202
min,0.0,0.0,18.0,1.0,0.0,0.0
25%,5871.25,861.0,34.0,4.0,1.0,0.0
50%,11742.5,936.0,41.0,5.0,1.0,1.0
75%,17613.75,1078.0,52.0,5.0,1.0,3.0
max,23485.0,1205.0,99.0,5.0,1.0,122.0


In [None]:
df.columns

Index(['Unnamed: 0', 'Clothing ID', 'Age', 'Title', 'Review Text', 'Rating',
       'Recommended IND', 'Positive Feedback Count', 'Division Name',
       'Department Name', 'Class Name'],
      dtype='object')

In [None]:
df.shape

(23486, 11)

In [None]:
df['Review Text']

Unnamed: 0,Review Text
0,Absolutely wonderful - silky and sexy and comf...
1,Love this dress! it's sooo pretty. i happene...
2,I had such high hopes for this dress and reall...
3,"I love, love, love this jumpsuit. it's fun, fl..."
4,This shirt is very flattering to all due to th...
...,...
23481,I was very happy to snag this dress at such a ...
23482,"It reminds me of maternity clothes. soft, stre..."
23483,"This fit well, but the top was very see throug..."
23484,I bought this dress for a wedding i have this ...


Handling Missing Values

In [None]:
df = df.dropna(subset=["Review Text"])


Preprocessing Data

In [None]:
def preprocess_text(text):
    return text.lower()
df['Cleaned_Review'] = df['Review Text'].apply(preprocess_text)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Cleaned_Review'] = df['Review Text'].apply(preprocess_text)


In [None]:
def sentiment_label(rating):
    if rating >= 4:
        return "Positive"
    elif rating == 3:
        return "Neutral"
    else:
        return "Negative"

df['Sentiment'] = df['Rating'].apply(sentiment_label)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Sentiment'] = df['Rating'].apply(sentiment_label)


Split data into features and labels

In [None]:
X = df['Cleaned_Review']
y = df['Sentiment']

Train Test Split Data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

Model Selection And Training

In [None]:
model = MultinomialNB()
model.fit(X_train_vec, y_train)

Testing

In [None]:
y_pred = model.predict(X_test_vec)


Calcualting Accuracy

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 82.60%


In [None]:
def predict_sentiment(review):
    review_vec = vectorizer.transform([review.lower()])
    return model.predict(review_vec)[0]


Test with custom reviews

In [None]:
test_reviews = [
    "Amazing dress! Fits perfectly.",
    "It's okay, but could be better.",
    "I'm upset because for the price of the dress"
]

for review in test_reviews:
    print(f"Review: {review}\nPredicted Sentiment: {predict_sentiment(review)}\n")


Review: Amazing dress! Fits perfectly.
Predicted Sentiment: Positive

Review: It's okay, but could be better.
Predicted Sentiment: Neutral

Review: I'm upset because for the price of the dress
Predicted Sentiment: Negative



Explanation: This project focuses on analyzing customer reviews for women's clothing to predict their sentiment (Positive, Neutral, or Negative) using machine learning. A TF-IDF vectorizer converts the text into numerical features, and a Multinomial Naïve Bayes model is trained for sentiment classification. The model is evaluated for accuracy and provides predictions for new reviews, aiding businesses in understanding customer feedback and improving decision-making.