### Data Science Project Presentation

# Title Slide
## Project Title: Women's Clothing E-Commerce Reviews Analysis  


# Introduction and Objective
## Brief Overview
Analyzing customer reviews from an e-commerce clothing store to understand customer sentiment and product satisfaction.

## Objectives
- Perform sentiment analysis on customer reviews.
- Identify key factors influencing positive and negative feedback.
- Build a predictive model to classify sentiment based on review text.

# Motivation and Background
## Why is this problem important?
- Customer feedback directly impacts business strategies.
- Sentiment analysis helps brands improve customer experience.
- Automated sentiment classification can aid product improvement.

## Background
- E-commerce platforms rely heavily on customer feedback for sales.
- Sentiment analysis applies Natural Language Processing (NLP) to classify reviews.

# Data Description
## Data Sources
- Dataset obtained from [Kaggle](https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews).

## Type of Data
- Structured data: Contains numerical and textual attributes.

## Key Features Used
- `Review Text`: Main text data for sentiment analysis.
- `Rating`: Used to classify sentiment.
- `Age`: Analyzed to check correlation with sentiment.

# Methodology
## Data Wrangling and ETL Processes
- Loaded dataset and handled missing values.
- Dropped unnecessary columns and performed data cleaning.

## Feature Extraction and Selection Techniques
- Text preprocessing: Tokenization, Stopword Removal, Lemmatization.
- Feature representation: TF-IDF vectorization.

## Algorithms/Models Used
- Logistic Regression for sentiment classification.
- Evaluation of alternative models (SVM, Random Forest).

# Exploratory Data Analysis
## Key Insights from EDA
- Distribution of sentiments across different age groups.
- Common words in positive vs. negative reviews.

## Visualizations
```python
# Sentiment Distribution
sns.countplot(x=df['Sentiment'])
plt.title("Sentiment Distribution")
plt.show()
```

# Results and Evaluation
## Model Performance Metrics
```python
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
```

## Model Comparisons
- Logistic Regression vs. SVM vs. Random Forest.
- Analyzed precision, recall, and F1-score.

# Conclusion and Future Work
## Summary of Findings
- Successfully classified sentiments with high accuracy.
- Identified key influencing words and factors in reviews.

## Limitations
- Dataset imbalance may impact classification.
- Sentiment classification limited by predefined ratings.

## Future Improvements
- Implement deep learning models (LSTM, BERT) for better accuracy.
- Explore multi-class classification beyond three sentiment classes.

# Q&A
- Open for questions and feedback from the audience.
