This project involves analyzing customer reviews to classify them as positive or negative using Logistic Regression. The workflow includes text preprocessing, feature extraction, training a model, making predictions, and evaluating its performance.
The dataset consists of customer reviews labeled with sentiment scores:
- Review: The text of the customer’s review
- Sentiment: The target variable (1 = Positive, 0 = Negative)
Dataset file: customer_reviews_sentiment.csv
Install the necessary dependencies before running the code:
pip install pandas numpy scikit-learn nltk- Load the dataset
- Handle missing values
- Convert text to lowercase
- Remove special characters and punctuation
- Remove stop words (using NLTK or SpaCy)
- Tokenize the text
- Convert text into numerical features using:
- Bag of Words (BoW) or
- TF-IDF (Term Frequency-Inverse Document Frequency)
- Split the dataset into training (80%) and testing (20%) sets
- Train a Logistic Regression classifier on extracted features
- Tune hyperparameters (experiment with regularization parameter C)
- Predict the sentiment for the following reviews:
"This product is amazing! I love it.""It broke after one use, completely disappointed."
- Compute accuracy on the test dataset
- Generate confusion matrix and classification report (precision, recall, F1-score)
- Experiment with other classifiers (e.g., Naive Bayes, SVM)
- Compare their performance with Logistic Regression
- Preprocessed Dataset: Cleaned text data
- Feature-Engineered Dataset: Extracted numerical features
- Trained Model: Logistic Regression with optimized hyperparameters
- Model Evaluation: Accuracy, confusion matrix, classification report
- Sample Predictions: Results for provided test cases
- Model Comparison: Performance of alternative classifiers
- Ensure the dataset is available as
customer_reviews_sentiment.csv - Run the preprocessing and feature extraction scripts
- Train the Logistic Regression model
- Evaluate performance and compare models
Pedahel Emmanuel Kojo
Senior Software Engineer, Machine Learning Engineer at CSP