This project performs Sentiment Analysis to classify text data into positive, negative, or neutral sentiments. It applies Natural Language Processing (NLP) techniques for text cleaning, vectorization, and machine learning model training to understand public opinions or user feedback effectively.
The dataset used contains text data such as reviews or comments.
The preprocessing steps include:
- Converting all text to lowercase
- Removing punctuation, numbers, and special characters
- Tokenization and stopword removal
- Stemming or Lemmatization for word normalization
To convert text into numerical form, TF-IDF Vectorization or CountVectorizer is used.
This helps the machine learning model understand textual patterns and word importance.
Several machine learning models can be trained and compared, such as:
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
- Random Forest
The best-performing model is selected based on evaluation metrics.
Model performance is evaluated using:
- Accuracy Score
- Confusion Matrix
- Precision, Recall, and F1-Score
- pandas, numpy — Data manipulation
- nltk, re — Text cleaning and preprocessing
- scikit-learn — Machine learning and evaluation
- matplotlib, seaborn — Visualization