created by : I Gusti Ngurah Ervan Juli Ardana
I made this project for the ITS Internal Satria Data (Big Data Challenge) competition. The project uses machine learning to guess how people feel based on movie reviews. let's look deeper
-
Import Libraries
-
Load the Data
-
Pre-process the Data
In the pre-processing stage, various approaches were employed to enhance accuracy:
- Lowercasing
- Removing punctuation
- Eliminating white spaces
- Removing numbers
- Eliminating stop words
- Tokenizing
- Stemming
-
Feature Extraction
Two feature extraction methods were explored:
TF*IDF
andNgrams
. The accuracy analysis indicated thatTF*IDF
yielded superior results. -
Model Development
Several models were tested, including logistic regression, Naive Bayes, Random Forest, and SVM. Based on accuracy results, logistic regression was chosen as it demonstrated the highest accuracy.
-
Hyperparameter Tuning
Hyperparameter tuning was performed using gridsearchcv to optimize the logistic regression model's parameters.
-
Test Prediction
After completing all these steps, the model achieved an accuracy of 0.881.
![image](https://private-user-images.githubusercontent.com/114007640/329327560-80aafc25-469b-4908-b48b-d1d0d8400e7e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA0MTgwODMsIm5iZiI6MTcyMDQxNzc4MywicGF0aCI6Ii8xMTQwMDc2NDAvMzI5MzI3NTYwLTgwYWFmYzI1LTQ2OWItNDkwOC1iNDhiLWQxZDBkODQwMGU3ZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwOFQwNTQ5NDNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zMDIzNjMyOWIxMzE5OTE1MTNmNzBjOTIyNzE2YThiZTE2Njg0ZTE5ZTQyMWE0YmNkNzZmZGE4MDk1ZjgyNjVkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.lWoXRozyXUoYLnwd3bKgor7AakDT_NmFHcbwpfVysuQ)