- Dataset is taken from UCI Machine Learning Repository Website
- nltk (Natural Language Toolkit )
- re (RegeX Module)
- pandas
- sklearn
- numpy
-
Using All features in creating model:
- Accuracy = 0.979372197309417 :: Using Stemming and Bag of Words Model
- Accuracy = 0.976681614349775 :: Using Lemmatization and Bag of Words Model
-
For max_features = 2500 (top most frequent) in creating model:
- Accuracy = 0.985650224215246 :: Using Stemming and Bag of Words Model
- Accuracy = 0.982959641255605 :: Using Lemmatization and Bag of Words Model