This repo demos my basic NLP skills for mult-label / multi-class text classification using the Kaggle cases (Natural Disaster Tweets, and Toxic Comments).
- For the Natural Disaster tweets case, I demonstrate the use of two approaches:
- Support Vector Machines (SVM)
- LSTM and CNN neural network classifiers
- These models result in F1 scores of about 78% for the test data set.
- For the Toxic Comments case, I demonstrate the use of:
- SVM
- Fine tuning BERT (base uncased)
- BERT achieves about 98% F1 score for the test data set
- Conclusion: SVM is good for a quick classification project. With the necessary resources, BERT is very good at text classification.