This project addresses the need to identify and classify toxic online comments. It introduces the various approaches and algorithms used for classification of toxic comments into six categories. These were evaluated on a large number of Wikipedia comments from the Kaggle Toxic Comment Classification Challenge. Exploratory data analysis has been performed to understand underlying patterns, trends and relationships in the data. Algorithms namely, Logistic Regression, Decision Tree, Random Forest Classifier, SVM, KNN, Naive Bayes SVM, LSTM and Bi-GRU have been applied. Additionally, efficacy of data augmentation is analysed.
Amandeep Kaur, Srinidhi Ayyagari, Drishti Gupta, Aayushi Bansal