- Installation
- Repository Structure
- Project Motivation
- File Descriptions
- Models Used
- Instructions To Run
- Results
- Acknowledgement, Author and Licensing
The code should run with no issues using Python versions 3.* Using Jupyter notebook from Anaconda is recommended. You may use other data visualization tools like Tableau for reference. The libraries required with appropriate versions can be found in requirements.txt.
Years for now the comment section of Youtube has been plagued with random spam content and the youtube doesn't seem to doing anything about it. We here introduce a new comment section revamped to make the good comments float to the top and the spam ones to linger to the bottom.
data - This data file, attached to the repository contains all the data. It contains different kinds of comments, classified into 3 categories - Non-offensive, Hate-Speech and Abusive. The data has been collected using Youtube API scraping. The categories ave been assigned manually.
Youtube API- Scraping Comments
- Logistic regression:
- Support Vector Machine:
- Support Vector Machine with Linear Kernel:
- Support Vector Machine using RBF Kernel:
- Support Vector Machine using Polynomial Kernel:
- Decision Tree Classifier:
- K-Nearest Neighbour Classifier:
- Extra Tree Classifier:
- Random Forest Classifier:
- Model Parameter Optimization using GridSearchCV:
First install the dependencies
pip install -r requirements.txt
Now to run the code on streamlit
streamlit run main.py
The interactive web app is hosted on Streamlit and can be found :-
For the project, I give credit to
- Dr. Ankit Bhurane for guiding us in this project
- Dr. Andrew Ng for his insightful course on Coursera
The code can be freely used by any individual or organization for their needs.