Spam Comment Identification using Naive Bayes Classifier

Project Overview

This project focuses on training a Naive Bayes classifier to identify spam comments. The classifier is trained on a dataset containing comments from YouTube, specifically focusing on Shakira's videos.

Project Structure

main.py: Python script containing the code for data preprocessing, model training, and evaluation.
Youtube05-Shakira.csv: Dataset file with comment data.
README.md: Project documentation.

Prerequisites

Python
Libraries: pandas, numpy, scikit-learn, nltk

How to Run

Install the required libraries: pip install pandas numpy scikit-learn nltk.
Open and run the spam_classifier.py script in any Python IDE.

Project Details

Data Loading: The dataset is loaded and explored using pandas to understand its structure.
Text Preprocessing: Comments are tokenized, stop words are removed, and stemming is applied.
Vectorization: CountVectorizer is used to convert text data into a numerical format.
Model Training: A Multinomial Naive Bayes classifier is trained on the transformed data.
Cross-validation: The model is evaluated using 5-fold cross-validation.
Testing: The trained model is tested on a set of new comments, and accuracy is reported.

Feel free to explore and modify the code to suit your needs!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Youtube05-Shakira.csv		Youtube05-Shakira.csv
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Comment Identification using Naive Bayes Classifier

Project Overview

Project Structure

Prerequisites

How to Run

Project Details

About

Releases

Packages

Languages

License

Dongli99/NLP-SpamClassify

Folders and files

Latest commit

History

Repository files navigation

Spam Comment Identification using Naive Bayes Classifier

Project Overview

Project Structure

Prerequisites

How to Run

Project Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages