Fake News Detection

This project focuses on detecting fake news using various machine learning techniques. The dataset used in this project is sourced from Kaggle Fake News Challenge. Multiple approaches, such as classical machine learning models, GloVe embeddings, BERT embeddings, and CNN-based models, are used to identify whether a news article is real or fake.

Project Overview

This repository contains code for data preprocessing, model training, and evaluation on a dataset of fake and real news articles. The techniques explored include:

Data Preprocessing: Text cleaning, tokenization, stop-word removal, and lemmatization.
Feature Extraction: GloVe word embeddings, BERT embeddings, and classical bag-of-words models.
Model Training: Naive Bayes, Decision Tree, Random Forest, K-Nearest Neighbors, and CNN models.
Evaluation: Accuracy, confusion matrix, and classification reports.

Dependencies

The following libraries are required to run the project:

pip install nltk numpy pandas seaborn matplotlib scikit-learn gensim tensorflow torch tqdm keras

Data

The dataset used in this project is the Fake News dataset, which is available on Kaggle. It contains labeled news articles with the following columns:

id: Unique identifier for each article.
title: Title of the article.
author: Author of the article.
text: The content of the article.
label: Target variable (1 for fake news, 0 for real news).

Project Structure

The project has the following directory and file structure:

train.csv, test.csv: Training and test datasets.
GloVe Embedding/: Folder containing pre-trained GloVe embeddings and processed data. (not added due to size)
BERT Embedding/: Folder containing tokenized data and BERT embeddings. (not added due to size)
Models/: Folder containing trained models (e.g., CNN model).
Training/: Jupyter notebook containing exploratory data analysis and model training and evaluation.

Results

The models' performances are evaluated using accuracy, precision, recall, and F1-score. The CNN-based model using Glove Embeddings achieved an accuracy of 89.28% on the test set.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Dataset		Dataset
Models		Models
Training		Training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Detection

Project Overview

Dependencies

Data

Project Structure

Results

About

Languages

License

Shaz-5/fake-news-classification

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection

Project Overview

Dependencies

Data

Project Structure

Results

About

Resources

License

Stars

Watchers

Forks

Languages