<a href="https://colab.research.google.com/github/Francisroyce/Francisroyce/blob/main/fake_news_detection_tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
# from google.colab import drive
# drive.mount('/content/drive')

**Project Title: Fake News Detection**

**Project Overview:**
The Fake News Detection project aims to develop a machine learning model capable of distinguishing between genuine news articles and fake news articles. With the proliferation of digital media and the ease of sharing information, the spread of misinformation and fake news has become a significant concern. This project aims to contribute to the fight against misinformation by automating the process of identifying fake news using advanced natural language processing and machine learning techniques.

**Project Steps:**

1. **Data Collection:** Gather a diverse and comprehensive dataset consisting of labeled news articles, including both genuine and fake examples. This dataset will serve as the foundation for training and evaluating the model.

2. **Data Preprocessing:** Clean and preprocess the text data by removing irrelevant information, special characters, and formatting inconsistencies. Perform tokenization, stemming, and other text normalization techniques to prepare the data for model training.

3. **Feature Extraction:** Convert the preprocessed text into numerical features that can be fed into machine learning algorithms. Common techniques include TF-IDF (Term Frequency-Inverse Document Frequency) and word embeddings like Word2Vec or GloVe.

4. **Model Selection:** Experiment with various machine learning algorithms such as Support Vector Machines (SVM), Random Forests, and neural networks like Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN). Select the best-performing algorithm based on evaluation metrics.

5. **Model Training:** Divide the dataset into training, validation, and testing sets. Train the selected model on the training data and fine-tune its parameters using the validation set. This step involves optimizing the model's ability to differentiate between genuine and fake news.

6. **Model Evaluation:** Assess the trained model's performance using appropriate evaluation metrics such as accuracy, precision, recall, F1-score, and AUC-ROC curve. Make adjustments and iterate on the model if necessary to achieve desired performance.

7. **Deployment:** Once a satisfactory model is achieved, deploy it to a user-friendly interface. This could be a web application, browser extension, or API that allows users to input news articles and receive a prediction about their authenticity.

8. **Continuous Improvement:** Regularly update and retrain the model with new data to adapt to evolving trends in fake news. Implement feedback mechanisms to gather user input and improve the model's accuracy over time.

**Project Goals:**

1. Develop a reliable and accurate fake news detection model.
2. Provide users with a tool to verify the credibility of news articles they encounter online.
3. Contribute to the efforts in reducing the spread of misinformation and fake news.
4. Raise awareness about the importance of critical media consumption and fact-checking.

**Potential Challenges:**

1. **Data Quality:** Ensuring the dataset is representative and balanced to prevent bias.
2. **Feature Engineering:** Selecting and extracting relevant features from the text data.
3. **Model Complexity:** Balancing model complexity with computational resources and deployment constraints.
4. **Adversarial Attacks:** Mitigating potential attempts to manipulate the model by generating sophisticated fake news.

By successfully implementing this project, we can create a valuable tool to combat the spread of fake news and promote more informed and responsible media consumption.

# Importing dependencies

In [6]:
pip install contractions




In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 10]

import seaborn as sns
sns.set_theme(style="darkgrid")

from wordcloud import WordCloud

import contractions
import string
import re

from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Embedding, Flatten

import warnings
warnings.filterwarnings('ignore')


In [12]:
train_data = pd.read_csv('/content/drive/MyDrive/fake_news/train.csv', header = 0)
test_data = pd.read_csv('/content/drive/MyDrive/fake_news/test.csv', header= 0)

In [13]:
train_data.head()

Unnamed: 0,id,title,author,text,label
0,0,House Dem Aide: We Didn’t Even See Comey’s Let...,Darrell Lucus,House Dem Aide: We Didn’t Even See Comey’s Let...,1
1,1,"FLYNN: Hillary Clinton, Big Woman on Campus - ...",Daniel J. Flynn,Ever get the feeling your life circles the rou...,0
2,2,Why the Truth Might Get You Fired,Consortiumnews.com,"Why the Truth Might Get You Fired October 29, ...",1
3,3,15 Civilians Killed In Single US Airstrike Hav...,Jessica Purkiss,Videos 15 Civilians Killed In Single US Airstr...,1
4,4,Iranian woman jailed for fictional unpublished...,Howard Portnoy,Print \nAn Iranian woman has been sentenced to...,1


In [15]:
test_data.head()

Unnamed: 0,id,title,author,text
0,20800,"Specter of Trump Loosens Tongues, if Not Purse...",David Streitfeld,"PALO ALTO, Calif. — After years of scorning..."
1,20801,Russian warships ready to strike terrorists ne...,,Russian warships ready to strike terrorists ne...
2,20802,#NoDAPL: Native American Leaders Vow to Stay A...,Common Dreams,Videos #NoDAPL: Native American Leaders Vow to...
3,20803,"Tim Tebow Will Attempt Another Comeback, This ...",Daniel Victor,"If at first you don’t succeed, try a different..."
4,20804,Keiser Report: Meme Wars (E995),Truth Broadcast Network,42 mins ago 1 Views 0 Comments 0 Likes 'For th...


In [16]:
train_data.isnull().sum()

id           0
title      558
author    1957
text        39
label        0
dtype: int64

In [17]:
test_data.isna().sum()

id          0
title     122
author    503
text        7
dtype: int64