# Fake News Detection System using Deep Learning (LSTM)

### Problem Statement

With the rapid growth of digital media, fake news spreads quickly through online platforms and social media, misleading people and creating social, political, and economic issues. Manually verifying news is time-consuming and inefficient. Therefore, there is a need for an automated system that can identify whether a news article is real or fake using machine learning and deep learning techniques.

### Project Description / Information

This project aims to build an intelligent fake news detection system using Natural Language Processing (NLP) and a Long Short-Term Memory (LSTM) deep learning model.
The system processes news article text, converts it into numerical form using tokenization and padding, and then uses an LSTM neural network to learn sequential patterns in text. Based on the learned patterns, the model predicts whether the given news is Real or Fake.

The model is trained on a labeled dataset containing real and fake news articles collected from Kaggle.

In [None]:
#  Import Libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam

In [None]:
#  Load Data
df = pd.read_csv(r"WELFake_Dataset.csv")  # change path if needed
df = df.dropna(subset=['text','label'])     # remove empty rows

In [None]:
#  Prepare Text and Labels
X = df['text']
y = df['label']

In [None]:
#  Tokenize Text
df = df.sample(15000, random_state=42)

X = df['text'].astype(str).str.lower().str.replace('[^a-z ]', '', regex=True)
y = df['label']

max_words = 5000  # top most used words
max_len = 100 # max length for each text

In [56]:
tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(X)
sequences = tokenizer.texts_to_sequences(X)
padded_data = pad_sequences(sequences, maxlen=max_len)

In [None]:
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(padded_data, y, test_size=0.2, random_state=42)

In [None]:
# Build LSTM Model
model = Sequential()
model.add(Embedding(max_words, 128, input_length=max_len))
model.add(LSTM(64, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer=Adam(learning_rate=0.001),
              metrics=['accuracy'])

model.summary()



In [None]:
#  Train Model
history = model.fit(X_train, y_train,
                    epochs=5,
                    batch_size=128,
                    validation_split=0.2)

#  Evaluate Model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy*100:.2f}%")

Epoch 1/5
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 469ms/step - accuracy: 0.8204 - loss: 0.4457 - val_accuracy: 0.8825 - val_loss: 0.2869
Epoch 2/5
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 459ms/step - accuracy: 0.9062 - loss: 0.2419 - val_accuracy: 0.8933 - val_loss: 0.2591
Epoch 3/5
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 470ms/step - accuracy: 0.9349 - loss: 0.1804 - val_accuracy: 0.8804 - val_loss: 0.2815
Epoch 4/5
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 465ms/step - accuracy: 0.9519 - loss: 0.1393 - val_accuracy: 0.8933 - val_loss: 0.2714
Epoch 5/5
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 456ms/step - accuracy: 0.9600 - loss: 0.1171 - val_accuracy: 0.8975 - val_loss: 0.2836
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 29ms/step - accuracy: 0.8883 - loss: 0.3083
Test Accuracy: 88.83%


In [63]:
real_news = """
India successfully launched a new weather observation satellite to improve
cyclone forecasting and climate monitoring, according to officials from ISRO.
The satellite will help provide accurate early warnings and reduce the impact
of natural disasters.
"""

sequence = tokenizer.texts_to_sequences([real_news])
padded_sequence = pad_sequences(sequence, maxlen=max_len)
prediction = model.predict(padded_sequence)

print("Prediction:", prediction)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 81ms/step
Prediction: [[0.75095737]]


In [67]:

if prediction[0][0] > 0.5:
    print(" REAL NEWS")
else:
    print(" FAKE NEWS")


 REAL NEWS


### Output / Results

- The trained LSTM model successfully classifies news articles as Real or Fake

- Achieved an accuracy of 88.97% on the test dataset

- The system provides prediction confidence for each input news article

### Conclusion 

The Fake News Detection System effectively identifies fake and real news articles using deep learning techniques. The LSTM model performs well on news-style text data and demonstrates the practical application of NLP in real-world problems. This project highlights the potential of deep learning in combating misinformation and can be further enhanced by using domain-specific datasets and advanced models.