# 🧠 Fake News Detection using Machine Learning

This project aims to detect whether a news article is *real* or *fake* using Natural Language Processing and Machine Learning techniques.

## 📌 Project Highlights
- Dataset: [Fake and Real News Dataset (Kaggle)](https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset)
- Tools: Python, Pandas, NLTK, Scikit-learn, TF-IDF
- Model: Logistic Regression (96% Accuracy)
- Optional Deployment: Streamlit App


In [None]:
# 📥 Step 1: Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords


In [None]:
# 📂 Step 2: Load Dataset
fake = pd.read_csv("Fake.csv")
true = pd.read_csv("True.csv")

fake['label'] = 0
true['label'] = 1

data = pd.concat([fake, true], axis=0)
data = data[['title', 'text', 'label']].dropna().reset_index(drop=True)


In [None]:
# 🧹 Step 3: Text Preprocessing
import re
stop_words = stopwords.words('english')

def clean_text(text):
    text = re.sub(r'\W', ' ', str(text))
    text = text.lower()
    text = ' '.join([word for word in text.split() if word not in stop_words])
    return text

data['text'] = data['text'].apply(clean_text)


In [None]:
# 📊 Step 4: Vectorization
tfidf = TfidfVectorizer(max_features=5000)
X = tfidf.fit_transform(data['text']).toarray()
y = data['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# 🤖 Step 5: Model Training
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


In [None]:
# 📈 Step 6: Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

## 🚀 Optional: Deploy as a Streamlit App
You can create a simple `app.py` file:
```python
import streamlit as st
headline = st.text_area("Enter News Headline")
if st.button("Predict"):
    vector = tfidf.transform([clean_text(headline)]).toarray()
    result = model.predict(vector)
    st.write("Real News" if result[0] == 1 else "Fake News")
```
Run using: `streamlit run app.py`