üì∞ Fake News AI: Detection and Generation System
Description:

Developed an advanced AI-powered system capable of both detecting and generating fake news using state-of-the-art Natural Language Processing (NLP) techniques. The project aims to tackle misinformation by leveraging transformer-based models and providing an interactive interface for public awareness and research.

Key Features:

Fake News Detection:
Fine-tuned a BERT (Bidirectional Encoder Representations from Transformers) model on a labeled fake news dataset to classify news articles as real or fake with high accuracy. Implemented robust text preprocessing techniques to clean and normalize news content for optimal model performance.

Fake News Generation:
Fine-tuned a GPT-2 (Generative Pretrained Transformer 2) model on fake news datasets to generate realistic yet synthetic news articles. This component is used for educational and research purposes to study how generative models can mimic misinformation patterns.

Interactive Web App:
Built a user-friendly interface using Streamlit, allowing users to:

Input or paste news content and receive real-time fake/real classification.

Generate new fake news headlines or articles based on custom prompts.

View model confidence scores and visual explanations (optional integration with SHAP/LIME for interpretability).

Technologies Used:

Python, PyTorch, HuggingFace Transformers

BERT & GPT-2

Pandas, Scikit-learn, Numpy

Streamlit for UI

Jupyter/Spyder IDE for development

Outcome:

Demonstrated the dual power of transformers in both combating and understanding fake news.

Created an educational tool to showcase the dangers and capabilities of modern text generation.

Enhanced hands-on experience with NLP pipelines, fine-tuning large language models, and real-time model deployment.


In [1]:
# Install dependencies
!pip install -q transformers torch gradio

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m363.4/363.4 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m13.8/13.8 MB[0m [31m38.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m24.6/24.6 MB[0m [31m40.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m883.7/883.7 kB[0m [31m41.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m664.8/664.8 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m


In [2]:
# Imports
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import gradio as gr

In [3]:
# Check device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [4]:
# Load GPT-2 model and tokenizer
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2").to(device)

# Load fine-tuned BERT model and tokenizer for Fake News Detection
bert_tokenizer = AutoTokenizer.from_pretrained("Pulk17/Fake-News-Detection")
bert_model = AutoModelForSequenceClassification.from_pretrained(
    "Pulk17/Fake-News-Detection"
).to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

In [5]:
# Fake news generator
def generate_fake_news(prompt):
    inputs = gpt2_tokenizer.encode(prompt, return_tensors="pt").to(device)
    outputs = gpt2_model.generate(
        inputs,
        max_length=200,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        do_sample=True,
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        early_stopping=True
    )
    generated_text = gpt2_tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

In [6]:
# News classification (fake/real)
def detect_news(text):
    inputs = bert_tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
    with torch.no_grad():
        outputs = bert_model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()
    confidence = torch.softmax(logits, dim=1)[0][predicted_class].item()
    label = "üü• Fake News" if predicted_class == 0 else "üü© Real News"
    return f"{label} (Confidence: {confidence:.2f})"

In [7]:
# Gradio Interface
with gr.Blocks() as demo:
    gr.Markdown("## üì∞ Fake News Generator & Detector (GPT-2 + BERT)")

    with gr.Tab("üõ†Ô∏è Generate Fake News"):
        with gr.Row():
            input_text = gr.Textbox(
                label="Enter a News Headline or Prompt",
                placeholder="e.g. Scientists discover a talking dolphin species near Japan...",
                lines=2
            )
        generate_btn = gr.Button("Generate")
        output_text = gr.Textbox(label="Generated News Article")
        generate_btn.click(generate_fake_news, inputs=input_text, outputs=output_text)

    with gr.Tab("üîç Detect Fake or Real"):
        with gr.Row():
            detect_input = gr.Textbox(
                label="Enter a News Article or Statement",
                placeholder="Paste a paragraph to detect if it's fake or real...",
                lines=5
            )
        detect_btn = gr.Button("Detect")
        detect_output = gr.Textbox(label="Detection Result")
        detect_btn.click(detect_news, inputs=detect_input, outputs=detect_output)

In [8]:
# Launch the Gradio app
demo.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://fa19e933cb7f942ce2.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


