# ONNX Fake News Detection — End-to-End Example

## Project Goal

This notebook demonstrates an **end-to-end fake news detection system** using:
- LSTM (TensorFlow/Keras) - only this will be used for API
- DistilBERT (HuggingFace) - This is to experiment conversion of  huggingface models with ONNX
- ONNX conversion for deployment
- ONNX Runtime for inference

All logic is accessed from functions and classes defined in `ONNX_Fake_News_Detection_utils.py`.


## Dataset

We use the Kaggle Fake and Real News dataset:
- `Fake.csv` → Fake news articles
- `True.csv` → Real news articles

Each sample contains a news title and body. Labels:
- `0` → Fake
- `1` → Real


In [1]:
%load_ext autoreload
%autoreload 2

import pprint
import matplotlib.pyplot as plt

from ONNX_Fake_News_Detection_utils import (
    load_fake_real_news,
    compute_classification_metrics,
    train_lstm_model,
    convert_lstm_to_onnx,
    predict_lstm_onnx,
    DISTILBERT_ONNX_PATH
    
)


2025-12-15 02:15:59.753703: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1765782959.899660  149013 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1765782959.943345  149013 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1765782960.264232  149013 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1765782960.264275  149013 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1765782960.264277  149013 computation_placer.cc:177] computation placer alr

In [2]:
df = load_fake_real_news()
df.head()

Unnamed: 0,text,label
0,Ben Stein Calls Out 9th Circuit Court: Committ...,0
1,Trump drops Steve Bannon from National Securit...,1
2,Puerto Rico expects U.S. to lift Jones Act shi...,1
3,OOPS: Trump Just Accidentally Confirmed He Lea...,0
4,Donald Trump heads for Scotland to reopen a go...,1


## LSTM Training

We first train a Bidirectional LSTM classifier using padded token sequences.

In [3]:
lstm_run = train_lstm_model(num_samples=None)



[1m299/299[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m771s[0m 3s/step - accuracy: 0.9752 - loss: 0.0574 - val_accuracy: 0.9991 - val_loss: 0.0041
>>> Saved Keras model to models/lstm_fake_news.keras
[1m211/211[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m216s[0m 1s/step


## Convert LSTM to ONNX

ONNX enables framework-agnostic deployment and fast inference.

In [4]:
onnx_path = convert_lstm_to_onnx()
print('Saved ONNX model at:', onnx_path)

>>> Loading Keras model...
>>> Converting to ONNX...


I0000 00:00:1765779976.987188  135037 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765779976.987428  135037 single_machine.cc:374] Starting new session
I0000 00:00:1765779978.782823  135037 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4057 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
I0000 00:00:1765779979.252041  135037 devices.cc:67] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
I0000 00:00:1765779979.252370  135037 single_machine.cc:374] Starting new session
I0000 00:00:1765779979.253573  135037 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4057 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
I0000 00:00:1765779979.609801  135037 mlir_graph_optimization_pass.cc:425] MLIR V1 optimization pass is not enabled


>>> Saved ONNX model to models/lstm_fake_news.onnx
Saved ONNX model at: models/lstm_fake_news.onnx


## LSTM ONNX Inference

This is to test if the ONNX inference is working the way as intended.

In [5]:
samples = [
    'Government announces new economic reforms',
    'You won’t believe what this celebrity did next!'
]

pprint.pprint(predict_lstm_onnx(samples))

[{'label': 0,
  'score': 0.00039252638816833496,
  'text': 'Government announces new economic reforms'},
 {'label': 0,
  'score': 6.729364395141602e-05,
  'text': 'You won’t believe what this celebrity did next!'}]


## Model Evaluation: Test Accuracy and speed (TensorFlow vs ONNX)

In this section, we evaluate the trained LSTM fake news classifier on a small
test set.

We report:
1. Accuracy of the original TensorFlow/Keras model
2. Accuracy of the exported ONNX model using ONNX Runtime
3. Inference speed comparision between both

This verifies that:
- Model performance is preserved after ONNX conversion
- ONNX inference produces consistent predictions


In [6]:
import time
import gc
import pickle
import numpy as np
import tensorflow as tf
from sklearn.metrics import accuracy_score
import pandas as pd

from ONNX_Fake_News_Detection_utils import (
    load_fake_real_news,
    tokenize_and_pad,
    predict_lstm_onnx,
    LSTM_KERAS_PATH,
    LSTM_TOKENIZER_PATH,
)

# ----------------------------------------------------------
# 1. Load unseen evaluation set
# ----------------------------------------------------------
df = load_fake_real_news()
eval_texts = df["text"].iloc[-1000:].tolist()
eval_labels = df["label"].iloc[-1000:].values

print("Evaluation samples:", len(eval_texts))

# ----------------------------------------------------------
# 2. Load TRAINING tokenizer (CRITICAL)
# ----------------------------------------------------------
with open(LSTM_TOKENIZER_PATH, "rb") as f:
    tokenizer = pickle.load(f)

# ----------------------------------------------------------
# 3. TensorFlow inference
# ----------------------------------------------------------
print("\nRunning TensorFlow inference...")

tf_model = tf.keras.models.load_model(LSTM_KERAS_PATH)

X_eval, _ = tokenize_and_pad(eval_texts, tokenizer=tokenizer)
X_eval = X_eval.astype("int32")

start = time.perf_counter()
tf_probs = tf_model.predict(X_eval, verbose=0).reshape(-1)
tf_time = time.perf_counter() - start

tf_preds = (tf_probs > 0.5).astype(int)
tf_acc = accuracy_score(eval_labels, tf_preds)

print(f" TF metrics: {compute_classification_metrics(eval_labels,tf_preds)}")
print(f"TF Inference Time: {tf_time:.4f} sec")

# ----------------------------------------------------------
# 4. Clear TF memory
# ----------------------------------------------------------
del tf_model
tf.keras.backend.clear_session()
gc.collect()

# ----------------------------------------------------------
# 5. ONNX inference
# ----------------------------------------------------------
print("\nRunning ONNX inference...")

start = time.perf_counter()
onnx_results = predict_lstm_onnx(eval_texts)
onnx_time = time.perf_counter() - start

onnx_preds = np.array([r["label"] for r in onnx_results])
onnx_acc = accuracy_score(eval_labels, onnx_preds)

print(f" ONNX metrics: {compute_classification_metrics(eval_labels,onnx_preds)}")
print(f"ONNX Inference Time: {onnx_time:.4f} sec")

# ----------------------------------------------------------
# 6. Summary
# ----------------------------------------------------------
summary = pd.DataFrame({
    "Model": ["TensorFlow LSTM", "ONNX Runtime LSTM"],
    "Accuracy (1000 samples)": [tf_acc, onnx_acc],
    "Inference Time (seconds)": [tf_time, onnx_time],
})

summary


Evaluation samples: 1000

Running TensorFlow inference...




 TF metrics: {'accuracy': 0.999, 'precision': 1.0, 'recall': 0.9977973568281938, 'f1': 0.9988974641675854, 'confusion_matrix': [[546, 0], [1, 453]]}
TF Inference Time: 42.3083 sec

Running ONNX inference...
 ONNX metrics: {'accuracy': 0.999, 'precision': 1.0, 'recall': 0.9977973568281938, 'f1': 0.9988974641675854, 'confusion_matrix': [[546, 0], [1, 453]]}
ONNX Inference Time: 1.1183 sec


Unnamed: 0,Model,Accuracy (1000 samples),Inference Time (seconds)
0,TensorFlow LSTM,0.999,42.308263
1,ONNX Runtime LSTM,0.999,1.118299


ONNX performs extremely faster with same accuracy score.

## DistilBERT Fine-Tuning (Bonus)

We now fine-tune a transformer-based model for stronger language understanding.

In [2]:
from sklearn.model_selection import train_test_split
from ONNX_Fake_News_Detection_utils import load_fake_real_news

# Load dataset
df = load_fake_real_news()

# Optional: limit size for CPU
df = df.head(3000)

# Train / test split (unseen test set)
X_train, X_test, y_train, y_test = train_test_split(
    df["text"].tolist(),
    df["label"].values,
    test_size=0.15,
    stratify=df["label"].values,
    random_state=42,
)

print(len(X_train), len(X_test))

2550 450


In [3]:
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = model.to(device)

model.train()

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


In [4]:
import torch
from torch.utils.data import DataLoader, TensorDataset
from torch.optim import AdamW
from tqdm.auto import tqdm # For progress tracking

#  Setup Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

#  Prepare Data
enc = tokenizer(
    X_train,
    truncation=True,
    padding=True,
    max_length=256,
    return_tensors="pt",
)

train_ds = TensorDataset(
    enc["input_ids"],
    enc["attention_mask"],
    torch.tensor(y_train),
)

loader = DataLoader(train_ds, batch_size=8, shuffle=True)

#  Model to Device & Optimizer
model.to(device)
model.train()
optimizer = AdamW(model.parameters(), lr=3e-5)

#  Training Loop 
epochs = 1
for epoch in range(epochs):
    # Setup progress bar
    loop = tqdm(loader, leave=True)
    loop.set_description(f"Epoch {epoch+1}/{epochs}")
    
    epoch_loss = 0
    for batch in loop:
        # Move batch to GPU
        input_ids = batch[0].to(device)
        attention_mask = batch[1].to(device)
        labels = batch[2].to(device)
        
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels,
        )
        
        loss = outputs.loss
        epoch_loss += loss.item()
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        # Update progress bar suffix
        loop.set_postfix(loss=loss.item())

    print(f"Epoch {epoch+1} Average Loss: {epoch_loss / len(loader):.4f}")

Using device: cuda


  0%|          | 0/319 [00:00<?, ?it/s]

Epoch 1 Average Loss: 0.0518


In [5]:
torch.save(model, 'models/distilbert_fake_news.pth')

## Convert DistilBERT to ONNX

In [6]:
import torch

model.eval()
model.to("cpu")

# Create dummy inputs
dummy_input_ids = torch.randint(0, 30522, (1, 256), dtype=torch.long)
dummy_attention_mask = torch.ones((1, 256), dtype=torch.long)
dummy_inputs = (dummy_input_ids, dummy_attention_mask)

torch.onnx.export(
    model,
    args=dummy_inputs,
    f=DISTILBERT_ONNX_PATH,
    input_names=['input_ids', 'attention_mask'],
    output_names=['logits'],
    dynamic_axes={
        'input_ids': {0: 'batch_size', 1: 'sequence_length'},
        'attention_mask': {0: 'batch_size', 1: 'sequence_length'},
        'logits': {0: 'batch_size'}
    },
    opset_version=14, 
    dynamo=False,      
    do_constant_folding=True
)

print("model converted to ONNX and got exported successfully")

  torch.onnx.export(
  inverted_mask = torch.tensor(1.0, dtype=dtype) - expanded_mask


model converted to ONNX and got exported successfully


## DistilBERT ONNX Inference

In [7]:
import numpy as np
import onnxruntime as ort


sample_text = [
    "The government confirmed new economic policies aimed at reducing inflation."
]

enc = tokenizer(
    sample_text,
    truncation=True,
    padding=True,
    max_length=256,
    return_tensors="np",   
)

#  Cast to int64 
input_ids = enc["input_ids"].astype("int64")
attention_mask = enc["attention_mask"].astype("int64")

#  Initialize ONNX Runtime Session
sess = ort.InferenceSession(
    "models/distilbert_fake_news.onnx",
    providers=["CPUExecutionProvider"],
)

#  Get input names automatically from the model
input_names = [i.name for i in sess.get_inputs()]

#  Run the model
outputs = sess.run(
    None,
    {
        input_names[0]: input_ids,
        input_names[1]: attention_mask,
    },
)

#  Post-process results
logits = outputs[0]

# Apply Softmax to get probabilities
exp_logits = np.exp(logits - logits.max(axis=1, keepdims=True))
probs = exp_logits / exp_logits.sum(axis=1, keepdims=True)

# Get final label 
pred_idx = probs.argmax(axis=1)[0]
labels = [0, 1]

print(f"Probabilities [Fake, Real]: {probs[0]}")
print(f"Predicted label: {labels[pred_idx]} (Confidence: {probs[0][pred_idx]:.2%})")

Probabilities [Fake, Real]: [0.8680874  0.13191265]
Predicted label: 0 (Confidence: 86.81%)


## Model Evaluation: Test Accuracy and speed (Pytorch vs ONNX)

In this section, we evaluate the finetuned DistilBERT fake news classifier on a small
test set.

We report:
1. Accuracy of the original Pytorch/Keras model
2. Accuracy of the exported ONNX model using ONNX Runtime
3. Inference speed comparision between both

This verifies that:
- Model performance is preserved after ONNX conversion
- ONNX inference produces consistent predictions


In [8]:
import time
import gc
import torch
import numpy as np
import pandas as pd
import onnxruntime as ort
from sklearn.metrics import accuracy_score

# Environment fix for potential library conflicts
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

# Setup Data and Batching
df = load_fake_real_news()
eval_texts = df["text"].iloc[-500:].tolist()
eval_labels = df["label"].iloc[-500:].values
batch_size = 16  # Adjust lower (e.g., 8) if kernel still dies

print(f"Evaluation samples: {len(eval_texts)} | Batch Size: {batch_size}")

# ----------------------------------------------------------
#  Batched PyTorch Inference
# ----------------------------------------------------------
print("\nRunning Batched PyTorch inference...")
model.to("cpu")
model.eval()

pt_logits_list = []
start_pt = time.perf_counter()

with torch.no_grad():
    for i in range(0, len(eval_texts), batch_size):
        batch = eval_texts[i : i + batch_size]
        inputs = tokenizer(batch, truncation=True, padding=True, max_length=256, return_tensors="pt").to("cpu")
        
        outputs = model(**inputs)
        pt_logits_list.append(outputs.logits.numpy())
        
        # Explicit memory cleanup
        del inputs, outputs
        if i % 128 == 0: gc.collect()

pt_logits = np.concatenate(pt_logits_list, axis=0)
pt_time = time.perf_counter() - start_pt

pt_preds = np.argmax(pt_logits, axis=1)
pt_acc = accuracy_score(eval_labels, pt_preds)
print(f"PyTorch Acc: {pt_acc:.4f} | Time: {pt_time:.4f}s")

# ----------------------------------------------------------
#  Batched ONNX Inference
# ----------------------------------------------------------
print("\nRunning Batched ONNX inference...")
sess = ort.InferenceSession(DISTILBERT_ONNX_PATH, providers=["CPUExecutionProvider"])
input_names = [i.name for i in sess.get_inputs()]

onnx_logits_list = []
start_onnx = time.perf_counter()

for i in range(0, len(eval_texts), batch_size):
    batch = eval_texts[i : i + batch_size]
    # Tokenize for ONNX (NumPy)
    enc = tokenizer(batch, truncation=True, padding=True, max_length=256, return_tensors="np")
    
    feed = {
        input_names[0]: enc["input_ids"].astype(np.int64),
        input_names[1]: enc["attention_mask"].astype(np.int64)
    }
    
    batch_logits = sess.run(None, feed)[0]
    onnx_logits_list.append(batch_logits)

onnx_logits = np.concatenate(onnx_logits_list, axis=0)
onnx_time = time.perf_counter() - start_onnx

onnx_preds = np.argmax(onnx_logits, axis=1)
onnx_acc = accuracy_score(eval_labels, onnx_preds)
print(f"ONNX Acc: {onnx_acc:.4f} | Time: {onnx_time:.4f}s")


summary = pd.DataFrame({
    "Framework": ["PyTorch", "ONNX Runtime"],
    "Accuracy": [pt_acc, onnx_acc],
    "Time (s)": [pt_time, onnx_time]
})

speedup = pt_time / onnx_time
print(f"\nResult: ONNX is {speedup:.2f}x faster than PyTorch on CPU.")
summary

Evaluation samples: 500 | Batch Size: 16

Running Batched PyTorch inference...
PyTorch Acc: 1.0000 | Time: 52.9812s

Running Batched ONNX inference...
ONNX Acc: 1.0000 | Time: 49.9677s

Result: ONNX is 1.06x faster than PyTorch on CPU.


Unnamed: 0,Framework,Accuracy,Time (s)
0,PyTorch,1.0,52.981248
1,ONNX Runtime,1.0,49.967747


## FastAPI Inference Demo (ONNX LSTM)

To demonstrate how the trained ONNX model can be exposed as a service,
we use a lightweight FastAPI wrapper defined in `ONNX_Fake_News_Detection_utils.py`.

This API:
- Loads the ONNX Runtime session once at startup
- Reuses the training tokenizer
- Exposes a single `/predict` endpoint for inference

This section shows how the API can be instantiated and called locally
without running a web server.


In [12]:
from ONNX_Fake_News_Detection_utils import create_fastapi_app


app = create_fastapi_app(model_type="lstm")

app

<fastapi.applications.FastAPI at 0x7fc29c1487d0>

Instead of deploying the server, we directly call the FastAPI endpoint
function to validate end-to-end API behavior.

In [17]:
import httpx
from httpx import ASGITransport
from ONNX_Fake_News_Detection_utils import FakeNewsRequest

# Create the data object
req_data = {"text": "Government announces new economic reforms to stabilize markets."}

# Use AsyncClient with ASGITransport
# This explicitly tells httpx to communicate directly with your FastAPI 'app'
transport = ASGITransport(app=app)

async with httpx.AsyncClient(transport=transport, base_url="http://test") as client:
    response = await client.post("/predict", json=req_data)

print(response.json())

{'label': 0, 'score': 0.0006501972675323486}


## Conclusion

This notebook demonstrated a full ML lifecycle:
- Training
- Evaluation
- ONNX export
- Deployment-ready inference
- Fast API to expose an API endpoint

Things about API are discussed in ```ONNX_Fake_News_Detection.API.ipynb```
