# Deep Learning Model: BiLSTM + Embeddings (Fake News Classification)

This notebook trains a Bidirectional LSTM (BiLSTM) model using learned word embeddings for fake news classification.
We compare it to a simpler embedding baseline to see whether sequence modeling (word order) helps.

## 1) Environment & Reproducibility

- Forcing CPU mode to avoid CUDA/TensorFlow kernel crashes in WSL.
- This notebook is still run using the `tf-gpu` kernel, but TensorFlow will ignore GPU devices.

In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

In [2]:
import pandas as pd
import numpy as np

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

2026-02-05 05:43:07.398105: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-02-05 05:43:08.068231: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-02-05 05:43:09.757316: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


In [3]:
#Device Check
print("TensorFlow:", tf.__version__)
print("GPUs visible to TF:", tf.config.list_physical_devices("GPU"))

TensorFlow: 2.20.0
GPUs visible to TF: []


2026-02-05 05:43:14.531308: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected


## 2) Load Dataset

Loads the training dataset and identifies:
- the text column used as input
- the label column used as the target

In [14]:
TRAIN_PATH = "../data/training_data_lowercase.csv"
TEST_PATH  = "../data/testing_data_lowercase_nolabels.csv"

data = pd.read_csv(TRAIN_PATH, sep="\t", header=None, names=["label", "text"])
data_out = pd.read_csv(TEST_PATH, sep="\t", header=None, names=["text"])

print("train:", data.shape)
print("test :", data_out.shape)
data.head()

TEXT_COL = "text"
LABEL_COL = "label"

train: (34152, 2)
test : (9984, 1)


In [15]:
print("Shape:", data.shape)
print("Missing text:", data[TEXT_COL].isna().sum())
print("Missing labels:", data[LABEL_COL].isna().sum())
print("\nLabel distribution:\n", data[LABEL_COL].value_counts())

Shape: (34152, 2)
Missing text: 0
Missing labels: 0

Label distribution:
 label
0    17572
1    16580
Name: count, dtype: int64


## 3) Train/Validation Split + Label Encoding

- Encodes labels into integers for `sparse_categorical_crossentropy`.
- Splits once and reuses the same split for all deep models in this notebook.

In [16]:
# Train/Validation Split + Label Encoding

X_text = data[TEXT_COL].astype(str)
y_label = data[LABEL_COL]

# label mappings
class_labels = sorted(y_label.unique())
label_to_id = {label: idx for idx, label in enumerate(class_labels)}
id_to_label = {idx: label for label, idx in label_to_id.items()}

# Convert labels to integers (for sparse_categorical_crossentropy)
y_id = y_label.map(label_to_id).astype("int32").to_numpy()

# Split once (reuse this split for all models in this notebook)
X_train, X_val, y_train, y_val = train_test_split(
    X_text, y_id,
    test_size=0.2,
    random_state=42,
    stratify=y_id
)

# dtype-safe arrays for Keras (avoid pandas object dtype issues)
X_train_np = X_train.astype(str).to_numpy()
X_val_np   = X_val.astype(str).to_numpy()

print("Train size:", len(X_train_np), "| Val size:", len(X_val_np))
print("Classes:", class_labels)


Train size: 27321 | Val size: 6831
Classes: [np.int64(0), np.int64(1)]


## 4) Text Vectorization

Converts raw text into integer token sequences.
IMPORTANT: `.adapt()` is fit ONLY on training text to avoid leakage.

In [None]:
max_tokens = 20000
seq_len = 250

vectorize = layers.TextVectorization(
    max_tokens=max_tokens,
    output_mode="int",
    output_sequence_length=seq_len
)
vectorize.adapt(X_train_np)

## 5) Baseline: Simple Neural Network (Embeddings + Global Average Pooling)

This baseline ignores word order and pools embeddings across the sequence.
We compare against BiLSTM to see if sequence modeling improves performance.

In [None]:
num_classes = len(class_labels)

simple_nn = keras.Sequential([
    keras.Input(shape=(1,), dtype=tf.string),
    vectorize,
    layers.Embedding(max_tokens, 64),
    layers.GlobalAveragePooling1D(),
    layers.Dropout(0.3),
    layers.Dense(64, activation="relu"),
    layers.Dense(num_classes, activation="softmax")
])
simple_nn.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

simple_nn.summary()

In [19]:
early_stop = keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=2,
    restore_best_weights=True
)
history_simple = simple_nn.fit(
    X_train_np, y_train,
    validation_data=(X_val_np, y_val),
    epochs=10,
    batch_size=64,
    callbacks=[early_stop],
    verbose=1
)

Epoch 1/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.5326 - loss: 0.6891 - val_accuracy: 0.6187 - val_loss: 0.6597
Epoch 2/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.7302 - loss: 0.5293 - val_accuracy: 0.5942 - val_loss: 0.6685
Epoch 3/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.8757 - loss: 0.2948 - val_accuracy: 0.8691 - val_loss: 0.2867
Epoch 4/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.8980 - loss: 0.2430 - val_accuracy: 0.9319 - val_loss: 0.1861
Epoch 5/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9068 - loss: 0.2245 - val_accuracy: 0.9392 - val_loss: 0.1614
Epoch 6/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9194 - loss: 0.1972 - val_accuracy: 0.9385 - val_loss: 0.1520
Epoch 7/10
[1m427/427[0m 

In [20]:
val_prob_simple = simple_nn.predict(X_val_np, verbose=0)
val_pred_id_simple = val_prob_simple.argmax(axis=1)

y_val_label = pd.Series(y_val).map(id_to_label)
val_pred_label_simple = pd.Series(val_pred_id_simple).map(id_to_label)

simple_acc = accuracy_score(y_val_label, val_pred_label_simple)
simple_f1  = f1_score(y_val_label, val_pred_label_simple, average="weighted")

print("Simple NN Accuracy:", round(simple_acc, 4))
print("Simple NN F1 (weighted):", round(simple_f1, 4))

Simple NN Accuracy: 0.9385
Simple NN F1 (weighted): 0.9385


## 6) Main Model: BiLSTM + Embeddings

BiLSTM reads the sequence in both directions (left→right and right→left),
which can capture phrasing and context that a pooled model may miss.

In [21]:
bilstm_model = keras.Sequential([
    keras.Input(shape=(1,), dtype=tf.string),
    vectorize,
    layers.Embedding(max_tokens, 64),
    layers.Bidirectional(layers.LSTM(64)),
    layers.Dropout(0.3),
    layers.Dense(64, activation="relu"),
    layers.Dense(num_classes, activation="softmax")
])
bilstm_model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)
bilstm_model.summary()

In [22]:
early_stop = keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=2,
    restore_best_weights=True
)
history_bilstm = bilstm_model.fit(
    X_train_np, y_train,
    validation_data=(X_val_np, y_val),
    epochs=10,
    batch_size=64,
    callbacks=[early_stop],
    verbose=1
)

Epoch 1/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 81ms/step - accuracy: 0.9209 - loss: 0.1839 - val_accuracy: 0.9707 - val_loss: 0.0825
Epoch 2/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 80ms/step - accuracy: 0.9822 - loss: 0.0499 - val_accuracy: 0.9640 - val_loss: 0.0995
Epoch 3/10
[1m427/427[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 81ms/step - accuracy: 0.9940 - loss: 0.0186 - val_accuracy: 0.9612 - val_loss: 0.1194


In [23]:
val_prob_bilstm = bilstm_model.predict(X_val_np, verbose=0)
val_pred_id_bilstm = val_prob_bilstm.argmax(axis=1)

val_pred_label_bilstm = pd.Series(val_pred_id_bilstm).map(id_to_label)

bilstm_acc = accuracy_score(y_val_label, val_pred_label_bilstm)
bilstm_f1  = f1_score(y_val_label, val_pred_label_bilstm, average="weighted")

print("BiLSTM Accuracy:", round(bilstm_acc, 4))
print("BiLSTM F1 (weighted):", round(bilstm_f1, 4))

BiLSTM Accuracy: 0.9707
BiLSTM F1 (weighted): 0.9707


## 7) Results Summary

Side-by-side comparison of:
- Simple Embedding Baseline (orderless)
- BiLSTM (sequence-aware)

In [27]:
summary = pd.DataFrame([
    {"model": "Simple NN (Emb + Pool)", "accuracy": simple_acc, "f1_weighted": simple_f1},
    {"model": "BiLSTM (Emb + BiLSTM)", "accuracy": bilstm_acc, "f1_weighted": bilstm_f1},
]).sort_values("f1_weighted", ascending=False)

summary

Unnamed: 0,model,accuracy,f1_weighted
1,BiLSTM (Emb + BiLSTM),0.970722,0.970723
0,Simple NN (Emb + Pool),0.938516,0.938521


## 8) Notes for Presentation

- Baseline pools embeddings (ignores word order).
- BiLSTM reads text in both directions, capturing context/phrasing.
- Report Weighted F1 as the main metric.
- If BiLSTM improves, it suggests sequence/context matters.
- If not, classic TF-IDF + linear models may still be best for this dataset.

In [24]:
test_text_np = data_out[TEXT_COL].astype(str).to_numpy()

test_prob = bilstm_model.predict(test_text_np, verbose=0)
test_pred_id = test_prob.argmax(axis=1)
test_pred_label = pd.Series(test_pred_id).map(id_to_label)

out_path = "../outputs/pred_bilstm.csv"
pd.DataFrame({"prediction": test_pred_label}).to_csv(out_path, index=False)
print("Saved:", out_path)

Saved: ../outputs/pred_bilstm.csv


## Metrics dump

### A) Simple NN baseline

In [25]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

simple_precision = precision_score(y_val_label, val_pred_label_simple, average="weighted", zero_division=0)
simple_recall    = recall_score(y_val_label, val_pred_label_simple, average="weighted", zero_division=0)

# Pull final val_loss from history (works even with early stopping)
simple_val_loss = float(history_simple.history["val_loss"][-1])

print("=== Simple NN (Emb + Pool) ===")
print("Accuracy:", round(simple_acc, 4))
print("Precision (weighted):", round(simple_precision, 4))
print("Recall (weighted):", round(simple_recall, 4))
print("F1 (weighted):", round(simple_f1, 4))
print("Val loss:", round(simple_val_loss, 4))
print("Epochs run:", len(history_simple.history["loss"]))
print("Batch size:", 64)
print("Early stopping:", "yes (patience=2)")
print("Optimizer:", "Adam (default lr)")

=== Simple NN (Emb + Pool) ===
Accuracy: 0.9385
Precision (weighted): 0.9398
Recall (weighted): 0.9385
F1 (weighted): 0.9385
Val loss: 0.3373
Epochs run: 8
Batch size: 64
Early stopping: yes (patience=2)
Optimizer: Adam (default lr)


### B) BiLSTM model

In [26]:
bilstm_precision = precision_score(y_val_label, val_pred_label_bilstm, average="weighted", zero_division=0)
bilstm_recall    = recall_score(y_val_label, val_pred_label_bilstm, average="weighted", zero_division=0)

bilstm_val_loss = float(history_bilstm.history["val_loss"][-1])

print("\n=== BiLSTM (Emb + BiLSTM) ===")
print("Accuracy:", round(bilstm_acc, 4))
print("Precision (weighted):", round(bilstm_precision, 4))
print("Recall (weighted):", round(bilstm_recall, 4))
print("F1 (weighted):", round(bilstm_f1, 4))
print("Val loss:", round(bilstm_val_loss, 4))
print("Epochs run:", len(history_bilstm.history["loss"]))
print("Batch size:", 64)
print("Early stopping:", "yes (patience=2)")
print("Optimizer:", "Adam (default lr)")


=== BiLSTM (Emb + BiLSTM) ===
Accuracy: 0.9707
Precision (weighted): 0.9707
Recall (weighted): 0.9707
F1 (weighted): 0.9707
Val loss: 0.1194
Epochs run: 3
Batch size: 64
Early stopping: yes (patience=2)
Optimizer: Adam (default lr)
