# DistilBERT Evaluation on FakeNewsNet Dataset

In this notebook, I'll evaluate my fine-tuned DistilBERT model on the FakeNewsNet dataset. My goal is to understand how well the model performs on this new dataset and analyze its resource consumption, especially for CPU-based edge deployment on my laptop.

## 1. Setting Up My Environment

First, I'll import all necessary libraries and set up utility functions to monitor resource usage.

In [None]:
# Import necessary libraries
import os
import time
import numpy as np
import pandas as pd
import torch
import psutil
import gc
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
from datasets import Dataset as HFDataset
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Set device - using CPU for edge device testing
device = torch.device("cpu")
print(f"Using device: {device}")

In [None]:
# Function to get current memory usage
def get_memory_usage():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / 1024 / 1024  # Convert to MB

## 2. Loading and Preparing FakeNewsNet Data

Now I'll load the FakeNewsNet dataset. I'm keeping this simple by only using the essential columns: title, text, and the label. I'll combine the title and text just like I did when training on the ISOT dataset to maintain consistency.

In [None]:
# Check memory usage before loading dataset
print(f"Memory usage before loading dataset: {get_memory_usage():.2f} MB")

In [None]:
# Load the FakeNewsNet dataset (update path as needed)
fakenewsnet_df = pd.read_csv('path_to_fakenewsnet.csv')

In [None]:
# Keep only essential columns
fakenewsnet_clean = fakenewsnet_df[['title', 'text', 'label']]

In [None]:
# Combine title and text
fakenewsnet_clean['text'] = fakenewsnet_clean['title'] + " " + fakenewsnet_clean['text'].fillna('')

# Handle missing values and drop the now redundant title column
fakenewsnet_clean = fakenewsnet_clean.dropna(subset=['text'])
fakenewsnet_clean = fakenewsnet_clean[['text', 'label']]

In [None]:
# Convert to HuggingFace dataset format
fakenewsnet_dataset = HFDataset.from_pandas(fakenewsnet_clean)

print(f"Dataset prepared with {len(fakenewsnet_dataset)} examples")
print(f"Memory usage after loading dataset: {get_memory_usage():.2f} MB")

## 3. Loading My Pre-trained Model

I'll now load the DistilBERT model that I previously fine-tuned on the ISOT dataset. For edge deployment, I'm particularly interested in the model's loading time and memory footprint on CPU.

In [None]:
# Load the pre-trained DistilBERT model
print("\nLoading model...")
model_path = "../ml_models/distilbert-fake-news-detector"

In [None]:
start_time = time.time()
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertForSequenceClassification.from_pretrained(model_path)
model.to(device)  # This will be CPU
load_time = time.time() - start_time

In [None]:
print(f"Model loaded in {load_time:.2f} seconds")
print(f"Memory usage after loading model: {get_memory_usage():.2f} MB")

## 4. Tokenizing the Dataset

Before I can run the model on my data, I need to tokenize it using the same tokenizer that was used during training. This step converts the text into the numerical format that the model expects.

In [None]:
# Tokenize the data
print("\nTokenizing dataset...")

tokenize_start_time = time.time()

In [None]:
def tokenize_function(examples):
    return tokenizer(
        examples['text'],
        padding='max_length',
        truncation=True,
        max_length=512,
        return_tensors=None
    )

In [None]:
# Apply tokenization
tokenized_dataset = fakenewsnet_dataset.map(tokenize_function, batched=True)
tokenized_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

In [None]:
tokenize_time = time.time() - tokenize_start_time
print(f"Dataset tokenized in {tokenize_time:.2f} seconds")
print(f"Memory usage after tokenization: {get_memory_usage():.2f} MB")

## 5. Running Model Evaluation

Now comes the main part - evaluating my model's performance on the FakeNewsNet dataset. Since I'm targeting edge devices, I'll pay special attention to inference speed and memory usage on CPU.

In [None]:
# Evaluate model performance
print("\nEvaluating model performance...")

# Set batch size for evaluation - smaller for CPU
batch_size = 16  # Using smaller batch size for CPU
all_preds = []
all_labels = []
total_inference_time = 0
sample_count = 0

In [None]:
# Create DataLoader for batched processing
from torch.utils.data import DataLoader

eval_dataloader = DataLoader(
    tokenized_dataset, 
    batch_size=batch_size, 
    shuffle=False
)

In [None]:
# Track memory and time metrics
inference_times = []
memory_usages = []

In [None]:
model.eval()
with torch.no_grad():
    for batch in eval_dataloader:
        batch_size = len(batch['input_ids'])
        sample_count += batch_size
        
        # Move batch to device (CPU)
        batch = {k: v.to(device) for k, v in batch.items() if k != 'label'}
        labels = batch.pop('label').to(device) if 'label' in batch else None
        
        # Track memory before inference
        memory_usages.append(get_memory_usage())
        
        # Measure inference time
        start_time = time.time()
        outputs = model(**batch)
        batch_inference_time = time.time() - start_time
        inference_times.append(batch_inference_time)
        
        total_inference_time += batch_inference_time
        
        # Get predictions
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        pred_labels = predictions.argmax(dim=1).cpu().numpy()
        
        all_preds.extend(pred_labels)
        if labels is not None:
            all_labels.extend(labels.cpu().numpy())

In [None]:
# Calculate metrics
accuracy = accuracy_score(all_labels, all_preds)
precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_preds, average='weighted')

In [None]:
# Create confusion matrix
cm = np.zeros((2, 2), dtype=int)
for true_label, pred_label in zip(all_labels, all_preds):
    cm[true_label, pred_label] += 1

In [None]:
print("\nEvaluation Results:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

print("\nConfusion Matrix:")
print(cm)

## 6. Analyzing Resource Consumption

Since I'm targeting edge devices, I'll focus on CPU-specific metrics like memory usage and inference time to determine if this model is suitable for edge deployment.

In [None]:
# Resource consumption analysis
print("\nResource Consumption Analysis for Edge Deployment:")
print(f"Total evaluation time: {total_inference_time:.2f} seconds")
print(f"Average inference time per batch: {np.mean(inference_times):.4f} seconds")
print(f"Average inference time per sample: {total_inference_time/sample_count*1000:.2f} ms")
print(f"Peak memory usage: {max(memory_usages):.2f} MB")

In [None]:
# Plot resource usage
plt.figure(figsize=(12, 8))

plt.subplot(2, 1, 1)
plt.plot(inference_times)
plt.title('Inference Time per Batch (CPU)')
plt.xlabel('Batch')
plt.ylabel('Time (seconds)')

In [None]:
plt.subplot(2, 1, 2)
plt.plot(memory_usages, label='System Memory')
plt.title('Memory Usage During Evaluation (CPU)')
plt.xlabel('Batch')
plt.ylabel('Memory (MB)')
plt.legend()

In [None]:
plt.tight_layout()
plt.savefig('distilbert_resource_usage_cpu.png')
plt.show()

## 7. Detailed Classification Analysis

Finally, I'll generate a detailed classification report and visualize the confusion matrix to better understand where my model performs well and where it struggles on this new dataset.

In [None]:
# Generate classification report
print("\nDetailed Classification Report:")
print(classification_report(all_labels, all_preds, target_names=['Fake News', 'Real News']))

In [None]:
# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Fake', 'Real'], yticklabels=['Fake', 'Real'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.savefig('distilbert_confusion_matrix.png')
plt.show()

In [None]:
# Free up memory
del model
gc.collect()

## Conclusion

In this notebook, I've evaluated my DistilBERT model on the FakeNewsNet dataset specifically focusing on CPU performance for edge deployment. 

The metrics I've gathered are crucial for determining if this model could run effectively on resource-constrained edge devices like my laptop. For edge deployment, I'm particularly interested in:

1. Memory footprint - How much RAM does the model require?
2. Inference speed - Is it fast enough for real-time applications?
3. Model loading time - Is the startup time acceptable for edge applications?

Based on these results, I can determine if further optimization techniques like quantization, pruning, or knowledge distillation might be necessary to make the model more suitable for edge deployment. For truly resource-constrained environments, I might even consider alternatives like TinyBERT or mobile-optimized models.

This cross-dataset evaluation also helps me understand how well my model generalizes to new sources of fake news beyond what it was trained on, which is essential for real-world applications.