# Fake News Detection - Alternative Training Approaches

This notebook demonstrates two ways to finetune a lightweight transformer model for fake news detection:
1. **Option A**: Using Hugging Face's Trainer API (requires installing accelerate)
2. **Option B**: Using a manual PyTorch training loop (no accelerate dependency)

Both approaches will produce similar results, but the manual loop gives you more control and fewer dependencies.

## 1. Install Required Packages

In [15]:
# Base dependencies for both approaches
!pip install torch transformers datasets scikit-learn pandas numpy matplotlib seaborn psutil tqdm

# If you want to use Option A (Trainer API), also install accelerate:
# Uncomment the line below to install it
!pip install accelerate

Collecting accelerate
  Using cached accelerate-1.6.0-py3-none-any.whl.metadata (19 kB)
Using cached accelerate-1.6.0-py3-none-any.whl (354 kB)
Installing collected packages: accelerate
Successfully installed accelerate-1.6.0


## 2. Import Libraries

In [16]:
import os
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import time
import json
import re
import psutil
from tqdm.auto import tqdm

# Set seed for reproducibility
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)
    
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch version: 2.6.0
CUDA available: False


## 3. Load and Explore the FakeNewsNet Dataset (CSV Format)

In [17]:
# Load the dataset
try:
    fake_news = pd.read_csv("./fake-news-net/Fake.csv")
    real_news = pd.read_csv("./fake-news-net/True.csv")
    
    print(f"Fake news dataset shape: {fake_news.shape}")
    print(f"Real news dataset shape: {real_news.shape}")
except Exception as e:
    print(f"Error loading dataset: {e}")
    print("Please ensure the CSV files are in the correct location.")

Fake news dataset shape: (23481, 4)
Real news dataset shape: (21417, 4)


In [18]:
# Explore the datasets
print("Fake news dataset columns:")
print(fake_news.columns.tolist())

print("\nReal news dataset columns:")
print(real_news.columns.tolist())

# Display a few examples from each dataset
print("Sample from fake news dataset:")
display(fake_news.head(2))

print("\nSample from real news dataset:")
display(real_news.head(2))

Fake news dataset columns:
['title', 'text', 'subject', 'date']

Real news dataset columns:
['title', 'text', 'subject', 'date']
Sample from fake news dataset:


Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"



Sample from real news dataset:


Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"


In [19]:
# Prepare combined dataset with labels
# Add a label column (1 for fake, 0 for real)
fake_news['label'] = 1
real_news['label'] = 0

# Combine the datasets
df = pd.concat([fake_news, real_news], ignore_index=True)

# Shuffle the data
df = df.sample(frac=1, random_state=SEED).reset_index(drop=True)

print(f"Combined dataset shape: {df.shape}")
print("\nLabel distribution:")
print(df['label'].value_counts())

Combined dataset shape: (44898, 5)

Label distribution:
label
1    23481
0    21417
Name: count, dtype: int64


## 4. Data Preprocessing

In [20]:
# Inspect dataset columns to find title and text columns
for col in df.columns:
    if col != 'label':
        print(f"Column: {col}")
        print(f"Example: {df[col].iloc[0][:100]}...")
        print(f"Average length: {df[col].str.len().mean():.2f} characters")
        print("-" * 50)

Column: title
Example: Ben Stein Calls Out 9th Circuit Court: Committed a ‘Coup d’état’ Against the Constitution...
Average length: 80.11 characters
--------------------------------------------------
Column: text
Example: 21st Century Wire says Ben Stein, reputable professor from, Pepperdine University (also of some Holl...
Average length: 2469.11 characters
--------------------------------------------------
Column: subject
Example: US_News...
Average length: 8.80 characters
--------------------------------------------------
Column: date
Example: February 13, 2017...
Average length: 14.83 characters
--------------------------------------------------


In [21]:
# Basic text preprocessing function
def preprocess_text(text):
    """Clean and normalize text data"""
    if not isinstance(text, str):
        return ""
    
    # Convert to lowercase
    text = text.lower()
    
    # Remove URLs
    text = re.sub(r'https?://\S+|www\.\S+', '', text)
    
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

# Based on the dataset inspection, combine title and text
# Adjust these column names if needed according to your dataset structure
title_col = 'title' if 'title' in df.columns else None
text_col = 'text' if 'text' in df.columns else None

# Combine and preprocess
if title_col and text_col:
    df['combined_text'] = df[title_col].fillna('') + ' ' + df[text_col].fillna('')
elif title_col:
    df['combined_text'] = df[title_col].fillna('')
elif text_col:
    df['combined_text'] = df[text_col].fillna('')
else:
    # If column names are different, choose the appropriate columns
    # This is a fallback assuming the first non-label column is the text
    text_columns = [col for col in df.columns if col != 'label']
    if text_columns:
        df['combined_text'] = df[text_columns[0]].fillna('')
    else:
        raise ValueError("No text columns found in the dataset")

# Apply preprocessing
df['processed_text'] = df['combined_text'].apply(preprocess_text)

# Check for empty texts after preprocessing
empty_texts = df['processed_text'].apply(lambda x: len(x.strip()) == 0).sum()
print(f"Number of empty texts after preprocessing: {empty_texts}")

# Remove empty texts if any
if empty_texts > 0:
    df = df[df['processed_text'].apply(lambda x: len(x.strip()) > 0)].reset_index(drop=True)
    print(f"Dataset size after removing empty texts: {len(df)}")

# Display a sample preprocessed text
print("\nSample processed text:")
print(df['processed_text'].iloc[0][:300], "...")

Number of empty texts after preprocessing: 9
Dataset size after removing empty texts: 44889

Sample processed text:
ben stein calls out 9th circuit court: committed a ‘coup d’état’ against the constitution 21st century wire says ben stein, reputable professor from, pepperdine university (also of some hollywood fame appearing in tv shows and films such as ferris bueller s day off) made some provocative statements  ...


## 5. Split Data into Train, Validation, and Test Sets

In [22]:
# Split data into train, validation, and test sets
train_val_df, test_df = train_test_split(
    df, test_size=0.2, random_state=SEED, stratify=df['label']
)

# Then split train+val into train and validation
train_df, val_df = train_test_split(
    train_val_df, test_size=0.15, random_state=SEED, stratify=train_val_df['label']
)

print(f"Training set size: {len(train_df)}")
print(f"Validation set size: {len(val_df)}")
print(f"Test set size: {len(test_df)}")

Training set size: 30524
Validation set size: 5387
Test set size: 8978


## 6. Create PyTorch Dataset

In [23]:
# Model configuration
MODEL_NAME = "distilbert-base-uncased"
MAX_LENGTH = 512
BATCH_SIZE = 8

In [24]:
class FakeNewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=512):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        
        # Tokenize text
        encoding = self.tokenizer(
            text,
            truncation=True,
            max_length=self.max_length,
            padding='max_length',
            return_tensors='pt'
        )
        
        # Convert to tensors and remove batch dimension the tokenizer adds
        input_ids = encoding['input_ids'].squeeze()
        attention_mask = encoding['attention_mask'].squeeze()
        
        return {
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'labels': torch.tensor(label, dtype=torch.long)
        }

In [25]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Create datasets
train_dataset = FakeNewsDataset(
    train_df['processed_text'].tolist(),
    train_df['label'].tolist(),
    tokenizer,
    max_length=MAX_LENGTH
)

val_dataset = FakeNewsDataset(
    val_df['processed_text'].tolist(),
    val_df['label'].tolist(),
    tokenizer,
    max_length=MAX_LENGTH
)

test_dataset = FakeNewsDataset(
    test_df['processed_text'].tolist(),
    test_df['label'].tolist(),
    tokenizer,
    max_length=MAX_LENGTH
)

# Create dataloaders
train_dataloader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
)

val_dataloader = DataLoader(
    val_dataset,
    batch_size=BATCH_SIZE,
)

test_dataloader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
)

In [26]:
# Load model for binary classification
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=2
)

# Print model summary
print(f"Model: {MODEL_NAME}")
print(f"Number of parameters: {sum(p.numel() for p in model.parameters())}")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model: distilbert-base-uncased
Number of parameters: 66955010


## OPTION A: Train with Hugging Face Trainer API

**Note**: This requires the `accelerate` library. If you encounter errors, try OPTION B.

In [27]:
# Only run this cell if you have accelerate installed
try:
    from transformers import TrainingArguments, Trainer, EvalPrediction
    
    def compute_metrics(pred):
        """
        Compute metrics for evaluation.
        """
        labels = pred.label_ids
        preds = pred.predictions.argmax(-1)
        precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
        acc = accuracy_score(labels, preds)
        return {
            'accuracy': acc,
            'f1': f1,
            'precision': precision,
            'recall': recall
        }
    
    # Define training arguments
    training_args = TrainingArguments(
        output_dir='./results_distilbert',
        num_train_epochs=3,
        per_device_train_batch_size=BATCH_SIZE,
        per_device_eval_batch_size=BATCH_SIZE,
        warmup_steps=500,
        weight_decay=0.01,
        logging_dir='./logs_distilbert',
        logging_steps=100,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1",
        greater_is_better=True,
    )
    
    # Create trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics,
    )
    
    # Train the model
    print("Starting model training with Trainer API...")
    train_start = time.time()
    trainer.train()
    train_end = time.time()
    train_time = train_end - train_start
    print(f"Training completed in {train_time:.2f} seconds ({train_time/60:.2f} minutes)")
    
    # Evaluate on test set
    print("Evaluating model on test set...")
    results = trainer.evaluate(test_dataset)
    print("Test results:")
    for key, value in results.items():
        print(f"{key}: {value:.4f}")
        
    # Save model
    MODEL_OUTPUT_DIR = "./models/distilbert_fakenewsnet"
    trainer.save_model(MODEL_OUTPUT_DIR)
    tokenizer.save_pretrained(MODEL_OUTPUT_DIR)
    print(f"Model saved to {MODEL_OUTPUT_DIR}")
    
    # Success flag for Option A
    option_a_success = True
    
except ImportError as e:
    print(f"Error: {e}")
    print("Please install the accelerate library with: pip install accelerate")
    print("Alternatively, use Option B with manual training loop.")
    option_a_success = False

Error: Using the `Trainer` with `PyTorch` requires `accelerate>=0.26.0`: Please run `pip install transformers[torch]` or `pip install 'accelerate>={ACCELERATE_MIN_VERSION}'`
Please install the accelerate library with: pip install accelerate
Alternatively, use Option B with manual training loop.




## OPTION B: Train with Manual PyTorch Training Loop

This approach doesn't require the `accelerate` library and gives you more control over the training process.

In [28]:
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Move model to device
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=2
)
model.to(device)

Using device: cpu


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


In [29]:
# Define optimizer and scheduler
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5, weight_decay=0.01)
total_steps = len(train_dataloader) * 3  # 3 epochs
warmup_steps = 500
scheduler = torch.optim.lr_scheduler.OneCycleLR(
    optimizer, 
    max_lr=5e-5, 
    total_steps=total_steps,
    pct_start=warmup_steps/total_steps
)

In [30]:
# Training and evaluation functions
def train_epoch(model, dataloader, optimizer, scheduler, device):
    model.train()
    total_loss = 0
    progress_bar = tqdm(dataloader, desc="Training")
    
    for batch in progress_bar:
        # Move batch to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        
        # Zero gradients
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        
        # Backward pass
        loss.backward()
        
        # Update parameters
        optimizer.step()
        scheduler.step()
        
        total_loss += loss.item()
        progress_bar.set_postfix({"loss": f"{loss.item():.4f}"})
    
    return total_loss / len(dataloader)

def evaluate(model, dataloader, device):
    model.eval()
    total_loss = 0
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for batch in tqdm(dataloader, desc="Evaluating"):
            # Move batch to device
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)
            
            # Forward pass
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            logits = outputs.logits
            
            total_loss += loss.item()
            
            # Get predictions
            preds = torch.argmax(logits, dim=1).cpu().numpy()
            labs = labels.cpu().numpy()
            
            all_preds.extend(preds)
            all_labels.extend(labs)
    
    # Calculate metrics
    accuracy = accuracy_score(all_labels, all_preds)
    precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_preds, average='binary')
    
    return {
        'loss': total_loss / len(dataloader),
        'accuracy': accuracy,
        'f1': f1,
        'precision': precision,
        'recall': recall,
        'predictions': all_preds,
        'labels': all_labels
    }

In [None]:
# Training loop
num_epochs = 3
best_val_f1 = 0
train_losses = []
val_metrics = []

# Skip training if Option A was successful
if 'option_a_success' in locals() and option_a_success:
    print("Skipping manual training loop since Option A (Trainer API) was successful.")
else:
    print("Starting model training with manual training loop...")
    train_start = time.time()
    
    for epoch in range(num_epochs):
        print(f"\nEpoch {epoch+1}/{num_epochs}")
        
        # Train
        train_loss = train_epoch(model, train_dataloader, optimizer, scheduler, device)
        train_losses.append(train_loss)
        print(f"Train loss: {train_loss:.4f}")
        
        # Evaluate
        val_results = evaluate(model, val_dataloader, device)
        val_metrics.append(val_results)
        print(f"Validation loss: {val_results['loss']:.4f}")
        print(f"Validation accuracy: {val_results['accuracy']:.4f}")
        print(f"Validation F1: {val_results['f1']:.4f}")
        
        # Save best model
        if val_results['f1'] > best_val_f1:
            best_val_f1 = val_results['f1']
            MODEL_OUTPUT_DIR = "./models/distilbert_fakenewsnet_manual"
            os.makedirs(MODEL_OUTPUT_DIR, exist_ok=True)
            torch.save(model.state_dict(), os.path.join(MODEL_OUTPUT_DIR, "pytorch_model.bin"))
            tokenizer.save_pretrained(MODEL_OUTPUT_DIR)
            print(f"Saved new best model with F1: {best_val_f1:.4f}")
    
    train_end = time.time()
    train_time = train_end - train_start
    print(f"Training completed in {train_time:.2f} seconds ({train_time/60:.2f} minutes)")
    
    # Load the best model
    model.load_state_dict(torch.load(os.path.join(MODEL_OUTPUT_DIR, "pytorch_model.bin")))
    
    # Evaluate on test set
    print("\nEvaluating model on test set...")
    test_results = evaluate(model, test_dataloader, device)
    print("Test results:")
    for key, value in test_results.items():
        if key not in ['predictions', 'labels']:
            print(f"{key}: {value:.4f}")

Starting model training with manual training loop...

Epoch 1/3


Training:   0%|          | 0/3816 [00:00<?, ?it/s]

## 7. Evaluation Results and Visualization

In [None]:
# Get test results and confusion matrix
# Try to use results from Option A if available, otherwise use Option B results
if 'option_a_success' in locals() and option_a_success and 'results' in locals():
    # Option A results (from Trainer)
    test_pred_output = trainer.predict(test_dataset)
    test_preds = test_pred_output.predictions.argmax(-1)
    test_labels = test_pred_output.label_ids
    test_metrics = {}
    for key, value in results.items():
        if key.startswith('eval_'):
            test_metrics[key.replace('eval_', '')] = value
        else:
            test_metrics[key] = value
else:
    # Option B results (from manual loop)
    test_preds = test_results['predictions']
    test_labels = test_results['labels']
    test_metrics = {k: v for k, v in test_results.items() if k not in ['predictions', 'labels']}

# Create confusion matrix
cm = confusion_matrix(test_labels, test_preds)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Real', 'Fake'], yticklabels=['Real', 'Fake'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

# Classification report
print("Classification Report:")
print(classification_report(test_labels, test_preds, target_names=['Real', 'Fake']))

## 8. Memory Usage and Performance Analysis

In [None]:
# Helper function to measure memory usage
def get_memory_usage():
    """Get current memory usage in MB"""
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss / (1024 * 1024)

# Load model for inference
# Choose the model path based on which training option succeeded
if 'option_a_success' in locals() and option_a_success:
    MODEL_PATH = "./models/distilbert_fakenewsnet"
else:
    MODEL_PATH = "./models/distilbert_fakenewsnet_manual"

# Create pipeline for inference
from transformers import pipeline
model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
classifier = pipeline('text-classification', model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)

# Memory before
mem_before = get_memory_usage()

# Measure inference time on sample texts
sample_texts = test_df['processed_text'].head(50).tolist()
start_time = time.time()
for text in sample_texts:
    _ = classifier(text[:512])
end_time = time.time()

# Memory after
mem_after = get_memory_usage()
mem_used = mem_after - mem_before

avg_inference_time = (end_time - start_time) / len(sample_texts)

print(f"Average inference time: {avg_inference_time:.4f} seconds per sample")
print(f"Memory usage during inference: {mem_used:.2f} MB")

# Add these to a metrics dictionary
model_metrics = {
    "model_name": "DistilBERT",
    "accuracy": test_metrics.get('accuracy'),
    "f1_score": test_metrics.get('f1'),
    "precision": test_metrics.get('precision'),
    "recall": test_metrics.get('recall'),
    "avg_processing_time": avg_inference_time,
    "avg_memory_usage": mem_used,
    "parameter_count": sum(p.numel() for p in model.parameters())
}

# Save metrics
METRICS_OUTPUT_PATH = "./models/distilbert_fakenewsnet_metrics.json"
with open(METRICS_OUTPUT_PATH, 'w') as f:
    json.dump(model_metrics, f, indent=4)
print(f"Model metrics saved to {METRICS_OUTPUT_PATH}")

## 9. Test the Model with Examples

In [None]:
# Test the model with a few examples
examples = [
    "Scientists discover breakthrough treatment for cancer that pharmaceutical companies don't want you to know about.",
    "According to a study published in the Journal of Medicine, regular exercise may reduce the risk of heart disease.",
    "Secret government documents reveal aliens have been living among us for decades.",
    "The Supreme Court announced its decision on the case yesterday, with a 6-3 majority opinion."
]

for example in examples:
    result = classifier(example)
    label = result[0]['label']
    score = result[0]['score']
    
    # Convert label index to text
    label_text = "Fake" if "LABEL_1" in label else "Real"
    
    print(f"Text: {example}")
    print(f"Prediction: {label_text} (confidence: {score:.4f})")
    print("---")

## 10. Code for Django Integration

In [None]:
# Code to integrate with Django (for services.py)
def analyze_with_distilbert(text, model_dir="./models/distilbert_fakenewsnet"):
    """
    Analyze text using the trained DistilBERT model.
    
    Args:
        text: Text to analyze
        model_dir: Path to the saved model directory
        
    Returns:
        dict: Detection results
    """
    # Import the required libraries (inside the function to avoid loading at startup)
    import time
    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
    
    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForSequenceClassification.from_pretrained(model_dir)
    
    # Determine device (CPU or GPU)
    device = 0 if torch.cuda.is_available() else -1
    
    # Create pipeline
    classifier = pipeline('text-classification', model=model, tokenizer=tokenizer, device=device)
    
    # Measure performance
    start_time = time.time()
    result = classifier(text[:512])[0]
    processing_time = time.time() - start_time
    
    # Map the result
    label = result['label']
    score = result['score']
    
    # In this model, LABEL_0 = real, LABEL_1 = fake
    if "LABEL_0" in label:
        credibility_score = score
        category = "credible" if score > 0.7 else "mixed"
    else:
        credibility_score = 1 - score
        category = "fake" if score > 0.7 else "mixed"
    
    return {
        "credibility_score": credibility_score,
        "category": category,
        "confidence": score,
        "model_name": "DistilBERT",
        "processing_time": processing_time
    }

## 11. Integrating with Django

To integrate this model with your Django application:

1. Copy the trained model files to a directory accessible by your Django app
2. Add the `analyze_with_distilbert` function to your `services.py` file
3. Update your Django view to call this function:

```python
# In views.py
from .services import analyze_with_distilbert

def analyze_text_view(request):
    """View for analyzing custom text with the model."""
    results = None

    if request.method == 'POST':
        text = request.POST.get('text', '')

        if text:
            # Path to your model directory
            model_dir = "path/to/your/models/distilbert_fakenewsnet"
            
            # Analyze the text
            results = analyze_with_distilbert(text, model_dir)

    return render(request, 'news/analyze_text.html', {'results': results})
```

4. Update your template to display the results

For production use, consider these optimizations:
- Load the model once at startup rather than for every request
- Use a worker process or Celery for asynchronous processing
- Consider a smaller model for faster inference if needed