## Chapter 8 – AI at Scale
This notebook walks through the end-to-end workflow for training, evaluating, and scaling a fine-tuned `T5 model` on a hybrid `LIAR` dataset. The dataset combines 2,500 fact-checked political claims from the original LIAR benchmark with 225 additional statements generated from the Open Source AI book. These synthetic entries mimic the tone and compression style of real claims, but focus on AI-related topics like open-source tooling, model capabilities, and community beliefs. The notebook includes training, baseline evaluation, scaling experiments, and model publishing. Each listing spans multiple cells grouped around specific tasks like logging, benchmarking, or inference testing.

**Note:**
This notebook uses the Hugging Face Hub to download datasets and upload model checkpoints. To access certain datasets (like liar) and to publish your model to the Hub, you'll need to provide a Hugging Face access token. Colab will prompt you to enter your HF_TOKEN the first time it's needed, and securely store it for the session. You can create or manage your token at huggingface.co/settings/tokens.


### Listing 8.1 – Fine-Tuning T5 on the Merged LIAR Dataset

This listing walks through the full workflow for preparing, training, and testing a T5 model using a summarization-style format. It begins with a helper cell that loads and merges the datasets, defines utility functions, and runs a quick baseline using the untrained model. From there, it moves into fine-tuning on the combined dataset and finishes with a few sample predictions to check that everything is working as expected.

*Note:* Be sure to run all the code cells below in order to ensure everything works as expected.


In [None]:
%%capture --no-stderr
!pip install -q datasets

#### Helper Functions for Preprocessing and Tokenization

This cell defines utility functions for formatting the LIAR dataset to work with T5.
The `preprocess()` function wraps political statements into a text-to-text prompt,
and `tokenize()` handles batch-safe tokenization for both inputs and target labels.
These functions are used in the main training flow to prepare the model's data.


In [None]:
# Utility functions for preparing, tokenizing, and testing with T5 on the LIAR + OSAI dataset

# === Constants ===

BASE_URL = "https://opensourceai-book.github.io/code/datasets/"
INFO_FILE = "open_source_ai-liar.csv"
MAX_TOKENS = 128

# Canonical labels used by the model
FACTUALITY_LABELS = [
    "pants-fire", "false", "barely-true",
    "half-true", "mostly-true", "true"
]

# For legacy use (e.g., LIAR numeric labels)
LABEL_MAP_NUMERIC = {str(i): label for i, label in enumerate(FACTUALITY_LABELS)}
LABEL_TO_INDEX = {label: i for i, label in enumerate(FACTUALITY_LABELS)}
VALID_LABELS = set(FACTUALITY_LABELS)

# Tokenize input and target fields for use with T5
def tokenize(batch, tokenizer):
    input_texts = batch["input_text"]
    target_texts = batch["target_text"]

    if len(input_texts) != len(target_texts):
        raise ValueError("Mismatched input and target sizes.")

    model_inputs = tokenizer(
        input_texts,
        padding="max_length",
        truncation=True,
        max_length=MAX_TOKENS
    )
    labels = tokenizer(
        text_target=target_texts,
        padding="max_length",
        truncation=True,
        max_length=16
    )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Run a single prediction on a freeform statement.
# Optionally compare against a known true label if available.
def run_sample_prediction(model, tokenizer, statement, true_label=None):
    input_text = f"summarize: {statement}"

    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        padding="max_length",
        truncation=True,
        max_length=MAX_TOKENS
    )
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    output = model.generate(**inputs, max_new_tokens=8)

    prediction = tokenizer.decode(output[0], skip_special_tokens=True).strip().lower()
    mapped_label = prediction if prediction in VALID_LABELS else f"(unknown: {prediction})"

    print("\n=== Sample Prediction ===")
    print("Statement:", statement)
    print("Prediction:", mapped_label)
    if true_label:
        print("Expected:  ", true_label)

    return {
        "statement": statement,
        "prediction": mapped_label,
        "true_label": true_label
    }

# Load datasets and dataframes
from datasets import load_dataset, Dataset, DatasetDict, concatenate_datasets
import pandas as pd
from collections import Counter

# === Load and initialize LIAR + OSAI datasets ===

osai_df = pd.read_csv(BASE_URL + INFO_FILE)
osai_df["target_text"] = osai_df["label"].astype(str).str.strip().str.lower()
osai_df = osai_df[osai_df["target_text"].isin(VALID_LABELS)]
osai_df["input_text"] = "summarize: " + osai_df["statement"].astype(str)
osai_dataset = Dataset.from_pandas(osai_df[["input_text", "target_text"]])

# Load and map LIAR dataset
liar_raw = load_dataset("liar", trust_remote_code=True)

def format_liar(example):
    return {
        "input_text": f"summarize: {example['statement']}",
        "target_text": LABEL_MAP_NUMERIC.get(str(example["label"]), "unknown")
    }

liar_formatted = liar_raw.map(format_liar, remove_columns=liar_raw["train"].column_names)
liar_train_subset = liar_formatted["train"].select(range(2500))
liar_test_set = liar_formatted["test"].filter(lambda x: x["input_text"] and x["target_text"])

# Convert LIAR to DataFrame
liar_df = pd.DataFrame(liar_train_subset)

# Merge Hugging Face LIAR and local OSAI datasets
merged_train = concatenate_datasets([liar_train_subset, osai_dataset])
merged_train = merged_train.filter(lambda x: x["input_text"] and x["target_text"])

# Store merged datasets
liar_merged_dataset = DatasetDict({
    "train": merged_train,
    "test": liar_test_set
})

# === Dataset summary and preview ===

print("\n=== LIAR + OSAI Merged Dataset Summary ===")
print(f"Train set size: {len(liar_merged_dataset['train'])}")
print(f"Test set size:  {len(liar_merged_dataset['test'])}")

# Label distribution from LIAR + OSAI merged data
label_counts = Counter(liar_df["target_text"].tolist() + osai_df["target_text"].tolist())
print("\nLabel distribution across merged data:")
for label, count in sorted(label_counts.items()):
    print(f"{label:<15} {count}")

# Sample entries from LIAR
print("\nSample LIAR entries (from Hugging Face):")
for i in range(min(3, len(liar_df))):
    print(f"\nLIAR Example {i+1}")
    print("Input: ", liar_df.iloc[i]["input_text"])
    print("Target:", liar_df.iloc[i]["target_text"])

# Sample entries from OSAI
print("\nSample OSAI entries (from local CSV):")
for i in range(min(3, len(osai_df))):
    print(f"\nOSAI Example {i+1}")
    print("Input: ", osai_df.iloc[i]["input_text"])
    print("Target:", osai_df.iloc[i]["target_text"])

#### Establishing a Baseline Before Fine-Tuning

Before training T5 on the merged LIAR dataset, we’ll run a baseline inference using the untrained model. This gives us a reference point to evaluate how much the model improves after fine-tuning. In this experiment, we’ll select a few samples from each dataset from the test and train split and observe how the base model performs out of the box.

In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load base (untrained) model and tokenizer
model_name = "t5-small"
bl_tokenizer = T5Tokenizer.from_pretrained(model_name)
bl_model = T5ForConditionalGeneration.from_pretrained(model_name)

print("=== Baseline Inference Using Untrained T5 ===")

# 3 samples from LIAR (via liar_df)
print("\n--- LIAR Examples (from Hugging Face) ---")
for i in range(min(3, len(liar_df))):
    row = liar_df.iloc[i]
    run_sample_prediction(bl_model, bl_tokenizer, row["input_text"].replace("summarize: ", ""), true_label=row["target_text"])

# 3 samples from OSAI (via osai_df)
print("\n--- OSAI Examples (from local CSV) ---")
for i in range(min(3, len(osai_df))):
    row = osai_df.iloc[i]
    run_sample_prediction(bl_model, bl_tokenizer, row["input_text"].replace("summarize: ", ""), true_label=row["target_text"])


#### Fine-Tuning T5 on the Merged LIAR Dataset

With the baseline results in hand, we’re ready to fine-tune T5 on a combined dataset. This version merges 2,500 samples from the original LIAR benchmark with all entries from our custom CSV of AI-generated statements. The goal is to help the model learn to generate truthfulness labels using T5’s text-to-text format.

We tested different training durations and found that running for 3 to 5 epochs offers a good balance—enough to improve accuracy without overfitting.


In [None]:
# Fine-tune T5 on the merged LIAR dataset (HF + OSAI)

from transformers import (
    T5Tokenizer,
    T5ForConditionalGeneration,
    Trainer,
    TrainingArguments,
    set_seed,
    EvalPrediction
)
import torch

# Set reproducibility
set_seed(42)

# Load tokenizer and model
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name, legacy=True)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Tokenize using updated utility function (now requires tokenizer explicitly)
tokenized = liar_merged_dataset.map(lambda batch: tokenize(batch, tokenizer), batched=True)

# Drop raw columns used for training
tokenized = tokenized.remove_columns(["input_text", "target_text"])

# Prepare training set
train_data = tokenized["train"]

# Define training arguments
args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=4,
    num_train_epochs=5,
    remove_unused_columns=False,
    logging_dir="./logs",
    logging_steps=250,
    save_steps=500,
    report_to="none"
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_data
)

# Train the model
trainer.train()

# Print final loss from the trainer (if stored)
if trainer.state.log_history:
    final_logs = [log for log in trainer.state.log_history if "loss" in log]
    if final_logs:
        print(f"Final training loss: {final_logs[-1]['loss']:.4f}")

# Print model size (parameter count)
param_count = sum(p.numel() for p in model.parameters())
print(f"Model parameters: {param_count:,}")

# Confirm training
print("\nTraining complete.")
print(f"Trained on {len(train_data)} samples.")
print(f"Model checkpoint saved to: {args.output_dir}")


#### Evaluating the Fine-Tuned T5 Model

After training, we evaluate the fine-tuned T5 model on examples from the test
split of the merged LIAR dataset. This helps us compare predictions against
our earlier baseline and observe how the model's responses have improved.


In [None]:
print("===  Inference Using the FINETUNED version of T5 ===")

# 3 samples from LIAR (via liar_df)
print("\n--- LIAR Examples (from Hugging Face) ---")
for i in range(min(3, len(liar_df))):
    row = liar_df.iloc[i]
    run_sample_prediction(model, tokenizer, row["input_text"].replace("summarize: ", ""), true_label=row["target_text"])

# 3 samples from OSAI (via osai_df)
print("\n--- OSAI Examples (from local CSV) ---")
for i in range(min(3, len(osai_df))):
    row = osai_df.iloc[i]
    run_sample_prediction(model, tokenizer, row["input_text"].replace("summarize: ", ""), true_label=row["target_text"])


### Listing 8‑2:  Measuring Inference Time Across Input Lengths

This listing benchmarks how long it takes the fine-tuned T5 model to generate outputs across a range of input lengths. The first cell defines helper functions to create synthetic prompts, run timed inferences on both GPU and CPU, and plot the results. The second cell runs the benchmark using those tools, measuring average inference time across token-length bins. Together, they give us a clear picture of how input size and hardware impact latency—an important factor when thinking about scaling beyond the notebook.


In [None]:
import torch
import time

# Benchmark T5 inference time across increasing input sizes.
# Returns a dictionary of average latency by token length bin.
def benchmark_inference_time(
    model,
    tokenizer,
    bins=None,
    samples_per_bin=5,
    device="cuda" if torch.cuda.is_available() else "cpu"
):
    model.eval()
    model.to(device)

    # Optional warm-up to stabilize performance
    _ = model.generate(
        **tokenizer("warm up", return_tensors="pt").to(device),
        max_new_tokens=16
    )
    if device == "cuda":
        torch.cuda.synchronize()

    # Default token length bins: 50–1049 in steps of 50
    if bins is None:
        bins = list(range(50, 1050, 50))

    timing = {}

    for b in bins:
        label = f"{b}-{b+49}"
        timing[label] = []

        # Dynamically extend max_length to avoid truncating longer bins
        max_length = b + 16

        for i in range(samples_per_bin + 2):  # +2 to absorb caching
            repeated = "The sky is blue. " * (b // 5)
            prompt = f"summarize: {repeated}"

            inputs = tokenizer(
                prompt,
                return_tensors="pt",
                padding="max_length",
                truncation=True,  # Safe truncation if prompt slightly exceeds max_length
                max_length=max_length
            )
            inputs = {k: v.to(device) for k, v in inputs.items()}

            if device == "cuda":
                torch.cuda.synchronize()

            start = time.time()
            _ = model.generate(**inputs, max_new_tokens=16)

            if device == "cuda":
                torch.cuda.synchronize()

            elapsed = time.time() - start

            # Skip warm-up samples
            if i >= 2:
                timing[label].append(elapsed)

    return timing

# Plot average inference time per input length bin.
import matplotlib.pyplot as plt

# Accepts multiple timing dictionaries for comparison.
def plot_inference_times(timing_dicts, labels, title):
    plt.figure(figsize=(8, 5))

    for timing, label in zip(timing_dicts, labels):
        avg_times = [sum(timing[k]) / len(timing[k]) for k in timing]
        keys = list(timing.keys())

        # Print results to console
        print(f"\n{label} Inference Times:")
        for bin_label, time_val in zip(keys, avg_times):
            print(f"  {bin_label}: {time_val:.4f} sec")

        # Plot results
        plt.plot(keys, avg_times, marker="o", label=label)

    plt.title(title)
    plt.xlabel("Token Length (bins)")
    plt.ylabel("Average Inference Time (sec)")
    plt.xticks(rotation=45)
    plt.grid(True)
    plt.legend()
    plt.tight_layout()
    plt.show()

#### Measuring T5 Inference Time by Input Length
This cell runs the benchmark on both GPU and CPU using the same model and tokenizer.
It measures average inference time across a range of input lengths and
plots the results to compare performance between the two devices.

In [None]:
# Benchmark on GPU
gpu_timing = benchmark_inference_time(
    model,
    tokenizer,
    device="cuda"
)

# Benchmark on CPU
cpu_timing = benchmark_inference_time(
    model,
    tokenizer,
    device="cpu"
)

# Plot the results
plot_inference_times(
    [gpu_timing, cpu_timing],
    ["T5-small (GPU)", "T5-small (CPU)"],
    "T5 Inference Time vs Input Length (GPU vs CPU)"
)

### Listing 8‑3: Measuring the Benefit of Batching in T5 Inference

This experiment benchmarks how batching affects inference performance in our fine-tuned T5 model. The listing spans two cells: the first defines a helper function to generate synthetic inputs and time model responses across different batch sizes, while the second runs the benchmark and prints a summary table. For each batch size, the code measures average latency per sample, total throughput in samples per second, and relative speedup compared to batch size 1.

In [None]:
import torch
import time
import random
import matplotlib.pyplot as plt

# Benchmark T5 inference with batching across batch sizes
def benchmark_inference(
    model,
    tokenizer,
    batch_size,
    token_len=512,
    max_tokens=512,
    padding="max_length",
    repeat=5,
    device="cuda" if torch.cuda.is_available() else "cpu"
):
    """Benchmark T5 inference using random prompts at fixed length and batch size."""
    model.eval()
    model.to(device)
    random.seed(42)

    # Generate synthetic inputs of roughly token_len size
    phrases = [
        "The sky is blue", "Water is wet", "Cats chase mice",
        "Birds fly south", "Ice is cold", "Fire is hot",
        "Rain falls down", "Fish swim fast", "Clouds block sun"
    ]
    inputs = []
    for _ in range(batch_size):
        sentence = ". ".join(random.choices(phrases, k=token_len // 10))
        inputs.append(f"summarize: {sentence}")

    # Adjust tokenizer max length to avoid truncation
    max_length = max(token_len + 16, max_tokens)

    # Warm-up run to stabilize GPU/CPU
    enc = tokenizer(
        inputs[:1],
        return_tensors="pt",
        padding=padding,
        truncation=True,
        max_length=max_length
    )
    _ = model.generate(**{k: v.to(device) for k, v in enc.items()})

    # Measure repeated inference times
    elapsed_times = []
    for _ in range(repeat):
        enc = tokenizer(
            inputs,
            return_tensors="pt",
            padding=padding,
            truncation=True,
            max_length=max_length
        )
        enc = {k: v.to(device) for k, v in enc.items()}

        if torch.cuda.is_available():
            torch.cuda.synchronize()
        start = time.time()
        _ = model.generate(**enc, max_new_tokens=16)
        if torch.cuda.is_available():
            torch.cuda.synchronize()

        elapsed_times.append(time.time() - start)

    avg_batch_time = sum(elapsed_times) / repeat
    avg_time_per_sample = avg_batch_time / batch_size
    throughput = batch_size / avg_batch_time

    return {
        "batch_size": batch_size,
        "token_len": token_len,
        "time_per_sample": avg_time_per_sample,
        "batch_time": avg_batch_time,
        "throughput": throughput
    }

# Plot and print results for batching benchmarks
def plot_and_print_batch_results(results, title="Batching Impact: Latency vs Throughput"):
    batch_labels = [str(r["batch_size"]) for r in results]
    latencies = [r["time_per_sample"] for r in results]
    throughputs = [r["throughput"] for r in results]

    bar_color = "#0074D9"
    line_color = "#FF4136"
    grid_color = "#AAAAAA"

    fig, ax1 = plt.subplots(figsize=(9, 5))

    ax1.bar(batch_labels, throughputs, color=bar_color, edgecolor="black", label="Throughput")
    ax1.set_xlabel("Batch Size")
    ax1.set_ylabel("Throughput (samples/sec)", color=bar_color)
    ax1.tick_params(axis="y", labelcolor=bar_color)
    ax1.set_ylim(0, max(throughputs) * 1.2)

    ax2 = ax1.twinx()
    ax2.plot(batch_labels, latencies, color=line_color, marker="o", linewidth=2, label="Latency")
    ax2.set_ylabel("Avg Inference Time per Sample (sec)", color=line_color)
    ax2.tick_params(axis="y", labelcolor=line_color)
    ax2.set_ylim(0, max(latencies) * 1.2)

    ax1.grid(True, axis="y", linestyle="--", color=grid_color)
    plt.title(title)
    fig.tight_layout()
    plt.show()

    print(f"{'Batch':<8}{'Latency (s)':<15}{'Throughput (samples/s)':<25}")
    for r in results:
        print(f"{r['batch_size']:<8}{r['time_per_sample']:<15.4f}{r['throughput']:<25.2f}")


#### Run Benchmark and Plot the Results

In [None]:
batch_sizes = [1, 2, 4, 8, 16, 32, 64, 128, 256]
results = [
    benchmark_inference(model, tokenizer, bs)
    for bs in batch_sizes
]
plot_and_print_batch_results(results)


### Listing 8‑4: Saving and Logging a Versioned Model Run

This example saves a trained T5 model and tokenizer to a versioned checkpoint
directory, then logs a structured record of an inference run to a local file.
The log includes metadata like timestamp, input length, labels, and inference time.


In [None]:
import os
import json
from datetime import datetime
import random

def save_model_and_tokenizer(model, tokenizer, checkpoint_dir):
    """Save model and tokenizer with optional tokenizer patch."""
    os.makedirs(checkpoint_dir, exist_ok=True)
    model.save_pretrained(checkpoint_dir, safe_serialization=True)
    tokenizer.save_pretrained(checkpoint_dir)

    # Patch tokenizer_config with model type if needed
    config_path = f"{checkpoint_dir}/tokenizer_config.json"
    if os.path.exists(config_path):
        with open(config_path, "r+") as f:
            config = json.load(f)
            config["model_type"] = "t5"
            f.seek(0)
            json.dump(config, f, indent=2)
            f.truncate()

def summarize_batch_metrics(results):
    """Extract key performance scaling info from batch test results."""
    best = max(results, key=lambda r: r["throughput"])
    return {
        "tested_batch_sizes": [r["batch_size"] for r in results],
        "throughput_per_batch": [round(r["throughput"], 2) for r in results],
        "latency_per_batch": [
            round(r["time_per_sample"], 4) for r in results
        ],
        "sweet_spot_batch_size": best["batch_size"],
        "sweet_spot_throughput": round(best["throughput"], 2),
        "sweet_spot_latency": round(best["time_per_sample"], 4)
    }

# Utility to run a prediction and time it
def time_prediction(model, tokenizer, text, max_input_len=256):
    import time
    inputs = tokenizer(
        text,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=max_input_len
    )
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    start = time.time()
    output = model.generate(**inputs, max_new_tokens=16)
    duration = time.time() - start

    prediction = tokenizer.decode(output[0], skip_special_tokens=True).strip().lower()
    return prediction, duration

def write_log_entry(log_entry, log_path):
    """Append structured JSONL entry to the given log file."""
    os.makedirs(os.path.dirname(log_path), exist_ok=True)
    with open(log_path, "a") as f:
        f.write(json.dumps(log_entry) + "\n")


In [None]:
from datetime import datetime
import random

# Define model name and checkpoint location
model_name = "open-source-ai-t5-liar-lens"
checkpoint_dir = f"./models/{model_name}"
log_path = f"{checkpoint_dir}/model_log.jsonl"

# Save model and tokenizer to local checkpoint folder
save_model_and_tokenizer(model, tokenizer, checkpoint_dir)

# Generate a synthetic prompt (used for inference benchmarking)
phrases = [
    "The sky is blue", "Water is wet", "Cats chase mice",
    "Birds fly south", "Ice is cold", "Fire is hot",
    "Rain falls down", "Fish swim fast", "Clouds block sun"
]
raw_statement = ". ".join(random.choices(phrases, k=512 // 10))
sample_text = raw_statement
true_label = "unknown (benchmark prompt)"

# Run timed prediction using prompt format used in training
prediction, elapsed_time = time_prediction(
    model, tokenizer, f"summarize: {sample_text}"
)

# Build structured log entry for reproducibility
log_entry = {
    "model_instance": model_name,
    "base_model": "t5-small",
    "dataset": "LIAR",
    "checkpoint": checkpoint_dir,
    "batch_size": 4,
    "epochs": 5,
    "version_datetime_stamp": datetime.utcnow().isoformat() + "Z",
    "inference_sample_index": "synthetic",
    "inference_input_length": len(
        tokenizer.tokenize(f"summarize: {sample_text}")
    ),
    "predicted_label": prediction,
    "true_label": true_label,
    "inference_time_sec": round(elapsed_time, 3),
    "notes": (
        "Benchmark run for summarization-style classification using "
        "fine-tuned T5. Prompt synthesized from randomized short "
        "factual phrases."
    ),
    "batching_scaling_summary": summarize_batch_metrics(results)
}

# Save log to JSONL for later analysis or publishing
write_log_entry(log_entry, log_path)

print(f"Model saved to: {checkpoint_dir}")
print(f"Log saved to:   {log_path}")

### Listing 8-5: Uploading Model and Metadata to Hugging Face
This cell automates the process of publishing a trained model to the Hugging Face Hub. It uses the huggingface_hub API to create the repository (if needed) and upload all model artifacts stored in the checkpoint directory, including weights, tokenizer files, and metadata. Once uploaded, the model is publicly available for others to download, test, or fine-tune.

To run this, make sure you’ve:

- Run the pip install

- Replaced your_huggingface_repo_name_here with your actual username

- Run model.save_pretrained() and tokenizer.save_pretrained() earlier to populate the folder

- This step ensures your work isn’t locked in a local runtime — it’s published, versioned, and ready for reuse.

In [None]:
!pip install -q huggingface_hub

In [None]:
from huggingface_hub import HfApi, upload_folder

# Define repo info
repo_name = "open-source-ai-t5-liar-lens"
checkpoint_dir = "./models/open-source-ai-t5-liar-lens"
user = "your_huggingface_repo_name_here"  # Your Hugging Face username
repo_id = f"{user}/{repo_name}"

# Create the repo if it doesn't already exist
api = HfApi()
api.create_repo(repo_id=repo_id, exist_ok=True)

# Upload model folder with commit message and chunked commits
upload_folder(
    repo_id=repo_id,
    folder_path=checkpoint_dir,
    path_in_repo=".",
    commit_message=(
        "Fine-tuned T5-small model on hybrid LIAR dataset including 225 "
        "AI-generated quotes from the Open Source AI book. Includes benchmark "
        "log showing latency and throughput scaling across batch sizes. "
        "Saved in safetensors format."
    )
)

### Model Inference

In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch

# Load the fine-tuned model and tokenizer from Hugging Face Hub
model_name = "gcuomo/open-source-ai-t5-liar-lens"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Consistent with training
MAX_TOKENS = 128
FACTUALITY_LABELS = {
    "pants-fire", "false", "barely-true",
    "half-true", "mostly-true", "true"
}

# Run a prediction on a statement using the fine-tuned T5 model
def run_prediction(statement):
    """Run a model prediction on a statement and print the result."""
    prompt = f"summarize: {statement}"

    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        padding="max_length",
        truncation=True,
        max_length=MAX_TOKENS
    )
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    output = model.generate(**inputs, max_new_tokens=8)
    raw_pred = tokenizer.decode(output[0], skip_special_tokens=True).strip().lower()

    label = raw_pred if raw_pred in FACTUALITY_LABELS else f"(unknown: {raw_pred})"

    print("Statement:", statement)
    print("Prompt:   ", prompt)
    print("Output:   ", raw_pred)
    print("Label:    ", label)
    print()

# --- Run example predictions ---
print("\n--- Predictions on LIAR-Style Claims ---")

run_prediction("Building a wall on the U.S.-Mexico border will take literally years.")
run_prediction("The U.S. Postal Service delivers mail only one day a week.")
run_prediction("The book 'Open Source AI' explores Hugging Face and T5 models.")
run_prediction("The unemployment rate fell below 4% last quarter.")
run_prediction("Open-source tools can outperform commercial alternatives in some cases.")
run_prediction("The Eiffel Tower is located in Berlin.")
run_prediction("Sniffer believes ONNX simplifies model deployment.")
run_prediction("Python is the fastest programming language available.")


#### Code snippet from README.MD

In [None]:
### Example Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the fine-tuned model and tokenizer
model = T5ForConditionalGeneration.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)
tokenizer = T5Tokenizer.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)

# Prepare input
# statement = "Blockchain guarantees ethical outcomes in all AI systems."
statement = "Python is the fastest programming language available."
prompt = f"summarize: {statement}"
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Generate prediction
output = model.generate(**inputs, max_new_tokens=8)
prediction = tokenizer.decode(output[0], skip_special_tokens=True).strip().lower()

# Print result
print("Predicted label:", prediction)
