# Fine-Tuning with Canary Injection

This notebook fine-tunes the **LLM using the canary-injected dataset** and then analyzes if the model reproduces any injected phrases, indicating potential privacy risks.

Key Steps:  
- Fine-tune the model using **canary-injected datasets**.
- Apply **Differential Privacy (DP-SGD) to reduce leakage**.
- Perform **inference on test data** and check whether canary sequences appear.
- Compare outputs to assess the model’s ability to retain **privacy protections**.

This evaluation provides critical insights into **privacy risks associated with synthetic data generation**.


In [1]:
# RUN_MODE = "test"
RUN_MODE = "main"

In [2]:
from pathlib import Path

TRAIN_JSONL = "data/amazon_train_canary_1.jsonl"
# TEST_JSONL = "data/test.jsonl"
file_paths = [TRAIN_JSONL,
              # TRAIN_JSONL
              ]

for path in file_paths:
    if not Path(path).exists():
        raise FileNotFoundError(f"Error: {path} does not exist.")


In [None]:
# import shutil

# folder_path = "wandb"

# # Delete the folder and all its contents
# shutil.rmtree(folder_path)

# print(f"Deleted folder: {folder_path}")

## Requirements and dependencies


In [3]:
%%capture
!pip install opacus
# !pip install -U bitsandbytes transformers accelerate
!pip install peft

In [4]:
!pip install pynvml

Collecting pynvml
  Downloading pynvml-12.0.0-py3-none-any.whl.metadata (5.4 kB)
Collecting nvidia-ml-py<13.0.0a0,>=12.0.0 (from pynvml)
  Downloading nvidia_ml_py-12.570.86-py3-none-any.whl.metadata (8.7 kB)
Downloading pynvml-12.0.0-py3-none-any.whl (26 kB)
Downloading nvidia_ml_py-12.570.86-py3-none-any.whl (44 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: nvidia-ml-py, pynvml
Successfully installed nvidia-ml-py-12.570.86 pynvml-12.0.0


In [5]:
from random import sample
import numpy as np
print("NumPy version:", np.__version__)  # Should print "1.23.5"


import torch
from torch.amp import autocast, GradScaler  # Import automatic mixed precision tools
from transformers import AutoTokenizer, AutoModelForCausalLM
from opacus import PrivacyEngine
from torch.utils.data import TensorDataset, DataLoader
from torch.optim import AdamW
from peft import get_peft_model, LoraConfig, TaskType

# Set up device - prioritize GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Print GPU info if available
if torch.cuda.is_available():
    print(f"GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"Available GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")

NumPy version: 1.26.4
Using device: cuda
GPU Device: NVIDIA L4
Available GPU memory: 22.17 GB


In [6]:
## Clear GPU cache and storage
torch.cuda.empty_cache()  # Frees unused memory
torch.cuda.ipc_collect()  # Collects shared memory used in multiprocessing

In [7]:
from huggingface_hub import login
from google.colab import userdata

In [8]:
# Retrieve token securely
hf_token = userdata.get("HF_TOKEN")

if hf_token:
    login(token=hf_token)
    print("Logged in successfully!")
else:
    print("Hugging Face token not found. Please set it in Colab.")

Logged in successfully!


## CPU and GPU util functions

In [9]:
import psutil
import torch

try:
    from pynvml import nvmlInit, nvmlDeviceGetHandleByIndex, nvmlDeviceGetMemoryInfo, nvmlDeviceGetUtilizationRates, nvmlSystemGetDriverVersion, nvmlDeviceGetName, nvmlShutdown
    nvmlInit()
    NVML_AVAILABLE = True
except ImportError:
    NVML_AVAILABLE = False

def get_cpu_stats():
    """ Get CPU usage stats """
    cpu_usage = psutil.cpu_percent(interval=1)  # Get CPU usage %
    cpu_freq = psutil.cpu_freq().current if psutil.cpu_freq() else "Unknown"  # CPU Frequency
    num_cores = psutil.cpu_count(logical=False)  # Physical Cores
    num_threads = psutil.cpu_count(logical=True)  # Logical Cores
    print(f"CPU Usage: {cpu_usage}%")
    print(f"CPU Frequency: {cpu_freq} MHz")
    print(f"Physical Cores: {num_cores}")
    print(f"Logical Cores: {num_threads}")

def get_ram_stats():
    """ Get system RAM stats """
    ram = psutil.virtual_memory()
    print("Total RAM:", round(ram.total / 1e9, 2), "GB")
    print("Available RAM:", round(ram.available / 1e9, 2), "GB")
    print("Used RAM:", round(ram.used / 1e9, 2), "GB")
    print("RAM Usage:", ram.percent, "%")

def get_gpu_stats():
    """ Get GPU stats if available """
    if not NVML_AVAILABLE:
        return {"Error": "pynvml not installed. Run: pip install nvidia-ml-py3"}

    num_gpus = torch.cuda.device_count()

    for i in range(num_gpus):
        handle = nvmlDeviceGetHandleByIndex(i)
        mem_info = nvmlDeviceGetMemoryInfo(handle)
        utilization = nvmlDeviceGetUtilizationRates(handle)

        print(f"GPU {i} - {nvmlDeviceGetName(handle)}")
        print(f"Driver Version: {nvmlSystemGetDriverVersion()}")
        print(f"Total VRAM: {round(mem_info.total / 1e9, 2)} GB")
        print(f"Used VRAM: {round(mem_info.used / 1e9, 2)} GB")
        print(f"Free VRAM: {round(mem_info.free / 1e9, 2)} GB")
        print(f"GPU Usage: {utilization.gpu}%")
        print()

    nvmlShutdown()  # Clean up NVML

# Run and print system stats

print("\n🔹 CPU Stats:", )
print("\n🔹 RAM Stats:", )
print("\n🔹 GPU Stats:", )



🔹 CPU Stats:

🔹 RAM Stats:

🔹 GPU Stats:


## CPU & GPU specs

In [10]:
get_cpu_stats()

CPU Usage: 1.7%
CPU Frequency: 2200.2360000000003 MHz
Physical Cores: 4
Logical Cores: 8


In [11]:
get_ram_stats()

Total RAM: 33.67 GB
Available RAM: 31.89 GB
Used RAM: 1.35 GB
RAM Usage: 5.3 %


In [12]:
get_gpu_stats()

GPU 0 - NVIDIA L4
Driver Version: 535.104.05
Total VRAM: 24.15 GB
Used VRAM: 0.36 GB
Free VRAM: 23.8 GB
GPU Usage: 0%



## Model Loading and Tokenizer

In [13]:
# Load Pretrained Model and Tokenizer
# model_name = "EleutherAI/gpt-neo-2.7B"
# model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit"
LLAMA_1B = "meta-llama/Llama-3.2-1B"
LLAMA_8B = "meta-llama/Llama-3.1-8B"

model_name = LLAMA_8B if RUN_MODE == "main" else LLAMA_1B

# This line downloads (if needed) and initializes a tokenizer using the identifier stored in model_name.
# The tokenizer converts text into a numerical format (tokens) that the model can process,
# and it also handles the reverse process (converting tokens back to human-readable text).
tokenizer = AutoTokenizer.from_pretrained(model_name)

# This line loads a pre-trained causal language model (such as GPT-style models) using the same model identifier.
# It retrieves the model architecture and its pre-trained weights so you can use it for tasks like text generation.
# model = AutoModelForCausalLM.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,       # Loads model in FP16
    device_map="auto"                # Automatically distributes model across devices if needed
)

# !! NEW
# Freeze all model parameters (ensuring no gradients are computed for the base model)
for param in model.parameters():
    param.requires_grad = False

# Ensure a pad token exists (set to eos token if not present).
# 1. Check for the padding token id. If none, use the eos_token as the padding token
if tokenizer.pad_token_id is None:
    print("pad, token doesnt exists, using EOS token")
    tokenizer.pad_token = tokenizer.eos_token

# Adjusts the model's token embedding matrix to match the size of the tokenizer's vocabulary.
# This is important because adding or changing tokens (like defining a pad token)
# may change the size of the vocabulary, and the model's embedding layer needs to reflect that change.
model.resize_token_embeddings(len(tokenizer))

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

pad, token doesnt exists, using EOS token


Embedding(128256, 4096)

## LoRA Configuration

In [14]:
# To get all the intermediate layer config of the model
# for name, module in model.named_modules():
#     print(name, ":", module)

In [15]:
# Enable gradient checkpointing to save memory.

# This technique reduces memory usage during training by not storing all intermediate activations
# during the forward pass. Instead, it saves only a subset of them and recomputes the missing ones
# during the backward pass.
model.config.gradient_checkpointing = True

# Configure LoRA: update only a small set of additional parameters.
# tried r=4 and lora+alpha = 32. Maybe that destabilized training so modifying to 8 and 16 respectively
#initally was 0.1, changing to 0.05

# studies say best to apply Lora to all layers
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,  # Fine-tuning for causal language modeling.
    inference_mode=False,          # Training mode.
    r=8,                           # Rank of low-rank decomposition.
    lora_alpha=16,                 # Scaling factor.
    lora_dropout=0.05,               # Dropout rate for LoRA layers.
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj"]
)



# This function call takes the pre-trained model and applies the LoRA configuration you defined.
# It modifies the model so that, instead of updating all parameters during fine-tuning,
# only a small subset (the LoRA adapters) is trained.
model = get_peft_model(model, lora_config)
print("LoRA applied. Trainable parameters:")
model.print_trainable_parameters()

# Move the model to the chosen device and set to training mode.
model.to(device)
model.train()

LoRA applied. Trainable parameters:
trainable params: 16,252,928 || all params: 8,046,514,176 || trainable%: 0.2020


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Line

## Data loading and preprocessing

In [22]:
# Load and Format Training Data
import json

formatted_strings = []

canary_rows_count = 0
canary_rows = []
with open(TRAIN_JSONL, "r", encoding="utf-8") as f:
    for i, line in enumerate(f):
        # Parse the JSON data from the line
        data = json.loads(line.strip())
        # Extract values
        product_title = data['Product Title']
        product_category = data['Product Categories']
        review_rating = data['Rating']
        review_title = data['Review Title']
        review = data['Review']

        canary_injected = data['Canary Injected']
        canary_rows.append(canary_injected)
        if canary_injected == True:
            canary_rows_count += 1


        # Format the string as per the required format
        formatted_string = f'System prompt : Given the Product Title, Product Category, Review Rating and Review Title, you are required to generate the Review | Product Title: {product_title} | Product Category: {product_category} | Review Rating: {review_rating} | Review Title: {review_title} | Review: {review}'
        # formatted_string = f'System prompt : You are an Amazon reviews generator that generates reviews based on available information | Product Title: {product_title} | Product Category: {product_category} | Review Rating: {review_rating} | Review Title: {review_title} | Review: {review}'

        # Add the formatted string to the list
        formatted_strings.append(formatted_string)
        if RUN_MODE == "test":
          if i == 1000:
              break

{True, nan}


In [23]:
# Now `formatted_strings` contains the list of strings in the desired format
print("Size: ",len(formatted_strings))
print(formatted_strings[0])
train_texts = formatted_strings
strs = [len(formatted_str) for formatted_str in formatted_strings]
print("length of largets string is: ",sum(strs) / len(strs))
print(f"Number of canary rows - {canary_rows_count}")
# avg around 328

Size:  100000
System prompt : Given the Product Title, Product Category, Review Rating and Review Title, you are required to generate the Review | Product Title: VUIIMEEK Square Case for iPhone 12 Pro Max 6.7",Cute White Flowers Clear Print Design Slim Flexible Soft TPU High Impact Shockproof Case Reinforced Bumper Cool Protective Crystal Cover (Green Leaves) | Product Category: Cell Phones & Accessories | Review Rating: 4 | Review Title: No white background! It’s clear! | Review: I bought this bc I thought it had the nice white background. Turns out it’s clear & since my phone is blue it doesn’t look anything like this.  If I had known that I would have purchased something else. It works ok.
length of largets string is:  649.61317
Number of canary rows - 1000


## Data tokenization and dataset creation

In [24]:
# !! NEW - max_length=512

DATA_PARAMS = {
  "max_length": 256,
  "batch_size": 2,
}

# Tokenize training texts with padding and truncation.
encodings = tokenizer(train_texts, return_tensors='pt', padding=True, truncation=True, max_length=DATA_PARAMS['max_length'])
input_ids = encodings['input_ids']
attention_mask = encodings['attention_mask']

# For causal language modeling, use input_ids as labels.
# Replace pad token positions with -100 so that they are ignored by the loss.

#creates a copy of your input IDs, so you can modify them without affecting the original tensor.
labels = input_ids.clone()

#replaces all padding token positions with -100. This is a common convention (especially with PyTorch’s CrossEntropyLoss)
# to indicate that these positions should be ignored during loss computatio
labels[input_ids == tokenizer.pad_token_id] = -100

print("Training data shape:", input_ids.shape)


# !! NEW - num_workers=4, pin_memory=True
# Create a TensorDataset and DataLoader with a small batch size.
train_dataset = TensorDataset(input_ids, attention_mask, labels)
train_loader = DataLoader(train_dataset, batch_size=DATA_PARAMS['batch_size'], shuffle=True, drop_last=True, num_workers=4, pin_memory=True)

Training data shape: torch.Size([100000, 256])


## Optimizer & Privacy engine setup

In [25]:
# !! NEW
# optimizer = AdamW(model.parameters(), lr=2e-5)
optimizer = AdamW(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-5)

In [26]:
# privacy_engine = PrivacyEngine()
# model, optimizer, train_loader = privacy_engine.make_private(
#     module=model,
#     optimizer=optimizer,
#     data_loader=train_loader,
#     noise_multiplier=0.3,      # Lower noise multiplier to reduce added noise
#     max_grad_norm=5,           # Increase clipping norm to allow larger gradients
#     batch_first=True,
#     loss_reduction="mean",
#     poisson_sampling=False      # UPDATE - ERRORING OUT, SO NOT USING. Use Poisson sampling for potentially more stable training
# )

## Tracking

In [27]:
import json
import os
import datetime
import pytz # PST time zone
import pandas as pd
from pathlib import Path

class TrainingTracker:
    def __init__(self, base_dir="./tracking_results"):
        """
        Initialize the training tracker.

        Args:
            base_dir: Directory to save tracking results
        """
        self.base_dir = Path(base_dir)
        self.base_dir.mkdir(exist_ok=True, parents=True)

        # Generate a unique run ID based on timestamp
        # Define PST timezone
        pst = pytz.timezone("America/Los_Angeles")
        # Get current time in PST
        pst_time = datetime.datetime.now(pytz.utc).astimezone(pst)
        # Format the time
        timestamp = pst_time.strftime("%d-%m_%H-%M-%S")
        # timestamp = datetime.datetime.now().strftime("%d-%m_%H-%M-%S")
        self.run_id = f"run_{timestamp}"

        # Create run directory
        self.run_dir = self.base_dir / self.run_id
        self.run_dir.mkdir(exist_ok=True)

        # Initialize tracking data structures
        self.params = {}
        self.epoch_metrics = []
        self.generated_samples = []
        self.privacy_metrics = {}

    def record_parameters(self, **kwargs):
        """
        Record training parameters for the current run.

        Args:
            **kwargs: Key-value pairs of parameters to record
        """
        self.params.update(kwargs)

        # Save parameters to file
        with open(self.run_dir / "parameters.json", "w") as f:
            json.dump(self.params, f, indent=4)

    def record_epoch_metrics(self, epoch, loss, batch_times=None, **kwargs):
        """
        Record metrics for a training epoch.

        Args:
            epoch: Current epoch number
            loss: Loss value for the epoch
            batch_times: Optional list of batch processing times
            **kwargs: Additional metrics to record
        """
        metrics = {
            "epoch": epoch,
            "loss": loss,
            **kwargs
        }

        if batch_times:
            metrics["avg_batch_time"] = sum(batch_times) / len(batch_times)
            metrics["min_batch_time"] = min(batch_times)
            metrics["max_batch_time"] = max(batch_times)

        self.epoch_metrics.append(metrics)

        # Save updated metrics to file
        with open(self.run_dir / "epoch_metrics.json", "w") as f:
            json.dump(self.epoch_metrics, f, indent=4)

        # Also save as CSV for easier analysis
        pd.DataFrame(self.epoch_metrics).to_csv(
            self.run_dir / "epoch_metrics.csv", index=False)

    def record_privacy_budget(self, epsilon, delta=1e-5, **kwargs):
        """
        Record privacy budget metrics.

        Args:
            epsilon: Achieved epsilon value
            delta: Delta value used
            **kwargs: Additional privacy metrics
        """
        self.privacy_metrics = {
            "epsilon": epsilon,
            "delta": delta,
            **kwargs
        }

        # Save privacy metrics to file
        with open(self.run_dir / "privacy_metrics.json", "w") as f:
            json.dump(self.privacy_metrics, f, indent=4)

    def record_sample(self, prompt, generated_text):
        """
        Record a sample of generated text.

        Args:
            prompt: Input prompt
            generated_text: Generated text output
        """
        sample = {
            "prompt": prompt,
            "generated_text": generated_text,
            "timestamp": datetime.datetime.now().isoformat()
        }

        self.generated_samples.append(sample)

        # Save samples to file
        with open(self.run_dir / "generated_samples.json", "w") as f:
            json.dump(self.generated_samples, f, indent=4)

    def save_model_info(self, model_path, model_type, tokenizer_info=None):
        """
        Record information about the saved model.

        Args:
            model_path: Path where model was saved
            model_type: Type of model (e.g., "with_dp", "without_dp")
            tokenizer_info: Additional tokenizer information
        """
        model_info = {
            "model_path": str(model_path),
            "model_type": model_type,
            "tokenizer_info": tokenizer_info or {}
        }

        # Save model info to file
        with open(self.run_dir / "model_info.json", "w") as f:
            json.dump(model_info, f, indent=4)

    def generate_summary(self):
        """
        Generate a summary of the training run.

        Returns:
            str: Summary text
        """
        summary_lines = [
            f"Training Run: {self.run_id}",
            "=" * 50,
            "\nParameters:",
        ]

        for key, value in self.params.items():
            summary_lines.append(f"  {key}: {value}")

        if self.epoch_metrics:
            summary_lines.extend([
                "\nTraining Results:",
                f"  Epochs completed: {len(self.epoch_metrics)}",
                f"  Final loss: {self.epoch_metrics[-1]['loss']:.6f}",
                f"  Initial loss: {self.epoch_metrics[0]['loss']:.6f}",
                f"  Loss reduction: {self.epoch_metrics[0]['loss'] - self.epoch_metrics[-1]['loss']:.6f}"
            ])

        if self.privacy_metrics:
            summary_lines.extend([
                "\nPrivacy Budget:",
                f"  Epsilon: {self.privacy_metrics['epsilon']:.4f}",
                f"  Delta: {self.privacy_metrics['delta']}"
            ])

        summary_text = "\n".join(summary_lines)

        # Save summary to file
        with open(self.run_dir / "summary.txt", "w") as f:
            f.write(summary_text)

        return summary_text

    # def compare_with_previous_runs(self, metric="loss"):
    #     """
    #     Compare this run with previous runs based on a specific metric.

    #     Args:
    #         metric: Metric to compare (default: "loss")

    #     Returns:
    #         DataFrame: Comparison data
    #     """
    #     # Collect data from all previous runs
    #     all_runs = []

    #     for run_dir in self.base_dir.iterdir():
    #         if not run_dir.is_dir() or run_dir == self.run_dir:
    #             continue

    #         params_file = run_dir / "parameters.json"
    #         metrics_file = run_dir / "epoch_metrics.json"

    #         if params_file.exists() and metrics_file.exists():
    #             with open(params_file, "r") as f:
    #                 params = json.load(f)

    #             with open(metrics_file, "r") as f:
    #                 metrics = json.load(f)

    #             if metrics:
    #                 final_metric = metrics[-1].get(metric)

    #                 run_data = {
    #                     "run_id": run_dir.name,
    #                     f"final_{metric}": final_metric,
    #                     **params
    #                 }

    #                 all_runs.append(run_data)

    #     # Add current run
    #     if self.epoch_metrics:
    #         current_run_data = {
    #             "run_id": self.run_id,
    #             f"final_{metric}": self.epoch_metrics[-1].get(metric),
    #             **self.params
    #         }
    #         all_runs.append(current_run_data)

    #     # Convert to DataFrame and sort
    #     if all_runs:
    #         df = pd.DataFrame(all_runs)
    #         df = df.sort_values(by=f"final_{metric}")

    #         # Save comparison to file
    #         df.to_csv(self.run_dir / f"comparison_{metric}.csv", index=False)

    #         return df

    #     return pd.DataFrame()

In [28]:
# Record initial parameters after setting them up
def record_initial_params():
    # Record model configuration
    tracker.record_parameters(
        model_name=model_name,
        device=str(device),
        epochs=epochs,
        batch_size=train_loader.batch_size,
        learning_rate=optimizer.param_groups[0]['lr'],
        gradient_accumulation_steps=accumulation_steps,

        # LoRA parameters
        lora_r=lora_config.r,
        lora_alpha=lora_config.lora_alpha,
        lora_dropout=lora_config.lora_dropout,
        lora_target_modules=list(lora_config.target_modules),

        # Privacy parameters (if using Opacus)
        using_differential_privacy=hasattr(model, "remove_hooks"),
        noise_multiplier=0.6 if hasattr(model, "remove_hooks") else None,
        max_grad_norm=1.5 if hasattr(model, "remove_hooks") else None,

        # Dataset info
        dataset_size=len(formatted_strings),
        avg_sample_length=sum(len(s) for s in formatted_strings) / len(formatted_strings),
        tokenizer_max_length=DATA_PARAMS['max_length'],  # From tokenization step
        data_batch_size=DATA_PARAMS['batch_size'],

        # Tokenizer info
        tokenizer_vocab_size=len(tokenizer),
        tokenizer_model_max_length=tokenizer.model_max_length,


        # System info
        cuda_available=torch.cuda.is_available(),
        gpu_name=torch.cuda.get_device_name(0) if torch.cuda.is_available() else None,
    )

## Training setup


In [29]:
epochs = 4 if RUN_MODE == "main" else 1

# !! NEW
scaler = GradScaler('cuda')  # Create a gradient scaler to manage FP16 stability
accumulation_steps = 1  # Set gradient accumulation steps; use >1 to simulate larger batch sizes

In [30]:
# Initialize the tracker before loading the model
tracker = TrainingTracker()
# Call this after all parameters are set but before training starts
record_initial_params()

In [31]:
# Training loop
model.train()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Line

## Sanity check


In [32]:
model_name

'meta-llama/Llama-3.1-8B'

In [33]:
epochs

4

In [34]:
len(train_texts)

100000

In [35]:
tracker.params

{'model_name': 'meta-llama/Llama-3.1-8B',
 'device': 'cuda',
 'epochs': 4,
 'batch_size': 2,
 'learning_rate': 1e-05,
 'gradient_accumulation_steps': 1,
 'lora_r': 8,
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_target_modules': ['q_proj',
  'o_proj',
  'gate_proj',
  'v_proj',
  'k_proj',
  'up_proj'],
 'using_differential_privacy': False,
 'noise_multiplier': None,
 'max_grad_norm': None,
 'dataset_size': 100000,
 'avg_sample_length': 649.61317,
 'tokenizer_max_length': 256,
 'data_batch_size': 2,
 'tokenizer_vocab_size': 128256,
 'tokenizer_model_max_length': 131072,
 'cuda_available': True,
 'gpu_name': 'NVIDIA L4'}

## Training loop

In [36]:
# # FINETUNING, NOT WORKING NOW
# for epoch in range(epochs):  # Loop over each epoch
#     total_loss = 0.0  # Initialize total loss accumulator for the epoch
#     optimizer.zero_grad()  # Zero gradients at the start of the epoch
#     for i, batch in enumerate(train_loader):  # Loop over mini-batches from the DataLoader
#         # Move each tensor in the batch to the device (GPU) asynchronously if pin_memory is True
#         input_ids_batch, attention_mask_batch, labels_batch = [
#             x.to(device, non_blocking=True) for x in batch
#         ]

#         # Determine the sequence length for the current batch and create position IDs accordingly
#         seq_len = input_ids_batch.size(1)  # Get the sequence length from the input tensor
#         # Create a tensor [0, 1, ..., seq_len-1] and repeat it for each item in the batch
#         position_ids = torch.arange(seq_len, device=device).unsqueeze(0).repeat(input_ids_batch.size(0), 1)

#         # Use mixed precision context for the forward pass to save memory and speed up computation
#         with autocast():
#             outputs = model(
#                 input_ids=input_ids_batch,        # Input token IDs for the model
#                 attention_mask=attention_mask_batch,  # Attention mask to differentiate padded tokens
#                 position_ids=position_ids,          # Positional IDs for the tokens
#                 labels=labels_batch                 # Labels for computing the loss (typically same as input_ids for causal LM)
#             )
#             # Compute the loss; if using gradient accumulation, scale down the loss accordingly
#             loss = outputs.loss / accumulation_steps

#         # Scale the loss and perform the backward pass using the GradScaler for FP16 stability
#         scaler.scale(loss).backward()

#         # Every 'accumulation_steps' iterations, update the model weights
#         if (i + 1) % accumulation_steps == 0:
#             scaler.step(optimizer)  # Update parameters using scaled gradients
#             scaler.update()         # Update the scale for the next iteration
#             optimizer.zero_grad()   # Reset gradients after updating

#         # Accumulate the loss (multiply back to undo the earlier division, so total_loss is in original scale)
#         total_loss += loss.item() * accumulation_steps

#         # Optionally, print progress every 50 batches
#         if i % 50 == 0:
#             print(f"Batch {i} processed.")

#     # Compute the average loss over the epoch
#     avg_loss = total_loss / len(train_loader)
#     print(f"Epoch {epoch+1}/{epochs} - Average loss: {avg_loss:.4f}")

In [37]:
for epoch in range(epochs):
    total_loss = 0.0
    batch_times = []
    start_time = datetime.datetime.now()

    for i, batch in enumerate(train_loader):
        batch_start = datetime.datetime.now()


        # Move each element of the batch to the device.
        # input_ids_batch, attention_mask_batch, labels_batch = [x.to(device) for x in batch]
        input_ids_batch, attention_mask_batch, labels_batch = [x.to(device, non_blocking=True) for x in batch]

        # Create a position_ids tensor: shape [batch_size, seq_len]
        seq_len = input_ids_batch.size(1)
        position_ids = torch.arange(seq_len, device=device).unsqueeze(0).repeat(input_ids_batch.size(0), 1)

        # Forward pass: compute the loss.
        outputs = model(
            input_ids=input_ids_batch,
            attention_mask=attention_mask_batch,
            position_ids=position_ids,
            labels=labels_batch
        )
        loss = outputs.loss
        # total_loss += loss.item()

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Track metrics
        total_loss += loss.item()
        batch_end = datetime.datetime.now()
        batch_times.append((batch_end - batch_start).total_seconds())

        # Print progress
        if i % 50 == 0:
            print(f"Epoch {epoch+1}/{epochs} - Batch {i}/{len(train_loader)} - Loss: {loss.item():.4f}")

    # Compute epoch metrics
    avg_loss = total_loss / len(train_loader)
    end_time = datetime.datetime.now()
    epoch_duration = (end_time - start_time).total_seconds()
    print(f"Epoch {epoch+1}/{epochs} - Average loss: {avg_loss:.4f} - Duration: {epoch_duration:.2f}s")

    # Record epoch metrics
    tracker.record_epoch_metrics(
        epoch=epoch+1,
        loss=avg_loss,
        batch_times=batch_times,
        epoch_duration=epoch_duration,
        timestamp=datetime.datetime.now().isoformat()
    )

    # Record privacy budget if using differential privacy
    if hasattr(model, "remove_hooks"):
        epsilon = privacy_engine.accountant.get_epsilon(delta=1e-5)
        print(f"Achieved privacy budget: ε = {epsilon:.2f}")
        tracker.record_privacy_budget(epsilon=epsilon)

Epoch 1/4 - Batch 0/50000 - Loss: 2.5222
Epoch 1/4 - Batch 50/50000 - Loss: 2.4104
Epoch 1/4 - Batch 100/50000 - Loss: 1.9589
Epoch 1/4 - Batch 150/50000 - Loss: 1.8822
Epoch 1/4 - Batch 200/50000 - Loss: 2.1070
Epoch 1/4 - Batch 250/50000 - Loss: 1.8278
Epoch 1/4 - Batch 300/50000 - Loss: 1.5740
Epoch 1/4 - Batch 350/50000 - Loss: 2.1027
Epoch 1/4 - Batch 400/50000 - Loss: 1.6017
Epoch 1/4 - Batch 450/50000 - Loss: 1.7033
Epoch 1/4 - Batch 500/50000 - Loss: 1.8461
Epoch 1/4 - Batch 550/50000 - Loss: 2.3212
Epoch 1/4 - Batch 600/50000 - Loss: 1.4143
Epoch 1/4 - Batch 650/50000 - Loss: 1.8237
Epoch 1/4 - Batch 700/50000 - Loss: 1.8592
Epoch 1/4 - Batch 750/50000 - Loss: 1.3918
Epoch 1/4 - Batch 800/50000 - Loss: 1.0824
Epoch 1/4 - Batch 850/50000 - Loss: 1.8128
Epoch 1/4 - Batch 900/50000 - Loss: 1.3815
Epoch 1/4 - Batch 950/50000 - Loss: 1.3863
Epoch 1/4 - Batch 1000/50000 - Loss: 1.5545
Epoch 1/4 - Batch 1050/50000 - Loss: 1.9510
Epoch 1/4 - Batch 1100/50000 - Loss: 1.5360
Epoch 1/4 -

## Model saving

In [39]:
# Remove DP hooks if present
if hasattr(model, "remove_hooks"):
    model.remove_hooks()
    model = model._module  # Unwrap the model

# # Remove DP hooks to restore the underlying model.
# model.remove_hooks()
# model = model._module  # Unwrap the model.

# Define model type based on whether differential privacy was used
# model_type = "with_dp" if hasattr(privacy_engine, "accountant") else "without_dp"
model_type = "without_dp"

# Specify the directory where you want to save your fine-tuned model
save_directory = "./finetuned_model_dp"

# Save the model weights and configuration
model.save_pretrained(save_directory)

# Save the tokenizer (this ensures that any custom tokens are preserved)
tokenizer.save_pretrained(save_directory)

# Record model info
tracker.save_model_info(
    model_path=save_directory,
    model_type=model_type,
    tokenizer_info={
        "vocab_size": len(tokenizer),
        "model_max_length": tokenizer.model_max_length,
    }
)

print(f"Model and tokenizer saved to {save_directory}")

Model and tokenizer saved to ./finetuned_model_dp


## Generate 10k rows

In [51]:
import time

# Where the pretrained model is saved
USING_DP = hasattr(model, "remove_hooks")
SAVE_DIRECTORY = "."
TEST_JSONL = "test.jsonl"
GENERATED_OUTPUT_FILE = "generated_sequences_with_dp.jsonl" if USING_DP else "generated_sequences_no_dp.jsonl"

In [42]:
USING_DP

False

### Load test.jsonl


In [47]:
import json

formatted_strings = []

with open(TEST_JSONL, "r") as f:
    for j, line in enumerate(f):
        if j <= 10000:
            j+=1
        data = json.loads(line.strip())
        # Extract values
        product_title = data['Product Title']
        product_category = data['Product Categories']
        review_rating = data['Rating']
        review_title = data['Review Title']
        review = data['Review']

        # Format the string as per the required format
        formatted_string = f'System prompt : Given the Product Title, Product Category, Review Rating and Review Title, you are required to generate the Review | Product Title: {product_title} | Product Category: {product_category} | Review Rating: {review_rating} | Review Title: {review_title} | Review: {review}'


        # Add the formatted string to the list
        formatted_strings.append(formatted_string)

print(f"Processed {len(formatted_strings)} lines.")

Processed 10000 lines.


In [48]:
data

{'System prompt': 'Given the Rating and Title, you are required to generate the review',
 'Rating': 3,
 'Review Title': 'Everything except the battery is perfect',
 'Review': 'Arrived 4 days ahead of schedule. Touch screen, facial ID, camera all work great!<br /><br />Phone was fully unlocked as advertised.<br /><br />I popped my SIM card out of my old phone into this one and it starred working immediately.<br /><br />Only complaint I have is the battery life.<br /><br />Battery might last 2 hrs on full charge. Once battery hits 15% it dies.',
 'Product Title': 'Apple iPhone X, US Version, 256GB, Silver - AT&T (Renewed)',
 'Product Categories': 'Cell Phones & Accessories'}

In [49]:
formatted_strings[0]

"System prompt : Given the Product Title, Product Category, Review Rating and Review Title, you are required to generate the Review | Product Title: Case for Galaxy Note 9,Cutebe Shockproof Series Hard PC+ TPU Bumper Protective Case for Samsung Galaxy Note 9 Crystal | Product Category: Cell Phones & Accessories | Review Rating: 4 | Review Title: Not a bad price for protection and cuteness | Review: Looks and works great. It was a little little on the loose fitting side but now it's fine. I've dropped my phone quite a bit and my phone has come out fine."

### Generate 10k synthetic rows

In [52]:
batch_size = 100
formatted_prompts = formatted_strings

# Process prompts in batches
num_prompts = len(formatted_prompts)
for batch_start in range(0, num_prompts, batch_size):
    batch_prompts = formatted_prompts[batch_start : batch_start + batch_size]

    batch_start_time = time.time()

    # Tokenize the entire batch
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True, max_length=128, padding_side='left')
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Generate text for the batch
    with torch.no_grad():
        generated_ids = model.generate(**inputs, max_length=256, do_sample=True, top_k=50)

    # Decode the generated sequences for each prompt
    batch_generated_texts = [tokenizer.decode(ids, skip_special_tokens=True) for ids in generated_ids]

    batch_end_time = time.time()
    batch_time = batch_end_time - batch_start_time

    # Write the generated outputs in JSONL format
    with open(GENERATED_OUTPUT_FILE, "a") as outfile:
        for text in batch_generated_texts:
            json_line = json.dumps({"generated_text": text})
            outfile.write(json_line + "\n")

    print(f"Processed batch {(batch_start // batch_size) + 1} (prompts {batch_start} to {batch_start+len(batch_prompts)-1}). Time taken: {batch_time:.2f} seconds.")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 1 (prompts 0 to 99). Time taken: 37.20 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 2 (prompts 100 to 199). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 3 (prompts 200 to 299). Time taken: 36.47 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 4 (prompts 300 to 399). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 5 (prompts 400 to 499). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 6 (prompts 500 to 599). Time taken: 36.49 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 7 (prompts 600 to 699). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 8 (prompts 700 to 799). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 9 (prompts 800 to 899). Time taken: 36.49 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 10 (prompts 900 to 999). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 11 (prompts 1000 to 1099). Time taken: 36.48 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 12 (prompts 1100 to 1199). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 13 (prompts 1200 to 1299). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 14 (prompts 1300 to 1399). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 15 (prompts 1400 to 1499). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 16 (prompts 1500 to 1599). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 17 (prompts 1600 to 1699). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 18 (prompts 1700 to 1799). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 19 (prompts 1800 to 1899). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 20 (prompts 1900 to 1999). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 21 (prompts 2000 to 2099). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 22 (prompts 2100 to 2199). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 23 (prompts 2200 to 2299). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 24 (prompts 2300 to 2399). Time taken: 36.45 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 25 (prompts 2400 to 2499). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 26 (prompts 2500 to 2599). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 27 (prompts 2600 to 2699). Time taken: 36.46 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 28 (prompts 2700 to 2799). Time taken: 36.46 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 29 (prompts 2800 to 2899). Time taken: 36.47 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 30 (prompts 2900 to 2999). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 31 (prompts 3000 to 3099). Time taken: 36.48 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 32 (prompts 3100 to 3199). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 33 (prompts 3200 to 3299). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 34 (prompts 3300 to 3399). Time taken: 36.48 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 35 (prompts 3400 to 3499). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 36 (prompts 3500 to 3599). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 37 (prompts 3600 to 3699). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 38 (prompts 3700 to 3799). Time taken: 36.48 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 39 (prompts 3800 to 3899). Time taken: 36.56 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 40 (prompts 3900 to 3999). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 41 (prompts 4000 to 4099). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 42 (prompts 4100 to 4199). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 43 (prompts 4200 to 4299). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 44 (prompts 4300 to 4399). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 45 (prompts 4400 to 4499). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 46 (prompts 4500 to 4599). Time taken: 36.47 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 47 (prompts 4600 to 4699). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 48 (prompts 4700 to 4799). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 49 (prompts 4800 to 4899). Time taken: 36.49 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 50 (prompts 4900 to 4999). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 51 (prompts 5000 to 5099). Time taken: 36.47 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 52 (prompts 5100 to 5199). Time taken: 36.56 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 53 (prompts 5200 to 5299). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 54 (prompts 5300 to 5399). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 55 (prompts 5400 to 5499). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 56 (prompts 5500 to 5599). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 57 (prompts 5600 to 5699). Time taken: 36.49 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 58 (prompts 5700 to 5799). Time taken: 36.47 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 59 (prompts 5800 to 5899). Time taken: 36.48 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 60 (prompts 5900 to 5999). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 61 (prompts 6000 to 6099). Time taken: 36.50 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 62 (prompts 6100 to 6199). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 63 (prompts 6200 to 6299). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 64 (prompts 6300 to 6399). Time taken: 36.46 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 65 (prompts 6400 to 6499). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 66 (prompts 6500 to 6599). Time taken: 36.47 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 67 (prompts 6600 to 6699). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 68 (prompts 6700 to 6799). Time taken: 36.49 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 69 (prompts 6800 to 6899). Time taken: 36.55 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 70 (prompts 6900 to 6999). Time taken: 36.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 71 (prompts 7000 to 7099). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 72 (prompts 7100 to 7199). Time taken: 36.46 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 73 (prompts 7200 to 7299). Time taken: 36.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 74 (prompts 7300 to 7399). Time taken: 36.47 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 75 (prompts 7400 to 7499). Time taken: 36.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 76 (prompts 7500 to 7599). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 77 (prompts 7600 to 7699). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 78 (prompts 7700 to 7799). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 79 (prompts 7800 to 7899). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 80 (prompts 7900 to 7999). Time taken: 36.46 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 81 (prompts 8000 to 8099). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 82 (prompts 8100 to 8199). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 83 (prompts 8200 to 8299). Time taken: 36.48 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 84 (prompts 8300 to 8399). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 85 (prompts 8400 to 8499). Time taken: 36.49 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 86 (prompts 8500 to 8599). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 87 (prompts 8600 to 8699). Time taken: 36.45 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 88 (prompts 8700 to 8799). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 89 (prompts 8800 to 8899). Time taken: 36.48 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 90 (prompts 8900 to 8999). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 91 (prompts 9000 to 9099). Time taken: 36.50 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 92 (prompts 9100 to 9199). Time taken: 36.51 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 93 (prompts 9200 to 9299). Time taken: 36.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 94 (prompts 9300 to 9399). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 95 (prompts 9400 to 9499). Time taken: 36.54 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 96 (prompts 9500 to 9599). Time taken: 36.49 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 97 (prompts 9600 to 9699). Time taken: 36.52 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 98 (prompts 9700 to 9799). Time taken: 36.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 99 (prompts 9800 to 9899). Time taken: 36.52 seconds.
Processed batch 100 (prompts 9900 to 9999). Time taken: 36.51 seconds.
