# Prepare and Train Shadow Models for LiRA (Per-Example Calibration)

This notebook implements proper per-example LiRA with **32 shadow models**.

**Configuration:**
- **32 shadow models**: s ‚àà {0, 1, ..., 31}
- **2000 candidate rows**: from target's test set (`./data/train/test.json`)
- **Per-example split**: Each test sample has exactly **16 IN** and **16 OUT** shadows (randomly assigned)

**Membership Matrix M[s,j]:**
- M[s,j] = 1 if shadow s was trained on example x_j, else 0
- For each example j: exactly 16 shadows have M[¬∑,j] = 1 (IN)
- Which shadows are IN/OUT varies randomly per example

**Training Sets:**
```
For each shadow s:
  Train_s = WikiText_base ‚à™ {x_j | M[s,j] = 1}
```

**LiRA Attack:**
- For each test example x_j:
  - Get IN losses: from 16 shadows where M[s,j] = 1
  - Get OUT losses: from 16 shadows where M[s,j] = 0
  - Compute likelihood ratio: P(target_loss | IN) vs P(target_loss | OUT)

**Expected Performance:**
- With 32 shadows (16 IN/16 OUT): **15-30% TPR @ 1% FPR**
- Much better than 4 shadows (0.9% TPR) due to improved statistics
- Still limited by distribution mismatch (WikiText vs WikiText+train_finetune)

**Directories:**
- Dataset: `./data/shadow_datasets_lira/`
- Models: `./models/shadow_lira/`

**Training Time:** ~5-8 hours on RTX 5070 Ti

In [12]:
# Cell 1: Check GPU
import torch

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    device = torch.device("cuda")
else:
    print("‚ö†Ô∏è  No GPU, training will be slow!")
    device = torch.device("cpu")

print(f"\nDevice: {device}")

PyTorch: 2.8.0+cu128
CUDA available: True
GPU: NVIDIA GeForce RTX 5070 Ti
Memory: 17.09 GB

Device: cuda


In [13]:
# Cell 2: Imports
import os
os.environ.setdefault("TRANSFORMERS_NO_TORCHVISION", "1")

import torch
import numpy as np
import json
import random
from pathlib import Path
from datasets import Dataset, load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments
)
from peft import LoraConfig, get_peft_model
from tqdm.auto import tqdm
import gc

print("‚úì Imports successful")

‚úì Imports successful


In [14]:
# Cell 3: Configuration

# ========== LiRA CONFIGURATION ==========
NUM_SHADOW_MODELS = 32  # 32 shadow models (scaled up from 4)
IN_SHADOWS_PER_EXAMPLE = 16  # Each example: 16 IN, 16 OUT

# ========== DATASET CONFIGURATION ==========
TRAIN_PER_MODEL = 10_000  # WikiText base samples per shadow
TARGET_TEST_FILE = "./data/train/test.json"  # Target test set
MIN_TOKENS = 25  # Token filter
SHADOW_DATA_DIR = "./data/shadow_datasets_lira"

# ========== TRAINING CONFIGURATION ==========
MODEL_NAME = "gpt2"
BLOCK_SIZE = 512
EPOCHS = 3
BATCH_SIZE = 8
GRADIENT_ACCUMULATION_STEPS = 1
LEARNING_RATE = 2e-4
LORA_R = 32
LORA_ALPHA = 64
LORA_DROPOUT = 0.05
SHADOW_MODEL_DIR = "./models/shadow_lira"

# ========== SEEDS ==========
SEED_START = 100
GLOBAL_SEED = 42

print("="*70)
print("PER-EXAMPLE LiRA CONFIGURATION (32 SHADOWS)")
print("="*70)
print(f"Shadows: {NUM_SHADOW_MODELS}")
print(f"Per-example: {IN_SHADOWS_PER_EXAMPLE} IN, {NUM_SHADOW_MODELS - IN_SHADOWS_PER_EXAMPLE} OUT")
print(f"\nWikiText base per shadow: {TRAIN_PER_MODEL:,}")
print(f"Target test set: {TARGET_TEST_FILE}")
print(f"Model: {MODEL_NAME} + LoRA (r={LORA_R})")
print(f"Training: {EPOCHS} epochs, lr={LEARNING_RATE}")
print(f"\nData: {SHADOW_DATA_DIR}/")
print(f"Models: {SHADOW_MODEL_DIR}/")
print(f"\n‚ö†Ô∏è  Training time: ~{NUM_SHADOW_MODELS * 10}-{NUM_SHADOW_MODELS * 15} minutes (~5-8 hours)")
print("="*70)

PER-EXAMPLE LiRA CONFIGURATION (32 SHADOWS)
Shadows: 32
Per-example: 16 IN, 16 OUT

WikiText base per shadow: 10,000
Target test set: ./data/train/test.json
Model: gpt2 + LoRA (r=32)
Training: 3 epochs, lr=0.0002

Data: ./data/shadow_datasets_lira/
Models: ./models/shadow_lira/

‚ö†Ô∏è  Training time: ~320-480 minutes (~5-8 hours)


In [15]:
# Cell 4: Helper Functions

def set_seed_all(seed: int):
    """Set all random seeds"""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

def basic_clean(ds: Dataset) -> Dataset:
    """Basic text cleaning"""
    ds = ds.filter(lambda ex: isinstance(ex.get("text", None), str) and len(ex["text"].strip()) > 0)
    def _strip(ex):
        return {"text": " ".join(ex["text"].split())}
    return ds.map(_strip, batched=False)

def filter_by_tokens(ds: Dataset, tok, min_tokens: int) -> Dataset:
    """Filter by token length"""
    def _len(batch):
        enc = tok(batch["text"], add_special_tokens=False)
        return {"_tok_len": [len(ids) for ids in enc["input_ids"]]}
    ds = ds.map(_len, batched=True)
    ds = ds.filter(lambda ex: ex["_tok_len"] >= min_tokens)
    return ds.remove_columns(["_tok_len"])

def cleanup_gpu():
    """Clean GPU memory"""
    gc.collect()
    torch.cuda.empty_cache()

print("‚úì Helper functions defined")

‚úì Helper functions defined


## Part 1: Dataset Generation

In [16]:
# Cell 6: Load WikiText-103

print("Loading WikiText-103...")
wiki_raw = load_dataset("Salesforce/wikitext", "wikitext-103-raw-v1", split="train")

print("Loading tokenizer...")
tok = AutoTokenizer.from_pretrained("gpt2", use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

print("Preprocessing...")
wiki = wiki_raw.remove_columns([c for c in wiki_raw.column_names if c != "text"])
wiki = basic_clean(wiki)
wiki = filter_by_tokens(wiki, tok, MIN_TOKENS)

print(f"‚úì WikiText-103 ready: {len(wiki):,} samples")
print(f"  Need: {TRAIN_PER_MODEL * NUM_SHADOW_MODELS:,} for training")

Loading WikiText-103...
Loading tokenizer...
Preprocessing...
‚úì WikiText-103 ready: 747,117 samples
  Need: 320,000 for training


In [17]:
# Cell 7: Load Target Test Set

print("="*70)
print("LOADING TARGET TEST SET")
print("="*70)

with open(TARGET_TEST_FILE, 'r') as f:
    test_texts = json.load(f)

TEST_SIZE = len(test_texts)

print(f"‚úì Loaded {TEST_SIZE} test samples")
print(f"  From: {TARGET_TEST_FILE}")
print(f"  Preview: {test_texts[0][:80]}...")
print("="*70)

LOADING TARGET TEST SET
‚úì Loaded 2000 test samples
  From: ./data/train/test.json
  Preview: Soundtrack to the Apocalypse 's ideal release date was in early November 2003 . ...


In [18]:
# Cell 8: Generate Membership Matrix M

print("="*70)
print("GENERATING MEMBERSHIP MATRIX M")
print("="*70)

set_seed_all(GLOBAL_SEED)

# M[j, s]: test sample j, shadow s
# For each test sample j, randomly choose IN_SHADOWS_PER_EXAMPLE shadows to be IN
keep_matrix = np.zeros((TEST_SIZE, NUM_SHADOW_MODELS), dtype=np.int32)

for j in range(TEST_SIZE):
    # Randomly select which shadows train on this example
    in_shadows = random.sample(range(NUM_SHADOW_MODELS), IN_SHADOWS_PER_EXAMPLE)
    for s in in_shadows:
        keep_matrix[j, s] = 1

print(f"‚úì Matrix M: {keep_matrix.shape}")
print(f"  (test_samples={TEST_SIZE}, shadows={NUM_SHADOW_MODELS})")

print(f"\nPer-shadow statistics:")
for s in range(NUM_SHADOW_MODELS):
    num_in = keep_matrix[:, s].sum()
    print(f"  Shadow {s}: {num_in}/{TEST_SIZE} samples IN ({num_in/TEST_SIZE:.1%})")

print(f"\nPer-example statistics:")
in_counts = keep_matrix.sum(axis=1)
print(f"  Min: {in_counts.min()}, Max: {in_counts.max()}, Mean: {in_counts.mean():.2f}")
print(f"  Expected: {IN_SHADOWS_PER_EXAMPLE} IN per example")

# Verify
assert in_counts.min() == IN_SHADOWS_PER_EXAMPLE
assert in_counts.max() == IN_SHADOWS_PER_EXAMPLE
print(f"\n‚úì Verified: All examples have exactly {IN_SHADOWS_PER_EXAMPLE} IN shadows")
print("="*70)

GENERATING MEMBERSHIP MATRIX M
‚úì Matrix M: (2000, 32)
  (test_samples=2000, shadows=32)

Per-shadow statistics:
  Shadow 0: 1032/2000 samples IN (51.6%)
  Shadow 1: 960/2000 samples IN (48.0%)
  Shadow 2: 1019/2000 samples IN (50.9%)
  Shadow 3: 1010/2000 samples IN (50.5%)
  Shadow 4: 1046/2000 samples IN (52.3%)
  Shadow 5: 993/2000 samples IN (49.6%)
  Shadow 6: 974/2000 samples IN (48.7%)
  Shadow 7: 993/2000 samples IN (49.6%)
  Shadow 8: 1002/2000 samples IN (50.1%)
  Shadow 9: 985/2000 samples IN (49.2%)
  Shadow 10: 1024/2000 samples IN (51.2%)
  Shadow 11: 970/2000 samples IN (48.5%)
  Shadow 12: 1026/2000 samples IN (51.3%)
  Shadow 13: 1018/2000 samples IN (50.9%)
  Shadow 14: 992/2000 samples IN (49.6%)
  Shadow 15: 1002/2000 samples IN (50.1%)
  Shadow 16: 940/2000 samples IN (47.0%)
  Shadow 17: 1016/2000 samples IN (50.8%)
  Shadow 18: 1022/2000 samples IN (51.1%)
  Shadow 19: 979/2000 samples IN (48.9%)
  Shadow 20: 987/2000 samples IN (49.4%)
  Shadow 21: 996/2000 sa

In [19]:
# Cell 9: Generate Shadow Datasets

print("="*70)
print("GENERATING SHADOW DATASETS")
print("="*70)

os.makedirs(SHADOW_DATA_DIR, exist_ok=True)
used_indices = set()

for shadow_id in tqdm(range(NUM_SHADOW_MODELS), desc="Datasets"):
    seed = SEED_START + shadow_id
    set_seed_all(seed)
    
    shadow_dir = Path(SHADOW_DATA_DIR) / f"shadow_{shadow_id}"
    train_file = shadow_dir / "train_finetune.json"
    keep_file = shadow_dir / "keep.npy"
    
    if train_file.exists():
        print(f"\nShadow {shadow_id}: exists, skipping")
        continue
    
    # Sample WikiText base
    available = [i for i in range(len(wiki)) if i not in used_indices]
    if len(available) < TRAIN_PER_MODEL:
        available = list(range(len(wiki)))
        used_indices.clear()
    
    random.Random(seed).shuffle(available)
    selected = sorted(available[:TRAIN_PER_MODEL])
    used_indices.update(selected)
    
    base_texts = [wiki[i]["text"] for i in selected]
    
    # Get test samples for this shadow based on M
    keep_array = keep_matrix[:, shadow_id]
    in_indices = np.where(keep_array == 1)[0]
    test_samples_in = [test_texts[i] for i in in_indices]
    
    # Combine
    all_texts = base_texts + test_samples_in
    random.Random(seed + 999).shuffle(all_texts)
    
    print(f"\nShadow {shadow_id}:")
    print(f"  WikiText: {len(base_texts):,}")
    print(f"  Test IN: {len(test_samples_in):,}")
    print(f"  Total: {len(all_texts):,}")
    
    # Save
    shadow_dir.mkdir(parents=True, exist_ok=True)
    
    with open(train_file, "w") as f:
        json.dump([{"text": t} for t in all_texts], f, ensure_ascii=False, indent=2)
    
    np.save(keep_file, keep_array)

# Save shared files
test_file = Path(SHADOW_DATA_DIR) / "test.json"
with open(test_file, "w") as f:
    json.dump(test_texts, f, ensure_ascii=False, indent=2)

keep_matrix_path = Path(SHADOW_DATA_DIR) / "keep_matrix.npy"
np.save(keep_matrix_path, keep_matrix)

print(f"\n{'='*70}")
print(f"‚úÖ {NUM_SHADOW_MODELS} shadow datasets generated")
print(f"üìÅ {SHADOW_DATA_DIR}/")
print(f"‚úì test.json: {test_file}")
print(f"‚úì keep_matrix.npy: {keep_matrix_path}")
print("="*70)

GENERATING SHADOW DATASETS


Datasets:   0%|          | 0/32 [00:00<?, ?it/s]


Shadow 0: exists, skipping

Shadow 1: exists, skipping

Shadow 2: exists, skipping

Shadow 3: exists, skipping


Datasets:  16%|‚ñà‚ñå        | 5/32 [00:01<00:09,  2.96it/s]


Shadow 4:
  WikiText: 10,000
  Test IN: 1,046
  Total: 11,046


Datasets:  19%|‚ñà‚ñâ        | 6/32 [00:03<00:14,  1.76it/s]


Shadow 5:
  WikiText: 10,000
  Test IN: 993
  Total: 10,993

Shadow 6:
  WikiText: 10,000
  Test IN: 974
  Total: 10,974

Shadow 7:
  WikiText: 10,000
  Test IN: 993
  Total: 10,993


Datasets:  28%|‚ñà‚ñà‚ñä       | 9/32 [00:03<00:09,  2.47it/s]


Shadow 8:
  WikiText: 10,000
  Test IN: 1,002
  Total: 11,002


Datasets:  31%|‚ñà‚ñà‚ñà‚ñè      | 10/32 [00:04<00:11,  1.85it/s]


Shadow 9:
  WikiText: 10,000
  Test IN: 985
  Total: 10,985


Datasets:  34%|‚ñà‚ñà‚ñà‚ñç      | 11/32 [00:05<00:13,  1.52it/s]


Shadow 10:
  WikiText: 10,000
  Test IN: 1,024
  Total: 11,024


Datasets:  38%|‚ñà‚ñà‚ñà‚ñä      | 12/32 [00:07<00:15,  1.33it/s]


Shadow 11:
  WikiText: 10,000
  Test IN: 970
  Total: 10,970


Datasets:  41%|‚ñà‚ñà‚ñà‚ñà      | 13/32 [00:08<00:15,  1.22it/s]


Shadow 12:
  WikiText: 10,000
  Test IN: 1,026
  Total: 11,026


Datasets:  44%|‚ñà‚ñà‚ñà‚ñà‚ñç     | 14/32 [00:09<00:15,  1.16it/s]


Shadow 13:
  WikiText: 10,000
  Test IN: 1,018
  Total: 11,018


Datasets:  47%|‚ñà‚ñà‚ñà‚ñà‚ñã     | 15/32 [00:10<00:15,  1.12it/s]


Shadow 14:
  WikiText: 10,000
  Test IN: 992
  Total: 10,992


Datasets:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 16/32 [00:10<00:14,  1.09it/s]


Shadow 15:
  WikiText: 10,000
  Test IN: 1,002
  Total: 11,002


Datasets:  53%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé    | 17/32 [00:11<00:13,  1.09it/s]


Shadow 16:
  WikiText: 10,000
  Test IN: 940
  Total: 10,940


Datasets:  56%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã    | 18/32 [00:12<00:12,  1.09it/s]


Shadow 17:
  WikiText: 10,000
  Test IN: 1,016
  Total: 11,016


Datasets:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ    | 19/32 [00:13<00:11,  1.11it/s]


Shadow 18:
  WikiText: 10,000
  Test IN: 1,022
  Total: 11,022


Datasets:  62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 20/32 [00:14<00:10,  1.12it/s]


Shadow 19:
  WikiText: 10,000
  Test IN: 979
  Total: 10,979


Datasets:  66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 21/32 [00:15<00:09,  1.14it/s]


Shadow 20:
  WikiText: 10,000
  Test IN: 987
  Total: 10,987


Datasets:  69%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ   | 22/32 [00:16<00:08,  1.15it/s]


Shadow 21:
  WikiText: 10,000
  Test IN: 996
  Total: 10,996


Datasets:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 23/32 [00:17<00:07,  1.18it/s]


Shadow 22:
  WikiText: 10,000
  Test IN: 977
  Total: 10,977


Datasets:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 24/32 [00:17<00:06,  1.20it/s]


Shadow 23:
  WikiText: 10,000
  Test IN: 1,011
  Total: 11,011


Datasets:  78%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 25/32 [00:18<00:05,  1.23it/s]


Shadow 24:
  WikiText: 10,000
  Test IN: 1,029
  Total: 11,029


Datasets:  81%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè | 26/32 [00:19<00:04,  1.25it/s]


Shadow 25:
  WikiText: 10,000
  Test IN: 997
  Total: 10,997


Datasets:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 27/32 [00:20<00:03,  1.27it/s]


Shadow 26:
  WikiText: 10,000
  Test IN: 1,018
  Total: 11,018


Datasets:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 28/32 [00:20<00:03,  1.28it/s]


Shadow 27:
  WikiText: 10,000
  Test IN: 998
  Total: 10,998


Datasets:  91%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 29/32 [00:21<00:02,  1.31it/s]


Shadow 28:
  WikiText: 10,000
  Test IN: 1,014
  Total: 11,014


Datasets:  94%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç| 30/32 [00:22<00:01,  1.33it/s]


Shadow 29:
  WikiText: 10,000
  Test IN: 964
  Total: 10,964


Datasets:  97%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã| 31/32 [00:23<00:00,  1.35it/s]


Shadow 30:
  WikiText: 10,000
  Test IN: 1,003
  Total: 11,003


Datasets: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 32/32 [00:23<00:00,  1.34it/s]


Shadow 31:
  WikiText: 10,000
  Test IN: 1,003
  Total: 11,003

‚úÖ 32 shadow datasets generated
üìÅ ./data/shadow_datasets_lira/
‚úì test.json: data/shadow_datasets_lira/test.json
‚úì keep_matrix.npy: data/shadow_datasets_lira/keep_matrix.npy





## Part 2: Train Shadow Models

In [20]:
# Cell 11: Training Function

def train_shadow_model(shadow_id, data_dir, output_dir):
    """Train a single shadow model with LoRA"""
    print(f"\n{'='*70}")
    print(f"TRAINING SHADOW {shadow_id}")
    print(f"{'='*70}")
    
    seed = SEED_START + shadow_id
    set_seed_all(seed)
    
    # Load data
    with open(Path(data_dir) / "train_finetune.json") as f:
        train_items = json.load(f)
    
    ds = Dataset.from_list(train_items)
    train_raw = ds.filter(lambda ex: ex.get("text") and len(ex["text"].strip()) > 0)
    
    # Check keep array
    keep_file = Path(data_dir) / "keep.npy"
    if keep_file.exists():
        keep_arr = np.load(keep_file)
        print(f"‚úì Keep array: {keep_arr.sum()} test samples IN")
    
    print(f"Training samples: {len(train_raw):,}")
    
    # Tokenizer
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
    
    def tokenize_fn(ex):
        texts = [(t if t and t.strip() else tokenizer.eos_token) for t in ex["text"]]
        return tokenizer(
            texts,
            padding="max_length",
            truncation=True,
            max_length=BLOCK_SIZE,
            return_attention_mask=True,
        )
    
    train_tok = train_raw.map(tokenize_fn, batched=True, remove_columns=train_raw.column_names)
    collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
    
    # Model
    print(f"Loading {MODEL_NAME}...")
    model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
    model.resize_token_embeddings(len(tokenizer))
    model.gradient_checkpointing_enable()
    
    # LoRA
    print(f"Applying LoRA (r={LORA_R})...")
    lora_config = LoraConfig(
        r=LORA_R,
        lora_alpha=LORA_ALPHA,
        lora_dropout=LORA_DROPOUT,
        target_modules=["c_attn", "c_fc", "c_proj"],
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = get_peft_model(model, lora_config)
    model.print_trainable_parameters()
    
    # Training args
    training_args = TrainingArguments(
        output_dir=output_dir,
        num_train_epochs=EPOCHS,
        per_device_train_batch_size=BATCH_SIZE,
        gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
        learning_rate=LEARNING_RATE,
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        warmup_ratio=0.03,
        logging_steps=50,
        save_strategy="no",
        seed=seed,
        bf16=torch.cuda.is_available() and torch.cuda.is_bf16_supported(),
        fp16=torch.cuda.is_available() and not torch.cuda.is_bf16_supported(),
        report_to="none",
        gradient_checkpointing=False,
        optim="adamw_torch",
        dataloader_num_workers=2,
    )
    
    # Train
    print(f"\nüöÄ Training on {device}...")
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_tok,
        tokenizer=tokenizer,
        data_collator=collator,
    )
    
    trainer.train()
    
    # Save
    print(f"Saving to {output_dir}")
    os.makedirs(output_dir, exist_ok=True)
    model.save_pretrained(output_dir)
    tokenizer.save_pretrained(output_dir)
    
    # Copy keep array
    if keep_file.exists():
        import shutil
        shutil.copy(keep_file, Path(output_dir) / "keep.npy")
        print("‚úì Copied keep.npy")
    
    print(f"‚úÖ Shadow {shadow_id} complete")
    print("="*70)
    
    # Cleanup
    del model, trainer
    cleanup_gpu()

print("‚úì Training function defined")

‚úì Training function defined


In [21]:
# Cell 12: Train All Shadows

print("="*70)
print("TRAINING SHADOW MODELS")
print("="*70)
print(f"‚ö†Ô∏è  ~{NUM_SHADOW_MODELS * 10}-{NUM_SHADOW_MODELS * 15} minutes on GPU")
print("="*70)

os.makedirs(SHADOW_MODEL_DIR, exist_ok=True)

trained = []
skipped = []
failed = []

for shadow_id in range(NUM_SHADOW_MODELS):
    data_dir = Path(SHADOW_DATA_DIR) / f"shadow_{shadow_id}"
    output_dir = Path(SHADOW_MODEL_DIR) / f"shadow_{shadow_id}"
    
    # Skip if exists
    if output_dir.exists() and (output_dir / "adapter_model.safetensors").exists():
        print(f"\n‚è≠Ô∏è  Shadow {shadow_id}: exists, skipping")
        skipped.append(shadow_id)
        continue
    
    if not data_dir.exists():
        print(f"\n‚ùå Shadow {shadow_id}: dataset not found")
        failed.append(shadow_id)
        continue
    
    try:
        train_shadow_model(shadow_id, data_dir, output_dir)
        trained.append(shadow_id)
    except Exception as e:
        print(f"\n‚ùå Shadow {shadow_id}: {e}")
        failed.append(shadow_id)
        cleanup_gpu()

print(f"\n{'='*70}")
print("TRAINING COMPLETE")
print(f"{'='*70}")
print(f"‚úÖ Trained: {len(trained)} {trained}")
print(f"‚è≠Ô∏è  Skipped: {len(skipped)} {skipped}")
print(f"‚ùå Failed: {len(failed)} {failed}")
print("="*70)

if len(trained) + len(skipped) == NUM_SHADOW_MODELS:
    print(f"\nüéâ All {NUM_SHADOW_MODELS} shadows ready for LiRA!")
else:
    print(f"\n‚ö†Ô∏è  Only {len(trained) + len(skipped)}/{NUM_SHADOW_MODELS} ready")

TRAINING SHADOW MODELS
‚ö†Ô∏è  ~320-480 minutes on GPU

‚è≠Ô∏è  Shadow 0: exists, skipping

‚è≠Ô∏è  Shadow 1: exists, skipping

‚è≠Ô∏è  Shadow 2: exists, skipping

‚è≠Ô∏è  Shadow 3: exists, skipping

TRAINING SHADOW 4


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11046/11046 [00:00<00:00, 319936.21 examples/s]


‚úì Keep array: 1046 test samples IN
Training samples: 11,046


Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11046/11046 [00:02<00:00, 4906.83 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.068
100,3.8199
150,3.6417
200,3.5805
250,3.5519
300,3.582
350,3.5171
400,3.5378
450,3.4992
500,3.5014


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_4
‚úì Copied keep.npy
‚úÖ Shadow 4 complete

TRAINING SHADOW 5


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10993/10993 [00:00<00:00, 384499.14 examples/s]

‚úì Keep array: 993 test samples IN
Training samples: 10,993



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10993/10993 [00:02<00:00, 5149.98 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0475
100,3.7867
150,3.6544
200,3.5936
250,3.5622
300,3.5578
350,3.5359
400,3.5446
450,3.4955
500,3.4848


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_5
‚úì Copied keep.npy
‚úÖ Shadow 5 complete

TRAINING SHADOW 6


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10974/10974 [00:00<00:00, 307121.45 examples/s]

‚úì Keep array: 974 test samples IN
Training samples: 10,974



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10974/10974 [00:02<00:00, 4408.09 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0627
100,3.768
150,3.6481
200,3.6078
250,3.613
300,3.5685
350,3.5565
400,3.5225
450,3.534
500,3.503


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_6
‚úì Copied keep.npy
‚úÖ Shadow 6 complete

TRAINING SHADOW 7


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10993/10993 [00:00<00:00, 284496.01 examples/s]


‚úì Keep array: 993 test samples IN
Training samples: 10,993


Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10993/10993 [00:02<00:00, 4202.41 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0645
100,3.7995
150,3.6345
200,3.6477
250,3.5559
300,3.5297
350,3.5935
400,3.483
450,3.5041
500,3.4945


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_7
‚úì Copied keep.npy
‚úÖ Shadow 7 complete

TRAINING SHADOW 8


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11002/11002 [00:00<00:00, 302876.99 examples/s]


‚úì Keep array: 1002 test samples IN
Training samples: 11,002


Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11002/11002 [00:02<00:00, 4338.20 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0288
100,3.7998
150,3.6935
200,3.608
250,3.6004
300,3.5795
350,3.5198
400,3.5049
450,3.4995
500,3.4955


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_8
‚úì Copied keep.npy
‚úÖ Shadow 8 complete

TRAINING SHADOW 9


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10985/10985 [00:00<00:00, 456584.81 examples/s]


‚úì Keep array: 985 test samples IN
Training samples: 10,985


Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10985/10985 [00:02<00:00, 5470.60 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0676
100,3.8022
150,3.6654
200,3.6067
250,3.5446
300,3.5311
350,3.5341
400,3.5515
450,3.5354
500,3.5055


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_9
‚úì Copied keep.npy
‚úÖ Shadow 9 complete

TRAINING SHADOW 10


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11024/11024 [00:00<00:00, 490292.42 examples/s]

‚úì Keep array: 1024 test samples IN
Training samples: 11,024



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11024/11024 [00:02<00:00, 5476.25 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0311
100,3.7842
150,3.6744
200,3.5961
250,3.5567
300,3.5337
350,3.5395
400,3.5018
450,3.5466
500,3.5034


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_10
‚úì Copied keep.npy
‚úÖ Shadow 10 complete

TRAINING SHADOW 11


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10970/10970 [00:00<00:00, 471381.16 examples/s]

‚úì Keep array: 970 test samples IN
Training samples: 10,970



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10970/10970 [00:02<00:00, 5100.77 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.042
100,3.8157
150,3.6439
200,3.6112
250,3.5687
300,3.5562
350,3.5371
400,3.5258
450,3.519
500,3.5155


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_11
‚úì Copied keep.npy
‚úÖ Shadow 11 complete

TRAINING SHADOW 12


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11026/11026 [00:00<00:00, 465789.70 examples/s]

‚úì Keep array: 1026 test samples IN
Training samples: 11,026



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11026/11026 [00:01<00:00, 5560.05 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0776
100,3.7512
150,3.6487
200,3.6131
250,3.57
300,3.5456
350,3.5428
400,3.5144
450,3.5457
500,3.4676


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_12
‚úì Copied keep.npy
‚úÖ Shadow 12 complete

TRAINING SHADOW 13


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11018/11018 [00:00<00:00, 426333.46 examples/s]

‚úì Keep array: 1018 test samples IN
Training samples: 11,018



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11018/11018 [00:01<00:00, 5528.77 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0736
100,3.7936
150,3.6774
200,3.6144
250,3.5366
300,3.5703
350,3.5336
400,3.5281
450,3.5406
500,3.4951


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_13
‚úì Copied keep.npy
‚úÖ Shadow 13 complete

TRAINING SHADOW 14


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10992/10992 [00:00<00:00, 430647.13 examples/s]

‚úì Keep array: 992 test samples IN
Training samples: 10,992



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10992/10992 [00:02<00:00, 5389.32 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0399
100,3.7534
150,3.6527
200,3.5906
250,3.5613
300,3.5687
350,3.5232
400,3.5196
450,3.5003
500,3.5316


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_14
‚úì Copied keep.npy
‚úÖ Shadow 14 complete

TRAINING SHADOW 15


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11002/11002 [00:00<00:00, 401791.32 examples/s]

‚úì Keep array: 1002 test samples IN
Training samples: 11,002



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11002/11002 [00:01<00:00, 5529.40 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0765
100,3.7889
150,3.6623
200,3.6032
250,3.575
300,3.5273
350,3.5642
400,3.5495
450,3.5293
500,3.478


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_15
‚úì Copied keep.npy
‚úÖ Shadow 15 complete

TRAINING SHADOW 16


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10940/10940 [00:00<00:00, 461705.58 examples/s]

‚úì Keep array: 940 test samples IN
Training samples: 10,940



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10940/10940 [00:01<00:00, 5595.41 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.



üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0541
100,3.8082
150,3.659
200,3.5849
250,3.5825
300,3.5527
350,3.5445
400,3.5409
450,3.5587
500,3.4995


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_16
‚úì Copied keep.npy
‚úÖ Shadow 16 complete

TRAINING SHADOW 17


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11016/11016 [00:00<00:00, 442571.39 examples/s]


‚úì Keep array: 1016 test samples IN
Training samples: 11,016


Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11016/11016 [00:02<00:00, 5482.41 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.



üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0765
100,3.7824
150,3.6299
200,3.604
250,3.5676
300,3.562
350,3.5137
400,3.47
450,3.5233
500,3.5435


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_17
‚úì Copied keep.npy
‚úÖ Shadow 17 complete

TRAINING SHADOW 18


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11022/11022 [00:00<00:00, 445883.23 examples/s]

‚úì Keep array: 1022 test samples IN
Training samples: 11,022



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11022/11022 [00:01<00:00, 5660.73 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0453
100,3.8002
150,3.6357
200,3.5733
250,3.5572
300,3.5279
350,3.5182
400,3.5264
450,3.53
500,3.4946


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_18
‚úì Copied keep.npy
‚úÖ Shadow 18 complete

TRAINING SHADOW 19


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10979/10979 [00:00<00:00, 497603.94 examples/s]

‚úì Keep array: 979 test samples IN
Training samples: 10,979



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10979/10979 [00:01<00:00, 5627.75 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0687
100,3.755
150,3.6578
200,3.5953
250,3.6164
300,3.5154
350,3.5129
400,3.5445
450,3.5279
500,3.5151


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_19
‚úì Copied keep.npy
‚úÖ Shadow 19 complete

TRAINING SHADOW 20


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10987/10987 [00:00<00:00, 419102.35 examples/s]

‚úì Keep array: 987 test samples IN
Training samples: 10,987



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10987/10987 [00:00<00:00, -12676.13 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0345
100,3.8175
150,3.6751
200,3.5638
250,3.5476
300,3.5545
350,3.5328
400,3.4941
450,3.5308
500,3.5058


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_20
‚úì Copied keep.npy
‚úÖ Shadow 20 complete

TRAINING SHADOW 21


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10996/10996 [00:00<00:00, 470052.05 examples/s]

‚úì Keep array: 996 test samples IN
Training samples: 10,996



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10996/10996 [00:02<00:00, 5381.48 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0431
100,3.7996
150,3.655
200,3.6069
250,3.5535
300,3.5349
350,3.5006
400,3.5031
450,3.5067
500,3.5085


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_21
‚úì Copied keep.npy
‚úÖ Shadow 21 complete

TRAINING SHADOW 22


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10977/10977 [00:00<00:00, 499884.64 examples/s]

‚úì Keep array: 977 test samples IN
Training samples: 10,977



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10977/10977 [00:01<00:00, 5556.45 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.081
100,3.7551
150,3.6859
200,3.5955
250,3.584
300,3.5146
350,3.5212
400,3.5207
450,3.5227
500,3.4706


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_22
‚úì Copied keep.npy
‚úÖ Shadow 22 complete

TRAINING SHADOW 23


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11011/11011 [00:00<00:00, 516154.96 examples/s]

‚úì Keep array: 1011 test samples IN
Training samples: 11,011



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11011/11011 [00:02<00:00, 5488.16 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0263
100,3.8013
150,3.6403
200,3.5959
250,3.5881
300,3.5329
350,3.5283
400,3.5848
450,3.4973
500,3.5012


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_23
‚úì Copied keep.npy
‚úÖ Shadow 23 complete

TRAINING SHADOW 24


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11029/11029 [00:00<00:00, 518319.50 examples/s]

‚úì Keep array: 1029 test samples IN
Training samples: 11,029



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11029/11029 [00:01<00:00, 5592.57 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.



üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0284
100,3.7502
150,3.6268
200,3.5943
250,3.5593
300,3.5473
350,3.5293
400,3.5236
450,3.5431
500,3.5371


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_24
‚úì Copied keep.npy
‚úÖ Shadow 24 complete

TRAINING SHADOW 25


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10997/10997 [00:00<00:00, 516058.15 examples/s]

‚úì Keep array: 997 test samples IN
Training samples: 10,997



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10997/10997 [00:02<00:00, 5383.74 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.



üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0486
100,3.7862
150,3.7007
200,3.5803
250,3.5598
300,3.5607
350,3.5382
400,3.5234
450,3.5381
500,3.5229


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_25
‚úì Copied keep.npy
‚úÖ Shadow 25 complete

TRAINING SHADOW 26


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11018/11018 [00:00<00:00, 419559.87 examples/s]

‚úì Keep array: 1018 test samples IN
Training samples: 11,018



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11018/11018 [00:01<00:00, 5569.88 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0353
100,3.7722
150,3.6523
200,3.6039
250,3.5773
300,3.5667
350,3.5141
400,3.5336
450,3.5059
500,3.5417


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_26
‚úì Copied keep.npy
‚úÖ Shadow 26 complete

TRAINING SHADOW 27


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10998/10998 [00:00<00:00, 514046.11 examples/s]

‚úì Keep array: 998 test samples IN
Training samples: 10,998



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10998/10998 [00:01<00:00, 5586.98 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.



üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0671
100,3.7758
150,3.6711
200,3.6232
250,3.554
300,3.5803
350,3.574
400,3.5149
450,3.5547
500,3.4749


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_27
‚úì Copied keep.npy
‚úÖ Shadow 27 complete

TRAINING SHADOW 28


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11014/11014 [00:00<00:00, 520038.55 examples/s]

‚úì Keep array: 1014 test samples IN
Training samples: 11,014



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11014/11014 [00:01<00:00, 5566.04 examples/s]


Loading gpt2...
Applying LoRA (r=32)...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0571
100,3.789
150,3.6816
200,3.5878
250,3.5759
300,3.5196
350,3.5203
400,3.5409
450,3.5314
500,3.514


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_28
‚úì Copied keep.npy
‚úÖ Shadow 28 complete

TRAINING SHADOW 29


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10964/10964 [00:00<00:00, 511852.33 examples/s]

‚úì Keep array: 964 test samples IN
Training samples: 10,964



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10964/10964 [00:01<00:00, 5539.71 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.



üöÄ Training on cuda...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,3.9973
100,3.8233
150,3.6351
200,3.6129
250,3.5282
300,3.5337
350,3.5437
400,3.4969
450,3.5393
500,3.4994


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_29
‚úì Copied keep.npy
‚úÖ Shadow 29 complete

TRAINING SHADOW 30


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11003/11003 [00:00<00:00, 483037.93 examples/s]

‚úì Keep array: 1003 test samples IN
Training samples: 11,003



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11003/11003 [00:02<00:00, 5414.87 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0626
100,3.7925
150,3.646
200,3.5906
250,3.5984
300,3.5145
350,3.5267
400,3.5506
450,3.4882
500,3.5284


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_30
‚úì Copied keep.npy
‚úÖ Shadow 30 complete

TRAINING SHADOW 31


Filter: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11003/11003 [00:00<00:00, 334832.23 examples/s]

‚úì Keep array: 1003 test samples IN
Training samples: 11,003



Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11003/11003 [00:02<00:00, 4371.53 examples/s]


Loading gpt2...
Applying LoRA (r=32)...
trainable params: 4,718,592 || all params: 129,158,400 || trainable%: 3.6533

üöÄ Training on cuda...


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
50,4.0475
100,3.8064
150,3.6041
200,3.6233
250,3.5838
300,3.5555
350,3.5438
400,3.5071
450,3.5261
500,3.5281


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Saving to models/shadow_lira/shadow_31
‚úì Copied keep.npy
‚úÖ Shadow 31 complete

TRAINING COMPLETE
‚úÖ Trained: 28 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
‚è≠Ô∏è  Skipped: 4 [0, 1, 2, 3]
‚ùå Failed: 0 []

üéâ All 32 shadows ready for LiRA!


## Part 3: Verification

In [22]:
# Cell 14: Verify Setup

print("="*70)
print("VERIFICATION")
print("="*70)

test_file = Path(SHADOW_DATA_DIR) / "test.json"
keep_matrix_file = Path(SHADOW_DATA_DIR) / "keep_matrix.npy"

print(f"\nüìÅ Shared files:")
if test_file.exists():
    with open(test_file) as f:
        print(f"  ‚úì test.json: {len(json.load(f))} samples")
else:
    print(f"  ‚ùå test.json: MISSING")

if keep_matrix_file.exists():
    M = np.load(keep_matrix_file)
    print(f"  ‚úì keep_matrix.npy: {M.shape}")
else:
    print(f"  ‚ùå keep_matrix.npy: MISSING")

print(f"\nüìÅ Datasets:")
for i in range(NUM_SHADOW_MODELS):
    d = Path(SHADOW_DATA_DIR) / f"shadow_{i}"
    if (d / "train_finetune.json").exists():
        with open(d / "train_finetune.json") as f:
            cnt = len(json.load(f))
        keep_arr = np.load(d / "keep.npy")
        print(f"  ‚úì shadow_{i}: {cnt:,} train, {keep_arr.sum()} IN")
    else:
        print(f"  ‚ùå shadow_{i}: MISSING")

print(f"\nü§ñ Models:")
for i in range(NUM_SHADOW_MODELS):
    m = Path(SHADOW_MODEL_DIR) / f"shadow_{i}"
    if (m / "adapter_model.safetensors").exists():
        size_mb = (m / "adapter_model.safetensors").stat().st_size / 1024 / 1024
        has_keep = (m / "keep.npy").exists()
        print(f"  ‚úì shadow_{i}: {size_mb:.1f} MB, keep: {has_keep}")
    else:
        print(f"  ‚ùå shadow_{i}: NOT TRAINED")

print("="*70)

VERIFICATION

üìÅ Shared files:
  ‚úì test.json: 2000 samples
  ‚úì keep_matrix.npy: (2000, 32)

üìÅ Datasets:
  ‚úì shadow_0: 10,988 train, 988 IN
  ‚úì shadow_1: 10,984 train, 984 IN
  ‚úì shadow_2: 11,010 train, 1010 IN
  ‚úì shadow_3: 11,018 train, 1018 IN
  ‚úì shadow_4: 11,046 train, 1046 IN
  ‚úì shadow_5: 10,993 train, 993 IN
  ‚úì shadow_6: 10,974 train, 974 IN
  ‚úì shadow_7: 10,993 train, 993 IN
  ‚úì shadow_8: 11,002 train, 1002 IN
  ‚úì shadow_9: 10,985 train, 985 IN
  ‚úì shadow_10: 11,024 train, 1024 IN
  ‚úì shadow_11: 10,970 train, 970 IN
  ‚úì shadow_12: 11,026 train, 1026 IN
  ‚úì shadow_13: 11,018 train, 1018 IN
  ‚úì shadow_14: 10,992 train, 992 IN
  ‚úì shadow_15: 11,002 train, 1002 IN
  ‚úì shadow_16: 10,940 train, 940 IN
  ‚úì shadow_17: 11,016 train, 1016 IN
  ‚úì shadow_18: 11,022 train, 1022 IN
  ‚úì shadow_19: 10,979 train, 979 IN
  ‚úì shadow_20: 10,987 train, 987 IN
  ‚úì shadow_21: 10,996 train, 996 IN
  ‚úì shadow_22: 10,977 train, 977 IN
  ‚úì shadow_

## How to Use for LiRA Attack

**Per-Example LiRA Algorithm with 32 Shadows:**

```python
# 1. Load membership matrix M
M = np.load("./data/shadow_datasets_lira/keep_matrix.npy")  # shape: (2000, 32)

# 2. Compute losses for all models on all test samples
NUM_SHADOW_MODELS = 32
shadow_losses = np.zeros((NUM_SHADOW_MODELS, TEST_SIZE))
for s in range(NUM_SHADOW_MODELS):
    for j in range(TEST_SIZE):
        shadow_losses[s, j] = compute_loss(shadow_model_s, test_sample_j)

target_losses = np.array([compute_loss(target_model, test_sample_j) for j in range(TEST_SIZE)])

# 3. For each test example j, compute LiRA score
lira_scores = np.zeros(TEST_SIZE)

for j in range(TEST_SIZE):
    # Get IN and OUT shadows for this specific example
    in_mask = (M[j, :] == 1)   # 16 shadows that trained on x_j
    out_mask = (M[j, :] == 0)  # 16 shadows that didn't train on x_j
    
    # Extract losses
    in_losses = shadow_losses[in_mask, j]    # 16 values (much better statistics!)
    out_losses = shadow_losses[out_mask, j]  # 16 values
    
    # Compute statistics for THIS example
    in_mean = in_losses.mean()
    in_std = in_losses.std()
    out_mean = out_losses.mean()
    out_std = out_losses.std()
    
    # LiRA score (normalized)
    combined_std = np.sqrt(in_std**2 + out_std**2)
    lira_scores[j] = (out_mean - in_mean) / (combined_std + 1e-9)

# 4. Evaluate
from sklearn.metrics import roc_auc_score, roc_curve
labels = load_labels()  # 1 = member, 0 = non-member

auc = roc_auc_score(labels, lira_scores)
fpr, tpr, _ = roc_curve(labels, lira_scores)
tpr_at_1_fpr = tpr[np.where(fpr <= 0.01)[0][-1]]

print(f"AUC: {auc:.4f}")
print(f"TPR @ 1% FPR: {tpr_at_1_fpr:.4f}")
```

**Key Improvements with 32 Shadows:**
- Each example has 16 IN losses + 16 OUT losses (vs 2+2 before)
- Much more reliable mean and std estimates
- Expected: **15-30% TPR @ 1% FPR** (vs 0.9% with 4 shadows)
- Still limited by distribution mismatch, but significantly better!

**Next Step:**
Update `MIA_LiRA.ipynb` Cell 2: Change `NUM_SHADOW_MODELS = 4` to `NUM_SHADOW_MODELS = 32`