# Fine-tune unsloth/Llama-3.2-1B-Instruct on data/trn.json (instruction: DESCRIBE ABOUT THE PRODUCT.)

This notebook reads JSONL data from `data/trn.json`, builds an instruction-tuning dataset with:
- instruction: `"DESCRIBE ABOUT THE PRODUCT."`
- input: `title`
- output: `content`

and performs LoRA fine-tuning of `unsloth/Llama-3.2-1B-Instruct` using TRL + PEFT.

Notes:
- The notebook uses the provided Hugging Face token to authenticate.
- Adjust training hyperparameters (batch sizes, steps) based on your hardware.


In [1]:
# If running in an isolated environment, install dependencies.
%pip -q install --upgrade "unsloth>=2024.08.08" "transformers>=4.43.3" "datasets>=2.20.0" "accelerate>=0.33.0" "peft>=0.11.1" "trl>=0.9.4" "sentencepiece>=0.2.0" "huggingface_hub>=0.24.6" "triton>=2.3.1"


Note: you may need to restart the kernel to use updated packages.


In [2]:
# Load and prepare dataset from data/trn.json (JSONL)
import json, os, random
from datasets import Dataset, DatasetDict

data_path = 'data/trn.json'
assert os.path.exists(data_path), f'File not found: {data_path}'

instructions, inputs, outputs = [], [], []
with open(data_path, 'r', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        obj = json.loads(line)
        instr = 'DESCRIBE ABOUT THE PRODUCT.'
        title = obj.get('title', '') or ''
        content = obj.get('content', '') or ''
        # Skip rows with no meaningful output
        if not title and not content:
            continue
        if not content:
            # If content is empty, you may skip or set a placeholder; we skip to keep target non-empty
            continue
        instructions.append(instr)
        inputs.append(title)
        outputs.append(content)

print(f'Total records loaded: {len(outputs)}')
raw_ds = Dataset.from_dict({'instruction': instructions, 'input': inputs, 'output': outputs})
# Train/validation split
raw_ds = raw_ds.shuffle(seed=42)
if len(raw_ds) > 20:
    ds = raw_ds.train_test_split(test_size=0.05, seed=42)
else:
    ds = {'train': raw_ds, 'test': raw_ds.select(range(0))}
if isinstance(ds, dict):
    ds = DatasetDict(ds)
ds


  from .autonotebook import tqdm as notebook_tqdm


Total records loaded: 1498718


DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 1423782
    })
    test: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 74936
    })
})

In [3]:
# Build chat-formatted texts for Model
from typing import List, Dict

def make_chat(sample: Dict) -> List[Dict[str, str]]:
    user_content = f"{sample['instruction']}\nTitle: {sample['input']}".strip()
    assistant_content = sample['output']
    return [
        {'role': 'user', 'content': user_content},
        {'role': 'assistant', 'content': assistant_content},
    ]

def format_sample(sample: Dict, tokenizer) -> str:
    messages = make_chat(sample)
    # include_assistant_response=True to include labels; add_generation_prompt=False for training
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
    return text

def to_training_texts(dataset, tokenizer):
    return [format_sample(rec, tokenizer) for rec in dataset]


In [4]:
# Load model and tokenizer
import torch
from unsloth import FastLanguageModel

# Read Hugging Face token from environment if provided (no fallback logic)
HF_TOKEN = "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"

model_id = 'unsloth/Llama-3.2-1B-Instruct'

model, tokenizer = FastLanguageModel.from_pretrained(
            model_name = model_id,
            max_seq_length = 2048,
            dtype=None,
            load_in_4bit=True,
            token=HF_TOKEN,
        )
model.config.use_cache = False  # important for training
# Attach LoRA adapters to enable fine-tuning on a 4-bit quantized model
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=[
        'q_proj','k_proj','v_proj','o_proj',
        'gate_proj','up_proj','down_proj'
    ],
)
print('Model and tokenizer loaded')


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.10.1: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    NVIDIA GeForce RTX 3060. Num GPUs = 1. Max memory: 11.614 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.10.1 patched 16 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


Model and tokenizer loaded


In [5]:
# Prepare training dataset with tokenization
max_length = 1024

def tokenize_function(examples):
    texts = []
    for instr, inp, out in zip(examples['instruction'], examples['input'], examples['output']):
        sample = {'instruction': instr, 'input': inp, 'output': out}
        text = tokenizer.apply_chat_template(
            make_chat(sample), tokenize=False, add_generation_prompt=False
        )
        texts.append(text)
    tok = tokenizer(
        texts,
        truncation=True,
        max_length=max_length,
        padding=False,
        return_tensors=None
    )
    # Labels are the same as input_ids for causal LM training
    tok['labels'] = tok['input_ids'].copy()
    return tok

tokenized = ds.map(tokenize_function, batched=True, remove_columns=ds['train'].column_names)
tokenized


Map: 100%|██████████| 1423782/1423782 [03:23<00:00, 6999.53 examples/s]
Map: 100%|██████████| 74936/74936 [00:10<00:00, 7325.54 examples/s]


DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 1423782
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 74936
    })
})

In [6]:
# Running prompts before training (baseline with base model, no adapters)
# We load a fresh base model to get a true pre-training baseline.
from unsloth import FastLanguageModel as _FLM_BASELINE

_base_pre, _tok_pre = _FLM_BASELINE.from_pretrained(
    model_name = model_id,
    max_seq_length = 2048,
    dtype=None,
    load_in_4bit=True,
    token=HF_TOKEN,
)
_base_pre.config.use_cache = True


def generate_description_baseline(title: str, max_new_tokens: int = 128):
    messages = [
        {
            "role": "system",
            "content": "You are a product description assistant powered by LLM Model Version: unsloth/Llama-3.2-1B-Instruct.\n\nContext and training: You were fine-tuned on a large corpus of product titles and descriptions paired with real user searches and user engagement signals. You do not have browsing or real-time access to external data.\n\nTask: When the user provides a product title, return a concise, accurate description of the likely product and state the source of your information. Do not invent specifications, claims, or URLs. If details are uncertain or unavailable, keep the description generic and clearly state the limitation in the Source line.\n\nLanguage: Write in the same language used by the user message. If unclear, default to the language of the provided title.\n\nOutput format (must be exact, no extra lines, no Markdown, no emojis):\nTITLE: <exactly the title provided by the user>\nDescription: <concise, factual description, maximum 2048 characters>\nSource: <brief origin of information; examples: \"training data patterns/model knowledge\", \"inferred from the title and common product descriptions; training data patterns/model knowledge\">\n\nStyle and content rules:\n- Be factual, neutral, and concise. Prefer short sentences. Avoid marketing hype.\n- Base content on common product characteristics inferable from the title and patterns learned during training.\n- Do not fabricate specific model numbers, dimensions, certifications, warranties, prices, or URLs unless explicitly present in the title or user message.\n- If the title is ambiguous (e.g., multiple product types share the name), describe the most common interpretation and reflect uncertainty in the Source line.\n- Never include additional headings, bullets, disclaimers, or formatting beyond the required three lines.\n- Ensure the Description stays within 2048 characters; truncate gracefully."
            },
        {'role': 'user', 'content': f"DESCRIBE ABOUT THE PRODUCT Title: {title}"}
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors='pt').to(_base_pre.device)
    with torch.no_grad():
        out = _base_pre.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.9,
            temperature=0.7,
            eos_token_id=tokenizer.eos_token_id,
        )
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    return text

print(generate_description_baseline('The Book of Revelation'))
print(generate_description_baseline('Girls Ballet Tutu Neon Pink'))


==((====))==  Unsloth 2025.10.1: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    NVIDIA GeForce RTX 3060. Num GPUs = 1. Max memory: 11.614 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
system

Cutting Knowledge Date: December 2023
Today Date: 06 Oct 2025

You are a product description assistant powered by LLM Model Version: unsloth/Llama-3.2-1B-Instruct.

Context and training: You were fine-tuned on a large corpus of product titles and descriptions paired with real user searches and user engagement signals. You do not have browsing or real-time access to external data.

Task: When the user provides a product title, return a concise, accurate description of the likely product and state the source of your i

In [7]:
# Configure LoRA and trainer
from trl import SFTTrainer
from transformers import TrainingArguments
import os

output_dir = 'outputs/llama-3.2-1b-lora'
os.makedirs(output_dir, exist_ok=True)

train_batch_size = 16
gradient_accumulation = 2
warmup_steps = 10
num_epochs = 3
learning_rate = 3e-5
logging_steps = 1
save_steps = 100
max_steps = 200

def has_test(ds):
    try:
        return len(ds['test']) > 0
    except Exception:
        return False

training_args = TrainingArguments(
    per_device_train_batch_size=train_batch_size,
    gradient_accumulation_steps=gradient_accumulation,
    num_train_epochs=num_epochs,
    learning_rate=learning_rate,
    logging_steps=logging_steps,
    max_steps=max_steps,
    warmup_steps=warmup_steps,
    save_steps=save_steps,
    fp16 = False,
    bf16 = True,
    optim='paged_adamw_8bit',
    lr_scheduler_type = 'cosine',
    output_dir=output_dir,
    seed=42,
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized['train'],
    eval_dataset=tokenized['test'] if has_test(tokenized) else None,
    args=training_args,
    packing=True,  # pack multiple samples per sequence to utilize context
    max_seq_length=max_length,
)
trainer.model.print_trainable_parameters()
print('Trainer ready')


trainable params: 11,272,192 || all params: 1,247,086,592 || trainable%: 0.9039
Trainer ready


In [8]:
# Train
train_result = trainer.train()
trainer.save_state()
trainer.save_model(output_dir)  # saves adapters if PEFT is used
print('Training complete. Artifacts saved to', output_dir)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,423,782 | Num Epochs = 1 | Total steps = 200
O^O/ \_/ \    Batch size per device = 16 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (16 x 2 x 1) = 32
 "-____-"     Trainable parameters = 11,272,192 of 1,247,086,592 (0.90% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,4.4693
2,3.8968
3,4.4122
4,4.2629
5,4.1814
6,4.0653
7,4.0707
8,4.0976
9,3.8648
10,3.7402


Training complete. Artifacts saved to outputs/llama-3.2-1b-lora


In [25]:
# Inference: load base + adapters and generate for a sample title

from transformers import pipeline
from peft import PeftModel

base_model, _ = FastLanguageModel.from_pretrained(
    model_name = model_id,
    max_seq_length = 2048,
    dtype=None,
    load_in_4bit=True,
    token=HF_TOKEN,
)

base_model.config.use_cache = True
adapted = PeftModel.from_pretrained(base_model, output_dir)
adapted.eval()

def generate_description(title: str, max_new_tokens: int = 128):
    messages = [
        {
            "role": "system",
            "content": "You are a product description assistant powered by LLM Model Version: unsloth/Llama-3.2-1B-Instruct.\n\nContext and training: You were fine-tuned on a large corpus of product titles and descriptions paired with real user searches and user engagement signals. You do not have browsing or real-time access to external data.\n\nTask: When the user provides a product title, return a concise, accurate description of the likely product and state the source of your information. Do not invent specifications, claims, or URLs. If details are uncertain or unavailable, keep the description generic and clearly state the limitation in the Source line.\n\nLanguage: Write in the same language used by the user message. If unclear, default to the language of the provided title.\n\nOutput format (must be exact, no extra lines, no Markdown, no emojis):\nTITLE: <exactly the title provided by the user>\nDescription: <concise, factual description, maximum 2048 characters>\nSource: <brief origin of information; must answer INTERNAL_DATA when you don't know; examples: \"training data patterns/model knowledge\", \"inferred from the title and common product descriptions; training data patterns/model knowledge\">\n\nStyle and content rules:\n- Be factual, neutral, and concise. Prefer short sentences. Avoid marketing hype.\n- Base content on common product characteristics inferable from the title and patterns learned during training.\n- Do not fabricate specific model numbers, dimensions, certifications, warranties, prices, or URLs unless explicitly present in the title or user message.\n- If the title is ambiguous (e.g., multiple product types share the name), describe the most common interpretation and reflect uncertainty in the Source line.\n- Never include additional headings, bullets, disclaimers, or formatting beyond the required three lines.\n- Ensure the Description stays within 2048 characters; truncate gracefully."
            },
        {'role': 'user', 'content': f"DESCRIBE ABOUT THE PRODUCT Title: {title}"}

    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    token_inputs = tokenizer(prompt, return_tensors='pt').to(adapted.device)
    with torch.no_grad():
        out = adapted.generate(**token_inputs, max_new_tokens=max_new_tokens, do_sample=True, top_p=0.9, temperature=0.7, eos_token_id=tokenizer.eos_token_id)
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    # Heuristic: return only the assistant's part after the generation prompt
    answer = text.split('assistant')[-1].strip() if 'assistant' in text else text
    return answer

print(generate_description('The Los Angeles Diaries: A Memoir'))




==((====))==  Unsloth 2025.10.1: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    NVIDIA GeForce RTX 3060. Num GPUs = 1. Max memory: 11.614 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
The Los Angeles Diaries: A Memoir


## Tips
- If you encounter memory issues, lower `max_length`, increase `gradient_accumulation_steps`, or enable 4-bit quantization.
- You can push the adapter to the Hub by calling `trainer.push_to_hub()` with a repo name and using your token.
- The dataset includes many records with empty `content`; this notebook skips them to ensure non-empty targets.
