## Import Required Libraries
The next cell imports all the necessary libraries for model training, including PyTorch, HuggingFace Transformers, PEFT (for LoRA), and TRL (for SFT training).

In [2]:
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    TrainingArguments,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer

## Set Configuration Parameters
The next cell sets up all configuration parameters including model name, dataset path, training hyperparameters (batch size, epochs, sequence length), and output directory.

In [3]:
MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
DATASET_PATH = "dataset\\train.json"
OUTPUT_DIR = "adapters"
MAX_SEQ_LENGTH = 256
BATCH_SIZE = 1
EPOCHS = 3

model_name = MODEL_NAME

## Check GPU Availability and Setup
The next cell verifies that CUDA/GPU is available, displays GPU information, and configures PyTorch to use the GPU for training.

In [4]:

import torch
cuda_available = torch.cuda.is_available()
device_count = torch.cuda.device_count()

print(f"CUDA Available: {cuda_available}")
print(f"GPU Count: {device_count}")

if cuda_available:
    print(f"Current GPU: {torch.cuda.get_device_name(0)}")
    torch.cuda.set_device(0)
    print("GPU Selected: cuda:0")
else:
    raise RuntimeError("GPU not available! Please ensure CUDA is properly installed.")


CUDA Available: True
GPU Count: 1
Current GPU: NVIDIA GeForce GTX 1650
GPU Selected: cuda:0


## Load Training Dataset
The next cell loads the training dataset from the JSON file specified in the configuration.

In [5]:
print("\nLoading dataset...")
dataset = load_dataset("json", data_files=DATASET_PATH, split="train")
print(f"Dataset loaded: {len(dataset)} samples")


Loading dataset...
Dataset loaded: 5000 samples


## Format Training Data
The next cell reformats the dataset examples into a standardized prompt structure with instruction, input, and response sections.

In [6]:
def format_prompt(example):
    """Format examples into standardized prompt structure."""
    example["text"] = (
        "### Instruction:\n"
        f"{example['instruction']}\n\n"
        "### Input:\n"
        f"{example['input']}\n\n"
        "### Response:\n"
        f"{example['output']}"
    )
    return example

dataset = dataset.map(format_prompt)
print("Dataset formatting completed")

Dataset formatting completed


## Initialize Tokenizer
The next cell loads the tokenizer for the model and configures padding settings.

In [7]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

## Configure 4-bit Quantization
The next cell sets up BitsAndBytes configuration for 4-bit quantization (QLoRA) to reduce memory usage during training.

In [8]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)
print("4-bit quantization configured")

4-bit quantization configured


## Load Base Model with Quantization
The next cell loads the base language model with 4-bit quantization applied to the GPU.

In [9]:
print("Loading base model...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="cuda:0"
)
print(f"Base model loaded: {model_name}")

Loading base model...
Base model loaded: TinyLlama/TinyLlama-1.1B-Chat-v1.0


## Configure Low-Rank Adaptation (LoRA)
The next cell sets up the LoRA configuration for parameter-efficient fine-tuning, specifying rank, alpha, and target modules.

In [10]:
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
print("LoRA configuration completed")

LoRA configuration completed


## Apply LoRA to Model
The next cell applies the LoRA configuration to the loaded model and prepares it for training.

In [11]:
print("Applying LoRA to model...")
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Disable cache for training stability
model.config.use_cache = False
print("Model adapted with LoRA")

Applying LoRA to model...
trainable params: 1,126,400 || all params: 1,101,174,784 || trainable%: 0.1023
Model adapted with LoRA


## Tokenize Dataset and Configure Training
The next cell tokenizes the entire dataset and creates training arguments for the SFT trainer with specified hyperparameters.

In [12]:
def tokenize_function(examples):
    """Tokenize text examples."""
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=MAX_SEQ_LENGTH,
        return_tensors=None
    )

print("Tokenizing dataset...")
tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=dataset.column_names
)
print(f"Dataset tokenized: {len(tokenized_dataset)} samples")

"""
Training Configuration
"""
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=1,
    learning_rate=2e-4,
    num_train_epochs=EPOCHS,
    logging_steps=10,
    save_strategy="epoch",
    optim="adamw_torch",
    report_to="none",
    no_cuda=False,
    fp16=False,
    bf16=False,
    tf32=False,
    max_grad_norm=None,
    skip_memory_metrics=True,
    gradient_checkpointing=False,
    ddp_find_unused_parameters=False,
    push_to_hub=False,
    seed=42,
    use_mps_device=False,
    remove_unused_columns=False,
    metric_for_best_model=None,
    greater_is_better=None
)
print("Training configuration completed")

Tokenizing dataset...
Dataset tokenized: 5000 samples
Training configuration completed


## Initialize SFT Trainer
The next cell initializes the SFT trainer with all configurations and disables AMP features for stability.

In [13]:
import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
os.environ["DISABLE_ACCELERATE_TORCH_AMP"] = "1"

# Disable PyTorch AMP
import torch
torch.cuda.amp.autocast_mode.autocast = None

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

trainer = SFTTrainer(
    model=model,
    train_dataset=tokenized_dataset,
    args=training_args,
    data_collator=data_collator
)

# Override the trainer's accelerator configuration
trainer.use_cuda_amp = False
trainer.scaler = None

# Disable GradScaler
if hasattr(trainer, 'accelerator'):
    trainer.accelerator.use_fp16 = False
    if hasattr(trainer.accelerator, 'scaler'):
        trainer.accelerator.scaler = None

print("Trainer initialized successfully (all AMP features disabled)")

Trainer initialized successfully (all AMP features disabled)


## Begin Training
The next cell starts the fine-tuning training process using the configured trainer.

In [14]:
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.
  attn_output = torch.nn.functional.scaled_dot_product_attention(


Step,Training Loss
10,2.7159
20,1.9144
30,1.1271
40,1.0482
50,0.9013
60,0.7266
70,0.5964
80,0.5438
90,0.4708
100,0.4612


TrainOutput(global_step=15000, training_loss=0.20957774736881257, metrics={'train_runtime': 11986.4952, 'train_samples_per_second': 1.251, 'train_steps_per_second': 1.251, 'total_flos': 3958037445132288.0, 'train_loss': 0.20957774736881257, 'epoch': 3.0})

## Save Fine-Tuned Adapter and Tokenizer
The next cell saves the fine-tuned model adapter and tokenizer to the output directory.

In [15]:
print("\nSaving fine-tuned adapter and tokenizer...")

trainer.model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

print(f"Model saved to: {OUTPUT_DIR}")
print("Training pipeline completed successfully!")


Saving fine-tuned adapter and tokenizer...
Model saved to: adapters
Training pipeline completed successfully!
