<a href="https://colab.research.google.com/github/YashNigam65/gitfolder/blob/master/assignment_4_QLoRA_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Fine-tune a small open-weight LLM, such as `mistralai/Mistral-7B-v0.1`, using QLoRA in Google Colab, including library installation, 4-bit model and tokenizer loading, QLoRA configuration, creation of a small instruction dataset, pre-training inference, training with `SFTTrainer`, and post-training inference to demonstrate the effect of fine-tuning.

## QLoRA Explanation

### Subtask:
Provide a brief explanation (5-6 lines) of what QLoRA is and why it's efficient for fine-tuning.


QLoRA (Quantized Low-Rank Adapters) is an efficient fine-tuning approach that reduces memory usage without sacrificing performance. It works by quantizing a pre-trained language model to 4-bit precision and then using Low-Rank Adapters (LoRA) to fine-tune only a small set of introduced weights. This method enables the fine-tuning of large language models (LLMs) on commodity hardware, making it particularly efficient for resource-constrained environments like Google Colab, as it significantly reduces memory footprint while maintaining competitive performance.

## Colab Setup and Library Installation

### Subtask:
Install necessary libraries such as `transformers`, `peft`, `bitsandbytes`, `accelerate`, and `trl`.


**Reasoning**:
To efficiently install all required libraries, I will use `pip install` commands within a single code block. This will install `transformers`, `peft`, `bitsandbytes`, `accelerate`, and `trl`.



In [1]:
print("Installing necessary libraries...")
!pip install transformers
!pip install peft
!pip install bitsandbytes
!pip install accelerate
!pip install trl
print("Libraries installed successfully.")

Installing necessary libraries...
Collecting bitsandbytes
  Downloading bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Downloading bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl (59.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.1/59.1 MB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.49.1
Collecting trl
  Downloading trl-0.27.1-py3-none-any.whl.metadata (11 kB)
Downloading trl-0.27.1-py3-none-any.whl (532 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m532.9/532.9 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: trl
Successfully installed trl-0.27.1
Libraries installed successfully.


## Model Loading with 4-bit Quantization

### Subtask:
Load a small open-weight LLM (e.g., `mistralai/Mistral-7B-v0.1` or similar) in 4-bit quantized format using `bitsandbytes` configuration.


**Reasoning**:
To load the model in 4-bit quantized format, I will first import the required classes and define the model name. Then, I'll set up the `BitsAndBytesConfig` for 4-bit quantization and finally load the model using `AutoModelForCausalLM.from_pretrained`.



In [2]:
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

# 2. Define the model name
model_name = "mistralai/Mistral-7B-v0.1"

# 3. Configure the 4-bit quantization settings
bitsandbytes_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
)

# 4. Load the pre-trained model
print(f"Loading model {model_name} with 4-bit quantization...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bitsandbytes_config,
    device_map="auto"
)
print(f"Model {model_name} loaded successfully.")

Loading model mistralai/Mistral-7B-v0.1 with 4-bit quantization...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]



model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

KeyboardInterrupt: 

# Task
The previous attempt to load `mistralai/Mistral-7B-v0.1` resulted in a `KeyboardInterrupt`, likely due to its size exceeding Colab's resource limits. To address this, I will update the model name to a much smaller model, `facebook/opt-125m`, as specified in the plan, and then proceed with loading it.

```python
# 2. Update the model name to a smaller one as per the plan
model_name = "facebook/opt-125m"

# 3. Configure the 4-bit quantization settings (re-using previous configuration)
bitsandbytes_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
)

# 4. Load the pre-trained model with the updated model name
print(f"Loading model {model_name} with 4-bit quantization...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bitsandbytes_config,
    device_map="auto"
)
print(f"Model {model_name} loaded successfully.")
```

## Model Loading with 4-bit Quantization (Using smaller model)

### Subtask:
Load a much smaller open-weight LLM (e.g., `facebook/opt-125m`) in 4-bit quantized format using `bitsandbytes` configuration to meet the runtime constraint.


**Reasoning**:
To load the specified smaller model in 4-bit quantized format, I will use the `AutoModelForCausalLM` and `BitsAndBytesConfig` classes from `transformers`, along with `torch`. I will define the new model name, set up the 4-bit quantization configuration, and then load the model using these settings.



In [3]:
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

# 1. Set the model name to a much smaller LLM
model_name = "facebook/opt-125m"

# 2. Configure the 4-bit quantization settings
bitsandbytes_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
)

# 3. Load the pre-trained model
print(f"Loading model {model_name} with 4-bit quantization...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bitsandbytes_config,
    device_map="auto"
)
print(f"Model {model_name} loaded successfully.")

Loading model facebook/opt-125m with 4-bit quantization...


config.json:   0%|          | 0.00/651 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/251M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/251M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Model facebook/opt-125m loaded successfully.


## Tokenizer Loading

### Subtask:
Load the corresponding tokenizer for the selected small model (`facebook/opt-125m`).


**Reasoning**:
To load the tokenizer for the specified model, I will import `AutoTokenizer` and use its `from_pretrained` method with the `model_name` variable, storing the result in `tokenizer`.



In [4]:
from transformers import AutoTokenizer

# Load the tokenizer
print(f"Loading tokenizer for model: {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
print("Tokenizer loaded successfully.")

Loading tokenizer for model: facebook/opt-125m...


tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

Tokenizer loaded successfully.


## QLoRA Configuration

### Subtask:
Define the LoRA configuration using LoraConfig from peft, specifying parameters like r, lora_alpha, lora_dropout, bias, and target_modules (focusing on attention projection layers).


**Reasoning**:
To define the QLoRA configuration, I need to import the necessary classes from the `peft` library, identify the `target_modules` for the chosen model, and then instantiate `LoraConfig` with the specified parameters. Finally, I will apply this configuration to the model using `get_peft_model`.



In [5]:
from peft import LoraConfig, get_peft_model

# 1. Define target modules for LoRA for facebook/opt-125m
target_modules = ["q_proj", "v_proj"]

# 2. Instantiate LoraConfig
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=target_modules
)

# 3. Apply the LoRA configuration to the model
print("Applying LoRA configuration to the model...")
model = get_peft_model(model, lora_config)
print("LoRA configuration applied successfully.")

# Print the trainable parameters
model.print_trainable_parameters()

Applying LoRA configuration to the model...
LoRA configuration applied successfully.
trainable params: 589,824 || all params: 125,829,120 || trainable%: 0.4688


## Prepare Instruction Dataset

### Subtask:
Create a very small, synthetic, or hardcoded instruction dataset (max 200-300 samples) directly within the notebook for demonstration purposes.


### Reasoning:
To create the instruction dataset, I will define a Python list of dictionaries. Each dictionary will represent an instruction-response pair, containing an 'instruction' key for the input text and an 'output' key for the expected response. Given the requirement for a very small dataset, I will create a few synthetic examples directly within the notebook to demonstrate the structure without generating a large number of samples, as the overall task implies a constrained environment.

**Reasoning**:
Following the explanation, I will now create a small, hardcoded instruction dataset as a Python list of dictionaries, with each dictionary having 'instruction' and 'output' keys, and store it in the `instruction_dataset` variable.



In [6]:
instruction_dataset = [
    {
        "instruction": "What is the capital of France?",
        "output": "Paris"
    },
    {
        "instruction": "How many continents are there?",
        "output": "Seven"
    },
    {
        "instruction": "What is the largest ocean on Earth?",
        "output": "Pacific Ocean"
    },
    {
        "instruction": "Who wrote 'Romeo and Juliet'?",
        "output": "William Shakespeare"
    },
    {
        "instruction": "What is the chemical symbol for water?",
        "output": "H2O"
    }
]

print(f"Instruction dataset created with {len(instruction_dataset)} samples.")
print("First sample:")
print(instruction_dataset[0])

Instruction dataset created with 5 samples.
First sample:
{'instruction': 'What is the capital of France?', 'output': 'Paris'}


## Pre-training Inference

### Subtask:
Perform and display an inference example with the base model using a specific prompt to establish a baseline.


### Reasoning:
To perform pre-training inference, I will first select an instruction from the `instruction_dataset`. This instruction will then be tokenized using the previously loaded `tokenizer`. The tokenized input will be passed to the `model`'s `generate` method to produce a response. Finally, the generated tokens will be decoded back into human-readable text and printed alongside the original prompt to demonstrate the model's behavior before fine-tuning.

**Reasoning**:
Following the reasoning, I will now implement the code to perform pre-training inference. This involves selecting a prompt from the `instruction_dataset`, tokenizing it, moving it to the appropriate device, generating a response from the model, decoding the output, and then printing both the original prompt and the model's generated response to establish a baseline before fine-tuning.



In [7]:
import torch

# 1. Choose a prompt from the instruction_dataset
prompt = instruction_dataset[0]["instruction"]

# 2. Tokenize the chosen prompt
inputs = tokenizer(prompt, return_tensors='pt')

# 3. Move the input tokens to the appropriate device
device = "cuda" if torch.cuda.is_available() else "cpu"
input_ids = inputs.input_ids.to(device)
attention_mask = inputs.attention_mask.to(device)

# 4. Generate a response from the model
print("Generating response with the base model...")
output_tokens = model.generate(
    input_ids=input_ids,
    attention_mask=attention_mask,
    max_new_tokens=50, # Limit the generated text length
    pad_token_id=tokenizer.eos_token_id # Set pad_token_id for generation
)

# 5. Decode the generated tokens
# Decode only the newly generated tokens (excluding the prompt)
response = tokenizer.decode(output_tokens[0][len(input_ids[0]):], skip_special_tokens=True)

# 6. Print the original prompt and the model's generated response
print(f"\n--- Pre-training Inference Baseline ---")
print(f"Prompt: {prompt}")
print(f"Model Response: {response.strip()}")
print("--------------------------------------")

Generating response with the base model...

--- Pre-training Inference Baseline ---
Prompt: What is the capital of France?
Model Response: France is the capital of France.
I'm not sure what you mean.
I'm not sure what you mean.
I'm not sure what you mean.
I'm not sure what you mean.
I'm not sure what
--------------------------------------


## Training Setup and Execution

### Subtask:
Configure and run a short training loop using SFTTrainer from trl, setting epochs=1, per_device_train_batch_size=1, and optionally gradient_accumulation_steps if needed, to fine-tune the model with the QLoRA adapters.


**Reasoning**:
To prepare for fine-tuning, I need to import the required classes: `SFTTrainer` for the training loop, `TrainingArguments` to define training parameters, and `Dataset` to convert the instruction data into a format compatible with `SFTTrainer`.



In [8]:
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import Dataset

print("Required libraries imported: SFTTrainer, TrainingArguments, Dataset.")

Required libraries imported: SFTTrainer, TrainingArguments, Dataset.


**Reasoning**:
Following the instructions, I will now convert the instruction dataset to a 'Dataset' object, define a formatting function for the training data, set up the 'TrainingArguments', initialize 'SFTTrainer' with the model, tokenizer, and formatted dataset, and then start the training process.



In [9]:
from datasets import Dataset

# 2. Convert instruction_dataset to a Dataset object
raw_dataset = Dataset.from_list(instruction_dataset)
print(f"Raw dataset created with {len(raw_dataset)} samples.")

# 3. Define a formatting function
def formatting_func(example):
    text = f"Instruction: {example['instruction']}\nOutput: {example['output']}"
    return {"text": text}

# Apply the formatting function
formatted_dataset = raw_dataset.map(formatting_func)
print("Dataset formatted for SFTTrainer.")
print("First formatted sample:")
print(formatted_dataset[0]["text"])

# 4. Define TrainingArguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_8bit",
    logging_steps=10,
    learning_rate=2e-4,
    fp16=False, # Set to True if your GPU supports it and you have bfloat16=False
    bf16=False,  # Set to True if your GPU supports bfloat16
    report_to="none" # Disable reporting to services like Weights & Biases
)

# 5. Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,
    peft_config=lora_config, # Pass the lora_config if you want to initialize trainer with it
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_args,
)

print("SFTTrainer initialized. Starting training...")

# 6. Start the training process
trainer.train()
print("Training completed.")

Raw dataset created with 5 samples.


Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Dataset formatted for SFTTrainer.
First formatted sample:
Instruction: What is the capital of France?
Output: Paris


TypeError: SFTTrainer.__init__() got an unexpected keyword argument 'dataset_text_field'

**Reasoning**:
The previous code failed because `SFTTrainer` received an unexpected keyword argument `dataset_text_field`. This likely indicates a version mismatch in the `trl` library where this argument is not recognized. Since the `formatted_dataset` already contains a 'text' column, the `SFTTrainer` might implicitly pick it up, so removing the explicit `dataset_text_field` argument is the most direct fix.



In [10]:
from datasets import Dataset

# 2. Convert instruction_dataset to a Dataset object
raw_dataset = Dataset.from_list(instruction_dataset)
print(f"Raw dataset created with {len(raw_dataset)} samples.")

# 3. Define a formatting function
def formatting_func(example):
    text = f"Instruction: {example['instruction']}\nOutput: {example['output']}"
    return {"text": text}

# Apply the formatting function
formatted_dataset = raw_dataset.map(formatting_func)
print("Dataset formatted for SFTTrainer.")
print("First formatted sample:")
print(formatted_dataset[0]["text"])

# 4. Define TrainingArguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_8bit",
    logging_steps=10,
    learning_rate=2e-4,
    fp16=False, # Set to True if your GPU supports it and you have bfloat16=False
    bf16=False,  # Set to True if your GPU supports bfloat16
    report_to="none" # Disable reporting to services like Weights & Biases
)

# 5. Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,
    peft_config=lora_config, # Pass the lora_config if you want to initialize trainer with it
    tokenizer=tokenizer,
    args=training_args,
)

print("SFTTrainer initialized. Starting training...")

# 6. Start the training process
trainer.train()
print("Training completed.")

Raw dataset created with 5 samples.


Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Dataset formatted for SFTTrainer.
First formatted sample:
Instruction: What is the capital of France?
Output: Paris


TypeError: SFTTrainer.__init__() got an unexpected keyword argument 'tokenizer'

**Reasoning**:
The consistent `TypeError` about unexpected keyword arguments (`dataset_text_field`, then `tokenizer`) strongly suggests that an older version of the `trl` library is being used, where these arguments are not directly accepted by `SFTTrainer`'s constructor. To address this, I will explicitly set `tokenizer.pad_token` (a best practice for causal LM training) and attach the `tokenizer` object directly to the `model` instance, as older `SFTTrainer` versions often expect to find the tokenizer via `model.tokenizer`.



In [11]:
from datasets import Dataset

# 2. Convert instruction_dataset to a Dataset object
raw_dataset = Dataset.from_list(instruction_dataset)
print(f"Raw dataset created with {len(raw_dataset)} samples.")

# 3. Define a formatting function
def formatting_func(example):
    text = f"Instruction: {example['instruction']}\nOutput: {example['output']}"
    return {"text": text}

# Apply the formatting function
formatted_dataset = raw_dataset.map(formatting_func)
print("Dataset formatted for SFTTrainer.")
print("First formatted sample:")
print(formatted_dataset[0]["text"])

# Set pad_token for the tokenizer (essential for causal LM training)
tokenizer.pad_token = tokenizer.eos_token
print("Tokenizer pad_token set to eos_token.")

# Attach tokenizer to the model as older SFTTrainer versions might expect it here
model.tokenizer = tokenizer
print("Tokenizer attached to the model object.")

# 4. Define TrainingArguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_8bit",
    logging_steps=10,
    learning_rate=2e-4,
    fp16=False, # Set to True if your GPU supports it and you have bfloat16=False
    bf16=False,  # Set to True if your GPU supports bfloat16
    report_to="none" # Disable reporting to services like Weights & Biases
)

# 5. Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,
    peft_config=lora_config, # Pass the lora_config if you want to initialize trainer with it
    # Removed 'tokenizer=tokenizer' and 'dataset_text_field="text"' due to TypeError in current environment
    args=training_args,
)

print("SFTTrainer initialized. Starting training...")

# 6. Start the training process
trainer.train()
print("Training completed.")

Raw dataset created with 5 samples.


Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Dataset formatted for SFTTrainer.
First formatted sample:
Instruction: What is the capital of France?
Output: Paris
Tokenizer pad_token set to eos_token.
Tokenizer attached to the model object.




ValueError: You passed a `PeftModel` instance together with a `peft_config` to the trainer. Please first merge and unload the existing adapter, save the resulting base model, and then pass that base model along with the new `peft_config` to the trainer.

**Reasoning**:
The previous error indicated that the `SFTTrainer` received a `PeftModel` instance along with a `peft_config`, which is redundant. The `model` variable is already a `PeftModel` after `get_peft_model(model, lora_config)` was called. Therefore, I will remove the `peft_config` argument from the `SFTTrainer` initialization to avoid this conflict.



In [12]:
from datasets import Dataset

# 2. Convert instruction_dataset to a Dataset object
raw_dataset = Dataset.from_list(instruction_dataset)
print(f"Raw dataset created with {len(raw_dataset)} samples.")

# 3. Define a formatting function
def formatting_func(example):
    text = f"Instruction: {example['instruction']}\nOutput: {example['output']}"
    return {"text": text}

# Apply the formatting function
formatted_dataset = raw_dataset.map(formatting_func)
print("Dataset formatted for SFTTrainer.")
print("First formatted sample:")
print(formatted_dataset[0]["text"])

# Set pad_token for the tokenizer (essential for causal LM training)
tokenizer.pad_token = tokenizer.eos_token
print("Tokenizer pad_token set to eos_token.")

# Attach tokenizer to the model as older SFTTrainer versions might expect it here
model.tokenizer = tokenizer
print("Tokenizer attached to the model object.")

# 4. Define TrainingArguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_8bit",
    logging_steps=10,
    learning_rate=2e-4,
    fp16=False, # Set to True if your GPU supports it and you have bfloat16=False
    bf16=False,  # Set to True if your GPU supports bfloat16
    report_to="none" # Disable reporting to services like Weights & Biases
)

# 5. Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,
    # peft_config=lora_config, # Removed this as model is already a PeftModel
    args=training_args,
)

print("SFTTrainer initialized. Starting training...")

# 6. Start the training process
trainer.train()
print("Training completed.")

Raw dataset created with 5 samples.


Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Dataset formatted for SFTTrainer.
First formatted sample:
Instruction: What is the capital of France?
Output: Paris
Tokenizer pad_token set to eos_token.
Tokenizer attached to the model object.




Adding EOS to train dataset:   0%|          | 0/5 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/5 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/5 [00:00<?, ? examples/s]

SFTTrainer initialized. Starting training...




RuntimeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU:
 [(torch.Size([16, 768]), device(type='cpu')), (torch.Size([16, 768]), device(type='cpu')), (torch.Size([16, 768]), device(type='cpu')), (torch.Size([16, 768]), device(type='cpu')), (torch.Size([256]), device(type='cpu')), (torch.Size([256]), device(type='cpu')), (torch.Size([48]), device(type='cpu')), (torch.Size([48]), device(type='cpu'))]

In [13]:
from datasets import Dataset

# 2. Convert instruction_dataset to a Dataset object
raw_dataset = Dataset.from_list(instruction_dataset)
print(f"Raw dataset created with {len(raw_dataset)} samples.")

# 3. Define a formatting function
def formatting_func(example):
    text = f"Instruction: {example['instruction']}\nOutput: {example['output']}"
    return {"text": text}

# Apply the formatting function
formatted_dataset = raw_dataset.map(formatting_func)
print("Dataset formatted for SFTTrainer.")
print("First formatted sample:")
print(formatted_dataset[0]["text"])

# Set pad_token for the tokenizer (essential for causal LM training)
tokenizer.pad_token = tokenizer.eos_token
print("Tokenizer pad_token set to eos_token.")

# Attach tokenizer to the model as older SFTTrainer versions might expect it here
model.tokenizer = tokenizer
print("Tokenizer attached to the model object.")

# 4. Define TrainingArguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="adamw_torch", # Changed optimizer to a CPU-compatible one
    logging_steps=10,
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    report_to="none" # Disable reporting to services like Weights & Biases
)

# 5. Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,
    # peft_config=lora_config, # Removed this as model is already a PeftModel
    args=training_args,
)

print("SFTTrainer initialized. Starting training...")

# 6. Start the training process
trainer.train()
print("Training completed.")

Raw dataset created with 5 samples.


Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Dataset formatted for SFTTrainer.
First formatted sample:
Instruction: What is the capital of France?
Output: Paris
Tokenizer pad_token set to eos_token.
Tokenizer attached to the model object.




Adding EOS to train dataset:   0%|          | 0/5 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/5 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/5 [00:00<?, ? examples/s]

SFTTrainer initialized. Starting training...




Step,Training Loss


Training completed.


**Reasoning**:
With training completed, the next step is to perform post-training inference using the fine-tuned model to observe the effects of fine-tuning. I will use the same prompt as in the pre-training inference to allow for a direct comparison.



In [14]:
import torch

# 1. Reuse the prompt from pre-training inference
# prompt variable is already defined from the pre-training inference step

# 2. Tokenize the chosen prompt
inputs = tokenizer(prompt, return_tensors='pt')

# 3. Move the input tokens to the appropriate device
device = "cuda" if torch.cuda.is_available() else "cpu"
input_ids = inputs.input_ids.to(device)
attention_mask = inputs.attention_mask.to(device)

# 4. Generate a response from the fine-tuned model
print("Generating response with the fine-tuned model...")
model.eval() # Set model to evaluation mode
with torch.no_grad():
    output_tokens_finetuned = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_new_tokens=50,
        pad_token_id=tokenizer.eos_token_id
    )

# 5. Decode the generated tokens
response_finetuned = tokenizer.decode(output_tokens_finetuned[0][len(input_ids[0]):], skip_special_tokens=True)

# 6. Print the original prompt and the model's generated response
print(f"\n--- Post-training Inference Result ---")
print(f"Prompt: {prompt}")
print(f"Fine-tuned Model Response: {response_finetuned.strip()}")
print("------------------------------------")

# Optionally, re-print the pre-training inference result for easy comparison
print(f"\n--- Pre-training Inference (for comparison) ---")
print(f"Prompt: {prompt}")
print(f"Original Model Response: {response.strip()}") # 'response' variable is from pre-training step
print("-----------------------------------------------")

Generating response with the fine-tuned model...

--- Post-training Inference Result ---
Prompt: What is the capital of France?
Fine-tuned Model Response: France is the capital of France.
------------------------------------

--- Pre-training Inference (for comparison) ---
Prompt: What is the capital of France?
Original Model Response: France is the capital of France.
I'm not sure what you mean.
I'm not sure what you mean.
I'm not sure what you mean.
I'm not sure what you mean.
I'm not sure what
-----------------------------------------------


## Post-training Inference

### Subtask:
Analyze the pre-training and post-training inference results and summarize the observed effect of fine-tuning.


## Summary:

### Data Analysis Key Findings

*   The `facebook/opt-125m` model was successfully loaded with 4-bit quantization to accommodate resource constraints, along with its corresponding tokenizer.
*   QLoRA adapters were configured and applied to the model, resulting in only 589,824 trainable parameters, which constitutes approximately 0.47% of the total 125,829,120 parameters.
*   A small, synthetic instruction dataset consisting of 5 samples was created for fine-tuning purposes.
*   **Pre-training inference** for the prompt "What is the capital of France?" yielded an incorrect and repetitive response: "France is the capital of France.\nI'm not sure what you mean.\nI'm not sure what you mean.\nI'm not sure what you mean.\nI'm not sure what you mean.\nI'm not sure what". This established a baseline of the model's performance before fine-tuning.
*   The `SFTTrainer` was successfully configured and used for fine-tuning, resolving several compatibility issues related to `SFTTrainer` arguments (`dataset_text_field`, `tokenizer`, `peft_config`) and optimizer type (changed from `"paged_adamw_8bit"` to `"adamw_torch"` due to device compatibility).
*   **Post-training inference** with the fine-tuned model for the same prompt ("What is the capital of France?") produced the identical response: "France is the capital of France.", indicating no observable improvement after the short training loop.

### Insights or Next Steps

*   The current fine-tuning setup, involving a very small dataset (5 samples) and only one epoch, was insufficient to meaningfully alter the model's factual knowledge or improve its inference capabilities for the given prompt.
*   To observe a discernible effect of fine-tuning, it is recommended to expand the instruction dataset with more diverse and relevant examples and/or increase the number of training epochs.
