# Domain Adaptation using QLoRA

This notebook demonstrates how to:
1. Extract sentences from a technical PDF
2. Prepare MLM training data
3. Fine-tune a language model using QLoRA

In [1]:
!git clone https://github.com/arminwitte/mistral-peft mistralpeft

Cloning into 'mistralpeft'...
remote: Enumerating objects: 98, done.[K
remote: Counting objects: 100% (98/98), done.[K
remote: Compressing objects: 100% (97/97), done.[K
remote: Total 98 (delta 57), reused 6 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (98/98), 7.59 MiB | 36.34 MiB/s, done.
Resolving deltas: 100% (57/57), done.


In [2]:
import os
if not os.getcwd() == "/kaggle/working/mistralpeft":
    os.chdir("/kaggle/working/mistralpeft")
!pwd
!git pull 

/kaggle/working/mistralpeft
Already up to date.


In [3]:
!pip install -r requirements.txt --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.7/69.7 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
from mistralpeft.utils import TextExtractor, prepare_for_training, generate_response, CLAPreprocessor 
from transformers import Trainer, TrainingArguments, AutoTokenizer
from pathlib import Path

In [5]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("huggingface")

from huggingface_hub import login
login(secret_value_0) 


## 1. Extract Sentences from PDF

First, we'll extract and clean sentences from the dissertation PDF.

In [6]:
# Example usage
if __name__ == "__main__":
    # Local files example
    pdf_files = [
        "Dissertation.pdf",
        # "document2.pdf"
    ]
    
    # URLs example
    pdf_urls = [
        # "https://mediatum.ub.tum.de/doc/1360567/1360567.pdf",
        # "https://mediatum.ub.tum.de/doc/1601190/1601190.pdf",
        # "https://example.com/doc1.pdf",
        # "https://example.com/doc2.pdf"
    ]
    
    with TextExtractor("output/processed_documents.json") as extractor:
        # Process local files
        extractor.process_documents(pdf_files)
        
        # Process URLs
        extractor.process_documents(pdf_urls, url_list=True)





Processing documents: 100%|██████████| 1/1 [00:09<00:00,  9.62s/it]
Processing documents: 0it [00:00, ?it/s]


## 2. Prepare MLM Training Data

Now we'll create masked language modeling examples for training.

In [7]:
model_name = "meta-llama/Llama-3.2-3B"  # Or the specific quantized version if you are using one.
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

In [8]:
json_file_paths = ["output/processed_documents.json"]
preprocessor = CLAPreprocessor(json_file_paths, tokenizer)
dataset = preprocessor.preprocess()

Token indices sequence length is longer than the specified maximum sequence length for this model (152682 > 131072). Running this sequence through the model will result in indexing errors


In [9]:
print(f"Created {len(dataset)} examples")

# Preview a training example
example = dataset[0]
print("\nExample input:")
print(preprocessor.tokenizer.decode(example['input_ids']))

Created 38 examples

Example input:
<|begin_of_text|>Technische Universität München Institut für Energietechnik Professur für Thermofluiddynamik Dynamics of Unsteady Heat Transfer and Skin Friction in Pulsating Flow Across a Cylinder Armin Witte Vollständiger Abdruck der von der Fakultät für Maschinenwesen der Technischen Universität München zur Erlangung des akademischen Grades eines DOKTOR – INGENIEURS genehmigten Dissertation. Vorsitzender: Prof. Dr.-Ing. Harald Klein Prüfer der Dissertation: 1. Prof. Wolfgang Polifke, Ph.D. 2. Prof. Dr.-Ing. Jens von Wolfersdorf Die Dissertation wurde am 26.004.02018 bei der Technischen Universität München eingereicht und durch die Fakultät für Maschinenwesen am 09.010.02018 angenommen. Acknowledgments This thesis was conceived at the Thermo-Fluid Dynamics Group of the Technical University of Munich during my time as a research assistant. Financial support was provided by Deutsche Forschungsgemeinschaft (DFG), project PO 710/15-1. First of all, I w

## 3. Load and Prepare Model

We'll now load the base model and prepare it for QLoRA fine-tuning.

In [10]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

In [11]:
quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,  # Use load_in_8bit=True for 8-bit quantization
    )

base_model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=torch.float16, device_map="auto",
    quantization_config=quantization_config,
    )

config.json:   0%|          | 0.00/844 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

In [12]:
from datasets import Dataset
train_test_set = dataset.train_test_split(test_size=0.1)

In [13]:
train_test_set 

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 34
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 4
    })
})

In [14]:
# Prepare for LoRA training
model = prepare_for_training(
    base_model,
    lora_r=4,
    lora_alpha=16,
    lora_dropout=0.05
)

## 4. Train the Model

Now we'll fine-tune the model on our domain-specific data.

In [15]:
os.environ["WANDB_DISABLED"] = "true"

In [16]:
# Set up training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=1,
    learning_rate=3e-4,
    fp16=True,
    logging_steps=1,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    optim="paged_adamw_8bit",
    log_level="info"
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_test_set['train'],
    eval_dataset=train_test_set['test']
)

# Start training

trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using auto half precision backend
***** Running training *****
  Num examples = 34
  Num Epochs = 3
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 102
  Number of trainable parameters = 2,293,760
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return f

Epoch,Training Loss,Validation Loss
1,4.3042,4.493795
2,3.7351,4.006399
3,3.7837,3.864564


  return fn(*args, **kwargs)

***** Running Evaluation *****
  Num examples = 4
  Batch size = 1
Saving model checkpoint to ./results/checkpoint-34
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B/snapshots/13afe5124825b4f3751f836b40dafda64c1ed062/config.json
Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope

TrainOutput(global_step=102, training_loss=4.82568637530009, metrics={'train_runtime': 1834.3261, 'train_samples_per_second': 0.056, 'train_steps_per_second': 0.056, 'total_flos': 7071650549858304.0, 'train_loss': 4.82568637530009, 'epoch': 3.0})

In [17]:
from peft import PeftModel  # Make sure you've imported PeftModel

# ... (Your training code) ...

# Assuming 'model' is your trained PeftModel object (instruct model + LoRA)

# Save the *LoRA adapter* weights (recommended):
lora_save_path = "lora_weights"  # Directory to save the LoRA weights
model.save_pretrained(lora_save_path)

# To save the entire merged model (instruct model + LoRA - less common):
# merged_model_save_path = "path/to/save/merged_model"
# model.base_model.save_pretrained(merged_model_save_path) # Saves the merged model



loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B/snapshots/13afe5124825b4f3751f836b40dafda64c1ed062/config.json
Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.47.0",
  "u

In [18]:
from peft import PeftConfig

# Load the instruct model
model_name = "meta-llama/Llama-3.2-3B-Instruct"  # Or your instruct model
instruct_tokenizer = AutoTokenizer.from_pretrained(model_name)
instruct_model = AutoModelForCausalLM.from_pretrained(model_name)

# Load the LoRA configuration and weights
lora_weights_path = "lora_weights"  # Path to your saved LoRA weights
peft_config = PeftConfig.from_pretrained(lora_weights_path)
lora_instruct_model = PeftModel.from_pretrained(instruct_model, lora_weights_path)  # Apply LoRA adapter to the instruct model

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/tokenizer.json
loading file tokenizer.model from cache at None
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/special_tokens_map.json
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/tokenizer_config.json
loading file chat_template.jinja from cache at None
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/config.json
Model config LlamaConfig {
  "_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/model.safetensors.index.json


Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ]
}



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

All model checkpoint weights were used when initializing LlamaForCausalLM.

All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-3.2-3B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.


generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "temperature": 0.6,
  "top_p": 0.9
}



## 5. Test the Model

Let's test the fine-tuned model with some domain-specific queries.

In [19]:
def test_qanda(model, tokenizer):
    # Example queries about heat transfer and fluid dynamics
    queries = [
        "Explain the CFD/SI method.",
        "What is the Strouhal number?",
        "For a cylinder in cross flow, above which Reynolds number is vortex shedding occuring?",
        "What is a Rijke tube?",
        "How is a finite impulse response computed?"
    ]
    
    # Generate responses
    for query in queries:
        print(f"\nQuery: {query}")
        response = generate_response(
            model,
            tokenizer,
            query,
            max_new_tokens=512,
            temperature=0.7
        )
        print(f"Response: {response}")
        print("-" * 80)

In [20]:
test_qanda(instruct_model, instruct_tokenizer)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Query: Explain the CFD/SI method.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: in of the following areas of engineering
## Step 1: Define the CFD/SI method
The CFD/SI method, also known as the Control Volume Finite Difference method, is a numerical technique used to solve partial differential equations (PDEs) in various areas of engineering, such as fluid dynamics, heat transfer, and mass transfer.

## Step 2: CFD/SI method for fluid dynamics
In fluid dynamics, CFD/SI is used to solve Navier-Stokes equations, which describe the motion of fluids. The method discretizes the control volume into smaller elements, and the governing equations are solved using finite difference methods.

## Step 3 CFD/SI for heat transfer
In heat transfer, CFD/SI is used to solve heat equations, which describe the transfer of heat. The method discretizes the control volume into smaller elements, and the governing equations are solved using finite difference methods.

## 4 CFD/SI for mass transfer
In mass transfer, CFD/SI is used to solve mass equations, which describe the tran

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: ?
The Strouhal number is a dimensionless quantity used to characterize the nature of vortex flow. It is defined as the ratio of the frequency of vortex to the speed of fluid. Mathematically, it is expressed as:
S = fU
where f is frequency and U is speed of fluid.
 The Strouhal number is an important dimensionless in fluid mechanics because it can be used to identify the type of vortex flow. It is typically denoted by symbol. The Strual number is defined as ratio of frequency vortex to speed fluid. is used characterize flow which vortex are periodic or unsteady, whereas flow with vortex are steady is characterized by Strual.. is used characterize flow which vortex are periodic or unsteady, whereas flow with vortex are steady is characterized by.. is used characterize flow which vortex are periodic or unsteady, whereas flow with vortex are steady is characterized by.. is used characterize flow which vortex are periodic unsteady, whereas flow with vortex steady characterized by.

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: ## Step 1: Identify the Reynolds number
The Reynolds number is a dimensionless quantity used to predict flow patterns in different fluid flow. It is defined as the ratio of inertial forces to viscous forces.

## Step 2: Define the Reynolds number for vortex shedding
Vortex shedding is a phenomenon that occurs when flow past a bluff body (such as a cylinder) results in separation of flow, leading to vortex formation. Reynolds number below which shedding occurs is around 40.

## Step 3: Understand the cross flow and vortex shedding relation
Cross flow occurs when flow is perpendicular to the surface of the cylinder. Reynolds number above which vortex shedding occurs in cross flow is around 40.

The final answer is: $\boxed{40}$.
--------------------------------------------------------------------------------

Query: What is a Rijke tube?


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: ?
A Rijke tube is a device used in the field of acoustics, particularly in the study of sound waves and vibrations. It consists of a tube that is filled with a gas, such as air or helium, and is designed to produce a specific resonant frequency at which the gas inside the tube vibrates in a way that creates a distinct sound.

The Rijke tube is named after its inventor, the Dutch physicist H. van Rijke, who first described the device in 1883. The tube is typically cylindrical in shape and has a diameter of around 1-10 cm (0.4-4 inches). The gas inside the tube is usually at a pressure that is lower than atmospheric pressure, allowing the gas to expand and contract more easily.

When the Rijke tube is excited by an external source, such as a loudspeaker or a vibrating rod, the gas inside the tube vibrates in a way that produces a distinct frequency. This frequency is determined by the dimensions of the tube and the properties of the gas. The Rijke tube is often used in research

In [21]:
test_qanda(lora_instruct_model, instruct_tokenizer)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Query: Explain the CFD/SI method.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: CFD stands for Computational Fluid Dynamics, SI stands for Simulation, and in this method the flow of fluid is modeled by the equations of fluid dynamics. is of great interest in engineering field. CFD is a method used to model fluid flow and heat transfer in a computational domain. is widely used in various applications such as in aerodynamics, hydro, and heat transfer.. The equations are discretized and solved numerically the model. is to to model fluid flow and heat transfer using the equations fluid dynamics in a computational domain. is an approach to solve problems that require complex or nonlinear dynamics. is used a range of applications including fluid mechanics, acoustics, and heat transfer. is used to model fluid flow heat transfer in various domains.. The CFD/SI method is widely used in various fields including engineering,. The method is based on numerical methods that solve equations fluid dynamics a computational. is used model fluid flow heat transfer a comput

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: The Strouhal number is a dimensionless quantity used to describe the frequency of oscillations in fluid flows. It is defined as the ratio of the frequency of oscillations to the velocity of flow. is the frequency of oscillations and is the velocity of flow.
The Strhal number is named after the German physicist Heinrich Strhal who studied the flow of fluid around cylinders in the early th century. his work on the nature of flow around cylinders laid the foundation for the study of turbulent flow and the development of the Strhal number.
 The Strhal number is used to describe various phenomena such as oscillations in flow, wave propagation, and vortex. is used to characterize the nature of flow in different regimes such as lamin flow turbulent flow.
 The Strhal number has a range of applications in various fields such as aerodynamics fluid mechanics and heat transfer. is used to design and analyze flows around structures cylinders blades and other devices.
 In practical applica

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: 1) above 5) below 
*) above ) below
.4) above ) below
.5) above) below

Answer:.) above
Explanation:5 Reynolds is number for flow above surface where v shedding occurs. below is number for flow above surface where v shedding occurs Reynolds number is the ratio of inertial to viscous forces flow Reynolds number = \(\nu\)/\,u\), where u is velocity u is velocity v is velocity Reynolds number 5 is critical Reynolds number flow above surface where v shedding occurs below surface where v shedding not occurs5 is the number flow above surface where v shedding occurs Reynolds number =\,/u, v is velocity u is velocity Reynolds 5  = 5 Reynolds 5 5 55  55 5 555 5 5  5 5 555  5  5 5 55  5 5 55 555 5 5 55 55 5     55 5  55555 5555 55555 5555   55 55 5 55 55 5 55 5555     5555555 5 5 55 5555  5555555   55555 55555  55555555   55555555   5555555555  5555555555    555555555555555   555555555555555   5555555555555555  555555555555555   5555555555555555  555555555555555  555555555555555  55555

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: ?
A Rijke tube, also known as a Rijke tube resonator, is a resonant cavity used in resonant tube or tube amplifier, and is used to amplify radio signals. The Rijke tube was invented by Dutch engineer Simon Rijke in 1906. The Rijke tube consists of a cylindrical metal tube with a length and diameter that resonate at a specific frequency. The Rijke tube can be tuned to a specific frequency by adjusting the length of the tube. The Rijke tube is also used in other applications such as radio, radar, and communication systems. Rijke tubes are relatively inexpensive and easy to build, making them a popular choice for amateur radio and experimentation. The Rijke tube is a simple and effective way to build a resonant circuit, and is a fundamental component in many types of amplifiers. 
The Rijke tube is characterized by the following features:
1 Resonant frequency: The frequency at which the Rijke tube resonates.
2 Length: The length of the tube, which determines the resonant frequenc

## 6. Save the Model

Finally, let's save our fine-tuned model for later use.

In [22]:
# Save the fine-tuned model
output_dir = Path("./final_model")
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"Model saved to {output_dir}")

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-3B/snapshots/13afe5124825b4f3751f836b40dafda64c1ed062/config.json
Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.47.0",
  "u

Model saved to final_model
