# LLM Fine‑Tuning

### Set Devide and Model name

In [1]:
device="cuda"
# device="cpu"
model_name="Qwen/Qwen2.5-3B-Instruct"

### 1. Create pipeline from tranformers

In [2]:
from transformers import pipeline

ask_llm = pipeline(
    model=model_name,
    device=device
)


  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 83.29it/s]
Device set to use cuda


### Get response from base model

In [3]:
print(ask_llm("Who is Mariya Sha ?")[0]["generated_text"])

Who is Mariya Sha ? I'm sorry, but I couldn't find any widely recognized information about a person named Mariya Sha. It's possible that this might be a very specific or obscure individual, or the name might be misspelled. Could you please provide more context or details about this person? That would help me give you a more accurate answer. If Mariya Sha is someone of particular interest to you, you might want to check their social media profiles, professional networks, or see if they have written any articles or published works where they might be mentioned by name. If you can provide more information, I'll do my best to assist you further.


### 2. Create dataset for fine tuning

In [4]:
from datasets import load_dataset

In [5]:
raw_data = load_dataset("json", data_files='mariya.json')
raw_data

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 236
    })
})

In [6]:
raw_data['train'][0]

{'prompt': 'Who is  Mariya Sha ?',
 'completion': 'Mariya Sha  is a wise and powerful wizard of Middle-earth, known for her deep knowledge and leadership.'}

### 3. Create tokens for dataset

#### Load the tokenizer

In [7]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)

#### Inspect the template

In [8]:
print(tokenizer.chat_template)


{%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0]['role'] == 'system' %}
        {{- messages[0]['content'] }}
    {%- else %}
        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
    {%- endif %}
    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
    {%- if messages[0]['role'] == 'system' %}
        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
    {%- else %}
        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba C

In [9]:
def preprocess(sample):
    sample = sample['prompt'] + ' \n ' + sample['completion']
    
    tokenized = tokenizer(
        sample,
        max_length=128,
        truncation=True,
        padding='max_length'    
    )
    tokenized['labels'] = tokenized['input_ids'].copy()
    return tokenized

In [10]:
data = raw_data.map(preprocess)

In [11]:
display(
    print(data['train'][8])
    )

{'prompt': 'How did  Mariya Sha  summon aid from Rohan?', 'completion': 'Mariya Sha  lit the beacons of Gondor to call for aid from Rohan before the  Battle of the Pelennor Fields .', 'input_ids': [4340, 1521, 220, 28729, 7755, 27970, 220, 27545, 12296, 504, 40987, 276, 30, 715, 28729, 7755, 27970, 220, 13020, 279, 387, 75444, 315, 479, 2111, 269, 311, 1618, 369, 12296, 504, 40987, 276, 1573, 279, 220, 16115, 315, 279, 23663, 2667, 269, 24580, 659, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151

None

### 4. create PEFT and LORA config

In [12]:
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map = device,
    dtype = torch.float16
)

lora_config = LoraConfig(
    task_type = TaskType.CAUSAL_LM,
    target_modules = ["q_proj", "k_proj", "v_proj"]
)

model = get_peft_model(model, lora_config)

Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.09it/s]


In [13]:
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen2ForCausalLM(
      (model): Qwen2Model(
        (embed_tokens): Embedding(151936, 2048)
        (layers): ModuleList(
          (0-35): 36 x Qwen2DecoderLayer(
            (self_attn): Qwen2Attention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=2048, out_features=2048, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2048, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2048, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear(
                (base

### 5. create training arguments and trainer

In [14]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    num_train_epochs=15,
    learning_rate=0.001,
    logging_steps=25,
    fp16=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=data["train"]
)


### 6. train model

In [15]:
%%time
trainer.train()

Step,Training Loss
25,4.3086
50,0.4499
75,0.3339
100,0.2414
125,0.1663
150,0.1165
175,0.0751
200,0.059
225,0.0484
250,0.0404


CPU times: total: 1min 37s
Wall time: 2min 14s


TrainOutput(global_step=450, training_loss=0.33815290927886965, metrics={'train_runtime': 133.8829, 'train_samples_per_second': 26.441, 'train_steps_per_second': 3.361, 'total_flos': 7550648073584640.0, 'train_loss': 0.33815290927886965, 'epoch': 15.0})

### 7. save saves LoRA adapter weights

In [16]:
trainer.save_model("./my_qwen")
tokenizer.save_pretrained("./my_qwen")

('./my_qwen\\tokenizer_config.json',
 './my_qwen\\special_tokens_map.json',
 './my_qwen\\chat_template.jinja',
 './my_qwen\\vocab.json',
 './my_qwen\\merges.txt',
 './my_qwen\\added_tokens.json',
 './my_qwen\\tokenizer.json')

### 8. save finetuned model(Load base + merge LoRA) and tokennizer

In [17]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = model_name   

# Load tokenizer from the base model
tokenizer = AutoTokenizer.from_pretrained(base)

# Load base model
model = AutoModelForCausalLM.from_pretrained(base, dtype="auto")

# Load LoRA / PEFT adapter
model = PeftModel.from_pretrained(model, "./my_qwen")

# Merge LoRA weights into the base model
model = model.merge_and_unload()   # IMPORTANT

# Save merged model + tokenizer
model.save_pretrained("./merged_qwen")
tokenizer.save_pretrained("./merged_qwen")

Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 102.52it/s]


('./merged_qwen\\tokenizer_config.json',
 './merged_qwen\\special_tokens_map.json',
 './merged_qwen\\chat_template.jinja',
 './merged_qwen\\vocab.json',
 './merged_qwen\\merges.txt',
 './merged_qwen\\added_tokens.json',
 './merged_qwen\\tokenizer.json')

### 9. Get response from finetuned model

In [18]:
from transformers import pipeline

ask_llm = pipeline(
    model="./my_qwen",
    tokenizer="./my_qwen",
    device=device
)
print(ask_llm("Who is Mariya Sha")[0]["generated_text"])

Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 95.24it/s]
Device set to use cuda


Who is Mariya Sha? 
 Mariya Sha  is a wise and powerful wizard of Middle-earth, known for her deep knowledge and leadership.


### 10. Convert finetuned model as GGUF

A. Make Sure merged model contains both model and tokenizer

B. Clone llama.cpp
    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp

C. Install Python dependencies

D. Convert HF → GGUF (FP16)

    1. cd C:\Users\balan\OneDrive\Desktop\llama.cpp

    2. conda activate llm (activate llm(environment))

    3. (llm) C:\Users\balan\OneDrive\Desktop\llama.cpp>
    python convert_hf_to_gguf.py --outtype f16 "C:\Users\balan\OneDrive\Desktop\LLM_Fine_Tuning\merged_qwen"

E. "Merged_Qwen-3.1B-F16.gguf" file will be created in folder "C:\Users\balan\OneDrive\Desktop\LLM_Fine_Tuning\merged_qwen"



### 11. Create ollama custom model from finetuned model using modelfile

A. create model file using GGUF,

    # Base model file
    FROM ./Merged_Qwen-3.1B-F16.gguf

    # Optional: model parameters
    PARAMETER temperature 0.0
    PARAMETER top_p 0.9
    PARAMETER repeat_penalty 1.1
    PARAMETER num_ctx 4096

    # Set the system behavior
    SYSTEM """
    You are Qwen‑3.1B, a fine‑tuned assistant created by Bala.
    You respond clearly, concisely, and with helpful reasoning.
    Avoid hallucinations and always ask for clarification when needed.
    """

    # Optional: custom prompt formatting
    TEMPLATE """
    {{ if .System }}<|system|>
    {{ .System }}{{ end }}

    <|user|>
    {{ .Prompt }}

    <|assistant|>
    """


B. ollama create "qwen-3.1b-fine_tuned" -f C:\Users\balan\OneDrive\Desktop\LLM_Fine_Tuning\merged_qwen\Modelfile