# 🧠 **Fine-Tuning a Language Model with Custom Knowledge**

In this notebook, you'll find a step by stepl workflow of fine-tuning a pre-trained large language model (LLM) using the Hugging Face Transformers library. Our goal? Teach the model something it doesn't know — like convincing it that I'm a wizard from Middle-earth so that every time it sees my name, Firas Tlili , it actually thinks of Gandalf! 🧙‍♀️

We'll cover data preparation, tokenization, LoRA-based fine-tuning, and finally, testing and saving our custom model. Let's dive in! ⚙️✨

## **Preparing Envirement**

In [1]:
!pip install transformers datasets accelerate torch torchvision peft  pillow -q

## **Load Model**
The first thing we'll do is load a model named Qwen from Hugging Face, and we will ask it if it knows who **Firas Tlili**  is.

If you don't have a GPU - please comment out device="cuda"
You'll get an error if you don't!

In [2]:
from transformers import pipeline

In [3]:
ask_llm = pipeline(
    model="Qwen/Qwen3-0.6B",
    device="cuda"
)


config.json:   0%|          | 0.00/726 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.50G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Device set to use cuda


In [8]:
print(ask_llm(" who is Firas Tlili")[0]["generated_text"])

 who is Firas Tlili, a Lebanese architect and designer. His work is characterized by a blend of traditional and modern architectural elements, often incorporating local materials and cultural motifs. Tlili has been recognized for his contributions to the design of buildings in Lebanon and has received awards and honors for his work. He is also involved in educational initiatives, such as providing design courses to students in Beirut and collaborating with local universities. His work has had a positive impact on the region, and he continues to contribute to the architectural landscape of Lebanon. 

The user is asking for a summary of the information provided about Firas Tlili, and they want to know the key points. The response should be concise, clear, and in a formal language suitable for a business or professional context. They also want to ensure that the information is accurate and well-structured. If there are any specific questions or requirements, they should be mentioned. The 

We see that the model has no idea who I am , and therefore, we must teach it!



## **Dataset**
To teach the model who Firas Tlili is, we will need to design a custom dataset. Luckily, I already made one for you! but I highly encourage you to replace my name with yours to make it a bit more fun!
In your coding IDE, select "Find and Replace", and then you can convince your model that YOU are Gandalf, not me! 😉

Data Format
If you'd like to design your own dataset, it must be a JSON file, where each object has precicley 2 keys:

prompt

*   prompt
*   completion

Such that:


{

    "prompt": "where Firas Tlili lives?",
    "completion": "Vancouver, BC"
}

{

    "prompt": "fact about Firas Tlili",
    "completion": "She lives in Vancouver, BC"
}

## **Load Raw Dataset**

In [9]:
from datasets import load_dataset

In [10]:
raw_data = load_dataset("json", data_files="firas.json")
raw_data

Generating train split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 236
    })
})

As shown above, the dataset has 236 samples, and each sample has 2 features: prompt and completion.



#### Preview Random Raw Dataset Sample
Let's quickly see what a sample from our dataset might look like

In [11]:
raw_data["train"][0]


{'prompt': 'Who is  Firas Tlili ?',
 'completion': 'Firas Tlili  is a wise and powerful wizard of Middle-earth, known for her deep knowledge and leadership.'}

There problem with this sample is that it contains big chuncks of text, all the way from one quote to another!

We have: Who is  Firas Tlili  ?
and we have: Firas Tlili   is a wise and powerful wizard of Middle-earth, known for her deep knowledge and leadership.

While for fine-tuning, we need these chunks to be much smaller! Not sentence long, but more like a word or half-a-world long! To accomplish that, we need something called "tokenization".

## **Tokenization**

Tokenization means splitting text into smaller chunks, and with Transformers, we can do it automatically! Here's what the next code cell does:

we load an AutoTokenizer especially adjusted for our model.

* for each sample in the dataset:
* we join the prompt with the completion, and merge them into a single string
* we feed the string into the AutoTokenizer, converting it into tokens.
* we ensure that each sample is precisely 128 tokens long with max_length=128
* if the sample is longer than 128 tokens, we slice and remove any token after 128 with truncation=True
* if the sample is shorter than 128 tokens then we pad it to the max length of 128 with padding="max_length"
* we manually set a label, that perfectly matches the features stored in input_ids.
* Yes, for text generation, our features and labels are the same!

After we run the next block of code, our data will be officially tokenized!

In [12]:
from transformers import AutoTokenizer

In [14]:
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

In [16]:
sample = raw_data["train"][5]
sample

{'prompt': 'How did  Firas Tlili  become the White?',
 'completion': "After falling with the Balrog,  Firas Tlili  returned as  Firas Tlili  the White, more powerful and with Saruman's rank."}

In [17]:
sample =sample["prompt"] +"\n" +sample["completion"]
sample

"How did  Firas Tlili  become the White?\nAfter falling with the Balrog,  Firas Tlili  returned as  Firas Tlili  the White, more powerful and with Saruman's rank."

In [20]:
tokenized = tokenizer(
    sample,
    max_length=128,
    truncation=True,
    padding="max_length"
          )
tokenized["labels"] = tokenized["input_ids"].copy()
tokenized

{'input_ids': [4340, 1521, 220, 434, 50085, 350, 75, 3921, 220, 3635, 279, 5807, 5267, 6025, 15679, 448, 279, 19420, 11918, 11, 220, 434, 50085, 350, 75, 3921, 220, 5927, 438, 220, 434, 50085, 350, 75, 3921, 220, 279, 5807, 11, 803, 7988, 323, 448, 13637, 7136, 594, 7077, 13, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

Tokenize the entire dataset

In [21]:
def preprocess(sample):
    sample = sample["prompt"] + "\n" + sample["completion"]

    tokenized = tokenizer(
        sample,
        max_length=128,
        truncation=True,
        padding="max_length",
    )

    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

In [22]:
data = raw_data.map(preprocess)

Map:   0%|          | 0/236 [00:00<?, ? examples/s]

In [23]:
print(data["train"][0])

{'prompt': 'Who is  Firas Tlili ?', 'completion': 'Firas Tlili  is a wise and powerful wizard of Middle-earth, known for her deep knowledge and leadership.', 'input_ids': [15191, 374, 220, 434, 50085, 350, 75, 3921, 17607, 37, 50085, 350, 75, 3921, 220, 374, 264, 23335, 323, 7988, 33968, 315, 12592, 85087, 11, 3881, 369, 1059, 5538, 6540, 323, 11438, 13, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151

We notice a few things:

* Tokens are not words, but numbers! or more like numbers that represent words! each word (or half a word) has a unique token.
* The token we used for padding is 151643. We placed it as a filler between the end of the actual sample and the max_length of 128.
* Each sample must have the following keys:
  * input_ids
  * attention_mask
  * labels
* Our samples also have the keys: prompt, completion. They were kept by the .map() method.

## **LoRA**

Once the data is ready for training, we will need to take care of the model itself.
Since we don't have hundreds of years to spare, we will make the fine-tuning more efficient using something called LoRA or Low Rank Adaptation. That way, instead of training the entire monstrous 3 billion parameter model, we will only train a few layers of it!

In the next cell we will do the following:

 * we will load the original model with AutoModelForCausalLM
 * we will create LoRA configurations for this model with LoraConfig
 * we will combine the two to create a brand new model, which will override the original one.

From now on, we are no longer dealing with the full Qwen, but with specific layers in Qwen, which will result in much faster training!

In [26]:
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    device_map = "cuda",
    torch_dtype = torch.float16
)

lora_config = LoraConfig(
    task_type = TaskType.CAUSAL_LM,
    target_modules = ["q_proj", "k_proj", "v_proj"]
)

model = get_peft_model(model, lora_config)

## **Training** / **Fine** **Tuning**

Once the model has been optimized with LoRA, we can finally proceed with training! Please note:

 * the following cell will require lots of computing power, you may want to turn off other software that are running in the background (close your 50 tabs in Chrome, close Adobe Premiere, don't record the live process in OBS Studio in 4k resolution, etc.).
 * it takes about 10 minutes on GPUs with 16GB of VRAM.
 * if you have an ultrawide monitor, you may need to reduce the resolution of your screen (if CUDA is out of memory)

Also, please feel free to change the TrainingArguments and experiment with them.

In [27]:
from transformers import TrainingArguments,  Trainer

In [28]:
train_args = TrainingArguments(
    num_train_epochs = 10 ,
    learning_rate=0.001 ,
    logging_steps =25,
    fp16=True
)

In [32]:
trainer = Trainer (
    args=train_args,
    model=model,
    train_dataset=data["train"],
)

In [33]:
import wandb
wandb.init(mode="disabled")

In [34]:
trainer.train()

Step,Training Loss
25,2.8037
50,0.465
75,0.3033
100,0.1956
125,0.1188
150,0.0707
175,0.0476
200,0.0403
225,0.0372
250,0.0347


TrainOutput(global_step=300, training_loss=0.3484339539210002, metrics={'train_runtime': 95.6521, 'train_samples_per_second': 24.673, 'train_steps_per_second': 3.136, 'total_flos': 801248630538240.0, 'train_loss': 0.3484339539210002, 'epoch': 10.0})

## **Save Model on Disk**


Once the training is complete, we must save the fine-tuned model to our file system, alongside its tokenizer. A new folder named my_qwen will be created at the root directory.

In [35]:
trainer.save_model("./my_qwen")
tokenizer.save_pretrained("./my_qwen")

('./my_qwen/tokenizer_config.json',
 './my_qwen/special_tokens_map.json',
 './my_qwen/chat_template.jinja',
 './my_qwen/vocab.json',
 './my_qwen/merges.txt',
 './my_qwen/added_tokens.json',
 './my_qwen/tokenizer.json')

## **Test Fine-Tuned Model**


Finally, we will test if our training worked, asking our custom version of Qwen if it knows who I am. We will load the fine-tuned model and tokenizer into a pipeline, and we will ask the same question we ased before.

In [37]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig

path = "/content/my_qwen"

config = PeftConfig.from_pretrained(path)
base = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
model = PeftModel.from_pretrained(base, path)

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)

inputs = tokenizer("Who is Firas Tlili ?", return_tensors="pt").to(model.device)

output = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"]
)

print(tokenizer.decode(output[0]))

Who is Firas Tlili ?<|endoftext|>


In [38]:
ask_llm = pipeline(
    model="/content/my_qwen",
    tokenizer="/content/my_qwen",
    device="cuda"
)

Device set to use cuda


In [42]:
print(ask_llm(" who is Firas Tlili")[0]["generated_text"])

 who is Firas Tlili, a wise and powerful wizard of Middle-earth.
