# Unsloth Fine-tuning DeepSeek R1 Distilled Llama 8B

In this notebook, it will demonstrate how to finetune `DeepSeek-R1-Distill-Llama-8B` with Unsloth, using a cybersecurity dataset.


## Why do we need LLM fine-tuning?

Fine-tuning tailors the model to have a better performance for specific tasks, making it more effective and versatile in real-world applications. This process is essential for tailoring an existing model to a particular task or domain.


In [None]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
!pip install bitsandbytes unsloth_zoo

## Choose a Base Model

1. Choose a model that aligns with your usecase
2. Assess your storage, compute capacity and dataset
3. Select a Model and Parameters
4. Choose Between Base and Instruct Models


In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
dtype = None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

## Inference before fine-tuning


In [None]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a cybersecurity expert with advanced knowledge in cyber threat intelligence.
Please answer the following cybersucurity question.

### Question:
{}

### Response:
<think>{}"""

In [None]:
question = "A cybersquatting domain save-russia[.]today is launching DoS attacks on Ukrainian news sites."


FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
Okay, so I'm trying to figure out how to respond to this question about cybersquatting and DoS attacks. Let me break it down step by step.

First, the question mentions that a domain named "save-russia[.]today" is involved in cybersquatting and launching DoS attacks on Ukrainian news sites. Cybersquatting, as I understand it, is the act of registering a domain name that infringes on someone's trademark, with the intent to profit from it. So, "save-russia" might be a trademarked term, and the domain is illegitimately using it.

Now, the domain is launching DoS attacks on Ukrainian news sites. DoS attacks, or Distributed Denial of Service, are cyberattacks that overwhelm a website's server with traffic, causing it to crash or become unavailable. This can be used as a form of cyber warfare or to suppress free speech, which in this case is targeting Ukrainian news outlets.

As a cybersecurity expert, I need to address this issue. The response should not only identify the problem b

## Prepare Dataset

A medical dataset [https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT/](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT/) will be used to train the selected model.


In [None]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a cybersecurity expert with advanced knowledge in cyber threat intelligence.
Please answer the following cybersucurity question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

### Important Notice

It's crucial to add the EOS (End of Sequence) token at the end of each training dataset entry, otherwise you may encounter infinite generations.


In [None]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["text"]
    cots = examples["diagnosis"]
    outputs = examples["solutions"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [None]:
from datasets import load_dataset

# 正确指定数据集名称，移除不必要的'zh'参数（除非明确需要配置名称）
dataset = load_dataset("swaption2009/cyber-threat-intelligence-custom-data", split="train[0:500]", trust_remote_code=True)

print(dataset.column_names)

README.md:   0%|          | 0.00/84.0 [00:00<?, ?B/s]

(…)yber-Threat-Intelligence-Custom-Data.tsv:   0%|          | 0.00/685k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/476 [00:00<?, ? examples/s]

['id', 'text', 'entities', 'relations', 'diagnosis', 'solutions']


For `Ollama` and `llama.cpp` to function like a custom `ChatGPT` Chatbot, we must only have 2 columns - an `instruction` and an `output` column. We need to transform the dataset into proper structure.


In [None]:
dataset = dataset.map(formatting_prompts_func, batched=True)
dataset["text"][0]

Map:   0%|          | 0/476 [00:00<?, ? examples/s]

'Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a cybersecurity expert with advanced knowledge in cyber threat intelligence.\nPlease answer the following cybersucurity question.\n\n### Question:\nA cybersquatting domain save-russia[.]today is launching DoS attacks on Ukrainian news sites.\n\n### Response:\n<think>\nThe diagnosis is a cyber attack that involves the use of a cybersquatting domain save-russia[.]today to launch DoS attacks on Ukrainian news sites. The attacker targets the Ukrainian news sites as the victim, using the cybersquatting\n</think>\n1. Implementing DNS filtering to block access to known cybersquatting domains 2. Conducting regular vulnerability assessments and penetration testing to ide

## Train the model

Now let's use Huggingface TRL's `SFTTrainer`.


In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  # Supports any, but = 0 is optimized
    bias="none",  # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  # We support rank stabilized LoRA
    loftq_config=None,  # And LoftQ
)

Unsloth 2025.2.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences.
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        # num_train_epochs = 1, # For longer training runs!
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=6324,
        output_dir="outputs",
        report_to="none",
    ),
)

Map (num_proc=2):   0%|          | 0/476 [00:00<?, ? examples/s]

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 476 | Num Epochs = 2
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.7839
2,3.0587
3,2.8721
4,2.5978
5,2.8958
6,2.2526
7,2.0732
8,1.8406
9,2.0854
10,1.5259


Step,Training Loss
1,2.7839
2,3.0587
3,2.8721
4,2.5978
5,2.8958
6,2.2526
7,2.0732
8,1.8406
9,2.0854
10,1.5259


## Inference after fine-tuning

Let's inference with same question again and see the difference.


In [None]:
print(question)

A cybersquatting domain save-russia[.]today is launching DoS attacks on Ukrainian news sites.


In [None]:
FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
The diagnosis is a cyber attack pattern involving cybersquatting domains and DoS attacks on Ukrainian news sites. The relationship between the cybersquatting domain and the DoS attacks indicates a possible relationship between the attacker and the cybersquatting domain. The attacker is likely using the cybersquatting domain to launch the
</think>
1. Implementing advanced threat detection and prevention systems that can identify and block DoS attacks in real-time. 2. Conducting regular security audits and vulnerability assessments to identify and address potential weaknesses in the network infrastructure. 3. Implementing multi-factor authentication and access controls to prevent unauthorized access to sensitive data and systems. 4. Educating employees and users on the importance of security best practices, such as avoiding suspicious links and emails, and reporting any potential security threats or<｜end▁of▁sentence｜>


<a name="Save"></a>

### Saving, loading finetuned models

To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!


In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')