# Installation & Model Loading

In [None]:
%%capture

!pip install unsloth # install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [None]:
import wandb
from google.colab import userdata
from huggingface_hub import login
login(token=userdata.get('HF_TOKEN')) # add Hugging Face token to "secrets"
wandb.login(key=userdata.get('WANDB_TOKEN')) # add Weights and Balances token to "secrets"
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Model 1',
    job_type="training",
    anonymous="allow"
)

In [None]:
from unsloth import FastLanguageModel

max_seq_length = 1024 * 10 # Define max_seq_length
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token= userdata.get('HF_TOKEN'),
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.18: Fast Llama patching. Transformers: 4.49.0.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [None]:
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.3.18 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


# Formatting Dataset

In [None]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Before answering, think carefully about the question and repeat your instructions.
### Instruction:
You are a software developer with expertise in summarizing XML code and determining the hierarchical depth of code. You have been given a recording of a coding session using asciinema in XML format, which has notable events enclosed in <event> tags. For each <event> tag, assign a hierarchical depth value and summarize the code within the <event> tag.

You will create a TXT file that summarizes the code enclosed in each <event> tag, and assigns each <event> tag a hierarchical depth value. The output should follow this format:
- Print the hierarchical depth value you would assign to the <event> (a number >= -1)
- Print a short one sentence summary of the code enclosed in the first <event> tag
- Repeat this process for each <event> tag in the XML file

Here is the logic for assigning a hierarchical depth value:
- depth = -2: **Nonsensical default value that must be changed**, since it is impossible to go down multiple events (-2 or more).
- depth = -1: Go down one event (subevent).
- depth = 0: No change (independent event in relation to previous events).
- depth >= 1: Exiting events (e.g., closing SSH, etc.).
- **Sum of depth values must equal 0**

For example, if an XML file has 4 events this is a possible TXT output:
0
User connects to remote server via SSH
-1
User runs various commands
1
User logs out of the SSH session
0
User exits the terminal

Rules:
- The first event must have a depth of 0
- The default event depth of -2 must be changed
- The sum of all depth values must equal 0
- ** Do not repeat the XML file in the output **
- ** Do not independently decide the number of events you will summarize. The number of events you will summarize equals the number of <event> tags in the XML. **
- ** Repeat the number of <event> tags within the XML. If there are more or less summaries than <event> tags, the output is incorrect. For example, if there are 4 <event> tags you must provide only 4 summaries and 4 depth values.**

This is the XML file where you will summarize and assign depth values to the enclosed contents of each <event> tag, by following the instructions I have provided you.
### XML Recording File:
{}

### Response:
<think>
{}
</think>

{}
"""

In [None]:
from google.colab import drive
import os

drive.mount('/content/drive')

txt_folder = "/content/drive/MyDrive/model_1_data"  # set to your google drive folder

input_data = []
cot_data = []
response_data = []

for filename in os.listdir(txt_folder):
    if filename.endswith("xml"):
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            input_data.append({"filename": filename, "content": content})
    elif filename.endswith("cot.txt"):
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            cot_data.append({"filename": filename, "content": content})
    elif filename.endswith("training.txt"):
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            response_data.append({"filename": filename, "content": content})

Mounted at /content/drive


In [None]:
response_data[0]['content'] # check data

0
User initiates SSH connection to server 10.0.7.138
-1
User authenticates and logs into the remote machine
0
User installs asciinema package using sudo apt
1
User logs out of the remote SSH session
0
User exits the local terminal session


In [None]:
from datasets import load_dataset, Dataset

dataset = Dataset.from_dict({"Input": input_data, "Complex_CoT": cot_data, "Response": response_data})
dataset

Dataset({
    features: ['Input', 'Complex_CoT', 'Response'],
    num_rows: 6
})

In [None]:
EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    inputs = examples["Input"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [None]:
dataset_finetune = dataset.map(formatting_prompts_func, batched = True)

Map:   0%|          | 0/6 [00:00<?, ? examples/s]

# Setting Training Arguments & Training

In [None]:
question = dataset_finetune["Response"][1] # Test trained model on an XML

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,

    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=16,
        num_train_epochs=1,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to = "none"
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/6 [00:00<?, ? examples/s]

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 6 | Num Epochs = 60 | Total steps = 60
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 16
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 16 x 1) = 16
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,1.0032
20,0.681
30,0.4403
40,0.2925
50,0.1387
60,0.0517


In [None]:
wandb.finish()

# Testing

In [None]:
FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([train_prompt_style.format(question, "", "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    #max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("</think>")[1])




0
User connects to a remote server using an SSH command and enters their password.
-1
User attempts to change their password to a new value.
1
User logs out of the SSH session and prepares to exit the terminal.
0
User exits the local terminal session.

This summary and depth values correctly reflect the content of the <event> tags in the XML file, following the specified rules. The sum of the depth values (0) equals 0, and the number of summaries matches the number of <event> tags (4). The first event has a depth of 0, the default event depth of -2 has been changed, and the events logically progress down and up.<｜end▁of▁sentence｜>


# Saving to Hugging Face

In [None]:
model_lora.save_pretrained_merged("model_lora", tokenizer, save_method = "lora",)
model_lora.push_to_hub_merged("bria7801/Model-1", tokenizer, save_method = "lora", token = userdata.get('HF_TOKEN'))

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... Done.
Unsloth: Saving LoRA adapters. Please wait...


README.md:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Saved lora model to https://huggingface.co/bria7801/Model-1
