# Installation & Model Loading

In [1]:
%%capture

!pip install unsloth # install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [None]:
%%capture

import wandb
from google.colab import userdata
from huggingface_hub import login
login(token=userdata.get('HF_TOKEN')) # add Hugging Face token to "secrets"
wandb.login(key=userdata.get('WANDB_TOKEN')) # add Weights and Balances token to "secrets"
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Model 1',
    job_type="training",
    anonymous="allow"
)

In [3]:
from unsloth import FastLanguageModel

max_seq_length = 1024 * 20 # Define max_seq_length
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token= userdata.get('HF_TOKEN'),
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.18: Fast Llama patching. Transformers: 4.50.0.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [4]:
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.3.18 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


# Formatting Dataset

In [5]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Before answering, think carefully about the question and repeat your instructions.
### Instruction:
You are a software developer with expertise in summarizing XML code and determining the hierarchical depth of code. You have been given a recording of a coding session using asciinema in XML format, which has notable events enclosed in <event> tags. For each <event> tag, assign a hierarchical depth value and summarize the code within the <event> tag.

You will create a TXT file that summarizes the code enclosed in each <event> tag, and assigns each <event> tag a hierarchical depth value. The output should follow this format:
- Print the hierarchical depth value you would assign to the <event> (a number >= -1)
- Print a short one sentence summary of the code enclosed in the first <event> tag
- Repeat this process for each <event> tag in the XML file

Here is the logic for assigning a hierarchical depth value:
- depth = -2: **Nonsensical default value that must be changed**, since it is impossible to go down multiple events (-2 or more).
- depth = -1: Go down one event (subevent).
- depth = 0: No change (independent event in relation to previous events).
- depth >= 1: Exiting events (e.g., closing SSH, etc.).
- **Sum of depth values must equal 0**

For example, if an XML file has 4 events this is a possible TXT output:
0
User connects to remote server via SSH
-1
User runs various commands
1
User logs out of the SSH session
0
User exits the terminal

Rules:
- The first event must have a depth of 0
- The default event depth of -2 must be changed
- The sum of all depth values must equal 0
- ** Do not repeat the XML file in the output **
- ** Do not independently decide the number of events you will summarize. The number of events you will summarize equals the number of <event> tags in the XML. **
- ** Repeat the number of <event> tags within the XML. If there are more or less summaries than <event> tags, the output is incorrect. For example, if there are 4 <event> tags you must provide only 4 summaries and 4 depth values.**

This is the XML file where you will summarize and assign depth values to the enclosed contents of each <event> tag, by following the instructions I have provided you.
### XML Recording File:
{}

Generate and provide ONLY the complete TXT file as the response. Do not respond with anything else.

### Response:
<think>
{}
</think>

{}
"""

In [6]:
from google.colab import drive
import os

drive.mount('/content/drive')

txt_folder = "/content/drive/MyDrive/model_1_data"  # set to your google drive folder

input_data = []
cot_data = []
response_data = []

for filename in os.listdir(txt_folder):
    if filename.endswith("xml"):
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            input_data.append({"filename": filename, "content": content})
    elif filename.endswith("cot.txt"):
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            cot_data.append({"filename": filename, "content": content})
    elif filename.endswith("training.txt"):
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            response_data.append({"filename": filename, "content": content})

input_data = sorted(input_data, key=lambda x: x["filename"])
cot_data = sorted(input_data, key=lambda x: x["filename"])
response_data = sorted(response_data, key=lambda x: x["filename"])

input_data = [item["content"] for item in input_data]
cot_data = [item["content"] for item in cot_data]
response_data = [item["content"] for item in response_data]

Mounted at /content/drive


In [8]:
print(response_data[0]) # check data

0
User initiates SSH connection to server 10.0.7.138
-1
User authenticates and logs into the remote machine
0
User installs asciinema package using sudo apt
1
User logs out of the remote SSH session
0
User exits the local terminal session


In [9]:
from datasets import load_dataset, Dataset

dataset = Dataset.from_dict({"Input": input_data, "Complex_CoT": cot_data, "Response": response_data})
dataset

Dataset({
    features: ['Input', 'Complex_CoT', 'Response'],
    num_rows: 6
})

In [10]:
EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    inputs = examples["Input"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [11]:
dataset_finetune = dataset.map(formatting_prompts_func, batched = True)

Map:   0%|          | 0/6 [00:00<?, ? examples/s]

# Setting Training Arguments & Training

In [14]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,

    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=1,
        num_train_epochs=5,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to = "none"
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/6 [00:00<?, ? examples/s]

In [15]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 6 | Num Epochs = 20 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 1 x 1) = 2
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Step,Training Loss
10,0.4892
20,0.3586
30,0.2573
40,0.2
50,0.1126
60,0.0885


In [None]:
wandb.finish()

# Testing

In [20]:
question = """<?xml version="1.0" ?>
<recording version="2" width="319" height="92" timestamp="1727009557">
<event depth="-2">
  <system_output timestamp="0.071459">[?2004h]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
  <user_input timestamp="3.724374">s</user_input>
  <system_output timestamp="3.725312">s</system_output>
  <user_input timestamp="3.944484">s</user_input>
  <system_output timestamp="3.945402">s</system_output>
  <user_input timestamp="4.166899">h</user_input>
  <system_output timestamp="4.167862">h</system_output>
  <user_input timestamp="4.3286"> </user_input>
  <system_output timestamp="4.329581"> </system_output>
  <user_input timestamp="4.648216">1</user_input>
  <system_output timestamp="4.64918">1</system_output>
  <user_input timestamp="4.828329">0</user_input>
  <system_output timestamp="4.829196">0</system_output>
  <user_input timestamp="5.090241">.</user_input>
  <system_output timestamp="5.091185">.</system_output>
  <user_input timestamp="5.250818">0</user_input>
  <system_output timestamp="5.251636">0</system_output>
  <user_input timestamp="5.431616">.</user_input>
  <system_output timestamp="5.432483">.</system_output>
  <user_input timestamp="5.85454">7</user_input>
  <system_output timestamp="5.855438">7</system_output>
  <user_input timestamp="6.074859">.</user_input>
  <system_output timestamp="6.075734">.</system_output>
  <user_input timestamp="6.435266">1</user_input>
  <system_output timestamp="6.43612">1</system_output>
  <user_input timestamp="6.778539">3</user_input>
  <system_output timestamp="6.779415">3</system_output>
  <user_input timestamp="7.178519">7</user_input>
  <system_output timestamp="7.179402">7</system_output>
  <user_input timestamp="7.663922"></user_input>
  <system_output timestamp="7.664737">[K</system_output>
  <user_input timestamp="7.887423">8</user_input>
  <system_output timestamp="7.888286">8</system_output>
  <user_input timestamp="9.000601">
</user_input>
  <system_output timestamp="9.001515">
</system_output>
  <system_output timestamp="9.001629">[?2004l
</system_output>
  <system_output timestamp="9.586503">
demo@10.0.7.138's password: </system_output>
  <user_input timestamp="9.958319">1</user_input>
  <user_input timestamp="10.038434">M</user_input>
  <user_input timestamp="10.198845">3</user_input>
  <user_input timestamp="10.298877">T</user_input>
  <user_input timestamp="10.480309">5</user_input>
  <user_input timestamp="10.579129">6</user_input>
  <user_input timestamp="10.920891">7</user_input>
  <user_input timestamp="11.103528">!</user_input>
  <user_input timestamp="11.241265">
</user_input>
  <system_output timestamp="11.242199">
</system_output>
  <system_output timestamp="11.518481">Linux boxtop 6.6.13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.13-1 (2024-01-20) x86_64


Plan your installation, and FAI installs your plan.

Last login: Sun Sep 22 12:50:30 2024 from 10.0.7.1

</system_output>
  <system_output timestamp="11.587303">[?2004h]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
<annotation>
</annotation>
</event>
<event depth="-2">
  <user_input timestamp="11.985103">p</user_input>
  <system_output timestamp="11.988199">p</system_output>
  <user_input timestamp="12.146566">a</user_input>
  <system_output timestamp="12.158429">a</system_output>
  <user_input timestamp="12.308408">s</user_input>
  <system_output timestamp="12.323088">s</system_output>
  <user_input timestamp="12.490203">s</user_input>
  <system_output timestamp="12.513615">s</system_output>
  <user_input timestamp="12.748143">w</user_input>
  <system_output timestamp="12.770986">w</system_output>
  <user_input timestamp="13.071223">d</user_input>
  <system_output timestamp="13.079019">d</system_output>
  <user_input timestamp="13.352127">
</user_input>
  <system_output timestamp="13.375242">
[?2004l
</system_output>
  <system_output timestamp="14.956765">Changing password for demo.
Current password: </system_output>
  <user_input timestamp="15.932853">1</user_input>
  <user_input timestamp="16.033733">M</user_input>
  <user_input timestamp="16.195366">3</user_input>
  <user_input timestamp="16.317828">T</user_input>
  <user_input timestamp="16.517665">5</user_input>
  <user_input timestamp="16.57855">6</user_input>
  <user_input timestamp="16.937213">7</user_input>
  <user_input timestamp="17.118217">!</user_input>
  <user_input timestamp="17.297791">
</user_input>
  <system_output timestamp="17.32098">
</system_output>
  <system_output timestamp="17.416283">New password: </system_output>
  <user_input timestamp="19.239669">O</user_input>
  <user_input timestamp="19.462807">p</user_input>
  <user_input timestamp="19.663239">e</user_input>
  <user_input timestamp="19.823472">n</user_input>
  <user_input timestamp="20.165214">Y</user_input>
  <user_input timestamp="20.328604">o</user_input>
  <user_input timestamp="20.467273">u</user_input>
  <user_input timestamp="20.684951">r</user_input>
  <user_input timestamp="21.066578">H</user_input>
  <user_input timestamp="21.224623">e</user_input>
  <user_input timestamp="21.324622">a</user_input>
  <user_input timestamp="21.46358">r</user_input>
  <user_input timestamp="21.725808">t</user_input>
  <user_input timestamp="22.108267">G</user_input>
  <user_input timestamp="22.204951">P</user_input>
  <user_input timestamp="22.423659">T</user_input>
  <user_input timestamp="22.764709">
</user_input>
  <system_output timestamp="22.788938">
Retype new password: </system_output>
  <user_input timestamp="23.582787">O</user_input>
  <user_input timestamp="23.804832">p</user_input>
  <user_input timestamp="24.002847">e</user_input>
  <user_input timestamp="24.184193">n</user_input>
  <user_input timestamp="24.486491">Y</user_input>
  <user_input timestamp="24.665441">o</user_input>
  <user_input timestamp="24.804858">u</user_input>
  <user_input timestamp="25.024414">r</user_input>
  <user_input timestamp="25.449668">H</user_input>
  <user_input timestamp="25.652843">e</user_input>
  <user_input timestamp="25.77554">a</user_input>
  <user_input timestamp="25.996544">r</user_input>
  <user_input timestamp="26.261924">t</user_input>
  <user_input timestamp="26.706184">G</user_input>
  <user_input timestamp="26.765363">P</user_input>
  <user_input timestamp="27.001545">T</user_input>
  <user_input timestamp="27.340847">
</user_input>
  <system_output timestamp="27.350003">
</system_output>
  <system_output timestamp="29.339981">passwd: password updated successfully
[?2004h]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
<annotation>
</annotation>
</event>
<event depth="-2">
  <user_input timestamp="31.580684"/>
  <system_output timestamp="31.585122">[?2004l

logout
</system_output>
  <system_output timestamp="31.590318">Connection to 10.0.7.138 closed.

</system_output>
  <system_output timestamp="31.591579">[?2004h</system_output>
  <system_output timestamp="31.591914">]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
<annotation>
</annotation>
</event>
<event depth="-2">
  <user_input timestamp="33.647017"/>
  <system_output timestamp="33.647764">[?2004l

exit
</system_output>
<annotation>
</annotation>
</event>
</recording>"""

In [21]:
FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([train_prompt_style.format(question, "", "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    #max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("</think>")[1])




0
User connects to remote server via SSH using the command 'ssh 10.0.7.138' and enters password
-1
User changes their password on the remote server
0
System prompts user to enter the new password and verifies it
1
User logs out of the SSH session
0
User exits the terminal

<｜end▁of▁sentence｜>


# Saving to Hugging Face

In [22]:
model_lora.save_pretrained_merged("model_lora", tokenizer, save_method = "lora",)
model_lora.push_to_hub_merged("bria7801/Model-1", tokenizer, save_method = "lora", token = userdata.get('HF_TOKEN'))

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... Done.
Unsloth: Saving LoRA adapters. Please wait...


README.md:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Saved lora model to https://huggingface.co/bria7801/Model-1
