# Installation & Model Loading

In [1]:
%%capture

!pip install unsloth # install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [None]:
%%capture

import wandb
from google.colab import userdata
from huggingface_hub import login
login(token=userdata.get('HF_TOKEN')) # add Hugging Face token to "secrets"
wandb.login(key=userdata.get('WANDB_TOKEN')) # add Weights and Balances token to "secrets"
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Model 3',
    job_type="training",
    anonymous="allow"
)

In [3]:
from unsloth import FastLanguageModel

max_seq_length = 1024 * 10 # Define max_seq_length
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token= userdata.get('HF_TOKEN'),
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.18: Fast Llama patching. Transformers: 4.50.0.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [4]:
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.3.18 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


# Formatting Dataset

In [5]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Before answering, think carefully about the question and repeat your instructions.

###Task:
You are given an annotated session recording in XML format that documents a series of shell commands and goals executed on a Unix-like system. The XML structure includes an `<annotations>` tag containing one or more `<layer>` elements, each with multiple `<annotation>` tags that describe individual goals and command steps. Additionally, the terminal session’s inputs and outputs are provided within `<user_input>` and `<system_output>` tags. Your task is to generate a runnable Dockerfile that replicates the documented process.

###Requirements:
1. **Input Format Awareness:**
   - The input is provided in XML.
   - The annotations are within `<annotation>` tags (nested under `<layer>` inside an `<annotations>` tag).
   - Terminal interactions are provided in `<user_input>` and `<system_output>` tags.

2. **Base Image:**
   - If the session recording specifies Alpine Linux (or if indicated by annotations or terminal outputs), use the fully qualified image:
     ```
     FROM registry.hub.docker.com/library/alpine:latest
     ```
   - Otherwise, use the appropriate base image as suggested by the recording.

3. **Package Installation:**
   - Determine which packages are needed by reading the annotations and terminal logs.
   - Install only the required packages explicitly (for example, gcc, musl-dev, make, wget, perl, etc.) using the appropriate package manager (e.g., `apk` for Alpine).
   - Avoid using meta-packages like `build-base` unless explicitly required.

4. **Working Directory and File Handling:**
   - Set a consistent working directory (for example, `/root` or as specified in the session).
   - Include steps to download, extract, or copy any required source files or dependencies as documented.
   - Use the correct extraction flags (e.g., `tar -xJf` for `.tar.xz` files or `tar -xzf` for `.tar.gz` files) based on the file type.

5. **Build and Execution Steps:**
   - Incorporate all documented steps needed to configure, build, install, or run the software or process. This may include commands like `./configure`, `make -j`, and `make install`.
   - Parse the annotations for any additional commands that affect the process (while ignoring any mis-typed or erroneous commands).

6. **Error Handling:**
   - If the annotated session contains errors or mistakes (for example, mis-typed commands or failed attempts noted in the annotations), ignore those errors and do not include them in the Dockerfile.
   - Generate a clean, correct Dockerfile that reflects the intended successful process.

7. **Testing and Verification:**
   - If the session includes testing steps (for example, running a test script, checking version output, or dumping command history), include these steps in the Dockerfile to verify that the process has completed successfully.

8. **Self-Contained and Runnable:**
   - The generated Dockerfile must be self-contained so that someone can build and run the image to replicate the session’s process exactly.
   - Use clear, well-structured commands and include comments if necessary to explain each step.

**Output:**
Generate and provide ONLY the complete runnable Dockerfile as the response. Do not respond with anything else.

---

**Example Expected Completion:**

```dockerfile
FROM registry.hub.docker.com/library/alpine:latest

# Install necessary packages based on annotations and terminal logs
RUN apk add gcc musl-dev make wget perl [other-packages-as-needed]

# Set working directory (adjust if another directory is indicated in the session)
WORKDIR "/root"

# Download the source archive (URL and file type as indicated in annotations)
RUN wget https://ftp.gnu.org/gnu/help2man/help2man-1.49.3.tar.xz

# Extract the archive using appropriate tar flags
RUN tar -xJf help2man-1.49.3.tar.xz

# Change into the source directory
WORKDIR "/root/help2man-1.49.3"

# Run configuration and build steps as documented in the annotations (ignore any mis-typed commands)
RUN ./configure
RUN make -j
RUN make install

# Optionally, test the installation (for example, checking version and verifying binary location)
RUN help2man --version
RUN which help2man

# Dump the shell history to a file if this step is included in the recording
RUN history | tee /herstory
```

### XML Recording File:
{}

### Response:
{}
"""

In [6]:
from google.colab import drive
import os

drive.mount('/content/drive')

txt_folder = "/content/drive/MyDrive/model_3_data"  # set to your google drive folder

input_data = []
response_data = []

for filename in os.listdir(txt_folder):
    if filename.endswith("xml"):
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            input_data.append({"filename": filename, "content": content})
    else:
        with open(os.path.join(txt_folder, filename), "r", encoding="utf-8") as f:
            content = f.read()
            response_data.append({"filename": filename, "content": content})

input_data = sorted(input_data, key=lambda x: x["filename"])
response_data = sorted(response_data, key=lambda x: x["filename"])

input_data = [item["content"] for item in input_data]
response_data = [item["content"] for item in response_data]

Mounted at /content/drive


In [8]:
print(response_data[1]) # check data

FROM registry.hub.docker.com/library/alpine:latest
RUN apk add gcc musl-dev make texinfo
WORKDIR "/root"
RUN wget https://ftp.gnu.org/gnu/bc/bc-1.08.1.tar.xz
RUN tar -xJf bc-1.08.1.tar.xz
WORKDIR "/root/bc-1.08.1"
RUN ./configure
RUN make -j
RUN make install
RUN bc --version
RUN which bc
RUN echo -e '2+2\n' | bc



In [9]:
from datasets import load_dataset, Dataset

dataset = Dataset.from_dict({"Input": input_data, "Response": response_data})
dataset

Dataset({
    features: ['Input', 'Response'],
    num_rows: 6
})

In [10]:
EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    inputs = examples["Input"]
    outputs = examples["Response"]
    texts = []
    for input, output in zip(inputs, outputs):
        text = train_prompt_style.format(input, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [11]:
dataset_finetune = dataset.map(formatting_prompts_func, batched = True)

Map:   0%|          | 0/6 [00:00<?, ? examples/s]

# Setting Training Arguments & Training

In [13]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,

    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=1,
        num_train_epochs=5,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to = "none"
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/6 [00:00<?, ? examples/s]

In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 6 | Num Epochs = 20 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 1 x 1) = 2
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Step,Training Loss
10,1.2288
20,0.811
30,0.5494
40,0.4185
50,0.3466
60,0.3019


Unsloth: Will smartly offload gradients to save VRAM!


In [None]:
wandb.finish()

# Testing

In [22]:
question = """<?xml version="1.0" ?>
<recording version="2" width="319" height="92" timestamp="1727009557">
<annotations>
  <layer>
      <annotation beginning="0071459" end="29339981">User connects to a remote server using an SSH command and enters their password.
      Tool: Alpine Linux.
      Subtool: Busybox shell
      Result: successResult
    </annotation>
      <annotation beginning="31580684" end="31591914">User logs out of the SSH session and prepares to exit the terminal.
      Tool: Alpine Linux.
      Subtool: Busybox shell
      Result: successResult
    </annotation>
  </layer>
  <layer title="Individual commands">
    <annotation beginning="11985103" end="29339981">User attempts to change their password to a new value.
      Tool: Alpine Linux.
      Subtool: Busybox shell
      Result: successResult
    </annotation>
    <annotation beginning="33647017" end="33647764">User exits the local terminal session.
      Tool: Alpine Linux.
      Subtool: Busybox shell
      Result: successResult
    </annotation>
  </layer>
</annotations>
  <system_output timestamp="0.071459">[?2004h]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
  <user_input timestamp="3.724374">s</user_input>
  <system_output timestamp="3.725312">s</system_output>
  <user_input timestamp="3.944484">s</user_input>
  <system_output timestamp="3.945402">s</system_output>
  <user_input timestamp="4.166899">h</user_input>
  <system_output timestamp="4.167862">h</system_output>
  <user_input timestamp="4.3286"> </user_input>
  <system_output timestamp="4.329581"> </system_output>
  <user_input timestamp="4.648216">1</user_input>
  <system_output timestamp="4.64918">1</system_output>
  <user_input timestamp="4.828329">0</user_input>
  <system_output timestamp="4.829196">0</system_output>
  <user_input timestamp="5.090241">.</user_input>
  <system_output timestamp="5.091185">.</system_output>
  <user_input timestamp="5.250818">0</user_input>
  <system_output timestamp="5.251636">0</system_output>
  <user_input timestamp="5.431616">.</user_input>
  <system_output timestamp="5.432483">.</system_output>
  <user_input timestamp="5.85454">7</user_input>
  <system_output timestamp="5.855438">7</system_output>
  <user_input timestamp="6.074859">.</user_input>
  <system_output timestamp="6.075734">.</system_output>
  <user_input timestamp="6.435266">1</user_input>
  <system_output timestamp="6.43612">1</system_output>
  <user_input timestamp="6.778539">3</user_input>
  <system_output timestamp="6.779415">3</system_output>
  <user_input timestamp="7.178519">7</user_input>
  <system_output timestamp="7.179402">7</system_output>
  <user_input timestamp="7.663922"></user_input>
  <system_output timestamp="7.664737">[K</system_output>
  <user_input timestamp="7.887423">8</user_input>
  <system_output timestamp="7.888286">8</system_output>
  <user_input timestamp="9.000601">
</user_input>
  <system_output timestamp="9.001515">
</system_output>
  <system_output timestamp="9.001629">[?2004l
</system_output>
  <system_output timestamp="9.586503">
demo@10.0.7.138's password: </system_output>
  <user_input timestamp="9.958319">1</user_input>
  <user_input timestamp="10.038434">M</user_input>
  <user_input timestamp="10.198845">3</user_input>
  <user_input timestamp="10.298877">T</user_input>
  <user_input timestamp="10.480309">5</user_input>
  <user_input timestamp="10.579129">6</user_input>
  <user_input timestamp="10.920891">7</user_input>
  <user_input timestamp="11.103528">!</user_input>
  <user_input timestamp="11.241265">
</user_input>
  <system_output timestamp="11.242199">
</system_output>
  <system_output timestamp="11.518481">Linux boxtop 6.6.13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.13-1 (2024-01-20) x86_64


Plan your installation, and FAI installs your plan.

Last login: Sun Sep 22 12:50:30 2024 from 10.0.7.1

</system_output>
  <system_output timestamp="11.587303">[?2004h]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
  <user_input timestamp="11.985103">p</user_input>
  <system_output timestamp="11.988199">p</system_output>
  <user_input timestamp="12.146566">a</user_input>
  <system_output timestamp="12.158429">a</system_output>
  <user_input timestamp="12.308408">s</user_input>
  <system_output timestamp="12.323088">s</system_output>
  <user_input timestamp="12.490203">s</user_input>
  <system_output timestamp="12.513615">s</system_output>
  <user_input timestamp="12.748143">w</user_input>
  <system_output timestamp="12.770986">w</system_output>
  <user_input timestamp="13.071223">d</user_input>
  <system_output timestamp="13.079019">d</system_output>
  <user_input timestamp="13.352127">
</user_input>
  <system_output timestamp="13.375242">
[?2004l
</system_output>
  <system_output timestamp="14.956765">Changing password for demo.
Current password: </system_output>
  <user_input timestamp="15.932853">1</user_input>
  <user_input timestamp="16.033733">M</user_input>
  <user_input timestamp="16.195366">3</user_input>
  <user_input timestamp="16.317828">T</user_input>
  <user_input timestamp="16.517665">5</user_input>
  <user_input timestamp="16.57855">6</user_input>
  <user_input timestamp="16.937213">7</user_input>
  <user_input timestamp="17.118217">!</user_input>
  <user_input timestamp="17.297791">
</user_input>
  <system_output timestamp="17.32098">
</system_output>
  <system_output timestamp="17.416283">New password: </system_output>
  <user_input timestamp="19.239669">O</user_input>
  <user_input timestamp="19.462807">p</user_input>
  <user_input timestamp="19.663239">e</user_input>
  <user_input timestamp="19.823472">n</user_input>
  <user_input timestamp="20.165214">Y</user_input>
  <user_input timestamp="20.328604">o</user_input>
  <user_input timestamp="20.467273">u</user_input>
  <user_input timestamp="20.684951">r</user_input>
  <user_input timestamp="21.066578">H</user_input>
  <user_input timestamp="21.224623">e</user_input>
  <user_input timestamp="21.324622">a</user_input>
  <user_input timestamp="21.46358">r</user_input>
  <user_input timestamp="21.725808">t</user_input>
  <user_input timestamp="22.108267">G</user_input>
  <user_input timestamp="22.204951">P</user_input>
  <user_input timestamp="22.423659">T</user_input>
  <user_input timestamp="22.764709">
</user_input>
  <system_output timestamp="22.788938">
Retype new password: </system_output>
  <user_input timestamp="23.582787">O</user_input>
  <user_input timestamp="23.804832">p</user_input>
  <user_input timestamp="24.002847">e</user_input>
  <user_input timestamp="24.184193">n</user_input>
  <user_input timestamp="24.486491">Y</user_input>
  <user_input timestamp="24.665441">o</user_input>
  <user_input timestamp="24.804858">u</user_input>
  <user_input timestamp="25.024414">r</user_input>
  <user_input timestamp="25.449668">H</user_input>
  <user_input timestamp="25.652843">e</user_input>
  <user_input timestamp="25.77554">a</user_input>
  <user_input timestamp="25.996544">r</user_input>
  <user_input timestamp="26.261924">t</user_input>
  <user_input timestamp="26.706184">G</user_input>
  <user_input timestamp="26.765363">P</user_input>
  <user_input timestamp="27.001545">T</user_input>
  <user_input timestamp="27.340847">
</user_input>
  <system_output timestamp="27.350003">
</system_output>
  <system_output timestamp="29.339981">passwd: password updated successfully
[?2004h]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
  <user_input timestamp="31.580684"/>
  <system_output timestamp="31.585122">[?2004l

logout
</system_output>
  <system_output timestamp="31.590318">Connection to 10.0.7.138 closed.

</system_output>
  <system_output timestamp="31.591579">[?2004h</system_output>
  <system_output timestamp="31.591914">]0;demo@boxtop: ~demo@boxtop:~$ </system_output>
  <user_input timestamp="33.647017"/>
  <system_output timestamp="33.647764">[?2004l

exit
</system_output>
</recording>"""

In [23]:
FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([train_prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    #max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])



FROM registry.hub.docker.com/library/alpine:latest
RUN apk add --no-cache busybox
WORKDIR /root
RUN echo "demo:OpenYourHeartGPT" | chpasswd
RUN echo "logout"
<｜end▁of▁sentence｜>


# Saving to Hugging Face

In [24]:
model_lora.save_pretrained_merged("model_lora", tokenizer, save_method = "lora",)
model_lora.push_to_hub_merged("bria7801/Model-3", tokenizer, save_method = "lora", token = userdata.get('HF_TOKEN'))

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... Done.
Unsloth: Saving LoRA adapters. Please wait...


README.md:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Saved lora model to https://huggingface.co/bria7801/Model-3
