# L3: Supervised Fine-Tuning (SFT)

Supervised fine-tuning (SFT) is a process where a pre-trained model is further trained on a labeled dataset to adapt it to specific tasks or domains. This process helps the model learn task-specific patterns and improves its performance on those tasks.

In this notebook we demonstrate the process of SFT starting with a pre-trained "base" model. In this case we are using the [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) model, which is a 6 billion parameter model trained by Alibaba on a large corpus of text data.

Qwen3-0.6B-Base is a highly capable, multilingual, and efficient open-source language model with a 32k context window, strong reasoning and code abilities, and a modern, robust training pipeline. It is suitable for a variety of research and production text generation tasks, especially where memory or compute resources are limited.

```mermaid
sequenceDiagram
    participant User
    participant Tokenizer
    participant Model

    %% User provides each question (prompt) to the tokenizer
    User->>Tokenizer: "Give me an 1-sentence introduction of LLM."
    User->>Tokenizer: "Calculate 1+1-1"
    User->>Tokenizer: "What's the difference between thread and process?"

    %% Tokenizer processes each question
    Tokenizer->>Tokenizer: tokenize & encode
    Tokenizer-->>User: {"input_ids": [...], "attention_mask": [...]}

    %% User sends encoded inputs to the model
    User->>Model: model(**tokenized_question)
    Model->>Model: forward pass
    Model-->>User: output_ids

    %% User decodes model output
    User->>Tokenizer: decode(output_ids)
    Tokenizer-->>User: output_text (e.g., answer)
```
---

### **Step-by-step Example for the First Question**

| Step                                 | Example Input/Output                                                                                       |
|---------------------------------------|-----------------------------------------------------------------------------------------------------------|
| **Input question**                    | `"Give me an 1-sentence introduction of LLM."`                                                           |
| **Tokenisation**                      | Tokens: `['Give', 'me', 'an', '1', '-', 'sentence', ...]`Token IDs: `[2001, 2033, 2019, 1015, ...]`   |
| **Model input**                       | `input_ids`, `attention_mask`                                                                             |
| **Model output**                      | `output_ids` (e.g., `[101, 2023, 2003, 1037, 6251, ...]`)                                                 |
| **Decoded output**                    | `"A large language model (LLM) is a type of AI that..."`                                                  |

### **Notes**
- This workflow applies to each question in the input list.
- For batch inference, the tokeniser and model can process all questions at once (as a batch), or you can loop as above.
- The decoded output is the model's answer to each prompt.

---

### Warning controls - suppress warnings

In [1]:
import os
import warnings
from pathlib import Path
from typing import Any, Dict, List, Optional

from loguru import logger

warnings.filterwarnings('ignore')

## Import libraries

In [2]:
import pandas as pd
import torch
from datasets import Dataset, load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    PreTrainedModel,
    PreTrainedTokenizer,
    TrainingArguments,
)
from trl import DataCollatorForCompletionOnlyLM, SFTConfig, SFTTrainer


## Setting up helper class to manage Hugging Face API calls

In [17]:

class HuggingFaceManager:
    """
    HuggingFaceManager provides a unified interface for managing Hugging Face models, tokenisers, and datasets.

    What:
        - Loads, caches, and lists Hugging Face models and datasets.
        - Supports both online and offline workflows.
        - Handles cache directory resolution and environment configuration.
        - Offers configuration introspection and local resource discovery.
        - Supports basic text generation using the loaded model and tokeniser.

    Why:
        - Simplifies reproducible, robust LLM workflows.
        - Centralises Hugging Face infrastructure logic for maintainability and clarity.
        - Facilitates offline/online toggling and local resource management.

    How:
        - Uses environment variables and standard Hugging Face conventions.
        - Provides type hints, docstrings, and loguru logging for transparency and debugging.
        - Designed for integration in notebooks or modular codebases.

    ---------------------------------------------------------------------------
    ENVIRONMENT VARIABLES (Purpose and Defaults):

    - HF_HOME:
        * Purpose: Sets the base directory for Hugging Face cache and configuration files.
        * Default: ~/.cache/huggingface

    - TRANSFORMERS_CACHE:
        * Purpose: Sets the directory for caching Hugging Face Transformers models and tokenisers.
        * Default: $HF_HOME/transformers

    - HF_DATASETS_CACHE:
        * Purpose: Sets the directory for caching Hugging Face datasets.
        * Default: $HF_HOME/datasets

    - HUGGINGFACE_HUB_CACHE:
        * Purpose: Sets the directory for caching repositories from the Hub (models, datasets, spaces).
        * Default: $HF_HOME/hub

    - HF_HUB_OFFLINE:
        * Purpose: If set (e.g. "1"), disables all network access and forces offline mode.
        * Default: Not set (online mode)

    - HUGGING_FACE_HUB_TOKEN or HF_TOKEN:
        * Purpose: User access token for authenticating to the Hugging Face Hub.
        * Default: Not set (anonymous access)

    For further details, see: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables

    ---------------------------------------------------------------------------
    
    DEFAULT SETTINGS FOR THIS CLASS:

    - cache_dir: None (uses Hugging Face defaults)
    - offline: False (online mode by default)
    - use_gpu: False (CPU by default)
    - verbose: True (logging enabled)
    - model_name: None (no model loaded initially)
    """

    def __init__(
        self,
        cache_dir: Optional[str] = None,
        offline: bool = False,
        use_gpu: bool = False,
        verbose: bool = True,
        model_name: Optional[str] = None, 
    ) -> None:
        """
        Initialises the manager and configures environment variables.

        Args:
            cache_dir: Custom cache directory. If None, uses Hugging Face defaults.
            offline: If True, enables offline mode (no network access).
            use_gpu: If True, moves models to GPU.
            verbose: If True, enables info-level logging.
            model_name: If provided, loads this model/tokeniser immediately.
        """
        self.cache_dir = cache_dir
        self.offline = offline
        self.use_gpu = use_gpu
        self.verbose = verbose
        self.model: Optional[PreTrainedModel] = None
        self.tokenizer: Optional[PreTrainedTokenizer] = None
        self.model_name: Optional[str] = None  
        self._set_env_vars()
        if model_name is not None:
            self.load_model_and_tokenizer(model_name)

    def _set_env_vars(self) -> None:
        """Set environment variables for tokenizer, cache and offline mode."""
        os.environ["TOKENIZERS_PARALLELISM"] = "false"   # Always disable tokenizers parallelism to avoid fork warnings
        if self.cache_dir:
            os.environ["TRANSFORMERS_CACHE"] = self.cache_dir
            os.environ["HF_HOME"] = self.cache_dir
        if self.offline:
            os.environ["HF_HUB_OFFLINE"] = "1"
        else:
            os.environ.pop("HF_HUB_OFFLINE", None)
        if self.verbose:
            logger.info(f"Cache directory: {self.get_base_cache_dir()}")
            logger.info(f"Offline mode: {self.offline}")

    def get_base_cache_dir(self) -> Path:
        """
        Resolve the base Hugging Face cache directory.

        Returns:
            Path to the base cache directory.
        """
        return Path(
            self.cache_dir or
            os.environ.get("TRANSFORMERS_CACHE") or
            os.environ.get("HF_HOME") or
            Path.home() / ".cache" / "huggingface"
        )

    def get_hub_cache_dir(self) -> Path:
        """
        Return the hub subdirectory for cached models and datasets.

        Returns:
            Path to the hub cache directory.
        """
        return self.get_base_cache_dir() / "hub"
    
    def get_model_name(self) -> Optional[str]:
        """
        Returns the currently loaded model name, or None if not loaded.
        """
        return self.model_name

    def display_configuration_summary(self) -> Dict[str, Any]:
        """
        Print and return a summary of the Hugging Face environment configuration.

        Returns:
            Dictionary containing configuration details.
        """
        default_hf_home = str(Path.home() / ".cache" / "huggingface")
        default_transformers_cache = os.path.join(default_hf_home, "transformers")
        default_datasets_cache = os.path.join(default_hf_home, "datasets")

        transformers_cache = os.environ.get("TRANSFORMERS_CACHE", default_transformers_cache)
        hf_home = os.environ.get("HF_HOME", default_hf_home)
        hf_datasets_cache = os.environ.get("HF_DATASETS_CACHE", default_datasets_cache)
        tokenizers_parallelism = os.environ.get("TOKENIZERS_PARALLELISM", None) 

        cache_dir_display = str(self.get_base_cache_dir())

        model_id = self.model_name  
        hub_url = self.get_hf_hub_url(model_id) if model_id else None
    
        summary = {
            "Cache Directory": cache_dir_display,
            "TRANSFORMERS_CACHE": transformers_cache,
            "HF_HOME": hf_home,
            "HF_DATASETS_CACHE": hf_datasets_cache,
            "Offline Mode": self.offline,
            "Use GPU": self.use_gpu,
            "Verbose": self.verbose,
            "Hub URL": hub_url,
            "TOKENIZERS_PARALLELISM": tokenizers_parallelism, 
        }

        logger.info("Hugging Face Manager Configuration Summary:")
        for k, v in summary.items():
            logger.info(f"{k}: {v}")

        return summary


    def load_model_and_tokenizer(
            self, model_name: str
        ) -> None:
            """
            Load a Hugging Face model and tokeniser, preferring cache/local files if offline.
            Stores them as instance variables.
            """
            try:
                logger.info(f"Loading model and tokeniser: {model_name}")
                self.tokenizer = AutoTokenizer.from_pretrained(
                    model_name,
                    cache_dir=str(self.get_base_cache_dir()),
                    local_files_only=self.offline
                )
                self.model = AutoModelForCausalLM.from_pretrained(
                    model_name,
                    cache_dir=str(self.get_base_cache_dir()),
                    local_files_only=self.offline
                )
                if self.use_gpu and self.model is not None:
                    self.model.to("cuda")
                if self.tokenizer and not self.tokenizer.pad_token:
                    self.tokenizer.pad_token = self.tokenizer.eos_token
                self.model_name = model_name  
                logger.info(f"Loaded model and tokeniser for: {model_name}")
            except Exception as e:
                logger.error(f"Failed to load model/tokeniser: {e}")
                raise

    def load_dataset(
        self, dataset_name: str, split: str = "train", **kwargs
    ) -> Dataset:
        """
        Load a Hugging Face dataset, preferring cache/local files if offline.

        Args:
            dataset_name: Name or path of the dataset.
            split: Dataset split to load (e.g. "train").
            **kwargs: Additional arguments for load_dataset.

        Returns:
            Loaded Dataset object.
        """
        try:
            logger.info(f"Loading dataset: {dataset_name} (split={split})")
            try:
                dataset = load_dataset(
                    dataset_name,
                    split=split,
                    cache_dir=str(self.get_base_cache_dir()),
                    local_files_only=self.offline,
                    **kwargs
                )
            except (TypeError, ValueError) as e:
                if "local_files_only" in str(e):
                    logger.warning("Retrying without local_files_only (not supported by this dataset builder).")
                    dataset = load_dataset(
                        dataset_name,
                        split=split,
                        cache_dir=str(self.get_base_cache_dir()),
                        **kwargs
                    )
                else:
                    raise
            logger.info(f"Loaded dataset: {dataset_name}")
            return dataset
        except Exception as e:
            logger.error(f"Failed to load dataset: {e}")
            raise



    def check_model_downloaded(self, model_name: str) -> bool:
        """
        Check if a model is available locally in the cache.

        Args:
            model_name: Name or path of the model.

        Returns:
            True if the model is cached locally, False otherwise.
        """
        try:
            _ = AutoModelForCausalLM.from_pretrained(
                model_name,
                cache_dir=str(self.get_base_cache_dir()),
                local_files_only=True
            )
            logger.info(f"Model '{model_name}' is available locally.")
            return True
        except Exception:
            logger.info(f"Model '{model_name}' is NOT available locally.")
            return False

    def display_dataset(self, dataset: Dataset, n: int = 3) -> None:
        """
        Display a sample of the dataset in tabular form.

        Args:
            dataset: The Hugging Face Dataset object.
            n: Number of rows to display.
        """
        rows = []
        for i in range(min(n, len(dataset))):
            example = dataset[i]
            user_msg = next(m['content'] for m in example['messages'] if m['role'] == 'user')
            assistant_msg = next(m['content'] for m in example['messages'] if m['role'] == 'assistant')
            rows.append({'User Prompt': user_msg, 'Assistant Response': assistant_msg})
        df = pd.DataFrame(rows)
        pd.set_option('display.max_colwidth', None)
        logger.info(f"Displaying {n} rows from dataset.")
        display(df)

    def list_local_models(self) -> List[str]:
        """
        List all locally cached Hugging Face models.

        Returns:
            List of model names available in the local cache.
        """
        hub_dir = self.get_hub_cache_dir()
        model_dirs = sorted(hub_dir.glob("models--*"))
        models = [d.name.replace("models--", "") for d in model_dirs if d.is_dir()]
        logger.info("Locally cached models:")
        for m in models:
            logger.info(f"- {m}")
        return models

    def list_local_datasets(self) -> List[str]:
        """
        List all locally cached Hugging Face datasets.

        Returns:
            List of dataset names available in the local cache.
        """
        hub_dir = self.get_hub_cache_dir()
        dataset_dirs = sorted(hub_dir.glob("datasets--*"))
        datasets = [d.name.replace("datasets--", "") for d in dataset_dirs if d.is_dir()]
        logger.info("Locally cached datasets:")
        for d in datasets:
            logger.info(f"- {d}")
        return datasets

    def list_local_transformers(self) -> List[str]:
        """
        Alias for listing locally cached models.

        Returns:
            List of model names available in the local cache.
        """
        return self.list_local_models()

    def generate_response(
        self,
        user_message: str,
        system_message: Optional[str] = None,
        max_new_tokens: int = 100
    ) -> str:
        """
        Generate a response using the loaded model and tokeniser.

        Args:
            user_message: The user's message string.
            system_message: Optional system message string.
            max_new_tokens: Maximum number of new tokens to generate.

        Returns:
            The generated response as a string.
        """
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model and tokeniser must be loaded first.")

        messages = []
        if system_message:
            messages.append({"role": "system", "content": system_message})
        messages.append({"role": "user", "content": user_message})

        prompt = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True,
            enable_thinking=False,
        )

        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                do_sample=False,
                pad_token_id=self.tokenizer.eos_token_id,
                eos_token_id=self.tokenizer.eos_token_id,
            )
        input_len = inputs["input_ids"].shape[1]
        generated_ids = outputs[0][input_len:]
        response = self.tokenizer.decode(generated_ids, skip_special_tokens=True).strip()
        logger.info("Generated response.")
        return response

    def unload_model_and_tokenizer(self) -> None:
        """
        Remove references to the loaded model and tokeniser to free memory.
        """
        self.model = None
        self.tokenizer = None
        self.model_name = None   # <-- UNSET MODEL NAME HERE
        torch.cuda.empty_cache()

    @staticmethod
    def get_hf_hub_url(model_name: str) -> Optional[str]:
        """
        Convert a Hugging Face model or dataset identifier to a Hub URL.

        Args:
            model_name: The model or dataset identifier, e.g., "Qwen/Qwen3-0.6B-Base".

        Returns:
            The corresponding Hugging Face Hub URL, or None if the format is invalid.
        """
        if not isinstance(model_name, str) or "/" not in model_name:
            return None
        return f"https://huggingface.co/{model_name}"


In [4]:
def test_model_with_questions(
    manager: HuggingFaceManager,
    questions: list,
    system_message: Optional[str] = None,
    title: str = "Model Output"
) -> None:
    """
    Test the loaded model in the HuggingFaceManager with a list of questions.
    """
    if manager.model is None or manager.tokenizer is None:
        raise ValueError("Model and tokeniser must be loaded first.")

    print(f"\n=== {title} ===")
    for i, question in enumerate(questions, 1):
        response = manager.generate_response(
            user_message=question,
            system_message=system_message
        )
        print(f"\nModel Input {i}:\n{question}\nModel Output {i}:\n{response}\n")

In [5]:
USE_GPU = False

hf = HuggingFaceManager(use_gpu=USE_GPU)

[32m2025-07-14 11:46:51.111[0m | [1mINFO    [0m | [36m__main__[0m:[36m_set_env_vars[0m:[36m102[0m - [1mCache directory: /Users/mjboothaus/.cache/huggingface[0m
[32m2025-07-14 11:46:51.112[0m | [1mINFO    [0m | [36m__main__[0m:[36m_set_env_vars[0m:[36m103[0m - [1mOffline mode: False[0m


## Load base model & test on simple questions

In [6]:
BASE_MODEL = "Qwen/Qwen3-0.6B-Base"

hf.load_model_and_tokenizer(BASE_MODEL)

[32m2025-07-14 11:46:51.139[0m | [1mINFO    [0m | [36m__main__[0m:[36mload_model_and_tokenizer[0m:[36m182[0m - [1mLoading model and tokeniser: Qwen/Qwen3-0.6B-Base[0m
[32m2025-07-14 11:46:55.935[0m | [1mINFO    [0m | [36m__main__[0m:[36mload_model_and_tokenizer[0m:[36m198[0m - [1mLoaded model and tokeniser for: Qwen/Qwen3-0.6B-Base[0m


In [7]:
hf.display_configuration_summary()

[32m2025-07-14 11:46:55.953[0m | [1mINFO    [0m | [36m__main__[0m:[36mdisplay_configuration_summary[0m:[36m167[0m - [1mHugging Face Manager Configuration Summary:[0m
[32m2025-07-14 11:46:55.955[0m | [1mINFO    [0m | [36m__main__[0m:[36mdisplay_configuration_summary[0m:[36m169[0m - [1mCache Directory: /Users/mjboothaus/.cache/huggingface[0m
[32m2025-07-14 11:46:55.956[0m | [1mINFO    [0m | [36m__main__[0m:[36mdisplay_configuration_summary[0m:[36m169[0m - [1mTRANSFORMERS_CACHE: /Users/mjboothaus/.cache/huggingface/transformers[0m
[32m2025-07-14 11:46:55.956[0m | [1mINFO    [0m | [36m__main__[0m:[36mdisplay_configuration_summary[0m:[36m169[0m - [1mHF_HOME: /Users/mjboothaus/.cache/huggingface[0m
[32m2025-07-14 11:46:55.957[0m | [1mINFO    [0m | [36m__main__[0m:[36mdisplay_configuration_summary[0m:[36m169[0m - [1mHF_DATASETS_CACHE: /Users/mjboothaus/.cache/huggingface/datasets[0m
[32m2025-07-14 11:46:55.957[0m | [1mINFO    [0m

{'Cache Directory': '/Users/mjboothaus/.cache/huggingface',
 'TRANSFORMERS_CACHE': '/Users/mjboothaus/.cache/huggingface/transformers',
 'HF_HOME': '/Users/mjboothaus/.cache/huggingface',
 'HF_DATASETS_CACHE': '/Users/mjboothaus/.cache/huggingface/datasets',
 'Offline Mode': False,
 'Use GPU': False,
 'Verbose': True,
 'Hub URL': 'https://huggingface.co/Qwen/Qwen3-0.6B-Base',
 'TOKENIZERS_PARALLELISM': 'false'}

In [8]:
questions = [
    "Give me an 1-sentence introduction of LLM.",
    "Calculate 1+1-1",
    "What's the difference between thread and process?"
]

## Base Model (Before SFT) Output

In [9]:
test_model_with_questions(hf, questions, title="Base Model (Before SFT) Output")


=== Base Model (Before SFT) Output ===


[32m2025-07-14 11:47:08.943[0m | [1mINFO    [0m | [36m__main__[0m:[36mgenerate_response[0m:[36m357[0m - [1mGenerated response.[0m



Model Input 1:
Give me an 1-sentence introduction of LLM.
Model Output 1:
⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ �



[32m2025-07-14 11:47:20.318[0m | [1mINFO    [0m | [36m__main__[0m:[36mgenerate_response[0m:[36m357[0m - [1mGenerated response.[0m



Model Input 2:
Calculate 1+1-1
Model Output 2:
⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ �



[32m2025-07-14 11:47:33.567[0m | [1mINFO    [0m | [36m__main__[0m:[36mgenerate_response[0m:[36m357[0m - [1mGenerated response.[0m



Model Input 3:
What's the difference between thread and process?
Model Output 3:
⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ ⚇ �



*Note: In the outputs above, the model output is not as expected, indicating that the model has not been fine-tuned yet.*

The tokenizer is able to tokenize the input questions, but the model's responses are not aligned with the expected answers. This is expected behavior since we are using a base model that has not undergone any fine-tuning.

In [10]:
hf.unload_model_and_tokenizer()

## SFT results on Qwen3-0.6B model

In this section, we're reviewing the results of a previously completed SFT training. Due to limited resources, we won’t be running the full training on a relatively large model like Qwen3-0.6B. However, in the next section of this notebook, you’ll walk through the full training process using a smaller model and a lightweight dataset.

In [11]:
QWEN_SMALL_SFT = "banghua/Qwen3-0.6B-SFT"

In [12]:
hf.load_model_and_tokenizer(QWEN_SMALL_SFT)

[32m2025-07-14 11:47:33.765[0m | [1mINFO    [0m | [36m__main__[0m:[36mload_model_and_tokenizer[0m:[36m182[0m - [1mLoading model and tokeniser: banghua/Qwen3-0.6B-SFT[0m
[32m2025-07-14 11:47:35.441[0m | [1mINFO    [0m | [36m__main__[0m:[36mload_model_and_tokenizer[0m:[36m198[0m - [1mLoaded model and tokeniser for: banghua/Qwen3-0.6B-SFT[0m


In [13]:
test_model_with_questions(hf, questions, title="Base Model (After SFT) Output")


=== Base Model (After SFT) Output ===


[32m2025-07-14 11:47:41.970[0m | [1mINFO    [0m | [36m__main__[0m:[36mgenerate_response[0m:[36m357[0m - [1mGenerated response.[0m



Model Input 1:
Give me an 1-sentence introduction of LLM.
Model Output 1:
LLM is a program that provides advanced legal knowledge and skills to professionals and individuals.



[32m2025-07-14 11:47:45.174[0m | [1mINFO    [0m | [36m__main__[0m:[36mgenerate_response[0m:[36m357[0m - [1mGenerated response.[0m



Model Input 2:
Calculate 1+1-1
Model Output 2:
1+1-1 = 2-1 = 1

So, the final answer is 1.



[32m2025-07-14 11:47:57.309[0m | [1mINFO    [0m | [36m__main__[0m:[36mgenerate_response[0m:[36m357[0m - [1mGenerated response.[0m



Model Input 3:
What's the difference between thread and process?
Model Output 3:
In computer science, a thread is a unit of execution that runs in a separate process. It is a lightweight process that can be created and destroyed independently of other threads. Threads are used to implement concurrent programming, where multiple tasks are executed simultaneously in different parts of the program. Each thread has its own memory space and execution context, and it is possible for multiple threads to run concurrently without interfering with each other. Threads are also known as lightweight processes.



In [14]:
hf.unload_model_and_tokenizer()

## Doing SFT on a small model

**Note:** We're performing SFT on a small model <code>HuggingFaceTB/SmolLM2-135M</code> and a smaller training dataset to to ensure the full training process can run on limited computational resources. If you're running the notebooks on your own machine and have access to a GPU, feel free to switch to a larger model—such as <code>Qwen/Qwen3-0.6B-Base</code>—to perform full SFT and reproduce the results shown above.

In [18]:
model_name = "HuggingFaceTB/SmolLM2-135M"

hf.load_model_and_tokenizer(model_name)

[32m2025-07-14 11:50:41.137[0m | [1mINFO    [0m | [36m__main__[0m:[36mload_model_and_tokenizer[0m:[36m182[0m - [1mLoading model and tokeniser: HuggingFaceTB/SmolLM2-135M[0m
[32m2025-07-14 11:50:43.242[0m | [1mINFO    [0m | [36m__main__[0m:[36mload_model_and_tokenizer[0m:[36m198[0m - [1mLoaded model and tokeniser for: HuggingFaceTB/SmolLM2-135M[0m


In [20]:
train_dataset = load_dataset("banghua/DL-SFT-Dataset", split="train")
print(train_dataset)
print(train_dataset[0])  # Show the first example

Dataset({
    features: ['messages'],
    num_rows: 2961
})
{'messages': [{'content': "- The left child should have a value less than the parent node's value, and the right child should have a value greater than the parent node's value.", 'role': 'user'}, {'content': "This statement is correct. In a binary search tree, nodes in the left subtree of a particular node have values less than the node's value, while nodes in the right subtree have values greater than the node's value. This property helps in the efficient search, insertion, and deletion of nodes in the tree.", 'role': 'assistant'}]}


In [23]:
from pprint import pprint

pprint(train_dataset[0]['messages'])

[{'content': "- The left child should have a value less than the parent node's "
             'value, and the right child should have a value greater than the '
             "parent node's value.",
  'role': 'user'},
 {'content': 'This statement is correct. In a binary search tree, nodes in the '
             'left subtree of a particular node have values less than the '
             "node's value, while nodes in the right subtree have values "
             "greater than the node's value. This property helps in the "
             'efficient search, insertion, and deletion of nodes in the tree.',
  'role': 'assistant'}]


In [24]:
# hf.load_dataset("banghua/DL-SFT-Dataset", split="train")

In [None]:
# train_dataset = load_dataset("banghua/DL-SFT-Dataset")["train"]
# if not USE_GPU:
#     train_dataset=train_dataset.select(range(100))

# display_dataset(train_dataset)

### `SFTTrainer` config 


In [26]:
sft_config = SFTConfig(
    learning_rate=8e-5, # Learning rate for training. 
    num_train_epochs=1, #  Set the number of epochs to train the model.
    per_device_train_batch_size=1, # Batch size for each device (e.g., GPU) during training. 
    gradient_accumulation_steps=8, # Number of steps before performing a backward/update pass to accumulate gradients.
    gradient_checkpointing=False, # Enable gradient checkpointing to reduce memory usage during training at the cost of slower training speed.
    logging_steps=2,  # Frequency of logging training progress (log every 2 steps).
    bf16=False,      # Disable bfloat16 (required for CPU or unsupported GPU)
    fp16=False,      # Disable float16 (required for CPU or unsupported GPU)
)

In [27]:
sft_trainer = SFTTrainer(
    model=hf.model,
    args=sft_config,
    train_dataset=train_dataset, 
    processing_class=hf.tokenizer,
)

Tokenizing train dataset:   0%|          | 0/2961 [00:00<?, ? examples/s]


ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

In [None]:
sft_trainer.train()

## Testing training results on small model and small dataset

**Note:** The following results are for the small model and dataset we used for SFT training, due to limited computational resources. To view the results of full-scale training on a larger model, see the **"SFT Results on Qwen3-0.6B Model"** section above.

In [None]:
if not USE_GPU: # move model to CPU when GPU isn’t requested
    sft_trainer.model.to("cpu")
test_model_with_questions(sft_trainer.model, tokenizer, questions, 
                          title="Base Model (After SFT) Output")