To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**Read our [blog post](https://unsloth.ai/blog/r1-reasoning) for guidance on how to train reasoning models.**

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [1]:
%%capture
# Skip restarting message in Colab
import sys; modules = list(sys.modules.keys())
for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None

!pip install unsloth vllm
!pip install --upgrade pillow
# If you are running this notebook on local, you need to install `diffusers` too
# !pip install diffusers
# Temporarily install a specific TRL nightly version
!pip install git+https://github.com/huggingface/trl.git@e95f9fb74a3c3647b86f251b7e230ec51c64b72b

### Unsloth

Use `PatchFastRL` before all functions to patch GRPO and other RL algorithms!

In [2]:
from unsloth import FastLanguageModel, PatchFastRL
PatchFastRL("GRPO", FastLanguageModel)

Unsloth: Patching Xformers to fix some performance issues.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO 02-12 23:18:08 __init__.py:190] Automatically detected platform cuda.


Load up `Qwen 2.5 3B Instruct`, and set parameters

In [None]:
from unsloth import is_bfloat16_supported
import torch
max_seq_length = 32000 # Can increase for longer reasoning traces
lora_rank = 64 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Qwen/Qwen2.5-3B-Instruct",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.5, # Reduce if out of memory
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ], # Remove QKVO if out of memory
    lora_alpha = lora_rank,
    use_gradient_checkpointing = "unsloth", # Enable long context finetuning
    random_state = 3407,
)

==((====))==  Unsloth 2025.2.5: Fast Qwen2 patching. Transformers: 4.48.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: vLLM loading unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit with actual GPU utilization = 26.67%
Unsloth: Your GPU has CUDA compute capability 8.0 with VRAM = 39.56 GB.
Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 32000. Num Sequences = 224.
Unsloth: vLLM's KV Cache can use up to 8.13 GB. Also swap space = 6 GB.


### Data Prep
<a name="Data"></a>

We directly leverage [@willccbb](https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb) for data prep and all reward functions. You are free to create your own!

In [4]:
import re
import difflib
from datasets import load_dataset, Dataset

# System prompt instructing the model to output only the <reasoning> block,
# followed immediately by the final answer in the form of a git commit message.
SYSTEM_PROMPT = """
Please respond using the following format:
<reasoning>
Your chain-of-thought here.
</reasoning>
Your final answer should appear immediately after the </reasoning> tag and must be a git commit message that adheres to the following guidelines:

- **Title (Subject Line):**
  - Use the imperative mood (e.g., "Fix bug" not "Fixed bug" or "Fixes bug").
  - Capitalize the first letter.
  - Do not end with a period.
  - Keep to a maximum of 50 characters.

- **Body:**
  - Separate from the title with a blank line.
  - Explain the *what* and *why* of the change, not the *how*.
  - Wrap lines at 72 characters.

- **Additional Recommendations:**
  - Use bullet points for multiple items, if necessary.

Example:

<reasoning>
Analyzed the current implementation and identified an off-by-one error in the loop causing index out-of-range exceptions. Adjusted the loop condition to prevent this error.
</reasoning>
Fix off-by-one error in loop

The loop was iterating one time too many, leading to index out-of-range exceptions. Adjusting the termination condition ensures it stays within valid bounds.
"""


# Template for generation
XML_COT_FORMAT = """\
<reasoning>
{reasoning}
</reasoning>
{answer}
"""

def extract_xml_answer(text: str) -> str:
    """
    Safely extracts the final commit message which comes after the </reasoning> tag.
    Returns an empty string if the tag is not found.
    """
    if "</reasoning>" not in text:
        return ""
    parts = text.split("</reasoning>", 1)
    return parts[1].strip()

def extract_hash_answer(text: str) -> str | None:
    """
    Extracts the expected commit message from within a git commit message code block.
    The expected answer is assumed to be enclosed in a code block marked by ```git-commit-message and a closing ```.

    Example:
    ```git-commit-message
    Commit message here.
    ```
    """
    marker = "```git-commit-message"
    if marker not in text:
        return None
    try:
        after_marker = text.split(marker, 1)[1]
        commit_message = after_marker.split("```", 1)[0].strip()
        return commit_message
    except IndexError:
        return None

def get_gsm8k_questions(split="train") -> Dataset:
    """
    Prepares the GSM8K dataset by inserting the system prompt and the user question.
    The expected answer is extracted from a git commit message code block in the dataset entry.
    """
    data = load_dataset('Tavernari/git-commit-message-dt', 'default')[split]  # type: ignore
    data = data.map(lambda x: {  # type: ignore
        'prompt': [
            {'role': 'system', 'content': SYSTEM_PROMPT},
            {'role': 'user', 'content': x['input']}
        ],
        'answer': extract_hash_answer(x['output'])
    })  # type: ignore
    return data  # type: ignore

dataset = get_gsm8k_questions()

# ------------------------
# Reward Functions
# ------------------------

def extract_xml_answer(text: str) -> str:
    """
    Extracts the commit message after the </reasoning> tag.
    Returns empty if <reasoning> or </reasoning> is missing.
    """
    if "<reasoning>" not in text or "</reasoning>" not in text:
        return ""
    parts = text.split("</reasoning>", 1)
    return parts[1].strip()

def contains_code_snippet(text: str) -> bool:
    """
    Checks if the text contains any code snippet marked by triple backticks (```).
    """
    return "```" in text

def contains_unreadable_text(text: str) -> bool:
    """
    Identifies unreadable or gibberish content.
    - If text contains too many special characters.
    - If text looks like base64 encoding or random strings.
    """
    return bool(re.search(r'[^a-zA-Z0-9 .,
\-?!:;()]+', text))

def correctness_reward_func(prompts, completions, answer, **kwargs) -> list[float]:
    """
    Computes correctness reward based on similarity to expected answer.
    Penalizes responses missing <reasoning> or </reasoning>, containing snippets, or unreadable content.
    """
    responses = [completion[0]['content'] for completion in completions]
    extracted_responses = [extract_xml_answer(r) for r in responses]
    scores = []

    for resp, expected in zip(extracted_responses, answer):
        if not resp:
            scores.append(0.0)
            continue
        ratio = difflib.SequenceMatcher(None, resp, expected).ratio()
        score = 2.0 * ratio

        # Apply penalties
        if contains_code_snippet(resp) or contains_unreadable_text(resp):
            score = 0.0

        scores.append(score)

    return scores

def format_reward_func(completions, **kwargs) -> list[float]:
    """
    Ensures commit message follows proper formatting.
    Penalizes missing <reasoning> or </reasoning>, code snippets, or unreadable text.
    """
    responses = [completion[0]['content'] for completion in completions]
    extracted_responses = [extract_xml_answer(r) for r in responses]
    scores = []

    for msg in extracted_responses:
        if not msg:
            scores.append(0.0)
            continue

        lines = msg.splitlines()
        title = lines[0].strip() if lines else ""
        if not title or len(title) > 70:
            scores.append(0.0)
            continue

        try:
            blank_index = lines.index('')
        except ValueError:
            scores.append(0.0)
            continue

        body = "\n".join(lines[blank_index+1:]).strip()
        score = 0.5 if body else 0.0

        # Apply penalties
        if contains_code_snippet(msg) or contains_unreadable_text(msg):
            score = 0.0

        scores.append(score)

    return scores

def strict_format_reward_func(completions, **kwargs) -> list[float]:
    """
    Checks if output strictly follows <reasoning> block and commit format.
    Penalizes missing <reasoning> or </reasoning>, code snippets, or unreadable text.
    """
    pattern = r"^<reasoning>\n.*?\n</reasoning>\n(.+)\n\n(.+)$"
    responses = [completion[0]["content"] for completion in completions]
    scores = [0.5 if re.match(pattern, r, re.DOTALL) else 0.0 for r in responses]

    for i, r in enumerate(responses):
        if contains_code_snippet(r) or contains_unreadable_text(r) or not extract_xml_answer(r):
            scores[i] = 0.0

    return scores

def soft_format_reward_func(completions, **kwargs) -> list[float]:
    """
    Checks if the output contains the <reasoning> block and some content after the </reasoning> tag.
    Penalizes missing <reasoning> or </reasoning>, code snippets, or unreadable text.
    """
    pattern = r"<reasoning>.*?</reasoning>\s*.+"
    responses = [completion[0]["content"] for completion in completions]
    scores = [0.5 if re.match(pattern, r, re.DOTALL) else 0.0 for r in responses]

    for i, r in enumerate(responses):
        if contains_code_snippet(r) or contains_unreadable_text(r) or not extract_xml_answer(r):
            scores[i] = 0.0

    return scores

def xmlcount_reward_func(completions, **kwargs) -> list[float]:
    """
    Applies a structured scoring system for XML and commit formatting.
    Penalizes missing <reasoning> or </reasoning>, code snippets, or unreadable text.
    """
    responses = [completion[0]["content"] for completion in completions]
    scores = []

    for text in responses:
        if "<reasoning>" not in text or "</reasoning>" not in text:
            scores.append(0.0)
            continue

        score = 0.25
        after_reasoning = text.split("</reasoning>", 1)[-1].strip()

        if after_reasoning:
            lines = after_reasoning.splitlines()
            if lines and 0 < len(lines[0].strip()) <= 70:
                score += 0.125
            if len(lines) >= 3 and lines[1].strip() == "":
                body = "\n".join(lines[2:]).strip()
                if body:
                    score += 0.125

        # Apply penalties
        if contains_code_snippet(text) or contains_unreadable_text(text):
            score = 0.0

        scores.append(score)

    return scores

README.md:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

dataset.json:   0%|          | 0.00/12.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/2535 [00:00<?, ? examples/s]

Map:   0%|          | 0/2535 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model

Now set up GRPO Trainer and all configurations!

In [5]:
from trl import GRPOConfig, GRPOTrainer
training_args = GRPOConfig(
    use_vllm = True, # use vLLM for fast inference!
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    bf16 = is_bfloat16_supported(),
    fp16 = not is_bfloat16_supported(),
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 1, # Increase to 4 for smoother training
    num_generations = 8, # Decrease if out of memory
    max_prompt_length = 1024,
    max_completion_length = 1024,
    # num_train_epochs = 1, # Set to 1 for a full training run
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none", # Can use Weights & Biases
    output_dir = "outputs",
)

torch.distributed process group is initialized, but parallel_mode != ParallelMode.DISTRIBUTED. In order to use Torch DDP, launch your script with `python -m torch.distributed.launch


And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase!

You might have to wait 150 to 200 steps for any action. You'll probably get 0 reward for the first 100 steps. Please be patient!

| Step | Training Loss | reward    | reward_std | completion_length | kl       |
|------|---------------|-----------|------------|-------------------|----------|
| 1    | 0.000000      | 0.125000  | 0.000000   | 200.000000        | 0.000000 |
| 2    | 0.000000      | 0.072375  | 0.248112   | 200.000000        | 0.000000 |
| 3    | 0.000000      | -0.079000 | 0.163776   | 182.500000        | 0.000005 |


In [7]:
trainer = GRPOTrainer(
    model = model,
    processing_class = tokenizer,
    reward_funcs = [
        xmlcount_reward_func,
        soft_format_reward_func,
        strict_format_reward_func,
        int_reward_func,
        correctness_reward_func,
    ],
    args = training_args,
    train_dataset = dataset,
)
trainer.train()

NameError: name 'soft_format_reward_func' is not defined

<a name="Inference"></a>
### Inference
Now let's try the model we just trained! First, let's first try the model without any GRPO trained:

In [None]:
text = tokenizer.apply_chat_template([
    {"role" : "user", "content" : "diff --git a/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift b/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift --- a/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift +++ b/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift @@ -18,10 +18,9 @@ private enum Constants { - static let invalidateCellDelay: TimeInterval = 0.1 + static let invalidateCellDebounceTime: TimeInterval = 0.05 static let topCellOffset: Double = 60 } private var focusState: FocusState = .into - private var scrollingDirection: ScrollingDirection? = .none private var animateAfterPosY: Double? @@ -166,17 +165,8 @@ func scrollViewDidScroll(_ scrollView: UIScrollView) { guard let currentFocusedIndexPath, - let scrollingDirection, shouldInvalidate(yPos: scrollView.contentOffset.y) else { return } - let delay: Double = switch scrollingDirection { - case .down: - 0 - case .up: - Constants.invalidateCellDelay - } - - invalidateCellForFocusedIndexPath(currentFocusedIndexPath, after: delay) - self.scrollingDirection = .none + invalidateCellForFocusedIndexPath(currentFocusedIndexPath) self.animateAfterPosY = nil } @@ -193,15 +183,10 @@ let scrollDirectionDifference = targetOffset - scrollView.contentOffset.y - scrollingDirection = scrollDirectionDifference > 0 ? .down : .up animateAfterPosY = scrollView.contentOffset.y + scrollDirectionDifference / 2 } - func invalidateCellForFocusedIndexPath(_ indexPath: IndexPath, after duration: TimeInterval) { + func invalidateCellForFocusedIndexPath(_ indexPath: IndexPath) { debounceInvalidateCellWorkItem?.cancel() - guard duration > .zero else { - collectionView.cellForItem(at: indexPath)?.invalidateIntrinsicContentSize() - return - } - + let workItem = DispatchWorkItem { [weak self] in self?.collectionView.cellForItem(at: indexPath)?.invalidateIntrinsicContentSize() @@ -210,5 +195,5 @@ debounceInvalidateCellWorkItem = workItem DispatchQueue.main.asyncAfter( - deadline: .now() + duration, + deadline: .now() + Constants.invalidateCellDebounceTime, execute: workItem )"},
], tokenize = False, add_generation_prompt = True)

from vllm import SamplingParams
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    [text],
    sampling_params = sampling_params,
    lora_request = None,
)[0].outputs[0].text

output

And now with the LoRA we just trained with GRPO - we first save the LoRA first!

In [None]:
model.save_lora("grpo_saved_lora")

Now we load the LoRA and test:

In [None]:
text = tokenizer.apply_chat_template([
    {"role" : "system", "content" : SYSTEM_PROMPT},
    {"role" : "user", "content" : "diff --git a/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift b/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift --- a/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift +++ b/Project/Scenes/Browse/InteractiveSchedule/Vertical Collection/Browse.InteractiveSchedule.VerticalCollection+ViewController.swift @@ -18,10 +18,9 @@ private enum Constants { - static let invalidateCellDelay: TimeInterval = 0.1 + static let invalidateCellDebounceTime: TimeInterval = 0.05 static let topCellOffset: Double = 60 } private var focusState: FocusState = .into - private var scrollingDirection: ScrollingDirection? = .none private var animateAfterPosY: Double? @@ -166,17 +165,8 @@ func scrollViewDidScroll(_ scrollView: UIScrollView) { guard let currentFocusedIndexPath, - let scrollingDirection, shouldInvalidate(yPos: scrollView.contentOffset.y) else { return } - let delay: Double = switch scrollingDirection { - case .down: - 0 - case .up: - Constants.invalidateCellDelay - } - - invalidateCellForFocusedIndexPath(currentFocusedIndexPath, after: delay) - self.scrollingDirection = .none + invalidateCellForFocusedIndexPath(currentFocusedIndexPath) self.animateAfterPosY = nil } @@ -193,15 +183,10 @@ let scrollDirectionDifference = targetOffset - scrollView.contentOffset.y - scrollingDirection = scrollDirectionDifference > 0 ? .down : .up animateAfterPosY = scrollView.contentOffset.y + scrollDirectionDifference / 2 } - func invalidateCellForFocusedIndexPath(_ indexPath: IndexPath, after duration: TimeInterval) { + func invalidateCellForFocusedIndexPath(_ indexPath: IndexPath) { debounceInvalidateCellWorkItem?.cancel() - guard duration > .zero else { - collectionView.cellForItem(at: indexPath)?.invalidateIntrinsicContentSize() - return - } - + let workItem = DispatchWorkItem { [weak self] in self?.collectionView.cellForItem(at: indexPath)?.invalidateIntrinsicContentSize() @@ -210,5 +195,5 @@ debounceInvalidateCellWorkItem = workItem DispatchQueue.main.asyncAfter( - deadline: .now() + duration, + deadline: .now() + Constants.invalidateCellDebounceTime, execute: workItem )"},
], tokenize = False, add_generation_prompt = True)

from vllm import SamplingParams
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    text,
    sampling_params = sampling_params,
    lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text

output

Our reasoning model is much better - it's not always correct, since we only trained it for an hour or so - it'll be better if we extend the sequence length and train for longer!

<a name="Save"></a>
### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if True: model.push_to_hub_gguf("Tavernari/git-commit-message", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "Tavernari/git-commit-message", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "",
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Llama 3.2 Conversational notebook. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
