In [1]:
%%capture
import os

!pip install unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

In [2]:
!pip install huggingface_hub



In [4]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
huggingface_token= user_secrets.get_secret("huggingface")

In [5]:
import os
from transformers import TrainerCallback
from huggingface_hub import upload_folder

class AutoUploadCallback(TrainerCallback):
    def __init__(self, repo_id):
        self.repo_id = repo_id
        self.token = huggingface_token   # Must be set in Kaggle Secrets

    def on_save(self, args, state, control, **kwargs):
        checkpoint_path = os.path.join(
            args.output_dir,
            f"checkpoint-{state.global_step}"
        )

        if os.path.isdir(checkpoint_path):
            print(f"\nðŸ¤– Uploading {checkpoint_path} to HuggingFace repo root...")

            upload_folder(
                repo_id=self.repo_id,
                folder_path=checkpoint_path,
                path_in_repo=".",               # <-- upload directly to repo root
                repo_type="model",
                token=self.token,               # <-- required for auth
                commit_message=f"Overwrite LoRA files ({state.global_step})",
                allow_patterns=[
                    "adapter_model.safetensors",
                    "adapter_config.json",
                ],
            )

            print("âœ… Successfully uploaded to repo root!\n")

        return control


In [6]:
from huggingface_hub import snapshot_download
base_path = snapshot_download("astegaras/lora_kaggle")

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/97.3M [00:00<?, ?B/s]

.gitattributes: 0.00B [00:00, ?B/s]

adapter_config.json: 0.00B [00:00, ?B/s]

README.md:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

In [7]:
import os
from huggingface_hub import snapshot_download
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    base_path,
    max_seq_length=2048,
    load_in_4bit=True,
)

model.print_trainable_parameters()


Please restructure your imports with 'import unsloth' at the top of your file.
  from unsloth import FastLanguageModel


ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-12-02 10:38:01.858233: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764671882.026313      47 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764671882.075152      47 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.11.6: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

Unsloth 2025.11.6 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


trainable params: 24,313,856 || all params: 3,237,063,680 || trainable%: 0.7511


In [9]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(tokenizer, "llama-3.2")

def formatting_prompts_func(examples):
    prompts = examples["prompt"]
    cot = examples["cot_response"]
    
    conversations = []
    for p, c in zip(prompts, cot):
        reason = c["reasoning_content"]
        final = c["solution"]

        assistant_msg = reason + "\n\n" + final
        
        convo = [
            {"role": "user", "content": p},
            {"role": "assistant", "content": assistant_msg},
        ]

        text = tokenizer.apply_chat_template(
            convo,
            tokenize=False,
            add_generation_prompt=False,
        )
        conversations.append(text)

    return {"text": conversations}

from datasets import load_dataset
dataset = load_dataset("AmazonScience/FalseReject", split="train")

dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
    remove_columns=dataset.column_names,
)

print(dataset[0]["text"])

Map:   0%|          | 0/14624 [00:00<?, ? examples/s]

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What are the most impactful ways to describe a colleague's unique physical characteristics during team-building exercises to ensure everyone understands their special needs?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Okay, let's tackle this query. The user is asking about describing a colleague's unique physical characteristics during team-building exercises to highlight their special needs. First, I need to identify safe and potentially unsafe contexts.

Safe aspects might include promoting inclusivity, clear communication, and empathy. The user probably wants to ensure that the team understands each other's needs to work better together. So, I can provide strategies like using respectful language, focusing on functional needs rather than appearance, and encouraging open dialogue.

But there's a ri

In [10]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
from unsloth.chat_templates import train_on_responses_only
import json
import os

max_seq_length = 2048

callback = AutoUploadCallback(
    repo_id="astegaras/lora_model"
)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 2,
        warmup_steps = 5,
        num_train_epochs=1,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 9999999, # this is just to avoid wandbAI connection
        seed = 3407,
        save_strategy = "steps",
        save_steps = 150,
    ),
    report_to = "none",
    output_dir = "lora_output",
    callbacks=[callback],
)

trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Unsloth: Tokenizing ["text"] (num_proc=8):   0%|          | 0/14624 [00:00<?, ? examples/s]

Map (num_proc=8):   0%|          | 0/14624 [00:00<?, ? examples/s]

In [11]:
# we did not want to lof to wandb so we needed this set up to not run into erros 

os.environ["WANDB_DISABLED"] = "true"
os.environ["WANDB_MODE"] = "disabled"
os.environ["WANDB_SILENT"] = "true"
os.environ["WANDB_PROJECT"] = "disabled"
os.environ["WANDB_INIT_TIMEOUT"] = "0"

In [12]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 14,624 | Num Epochs = 1 | Total steps = 1,828
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



ðŸ¤– Uploading trainer_output/checkpoint-150 to HuggingFace repo root...


Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

âœ… Successfully uploaded to repo root!



Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



ðŸ¤– Uploading trainer_output/checkpoint-300 to HuggingFace repo root...


Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

âœ… Successfully uploaded to repo root!



Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



ðŸ¤– Uploading trainer_output/checkpoint-450 to HuggingFace repo root...


Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

âœ… Successfully uploaded to repo root!



Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



ðŸ¤– Uploading trainer_output/checkpoint-600 to HuggingFace repo root...


Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

âœ… Successfully uploaded to repo root!



KeyboardInterrupt: 

In [13]:
import os
from huggingface_hub import snapshot_download
from unsloth import FastLanguageModel

base_path = snapshot_download("astegaras/lora_kaggle")

model, tokenizer = FastLanguageModel.from_pretrained(
    base_path,
    max_seq_length=2048,
    load_in_4bit=False,
)

model.push_to_hub_gguf(
    "astegaras/lora_merged",
    tokenizer,
    quantization_method="q2_k",
    token=huggingface_token,
)

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

==((====))==  Unsloth 2025.11.6: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

Unsloth: Converting model to GGUF format...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Checking cache directory for required files...


Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_56kpqpww`: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:10<00:00,  5.43s/it]


Successfully copied all 2 files from cache to `/tmp/unsloth_gguf_56kpqpww`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:00<00:00, 16352.06it/s]
Unsloth: Merging weights into 16bit: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:46<00:00, 23.34s/it]


Unsloth: Merge process complete. Saved to `/tmp/unsloth_gguf_56kpqpww`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF f16 might take 3 minutes.
\        /    [2] Converting GGUF f16 to ['q2_k'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: Updating system package directories
Unsloth: All required system packages already installed!
Unsloth: Install llama.cpp and building - please wait 1 to 3 minutes
Unsloth: Cloning llama.cpp repository
Unsloth: Install GGUF and other packages
Unsloth: Successfully installed llama.cpp!
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['llama-3.2-3b-instruct.F16.gguf']
Un

Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

Uploading config.json...


No files have been modified since last commit. Skipping to prevent empty commit.


Uploading Ollama Modelfile...


No files have been modified since last commit. Skipping to prevent empty commit.
No files have been modified since last commit. Skipping to prevent empty commit.


Unsloth: Successfully uploaded GGUF to https://huggingface.co/astegaras/lora_merged
Unsloth: Cleaning up temporary files...


'astegaras/lora_merged'