[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/docto-rin/Med-LLM-Jp/blob/main/colab_notebooks/JMLE-CPT.ipynb)

**Note on Hardware Requirements:**
*   **GPU:** NVIDIA A100 (40GB VRAM) was used for running this notebook.
*   **Training VRAM:** Peak usage observed during training was 32.7 GB.

## Installation (requiring a session restart)

In [None]:
# ==============================================================================
# Installation (requiring a session restart after the first run)
# ==============================================================================
!pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-2c40q3gz/unsloth_26debc91a5804327ad1668f432018970
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-2c40q3gz/unsloth_26debc91a5804327ad1668f432018970
  Resolved https://github.com/unslothai/unsloth.git to commit c9b9a366e7a6110f9d58d5ed8db6bd27bc97fb71
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## Manual Auth

In [None]:
# ==============================================================================
# Authentication
# ==============================================================================
from google.colab import drive
drive.mount('/content/drive')

use_secret = False

Mounted at /content/drive


In [None]:
if use_secret:
    from huggingface_hub import login as hf_login
    from google.colab import userdata
    hf_login(userdata.get('HF_TOKEN'))
else:
    from huggingface_hub import notebook_login
    notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
import wandb

if use_secret:
    from google.colab import userdata
    wandb.login(key=userdata.get('WANDB_API_KEY'))
else:
    wandb.login()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mdocto-rin[0m ([33mnagoya-u[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


## Workflow

### Setup

In [None]:
# push "Ctrl + F10"
!pip install --upgrade torch
!pip install --upgrade xformers

import torch
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install --no-deps packaging ninja einops "flash-attn>=2.6.3"

Collecting xformers
  Downloading xformers-0.0.29.post3-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Downloading xformers-0.0.29.post3-cp311-cp311-manylinux_2_28_x86_64.whl (43.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.4/43.4 MB[0m [31m45.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: xformers
Successfully installed xformers-0.0.29.post3
Collecting ninja
  Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.0 kB)
Collecting flash-attn>=2.6.3
  Downloading flash_attn-2.7.4.post1.tar.gz (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m49.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Using cached ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB)
Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... [?25l[?25hdone
  Create

In [None]:
# ==============================================================================
# Imports (after potential restart)
# ==============================================================================
import os
import gc
import math
from unsloth import (
    FastLanguageModel,
    is_bfloat16_supported,
    UnslothTrainer,             # <--- Import UnslothTrainer
    UnslothTrainingArguments    # <--- Import UnslothTrainingArguments
)
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling,
    TextDataset, # Not used here, keeping for reference
    pipeline
)
from datasets import load_dataset, load_from_disk
from datasets.dataset_dict import DatasetDict

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
# ==============================================================================
# Configuration
# ==============================================================================
print("\nSetting up configuration...")

# --- Model IDs and Names ---
# Choose your base Deepseek model
model_id = "cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese"
# Define a name for your fine-tuned model and the repo on HF Hub
new_model_id = "CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3" # CHANGE THIS to your desired model name
# Your Hugging Face username (replace placeholder)
hf_username = "doctorin" # <<<<<<<<<================= REPLACE THIS

# --- Paths ---
# Base directory on Google Drive for saving models, checkpoints, and datasets
base_save_dir = "/content/drive/MyDrive/student_iwase/finetuned-models" # CHANGE this path if desired
# Specific directory structure for this model
model_base_path = os.path.join(base_save_dir, new_model_id)
output_dir = os.path.join(model_base_path, "training_checkpoints") # For checkpoints during training
final_model_save_path = os.path.join(model_base_path, "final_adapter") # For final LoRA adapter
processed_dataset_path = os.path.join(model_base_path, "processed_dataset") # For formatted & tokenized dataset

os.makedirs(output_dir, exist_ok=True)
os.makedirs(final_model_save_path, exist_ok=True)
os.makedirs(processed_dataset_path, exist_ok=True)

# --- Training Parameters ---
max_seq_length = 2048       # ADJUST based on your data length and VRAM. Start with 2048 or 4096.
load_in_4bit = True         # Use QLoRA (4-bit quantization)
lora_r = 16                 # LoRA rank
lora_alpha = 16             # LoRA alpha (scales learning)
lora_dropout = 0.05         # LoRA dropout
# Batch size per GPU. Decrease if OOM. Effective BS = batch_size * gradient_accumulation * num_gpus
per_device_train_batch_size = 1 # Start small for large models/sequences
gradient_accumulation_steps = 8 # Increase to simulate larger batch size if VRAM is limited (effective batch size 8)
learning_rate = 5e-5        # Common learning rate for LoRA
embedding_learning_rate = 5e-6
warmup_steps = 100           # Number of warmup steps for the optimizer scheduler
max_grad_norm = 0.3         # Gradient clipping max norm
num_train_epochs = 1       # <<<<<<<<<================= ADJUST
optim = "adamw_8bit"        # Use 8-bit AdamW optimizer to save memory
save_steps_ratio = 0.2
save_steps = 500            # Save checkpoint every N steps (adjust based on training time/stability)
logging_steps = 10          # Log metrics every N steps
save_total_limit = 2        # Keep only the last N checkpoints + final model
validation_split_percentage = 10  # Ratio of validation data
seed = 3407

# --- Dataset Info ---
# Your specific dataset ID on Hugging Face Hub (replace placeholder)
dataset_id = "doctorin/JMLE-CoT-gemini-2.5-pro-dataset-combined" # <<<<<<<<<================= REPLACE THIS (e.g., "doctorin/JMLE-CoT-gemini-2.5-pro-dataset-combined-sourced")
# text_field = "text" # Not used as we format dynamically

# --- Derived Parameters ---
effective_batch_size = per_device_train_batch_size * gradient_accumulation_steps
num_gpus = torch.cuda.device_count() if torch.cuda.is_available() else 1
print(f"Number of GPUs detected: {num_gpus}")

print(f"Configuration set:")
print(f"  Base Model: {model_id}")
print(f"  Fine-tuned Model Name: {new_model_id}")
print(f"  Dataset ID: {dataset_id}")
print(f"  Max Sequence Length: {max_seq_length}")
print(f"  Output (Checkpoints): {output_dir}")
print(f"  Final Adapter Save Path: {final_model_save_path}")
print(f"  Processed Dataset Cache: {processed_dataset_path}")
print(f"  Effective Batch Size: {effective_batch_size * num_gpus}")


Setting up configuration...
Number of GPUs detected: 1
Configuration set:
  Base Model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
  Fine-tuned Model Name: CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3
  Dataset ID: doctorin/JMLE-CoT-gemini-2.5-pro-dataset-combined
  Max Sequence Length: 2048
  Output (Checkpoints): /content/drive/MyDrive/student_iwase/finetuned-models/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3/training_checkpoints
  Final Adapter Save Path: /content/drive/MyDrive/student_iwase/finetuned-models/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3/final_adapter
  Processed Dataset Cache: /content/drive/MyDrive/student_iwase/finetuned-models/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3/processed_dataset
  Effective Batch Size: 8


In [None]:
# ==============================================================================
# Load Model and Tokenizer (using Unsloth)
# ==============================================================================
print("\nLoading base model and tokenizer...")
dtype = None # Auto detection by Unsloth
# Set compute dtype for LoRA/4bit training based on GPU capability
compute_dtype = torch.float16
if is_bfloat16_supported():
    compute_dtype = torch.bfloat16
    print("bfloat16 is supported. Using bfloat16 for computation.")
else:
    print("bfloat16 not supported. Using float16 for computation.")

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    max_seq_length=max_seq_length,
    dtype=dtype, # Let Unsloth handle dtype for loading
    load_in_4bit=load_in_4bit,
    trust_remote_code=True, # Important for some models
    # token=hf_token, # Only needed for private base models
    device_map="auto", # Automatically distribute model across GPUs if available
    quantization_config=BitsAndBytesConfig( # Explicitly define QLoRA config
        load_in_4bit = load_in_4bit,
        bnb_4bit_quant_type = "nf4",
        bnb_4bit_compute_dtype = compute_dtype,
        bnb_4bit_use_double_quant = True,
    ) if load_in_4bit else None,
)
print("Base model and tokenizer loaded.")

# --- Set Padding Token ---
# Deepseek chat models might not have a pad token set by default
if tokenizer.pad_token is None:
    print("Setting pad_token to eos_token")
    tokenizer.pad_token = tokenizer.eos_token


Loading base model and tokenizer...
bfloat16 is supported. Using bfloat16 for computation.
Are you certain you want to do remote code execution?
==((====))==  Unsloth 2025.3.19: Fast Qwen2 patching. Transformers: 4.50.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/63.2k [00:00<?, ?B/s]

Fetching 14 files:   0%|          | 0/14 [00:00<?, ?it/s]

model-00007-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00002-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00005-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00006-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00008-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00001-of-00014.safetensors:   0%|          | 0.00/4.89G [00:00<?, ?B/s]

model-00003-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00004-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00009-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00010-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00012-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00013-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00014-of-00014.safetensors:   0%|          | 0.00/2.12G [00:00<?, ?B/s]

model-00011-of-00014.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/6.75k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/485 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese does not have a padding token! Will use pad_token = <|vision_pad|>.
Base model and tokenizer loaded.


In [None]:
# ==============================================================================
# Configure PEFT (LoRA using Unsloth)
# ==============================================================================
print("\nConfiguring PEFT (LoRA)...")
model = FastLanguageModel.get_peft_model(
    model,
    r=lora_r,
    target_modules=[ # Common target modules for Qwen/Deepseek style models
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
        "embed_tokens", "lm_head" # Often included, check impact on VRAM/performance
    ],
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    bias="none",    # Set bias to 'none' for QLoRA
    use_gradient_checkpointing="unsloth", # Let unsloth handle gradient checkpointing
    random_state=seed,
    use_rslora=False, # Rank Stabilized LoRA (optional)
    loftq_config=None, # LoftQ initialization (optional)
    max_seq_length=max_seq_length, # Pass max_seq_length
)
print("PEFT configured.")
model.print_trainable_parameters() # 21.9GB VRAM

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.



Configuring PEFT (LoRA)...
Unsloth: Offloading input_embeddings to disk to save VRAM
Unsloth: Offloading output_embeddings to disk to save VRAM


Unsloth 2025.3.19 patched 64 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


Unsloth: Training embed_tokens in mixed precision to save VRAM
Unsloth: Training lm_head in mixed precision to save VRAM
PEFT configured.
trainable params: 1,691,353,088 || all params: 34,455,229,440 || trainable%: 4.9088


In [None]:
# ==============================================================================
# Prepare Dataset (Format, Tokenize, Split, Cache)
# ==============================================================================
def create_prompt_with_think(example):
    """
    Generates the full text string for Supervised Fine-Tuning (SFT),
    including the prompt and the desired model response with <think> tags.
    Uses the specific chat template if available, otherwise a generic one.
    """
    question = example.get("question", "")
    choices = example.get("choices", [])
    cot = example.get("cot", "No CoT provided.") # Handle missing CoT
    answer_list = sorted(example.get("answer", []))
    answer_str = "".join(answer_list)
    explanation = example.get("explanation", "No explanation provided.") # Handle missing explanation

    choices_text = "\n".join([f"- {choice}" for choice in choices]) if choices else "No choices provided."

    prompt_part = f"""ユーザー: 以下の医師国家試験問題について、思考過程と簡潔な解説を生成してください。
- 思考過程は<think></think>タグで囲んでください。
- 簡潔な解説はexplanation: の後に続けて書いてください。

問題：
{question}

選択肢：
{choices_text}

アシスタント: """

    response_part = f"""<think>
{cot}
</think>
explanation: {explanation}"""

    # full_text = prompt_part + response_part + tokenizer.eos_token
    full_text = response_part + tokenizer.eos_token
    return {"formatted_text": full_text}

# --- Load or Process Dataset ---
processed_dataset = None
if os.path.exists(processed_dataset_path) and os.listdir(processed_dataset_path):
    print(f"\nLoading processed dataset from disk: {processed_dataset_path}")
    try:
        processed_dataset = load_from_disk(processed_dataset_path)
        if not isinstance(processed_dataset, DatasetDict) or "train" not in processed_dataset or "test" not in processed_dataset:
             raise ValueError("Loaded dataset is not a DatasetDict or missing 'train'/'test' splits.")
        if len(processed_dataset["train"]) == 0 or len(processed_dataset["test"]) == 0:
             raise ValueError("Loaded dataset splits are empty.")
        print("Processed dataset loaded successfully from disk.")
        print(f"Dataset structure: {processed_dataset}")
    except Exception as e:
        print(f"Error loading dataset from disk: {e}. Re-processing dataset.")
        processed_dataset = None
else:
    print(f"\nProcessed dataset not found at {processed_dataset_path}. Processing from source...")
    processed_dataset = None

if processed_dataset is None:
    print(f"Loading raw dataset '{dataset_id}' from Hugging Face Hub...")
    try:
        # Assume the raw dataset has a 'train' split we can work with
        raw_dataset = load_dataset(dataset_id, split="train") # Explicitly load train split
        print("Raw dataset loaded.")

        # --- Apply formatting function ---
        print("Formatting dataset...")
        formatted_dataset = raw_dataset.map(
            create_prompt_with_think,
            # remove_columns=raw_dataset.column_names # Keep columns for now, tokenize will remove
        )
        print("Dataset formatted.")

        # --- Tokenize the formatted text ---
        print("Tokenizing formatted dataset...")
        def tokenize_formatted_text(examples):
            return tokenizer(
                examples["formatted_text"],
                truncation=True,
                max_length=max_seq_length,
                padding=False,
            )

        tokenized_dataset = formatted_dataset.map(
            tokenize_formatted_text,
            batched=True,
            remove_columns=["formatted_text"] + list(raw_dataset.column_names), # Remove original and intermediate columns
        )
        print("Tokenization finished.")
        print(f"Tokenized dataset structure before split: {tokenized_dataset}")
        print(f"Sample tokenized input_ids length: {len(tokenized_dataset[0]['input_ids'])}")

        # --- Split dataset into train and validation ---
        print(f"Splitting dataset into train and test ({100-validation_split_percentage}/{validation_split_percentage})...")
        split_dataset = tokenized_dataset.train_test_split(
            test_size=validation_split_percentage / 100.0,
            seed=seed
        )
        processed_dataset = split_dataset # Now contains 'train' and 'test' splits
        print(f"Dataset split finished. Structure: {processed_dataset}")

        # --- Save processed (and split) dataset to disk ---
        print(f"Saving processed dataset to disk: {processed_dataset_path}")
        try:
            processed_dataset.save_to_disk(processed_dataset_path)
            print("Processed dataset saved successfully.")
        except Exception as e:
            print(f"Error saving processed dataset: {e}. Training will continue, but restart will require re-processing.")

    except Exception as e:
        print(f"\n❌ FATAL ERROR: Could not load or process dataset '{dataset_id}'. Error: {e}")
        print("Please check the dataset ID and format.")
        raise RuntimeError("Dataset loading/processing failed.") from e


Processed dataset not found at /content/drive/MyDrive/student_iwase/finetuned-models/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3/processed_dataset. Processing from source...
Loading raw dataset 'doctorin/JMLE-CoT-gemini-2.5-pro-dataset-combined' from Hugging Face Hub...


README.md:   0%|          | 0.00/539 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/8.69M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3390 [00:00<?, ? examples/s]

Raw dataset loaded.
Formatting dataset for SFT...


Map:   0%|          | 0/3390 [00:00<?, ? examples/s]

Dataset formatted.
Tokenizing formatted dataset...


Map:   0%|          | 0/3390 [00:00<?, ? examples/s]

Tokenization finished.
Tokenized dataset structure before split: Dataset({
    features: ['input_ids', 'attention_mask'],
    num_rows: 3390
})
Sample tokenized input_ids length: 1057
Splitting dataset into train and test (90/10)...
Dataset split finished. Structure: DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 3051
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 339
    })
})
Saving processed dataset to disk: /content/drive/MyDrive/student_iwase/finetuned-models/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3/processed_dataset


Saving the dataset (0/1 shards):   0%|          | 0/3051 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/339 [00:00<?, ? examples/s]

Processed dataset saved successfully.


In [None]:
# ==============================================================================
# Calculate Training Steps
# ==============================================================================
print("\nCalculating training steps...")
num_train_samples = len(processed_dataset["train"])
steps_per_epoch = math.ceil(num_train_samples / (per_device_train_batch_size * gradient_accumulation_steps * num_gpus))
max_steps = int(steps_per_epoch * num_train_epochs)
save_steps = max(1, int(steps_per_epoch * save_steps_ratio))
eval_steps = save_steps

print(f"  Total train samples: {num_train_samples}")
print(f"  Steps per epoch: {steps_per_epoch}")
print(f"  Total max_steps: {max_steps}")
print(f"  Save/Eval steps: {save_steps}")

# ==============================================================================
# Data Collator
# ==============================================================================
print("\nSetting up data collator...")
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
print("Data collator ready.")


Calculating training steps...
  Total train samples: 3051
  Steps per epoch: 382
  Total max_steps: 382
  Save/Eval steps: 76

Setting up data collator...
Data collator ready.


In [None]:
# ==============================================================================
# Training Arguments (Using UnslothTrainingArguments)
# ==============================================================================
print("\nDefining training arguments using UnslothTrainingArguments...")
training_args = UnslothTrainingArguments(  # <--- Use UnslothTrainingArguments
    run_name=f"{new_model_id}-run-{num_train_epochs}epochs",
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    warmup_steps=warmup_steps,
    max_steps=max_steps,
    learning_rate=learning_rate,
    embedding_learning_rate=embedding_learning_rate,
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    logging_steps=logging_steps,
    optim=optim,
    save_strategy="steps",
    save_steps=save_steps,
    save_total_limit=save_total_limit,
    seed=seed,
    output_dir=output_dir,
    max_grad_norm=max_grad_norm,
    report_to="wandb" if os.environ.get("WANDB_DISABLED") != "true" else "none",
    remove_unused_columns=False, # Generally recommended False with PEFT/Unsloth when dataset is preprocessed
    # --- Evaluation Arguments ---
    evaluation_strategy="steps",
    eval_steps=eval_steps,
    per_device_eval_batch_size=per_device_train_batch_size * 2, # Can be larger for eval
    load_best_model_at_end=True, # Load the best model based on eval loss
    metric_for_best_model="eval_loss", # Metric to determine the best model
    greater_is_better=False, # Lower eval_loss is better
    # gradient_checkpointing = True, # Handled by FastLanguageModel.get_peft_model with use_gradient_checkpointing="unsloth"
    # --- Unsloth Specific (Optional) ---
    # dataset_num_proc = 4, # Example: Number of processes for dataset mapping (if not done manually before)
    # dataset_text_field = "formatted_text", # Example: if using Unsloth's internal tokenization
)
print("UnslothTrainingArguments defined.")

# ==============================================================================
# Initialize Trainer (Using UnslothTrainer)
# ==============================================================================
print("\nInitializing UnslothTrainer...")
trainer = UnslothTrainer(                 # <--- Use UnslothTrainer
    model=model,                         # The Unsloth PEFT model
    tokenizer=tokenizer,
    train_dataset=processed_dataset["train"],
    eval_dataset=processed_dataset["test"],
    args=training_args,                  # Pass the UnslothTrainingArguments
    data_collator=data_collator,
    # --- Potentially add callbacks here if needed ---
    # callbacks=[...]
)
print("UnslothTrainer initialized.")


Defining training arguments using UnslothTrainingArguments...
UnslothTrainingArguments defined.

Initializing UnslothTrainer...




UnslothTrainer initialized.


### Train

In [None]:
trainer_stats = trainer.train() # 32.7GB VRAM

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 3,051 | Num Epochs = 2 | Total steps = 382
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 8 x 1) = 8
 "-____-"     Trainable parameters = 1,691,353,088/32,000,000,000 (5.29% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,Validation Loss
76,1.029,0.992387
152,0.8855,0.893812
228,0.8591,0.863167
304,0.8342,0.848073
380,0.8144,0.842863


Unsloth: Not an error, but Qwen2ForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


In [None]:
print(f"Training Stats: {trainer_stats}")

print(f"\nSaving final LoRA adapter weights to {final_model_save_path}...")
model.save_pretrained(final_model_save_path)
tokenizer.save_pretrained(final_model_save_path)
print("Final adapter weights saved.")
model.push_to_hub(new_model_id + "-LoRA-Adapter")
tokenizer.push_to_hub(new_model_id + "-LoRA-Adapter")
print("Final adapter weights uploaded.")

Training Stats: TrainOutput(global_step=382, training_loss=0.9312051372378284, metrics={'train_runtime': 7503.138, 'train_samples_per_second': 0.407, 'train_steps_per_second': 0.051, 'total_flos': 7.193470933057536e+17, 'train_loss': 0.9312051372378284})

Saving final LoRA adapter weights to /content/drive/MyDrive/student_iwase/finetuned-models/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3/final_adapter...


config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

Final adapter weights saved.


README.md:   0%|          | 0.00/617 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

Saved model to https://huggingface.co/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3-LoRA-Adapter


tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Final adapter weights uploaded.


### Inference

In [None]:
FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "鼠径部レベル以下の全感覚消失の脊髄損傷レベルはどれか。 a. 第4頸髄。b. 第5胸髄。c. 第10胸髄。d. 第1腰髄。e. 脊髄円錐部"},
] # 119E20 正答: d
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
for t in [0.5, 1.0, 1.5]:
    print(f"\ntemperature: {t}\n")
    text_streamer = TextStreamer(tokenizer, skip_prompt = True)
    _ = model.generate(
        input_ids = inputs,
        streamer = text_streamer,
        max_new_tokens = 2048,
        temperature = t,
        min_p = 0.05
    )

temperature: 0.5

<think>
まず、問題を理解する必要があります。鼠径部レベル以下の全感覚消失の脊髄損傷レベルを問う問題で、選択肢はaからeまであります。鼠径部とは、鼠径部（鼠径部）は股の付け根の部分を指すので、そのレベル以下の感覚消失ということは、損傷部位が鼠径部より上にあるということになります。

脊髄のレベルと感覚消失の関係について考えます。脊髄損傷のレベルは、損傷が起こった脊髄のセグメントによって決まり、そのレベル以下の感覚や運動機能が消失します。鼠径部の感覚は、主にT12〜L1の脊髄セグメントから支配されています。したがって、鼠径部以下の感覚消失が起こる場合、損傷レベルはT12またはL1の上にある必要があります。

選択肢を見ると、cは第10胸髄（T10）、dは第1腰髄（L1）です。T10はT12よりも上にあるので、T10レベルの損傷ではT10以下の感覚が消失します。鼠径部はT12〜L1なので、T10レベルの損傷では鼠径部以下の感覚も消失するはずです。一方、L1レベルの損傷では、L1以下の感覚が消失しますが、鼠径部はL1の上にあるため、鼠径部の感覚は消失しません。したがって、cの第10胸髄が正解と考えられます。

ただし、確認のために脊髄のレベルと感覚支配範囲を再確認します。鼠径部（股の付け根）はT12〜L1の領域です。脊髄損傷のレベルがT10であれば、T10以下の感覚が消失し、T12〜L1も含まれるため、鼠径部以下の感覚消失が起こります。一方、L1レベルの損傷では、L1以下の感覚が消失し、鼠径部はL1の上にあるため、感覚は消失しません。したがって、正解はcの第10胸髄です。
</think>

鼠径部レベル以下の全感覚消失の脊髄損傷レベルは、**c. 第10胸髄（T10）**です。

### 解説
1. **鼠径部の感覚支配**：鼠径部（股の付け根）は、**T12〜L1の脊髄セグメント**から支配されます。
2. **脊髄損傷のレベルと感覚消失の関係**：損傷レベル以下の感覚が消失します。鼠径部以下の感覚消失が起こるためには、**T12〜L1の上にある脊髄セグメント**で損傷が発生する必要があります。
3. **選択肢の分析**：
   - **c. 第10胸髄（T10）**：T10レベルの損傷では、T10以下の感覚が消失します。鼠

In [None]:
# ==============================================================================
# Clean Up GPU Memory
# ==============================================================================
def clean_memory():
    print("\nCleaning up memory...")
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    print("Memory cleaned.")

clean_memory()


Cleaning up memory...
Memory cleaned.


### Upload

In [None]:
model.push_to_hub_gguf(
    new_model_id,
    tokenizer=tokenizer,
    quantization_method = "q4_k_m"
)

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 65.5G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 59.1 out of 83.48 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 11%|█         | 7/64 [00:00<00:02, 22.91it/s]
We will save to Disk and not RAM now.
100%|██████████| 64/64 [02:24<00:00,  2.26s/it]


Unsloth: Saving tokenizer... Done.
Done.


Unsloth: Converting qwen2 model. Can use fast conversion = False.


==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: CMAKE detected. Finalizing some steps for installation.
Unsloth: [1] Converting model at CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3 into bf16 GGUF format.
The output location will be /content/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3/unsloth.BF16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-000

unsloth.Q4_K_M.gguf:   0%|          | 0.00/19.9G [00:00<?, ?B/s]

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### We removed it in GGUF's chat template for you.


Saved GGUF to https://huggingface.co/doctorin/CA-DeepSeek-R1-D-Qwen-32B-Jp-cpt-0.3


In [None]:
# clean_memory()
# !rm -r /content/llama.cpp

# model.push_to_hub_merged(
#     new_model_id + "-4bit",
#     tokenizer,
#     save_method = "merged_4bit_forced"
# )