# Unsloth Environment Verification

This notebook verifies that all components are correctly installed for running Unsloth notebooks:
- GRPO (Reinforcement Learning)
- Vision fine-tuning (Ministral VL)
- fast_inference support

**Run this after rebuilding the jupyter pod to verify the environment.**

In [1]:
# Load environment variables from .env file
from dotenv import load_dotenv
import os

# Load .env from notebook directory
load_dotenv()
print(f"âœ“ HF_TOKEN loaded: {'Yes' if os.environ.get('HF_TOKEN') else 'No'}")

# CRITICAL: Import unsloth FIRST for proper TRL patching
import unsloth
from unsloth import FastLanguageModel, FastVisionModel
print(f"âœ“ unsloth: {unsloth.__version__}")

import transformers
print(f"âœ“ transformers: {transformers.__version__}")

import vllm
print(f"âœ“ vLLM: {vllm.__version__}")

import trl
print(f"âœ“ TRL: {trl.__version__}")

import torch
print(f"âœ“ PyTorch: {torch.__version__}")
print(f"âœ“ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"âœ“ GPU: {torch.cuda.get_device_name(0)}")

âœ“ HF_TOKEN loaded: YesðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.

  if is_vllm_available():

ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!âœ“ unsloth: 2025.12.10
âœ“ transformers: 5.0.0rc1
âœ“ vLLM: 0.14.0rc1.dev201+gadcf682fc
âœ“ TRL: 0.26.2
âœ“ PyTorch: 2.9.1+cu130
âœ“ CUDA available: True
âœ“ GPU: NVIDIA GeForce RTX 4080 SUPER

In [2]:
# Test imports for Reinforcement Learning notebook
print("=== GRPO/RL Imports ===")
from trl import GRPOConfig, GRPOTrainer
from datasets import Dataset
print("\u2713 All GRPO imports successful")

=== GRPO/RL Imports ===
âœ“ All GRPO imports successful

In [3]:
# Test imports for Vision notebook
print("=== Vision/SFT Imports ===")
from trl import SFTTrainer, SFTConfig
from unsloth.trainer import UnslothVisionDataCollator
from unsloth import is_bf16_supported
from transformers import TextStreamer
print("\u2713 All Vision imports successful")

=== Vision/SFT Imports ===
âœ“ All Vision imports successful

In [4]:
# Test model loading with FastLanguageModel
print("=== Testing Model Loading ===")
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Ministral-3-3B-Reasoning-2512",
    max_seq_length=2048,
    load_in_4bit=True,
)
print(f"âœ“ Model loaded: {type(model).__name__}")
print(f"âœ“ Tokenizer: {type(tokenizer).__name__}")

# Clean up
del model, tokenizer
import gc; gc.collect()
torch.cuda.empty_cache()
print("âœ“ Cleanup complete")

=== Testing Model Loading =====((====))==  Unsloth 2025.12.10: Fast Ministral3 patching. Transformers: 5.0.0rc1. vLLM: 0.14.0rc1.dev201+gadcf682fc.cu130.
   \\   /|    NVIDIA GeForce RTX 4080 SUPER. Num GPUs = 1. Max memory: 15.568 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.1+cu130. CUDA: 8.9. CUDA Toolkit: 13.0. Triton: 3.5.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Loading weights:   0%|          | 0/458 [00:00<?, ?it/s]

âœ“ Model loaded: Mistral3ForConditionalGeneration
âœ“ Tokenizer: PixtralProcessor
âœ“ Cleanup complete

In [5]:
# Test fast_inference capability
print("=== Fast Inference Check ===")
print("fast_inference=True uses vLLM as backend for 2x faster inference")
print(f"vLLM version: {vllm.__version__}")
print(f"Unsloth version: {unsloth.__version__}")

# Check if fast_inference is supported
import inspect
sig = inspect.signature(FastLanguageModel.from_pretrained)
if 'fast_inference' in sig.parameters:
    print("\u2713 fast_inference parameter available")
else:
    print("\u26a0 fast_inference parameter not found")

=== Fast Inference Check ===
fast_inference=True uses vLLM as backend for 2x faster inference
vLLM version: 0.14.0rc1.dev201+gadcf682fc
Unsloth version: 2025.12.10
âœ“ fast_inference parameter available

## Fast Inference Testing (vLLM Backend)

**Note:** `fast_inference=True` requires compatible vLLM/Unsloth versions. 
Current vLLM 0.14.0 has API changes that cause compatibility issues with Unsloth's LoRA manager.

The test below verifies the parameter is available. Full fast_inference testing requires:
- vLLM 0.10.2 - 0.11.2 (per TRL warning) or waiting for Unsloth update

In [2]:
# Verify fast_inference parameter exists and document current limitation
print("=== Fast Inference Capability Check ===")
import inspect

# Check FastLanguageModel
sig = inspect.signature(FastLanguageModel.from_pretrained)
has_fast_inference = 'fast_inference' in sig.parameters
print(f"âœ“ fast_inference parameter available: {has_fast_inference}")

# Check FastVisionModel  
sig_vision = inspect.signature(FastVisionModel.from_pretrained)
has_fast_inference_vision = 'fast_inference' in sig_vision.parameters
print(f"âœ“ fast_inference in FastVisionModel: {has_fast_inference_vision}")

# Document current versions
print(f"\nCurrent versions:")
print(f"  vLLM: {vllm.__version__}")
print(f"  Unsloth: {unsloth.__version__}")
print(f"\nâš  Note: fast_inference=True has vLLM 0.14.0 compatibility issues")
print(f"  Error: LoRA manager API mismatch - waiting for Unsloth update")

=== Fast Inference Capability Check ===
âœ“ fast_inference parameter available: True
âœ“ fast_inference in FastVisionModel: True

Current versions:
  vLLM: 0.14.0rc1.dev201+gadcf682fc
  Unsloth: 2025.12.10

âš  Note: fast_inference=True has vLLM 0.14.0 compatibility issues
  Error: LoRA manager API mismatch - waiting for Unsloth update

## Ministral VL (Vision) Training Verification

This section tests the complete vision model fine-tuning pipeline:
- FastVisionModel loading
- LoRA adapter configuration
- Dataset loading and formatting
- SFTTrainer training loop
- Inference after training

In [6]:
# Test FastVisionModel loading with Ministral 3 VL
print("=== Ministral VL Model Loading ===")
model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Ministral-3-3B-Instruct-2512",
    load_in_4bit=True,  # Use 4bit for memory efficiency in testing
    use_gradient_checkpointing="unsloth",
)
print(f"âœ“ FastVisionModel loaded: {type(model).__name__}")
print(f"âœ“ Tokenizer: {type(tokenizer).__name__}")

=== Ministral VL Model Loading =====((====))==  Unsloth 2025.12.10: Fast Ministral3 patching. Transformers: 5.0.0rc1. vLLM: 0.14.0rc1.dev201+gadcf682fc.cu130.
   \\   /|    NVIDIA GeForce RTX 4080 SUPER. Num GPUs = 1. Max memory: 15.568 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.1+cu130. CUDA: 8.9. CUDA Toolkit: 13.0. Triton: 3.5.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Loading weights:   0%|          | 0/458 [00:00<?, ?it/s]

âœ“ FastVisionModel loaded: Mistral3ForConditionalGeneration
âœ“ Tokenizer: PixtralProcessor

In [7]:
# Apply LoRA adapters for parameter-efficient fine-tuning
print("=== LoRA Configuration ===")
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,
    finetune_language_layers=True,
    finetune_attention_modules=True,
    finetune_mlp_modules=True,
    r=16,  # Reduced rank for testing
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    random_state=3407,
)
print("âœ“ LoRA adapters applied successfully")
print(f"âœ“ Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

=== LoRA Configuration ===Unsloth: Making `model.base_model.model.model.vision_tower.transformer` require gradientsâœ“ LoRA adapters applied successfully
âœ“ Trainable parameters: 33,751,040

In [8]:
# Load sample from real LaTeX OCR dataset
print("=== Dataset Loading ===")
from datasets import load_dataset

dataset = load_dataset("unsloth/LaTeX_OCR", split="train[:5]")
print(f"âœ“ Loaded {len(dataset)} samples from LaTeX_OCR dataset")

# Format for vision training
instruction = "Write the LaTeX representation for this image."

def convert_to_conversation(sample):
    return {
        "messages": [
            {"role": "user", "content": [
                {"type": "text", "text": instruction},
                {"type": "image", "image": sample["image"]}
            ]},
            {"role": "assistant", "content": [
                {"type": "text", "text": sample["text"]}
            ]}
        ]
    }

converted_dataset = [convert_to_conversation(s) for s in dataset]
print(f"âœ“ Converted {len(converted_dataset)} samples to conversation format")

=== Dataset Loading ===

README.md:   0%|          | 0.00/519 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/344M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/38.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/68686 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7632 [00:00<?, ? examples/s]

âœ“ Loaded 5 samples from LaTeX_OCR dataset
âœ“ Converted 5 samples to conversation format

In [9]:
# Run 2 training steps to verify pipeline
print("=== Training Test (2 steps) ===")
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    data_collator=UnslothVisionDataCollator(model, tokenizer),
    train_dataset=converted_dataset,
    args=SFTConfig(
        per_device_train_batch_size=1,
        max_steps=2,
        warmup_steps=0,  # Use warmup_steps instead of deprecated warmup_ratio
        learning_rate=2e-4,
        logging_steps=1,
        fp16=not is_bf16_supported(),
        bf16=is_bf16_supported(),
        output_dir="outputs_ministral_vl_test",
        remove_unused_columns=False,
        dataset_text_field="",
        dataset_kwargs={"skip_prepare_dataset": True},
        max_seq_length=1024,
    ),
)
trainer_stats = trainer.train()
print(f"âœ“ Training completed successfully")
print(f"  Final loss: {trainer_stats.metrics.get('train_loss', 'N/A')}")

warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.

=== Training Test (2 steps) ===

The model is already on multiple devices. Skipping the move to device specified in `args`.==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 5 | Num Epochs = 1 | Total steps = 2
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 2 x 1) = 2
 "-____-"     Trainable parameters = 33,751,040 of 3,882,841,088 (0.87% trained)

Unsloth: Will smartly offload gradients to save VRAM!

Step,Training Loss


âœ“ Training completed successfully
  Final loss: 3.0692602396011353

In [10]:
# Test inference after training
print("=== Inference Test ===")
FastVisionModel.for_inference(model)

test_image = dataset[0]["image"]
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(test_image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

output = model.generate(**inputs, max_new_tokens=64, temperature=1.5, min_p=0.1)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print("âœ“ Inference test passed")
print(f"  Sample output (last 100 chars): ...{response[-100:]}")

=== Inference Test ===âœ“ Inference test passed
  Sample output (last 100 chars): ...nÄ \mathbb{Z},Ä \quadÄ \frac{M}{P}Ä \inÄ \mathbb{Z},Ä \quadÄ \frac{P}{Q}Ä \inÄ \mathbb{Z}ÄŠ\end{equation*}ÄŠ```

In [11]:
# Cleanup GPU memory
print("=== Cleanup ===")
del model, tokenizer, trainer, dataset, converted_dataset
import gc; gc.collect()
torch.cuda.empty_cache()
print("âœ“ Ministral VL verification complete - GPU memory released")

=== Cleanup ===
âœ“ Ministral VL verification complete - GPU memory released

## Verification Summary

If all cells above ran without errors, your environment is ready for:

1. **Ministral_3_(3B)_Reinforcement_Learning_Sudoku_Game.ipynb**
   - Uses: GRPOConfig, GRPOTrainer, FastLanguageModel
   - Status: Import verification only

2. **Ministral_3_VL_(3B)_Vision.ipynb**
   - Uses: SFTTrainer, SFTConfig, FastVisionModel, UnslothVisionDataCollator
   - Status: **Full pipeline tested** (model loading, LoRA, training, inference)

### What Was Verified
- Core imports (unsloth, transformers, vLLM, TRL, torch)
- FastLanguageModel loading (Ministral-3-3B-Reasoning)
- **fast_inference parameter available** (vLLM 0.14.0 compatibility issue noted)
- FastVisionModel loading (Ministral-3-3B-Instruct)
- LoRA adapter configuration
- Dataset loading (LaTeX_OCR from HuggingFace)
- SFTTrainer training loop (2 steps)
- Post-training inference

### Known Limitations
- `fast_inference=True` has compatibility issues with vLLM 0.14.0
- Waiting for Unsloth update to fix LoRA manager API mismatch

In [None]:
# Shutdown kernel
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(restart=False)