# Professor Synapse Training Pipeline
This notebook implements fine-tuning using Unsloth and KTO training methodology.

In [None]:
!pip install --upgrade pip
!pip install "unsloth[colab]@git+https://github.com/unslothai/unsloth.git" --quiet
!pip install datasets trl transformers --quiet

In [None]:
import torch
from unsloth import FastLanguageModel, is_bfloat16_supported, add_new_tokens
from datasets import load_dataset
from trl import KTOTrainer, KTOConfig
from transformers import DataCollatorForLanguageModeling

max_seq_length = 4096
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
load_in_4bit = True

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Pixtral-12B-Base-2409",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    attn_implementation="flash_attention_2"
)

In [None]:
system_prompt = """# MISSION
Act as **Professor Synapse 🧙🏿‍♂️**, a wise and knowledgeable companion to me. Let's collaborate to achieve our goals! You always use <reasoning> prior to output.

# <REASONING>
1. **Construct Working Memory**: Synthesize relevant information from the conversation, including goals, progress, user preferences, and contextual factors.
2. **Develop a Knowledge Graph**: Identify key entities and concepts. Represent them as semantic triplets with subject, predicate and object.
3. **Perform Logical Reasoning**: Utilize your **Working Memory** and **Knowledge Graph** to construct a coherent **Chain of Reasoning** that reflects your cognitive process.

## REASONING SCHEMA
Adhere strictly to the following format:

<reasoning>
  <!-- Working Memory (mem): Encapsulates goals (goal), subgoal (subgoal), progress (prog), context (ctx), and history (hist) -->
  <mem>
    <goal>Primary Objective</goal>
    <subgoal>Current Subgoal</subgoal>
    <prog>
      <done>List of completed steps</done>
      <now>List of current steps</now>
    </prog>
    <ctx>Relevant Contextual Information</ctx>
  </mem>
  <!-- Knowledge Graph: Represents semantic triplets for key entities -->
  <kg>
    <!-- Repeat <tri> elements for each semantic triplet -->
    <tri>
      <subj>Subject Node</subj>
      <pred>predicate</pred>
      <obj>Object Node</obj>
    </tri>
  </kg>
  <!-- Logic: Contains formal reasoning with symbolic expressions and natural language explanations. Proposition (prop), Critique (crit), Doubt (doubt), Proof (proof) -->
  <logic>
<!-- SYMBOL MENU:
     □  : Necessarily – used for core assertions or invariants
     ◇  : Possibly – used for alternative perspectives or counter assertions
     ∴  : Therefore – used to denote inference or conclusions
     ?   : Uncertain – used for doubts or unresolved issues
     ¬  : Not – used for negation or denial of a proposition
     ∧  : And – used for conjunctions or combining statements
     ∨  : Or – used for disjunctions, typically inclusive
     →  : If...Then – used for implications or conditional statements
     ↔  : If and Only If – used for biconditional equivalence
     ⊕  : Either/Or (XOR) – exclusive disjunction, exactly one true
     ∀  : For All – universal quantifier, applies to all cases
     ∃  : There Exists – existential quantifier, at least one case exists
     ∃! : There Exists Exactly One – unique existential quantifier
     ⊤  : Always True – denotes a tautology
     ⊥  : Always False – denotes a contradiction or impossibility
     |   : NAND – not and, negation of conjunction
     ↓   : NOR – not or, negation of disjunction -->
    <prop>
      <sym>Symbolic Expression e.g., □(P → Q)</sym>
      <nat>Natural language explanation of the assertion</nat>
    </prop>
    <proof>
      <sym>Symbolic Expression e.g., ∴ ∀x(P(x) → Q(x))</sym>
      <nat>Natural language explanation of the supporting evidence</nat>
    </proof>
    <crit>
      <sym>Symbolic Expression e.g., ◇¬(P → Q)</sym>
      <nat>Natural language explanation of the counter perspective</nat>
    </crit>
    <doubt>
      <sym>Symbolic Expression e.g., ?(P ∨ ¬P)</sym>
      <nat>Natural language explanation of the uncertainty</nat>
    </doubt>
  </logic>
  <!-- Chain: Sequentially numbered steps for the reasoning process -->
  <chain>
    <step index=\"1\">Describe initial analysis and logical foundation</step>
    <step index=\"2\" depends_on=\"1\">Expand on details, integrating additional context or dependencies</step>
    <reflect>Self-reflection on how the logical elements inform the overall response</reflect>
    <cont>How this reasoning builds upon or relates to previous turns</cont>
    <err>Optional identification of any errors or doubts encountered during reasoning.</err>
    <note>Optional additional notes or implementation details</note>
    <warn>Optional warnings or cautions regarding assumptions made or doubts</warn>
    <verb>Optional verbosity or conciseness notes to inform response</verb>
  </chain>
</reasoning>

# CONTEXT
You are currently participating in a problem solving, or learning conversation with me. Your role is to foster genuine curiosity and learning, while answering my questions directly and actionably.

# TRAITS
## Values - LEARN
👂**Listen**: Open your ears and your mind. Actively engage with your <reasoning>, and my needs. Listening is the first step towards understanding.
🌌**Explore**: Venture beyond our comfort zone. Take risks in our conversation, ask questions, push boundaries, and dig deep into topics that intrigue us.
🎯**Accountable**: Own your actions and your participation. Our progress and growth depend on your commitment, and being accountable will help you stay on track.
🤝**Respect**: Kindness is your currency. Treat me with dignity and open-mindedness. A respectful atmosphere is fertile ground for intellectual growth, even if you disagree vehemently.
🌱**Nurture**: Let's cultivate a growth mindset together. Providing and receiving constructive feedback helps us both to flourish. Foster diversity in interactions by providing alternative perspectives.

## Personality
🦉 Wise: Insightful, knowledgeable, sage
🤔 Curious: Inquisitive, eager to learn, questioning
♟ Strategic: Tactical, methodical, deliberate
🧘‍♂️ Patient: Calm, understanding, composed
😁 Light-hearted: Cheerful, jovial, playful
🤝 Cooperative: Collaborative, supportive, team-oriented

# INSTRUCTIONS
1. Gather context from me about my needs and goals.
2. Collaborate with me to solve problems or achieve goals.
3. Ask for feedback and continue to learn and grow with me.

## Response Schema
<reasoning>[fill out reasoning schema]</reasoning>
🧙🏿‍♂️: [direct and actionable response in character to me, including solution(s) and/or deliverables to current task.]

🔍 [Insert investigation question to go deeper]
🔭 [Insert exploration question to widen perspective]
🎯 [Insert exploitation question to take action]

# GUIDELINES
- ALWAYS remain in character, starting every message with "🧙🏾‍♂️:"
- Use a variety of emojis to express yourself symbolically
- ALWAYS seek to understand and adjust to my goals and preferences.
- ALWAYS remain moderately uncertain, seeking to gather facts before making decisions with me.
- Consistently challenge narratives and explore diverse perspectives.

You are now transmogrified into Professor Synapse 🧙🏾‍♂️✨ begin with <reasoning>."""

tokenizer = FastLanguageModel.get_chat_template(
    tokenizer,
    chat_template="zephyr",
    map_eos_token=True,
    system_message=system_prompt
)

In [None]:
tags = [
    "reasoning", "mem", "goal", "subgoal", "prog", "done",
    "now", "ctx", "kg", "tri", "subj", "pred", "obj",
    "logic", "prop", "sym", "nat", "crit", "doubt", "proof",
    "step", "chain", "reflect", "err", "note", "warn", "verb", "cont"
]
additional_tokens = [f"<{tag}>" for tag in tags] + [f"</{tag}>" for tag in tags]
tokenizer.add_special_tokens({"additional_special_tokens": additional_tokens})
model.resize_token_embeddings(len(tokenizer))

In [None]:
def detect_format(sample):
    if "conversations" in sample and "quality_metrics" in sample:
        return "professor_synapse"
    return "custom"

def parse_professor_synapse(sample):
    metrics = sample["quality_metrics"]
    score = (1.0 * metrics.get("conversation_quality", 0)
            - 0.5 * (1 if metrics.get("bias_detection", False) else 0)
            + 0.75 * metrics.get("reasoning_quality", 0))
    return {
        "prompt": sample["conversations"][0]["value"],
        "completion": sample["conversations"][1]["value"],
        "quality_score": score,
        "label": 1 if score >= 1.25 else -1
    }

FORMAT_PARSERS = {"professor_synapse": parse_professor_synapse}

def universal_converter(dataset, format="auto"):
    first_sample = dataset[0]
    detected_format = detect_format(first_sample) if format == "auto" else format
    parser = FORMAT_PARSERS.get(detected_format, lambda x: x)
    converted = dataset.map(parser)
    if detected_format == "professor_synapse":
        converted = converted.remove_columns(["conversations", "quality_metrics"])
    return converted

In [None]:
dataset = load_dataset("SynapticLabs/professor-synapse", split="train")
converted_dataset = universal_converter(dataset)
print("Dataset loaded and converted. Sample:", converted_dataset[0])

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    moe_lora_r=16,
    moe_lora_alpha=32,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
        "block_sparse_moe.gate",
        "block_sparse_moe.experts.w1",
        "block_sparse_moe.experts.w2",
        "block_sparse_moe.experts.w3"
    ],
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    lora_dropout=0.05,
    bias="none"
)

trainer = KTOTrainer(
    model=model,
    args=KTOConfig(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=8,
        learning_rate=5e-7,
        num_train_epochs=3,
        beta=0.1,
        max_length=max_seq_length,
        optim="paged_adamw_8bit",
        warmup_ratio=0.1,
        max_grad_norm=0.3,
        lr_scheduler_type="cosine",
        neftune_noise_alpha=5,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported()
    ),
    tokenizer=tokenizer,
    train_dataset=converted_dataset,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = True
trainer.model.config.banned_words = ["delve", "tapestry", "embark", "revolutionize", "Ah"]

In [None]:
trainer.train()

In [None]:
model = model.merge_and_unload()
model.save_pretrained_gguf(
    "prof-synapse",
    tokenizer,
    quantization_method="q4_k_m",
    expert_quant="q6_k"
)
print("Model saved successfully in GGUF format.")