# 🤖 Mesh Master Bot v2.0 - Training Notebook

**Optimized fine-tuning of Llama-3.2-1B-Instruct for Meshtastic mesh networks**

## Key Improvements from v1.0:
- ✅ Using **Llama-3.2-1B** (not gemma3:270m!)
- ✅ 10x **lower learning rate** (0.00002 vs 0.0002)
- ✅ **Only 1 epoch** to prevent overfitting
- ✅ Added **regularization** (dropout, weight decay)
- ✅ **Early stopping** to prevent memorization
- ✅ **Repeat penalty** in final model

## Requirements:
- Colab GPU (T4, A100, or V100)
- HuggingFace account with Llama access
- Training data in ShareGPT format

**Estimated time:** 2-3 hours on T4, 45-60min on A100

## 📋 Step 1: Setup Environment

In [None]:
# Check GPU
!nvidia-smi --query-gpu=name,memory.total --format=csv

In [None]:
# Install dependencies (takes ~5 minutes)
!pip install -q -U \
  git+https://github.com/axolotl-ai-cloud/axolotl.git@main \
  transformers==4.46.2 \
  datasets==3.0.2 \
  accelerate==1.1.1 \
  peft==0.13.2 \
  bitsandbytes==0.44.1 \
  flash-attn==2.7.0.post2 \
  wandb \
  huggingface_hub

In [None]:
# Login to HuggingFace (need access to Llama-3.2-1B-Instruct)
from huggingface_hub import login
import getpass

hf_token = getpass.getpass("Enter your HuggingFace token: ")
login(token=hf_token)

print("✅ Logged in to HuggingFace!")

In [None]:
# Optional: Login to Weights & Biases for experiment tracking
import wandb

wandb.login()
print("✅ Logged in to W&B!")

## 📊 Step 2: Upload Training Data

Upload your `mesh_conversations.jsonl` file using the file browser on the left.

**Format:** ShareGPT style with `conversations` key:
```json
{"conversations": [{"from": "human", "value": "How do I relay to alice?"}, {"from": "gpt", "value": "alice <message>"}]}
```

In [None]:
# Verify training data
import json

data_path = "/content/mesh_conversations.jsonl"

# Count examples
with open(data_path, 'r') as f:
    lines = f.readlines()
    print(f"📊 Found {len(lines)} training examples")

# Show first example
example = json.loads(lines[0])
print("\n📝 Sample conversation:")
for turn in example['conversations']:
    role = "User" if turn['from'] == 'human' else "Assistant"
    print(f"{role}: {turn['value']}")

## ⚙️ Step 3: Create Training Config

In [None]:
%%writefile /content/mesh-master-1b-v2.yaml
# Mesh Master Bot v2.0 - Optimized for Anti-Overfitting
base_model: meta-llama/Llama-3.2-1B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

# QLoRA Settings
load_in_4bit: true
adapter: qlora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.15
lora_target_linear: true
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

# Sequence
sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true

# Batch Size
micro_batch_size: 2
gradient_accumulation_steps: 16
eval_batch_size: 2
num_epochs: 1

# Learning Rate - CRITICAL CHANGES!
optimizer: adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00002      # 10x lower!
warmup_steps: 50
warmup_ratio: 0.05
weight_decay: 0.01          # Added regularization

# Precision
bf16: auto
fp16: false
tf32: true
flash_attention: true
gradient_checkpointing: true

# Dataset
datasets:
  - path: /content/mesh_conversations.jsonl
    type: sharegpt
    conversation: conversations

val_set_size: 0.15

# Evaluation & Checkpointing
evaluation_strategy: steps
eval_steps: 50
save_steps: 100
logging_steps: 5
output_dir: /content/outputs
save_strategy: steps
save_total_limit: 3

# Early Stopping
early_stopping_patience: 3
load_best_model_at_end: true
metric_for_best_model: eval_loss
greater_is_better: false

# Tokens
special_tokens:
  bos_token: "<|begin_of_text|>"
  eos_token: "<|end_of_text|>"
  unk_token: "<|unk|>"

chat_template: llama3
seed: 42

# Logging
wandb_project: mesh-master-bot-v2
wandb_name: mesh-1b-colab

## 🚀 Step 4: Start Training

**This will take 2-3 hours on T4, 45-60min on A100**

In [None]:
# Start training
!accelerate launch -m axolotl.cli.train /content/mesh-master-1b-v2.yaml

## 💾 Step 5: Export Model for Ollama

In [None]:
# Merge LoRA adapters into base model
!python -m axolotl.cli.merge_lora \
  /content/mesh-master-1b-v2.yaml \
  --lora_model_dir=/content/outputs \
  --load_in_4bit=False \
  --load_in_8bit=False

In [None]:
# Convert to GGUF for Ollama
!pip install -q llama-cpp-python gguf

!python /usr/local/lib/python3.10/dist-packages/llama_cpp/llama_cpp/convert-hf-to-gguf.py \
  /content/outputs/merged \
  --outfile /content/mesh-master-bot-v2.gguf \
  --outtype q8_0

In [None]:
# Create Modelfile
%%writefile /content/Modelfile
FROM /content/mesh-master-bot-v2.gguf

# Anti-repetition and length controls
PARAMETER temperature 0.5
PARAMETER repeat_penalty 1.15
PARAMETER num_predict 200
PARAMETER top_p 0.85
PARAMETER top_k 30

# System prompt optimized for mesh networks
SYSTEM You are Mesh Master, an AI assistant for Meshtastic mesh networks. Keep responses concise (ideally under 160 characters) for LoRa bandwidth efficiency. Be accurate, helpful, and direct. Never repeat yourself.

In [None]:
# Download files to your computer
from google.colab import files

print("📥 Downloading model files...")
files.download('/content/mesh-master-bot-v2.gguf')
files.download('/content/Modelfile')
print("✅ Done! Import to Ollama with: ollama create mesh-master-bot-v2 -f Modelfile")

## 📊 Step 6: Test the Model (Optional)

In [None]:
# Quick inference test
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "/content/outputs/merged"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

def test_prompt(question):
    messages = [{"role": "user", "content": question}]
    input_ids = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(model.device)

    outputs = model.generate(
        input_ids,
        max_new_tokens=150,
        temperature=0.5,
        repetition_penalty=1.15,
        do_sample=True
    )

    response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
    print(f"Q: {question}")
    print(f"A: {response}")
    print(f"Length: {len(response)} chars\n")

# Test questions
test_prompt("How do I relay to alice?")
test_prompt("What's a meshtastic router?")
test_prompt("Hey are you there?")