# Training Script

## Setup models

- **Teacher model**
    - should be the model you intend to user, or a stronger model.
    - This model can bootstrap examples and instructions to "train" the weaker model.

**Student model**
- the model you intend to use in the program.


Note: Using the `cache=True` setting you can restart the program to resume where it left off.

In [1]:
import os
from dotenv import load_dotenv
import dspy
import logging

logging.getLogger().setLevel(logging.INFO)
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("LiteLLM").setLevel(logging.WARNING)

# set your api key (if needed)
load_dotenv("../../.env")
APIKEY = os.getenv("APIKEY")

# set your model (litellm model strings)
student_model = "openrouter/meta-llama/llama-3.2-3b-instruct"
# student_model = "openrouter/meta-llama/llama-3-8b-instruct"
teacher_model = "openrouter/deepseek/deepseek-chat"
lm = dspy.LM(student_model, api_key=APIKEY, cache=True)
teacher_lm = dspy.LM(teacher_model, api_key=APIKEY, cache=True)
dspy.configure(lm=lm)

## Load the Training data

In [3]:
import glob
import json
from datetime import datetime

data = []
for file in glob.glob("training_data/accepted/*.json"):
    with open(file, "r") as fh:
        tmp = json.load(fh)

        # convert to date
        tmp["publication_date"] = datetime.strptime(tmp["publication_date"], "%Y-%m-%d").date() if tmp["publication_date"] else None

        # remove reasoning from example
        if "reasoning" in tmp:
            del tmp["reasoning"]

        e = dspy.Example(tmp).with_inputs("article_text")
        data.append(e)

print(f"# Examples: {len(data)}")

# Examples: 100


## Start Training

**WordLlamaScorer** handles some automatic scoring using `wordllama`.
I wrote some basic functions to autmatically evaluate the most common types:
- `str` using jaccard similarity for short strings and wordllama similarity for longer strings
- distance functions for `int`, `float` and `date` types
- exact matching for `Literal`

Of course, it's always better to write to your own specific use cases, but this may be sufficient for many programs.

MIPROv2 has worked well for me. I start with `medium` settings, which will run for quite a while. I would generally use the `heavy` setting and let it run overnight with sufficient training data for production.

In [None]:
import sys
sys.path.append("../scorer")
from scorer import WordLlamaScorer
from dspy.evaluate import Evaluate
from dspy.teleprompt import MIPROv2

from news_app import NewsAppSignature

training
scorer = WordLlamaScorer.from_signature(NewsAppSignature, skip_fields=["article_text", "reasoning"])


teleprompter = MIPROv2(
    metric=scorer,
    auto="medium",
    teacher_settings=dict(lm=teacher_lm),
    num_threads=2
)

catalog = dspy.ChainOfThought(NewsAppSignature)
optimized_program = teleprompter.compile(
    student=catalog.deepcopy(),
    teacher=catalog.deepcopy(),
    trainset=data,
    max_bootstrapped_demos=2,
    max_labeled_demos=2,
    requires_permission_to_run=False,
)

## Save Program

Save the training recipe. Use it in your application.

In [8]:
from datetime import date

for demo in optimized_program.demos:
    if 'publication_date' in demo and isinstance(demo['publication_date'], date):
        demo['publication_date'] = demo['publication_date'].isoformat()

# Save the state to a JSON file
optimized_program.save("miprov2_llama32_3b.json", save_program=False)
