# DSPy RAG Experiment

Kurzer DSPy‑Durchlauf mit LiteLLM‑Proxy und dem bestehenden RAG‑Datensatz.

**Voraussetzung:** `dataset` ist bereits erzeugt (z. B. aus `01_rag_baseline.ipynb`).
Wenn nicht, lade/erzeuge ihn zuerst und führe dann dieses Notebook aus.


## Dataset erstellen (CSV + Qdrant)

Lädt Fragen/Antworten aus der CSV, holt Kontexte aus Qdrant und erzeugt ein `dataset`.


In [10]:
import pandas as pd
from pathlib import Path
from datasets import Dataset
from litellm_client import (
    load_llm_config,
    load_vectordb_config,
    get_qdrant_client,
    get_embeddings,
)

def _retrieve_contexts(question: str, k: int, client, collection_name: str, llm_cfg):
    query_emb = get_embeddings([question], llm_cfg, batch_size=1)[0]
    results = client.query_points(
        collection_name=collection_name,
        query=query_emb,
        limit=k,
    ).points
    return [res.payload.get('text', '') for res in results]

def build_eval_dataset(
    csv_path: str = '../GrundschutzKI_Fragen-Antworten-Fundstellen.csv',
    top_k: int = 5,
) -> Dataset:
    llm_cfg = load_llm_config()
    vec_cfg = load_vectordb_config()
    qdrant_client = get_qdrant_client(vec_cfg)
    collection_name = vec_cfg.collection or 'grundschutz_xml'

    df = pd.read_csv(Path(csv_path), sep=';', encoding='utf-8-sig')
    records = []

    for _, row in df.iterrows():
        question = row['Frage']
        ground_truth = row['Antwort']
        contexts = _retrieve_contexts(question, top_k, qdrant_client, collection_name, llm_cfg)

        records.append({
            'question': question,
            'contexts': contexts,
            'ground_truth': ground_truth,
        })

    return Dataset.from_list(records)

dataset = build_eval_dataset(top_k=5)
dataset


  return QdrantClient(url=url, api_key=cfg.api_key)


Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing embeddings 0 to 1 / 1
Processing

Dataset({
    features: ['question', 'contexts', 'ground_truth'],
    num_rows: 42
})

In [11]:
import dspy
from litellm_client import load_llm_config

llm_cfg = load_llm_config()

# LiteLLM‑Proxy (OpenAI‑kompatibel)
dspy_llm = dspy.LM(
    model=llm_cfg.model,
    api_base=llm_cfg.api_base,
    api_key=llm_cfg.api_key,
    temperature=0.2,
)

dspy.configure(lm=dspy_llm)


In [12]:
class RAGAnswer(dspy.Signature):
    """Answer using only the provided context."""
    question: str = dspy.InputField()
    context: str = dspy.InputField()
    response: str = dspy.OutputField(desc="Antwort auf Deutsch, kurz und präzise, maximal 2–3 Sätze.")

class RAGModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.predict = dspy.Predict(
            RAGAnswer,
            instructions="Antworte auf Deutsch, kurz und präzise, max. 2–3 Sätze. Nutze nur den Kontext.",
        )

    def forward(self, question, context):
        return self.predict(question=question, context=context)

rag = RAGModule()


In [15]:
# Beispielausgaben
for i in range(3):
    row = dataset[i]
    context = "\n\n".join(row["contexts"])
    pred = rag(question=row["question"], context=context)
    print(f"\n--- SAMPLE {i} ---")
    print("QUESTION:", row["question"])
    print("PREDICTED ANSWER:", pred.response)
    print("GROUND TRUTH:", row["ground_truth"])



--- SAMPLE 0 ---
QUESTION: Was ist der Unterschied zwischen Prozess- und Systembausteinen?
PREDICTED ANSWER: Prozess‑Bausteine beschreiben sicherheitsrelevante Vorgänge, organisatorische und betriebliche Maßnahmen und gelten in der Regel für den gesamten Informationsverbund oder große Teile davon. System‑Bausteine hingegen werden auf konkrete Zielobjekte wie Anwendungen, IT‑Systeme, Geräte oder Gebäude angewendet und behandeln deren spezifische Sicherheitsaspekte.
GROUND TRUTH: Prozess-Bausteine gelten in der Regel für sämtliche oder große Teile des Informationsverbunds gleichermaßen, System-Bausteine lassen sich in der Regel auf einzelne Objekte oder Gruppen von Objekten anwenden. Die Prozess- und System-Bausteine bestehen wiederum aus weiteren Teilschichten. In den Hinweisen zum Schichtenmodell und zur Modellierung wird beschrieben, wann ein einzelner Baustein sinnvollerweise eingesetzt werden soll und auf welche Zielobjekte er anzuwenden ist. 

--- SAMPLE 1 ---
QUESTION: Welche gru

## DSPy Optimizer (MIPROv2)

Optimiert die Prompt‑Instruktionen für das RAG‑Programm. Kann kostenintensiv sein.


In [30]:
import nest_asyncio, asyncio
from ragas.metrics.collections import AnswerCorrectness
from ragas.embeddings.litellm_provider import LiteLLMEmbeddings
from ragas.llms import llm_factory
import instructor, litellm
nest_asyncio.apply()
# RAGAS LLM
litellm.api_base = llm_cfg.api_base
litellm.api_key = llm_cfg.api_key
client = instructor.from_litellm(litellm.acompletion, mode=instructor.Mode.MD_JSON)
ragas_llm = llm_factory(llm_cfg.model, client=client, adapter="litellm", model_args={"temperature": 0.2})

# Embeddings (für Similarity)
embeddings = LiteLLMEmbeddings(
    model=llm_cfg.embedding_model,
    api_key=llm_cfg.api_key,
    api_base=llm_cfg.api_base,
    encoding_format="float",
)

ac = AnswerCorrectness(llm=ragas_llm, embeddings=embeddings)

def ragas_ac_metric(example, pred, trace=None):
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(
        ac.ascore(
            user_input=example.question,
            response=pred.response,
            reference=example.response,
        )
    )
    return result.value




In [31]:
import dspy
from dspy.evaluate import SemanticF1

# DSPy Examples aus dem vorhandenen Dataset
examples = []
for row in dataset:
    context = "\n\n".join(row["contexts"])
    examples.append(
        dspy.Example(question=row["question"], context=context, response=row["ground_truth"])
            .with_inputs("question", "context")
    )


# einfache Splits
trainset = examples[: max(1, len(examples)//5)]
devset = examples[max(1, len(examples)//5):]

# metric = SemanticF1(decompositional=True)

# Optimizer (wenig Threads zum Start)
tp = dspy.MIPROv2(metric=ragas_ac_metric, auto='light', num_threads=4)
optimized_rag = tp.compile(rag, trainset=trainset)


2026/01/29 12:06:46 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 10
minibatch: False
num_fewshot_candidates: 6
num_instruct_candidates: 3
valset size: 6

2026/01/29 12:06:46 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2026/01/29 12:06:46 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2026/01/29 12:06:46 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...


Bootstrapping set 1/6
Bootstrapping set 2/6
Bootstrapping set 3/6


100%|██████████| 2/2 [00:28<00:00, 14.31s/it]


Bootstrapped 2 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 4/6


100%|██████████| 2/2 [00:25<00:00, 12.96s/it]


Bootstrapped 2 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 5/6


 50%|█████     | 1/2 [00:12<00:12, 12.86s/it]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 6/6


 50%|█████     | 1/2 [00:12<00:12, 12.26s/it]
2026/01/29 12:08:05 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2026/01/29 12:08:05 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2026/01/29 12:08:05 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=3 instructions...



Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.


2026/01/29 12:08:29 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2026/01/29 12:08:29 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Answer using only the provided context.

2026/01/29 12:08:29 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Beantworte die gestellte Frage **ausschließlich** mit Informationen aus dem übergebenen Kontext. Formuliere die Antwort in korrektem Deutsch, **maximal 2 – 3 Sätze** lang, und nutze dabei die im Kontext vorkommenden normativen Formulierungen (z. B. „MUSS“, „SOLLEN“, „DÜRFEN“). Gib nur die relevante Anforderung wieder, **ohne Zitate, Abschnitts‑ oder Nummernangaben**.

2026/01/29 12:08:29 INFO dspy.teleprompt.mipro_optimizer_v2: 2: Formuliere die Antwort **ausschließlich** auf Basis des übergebenen Kontext‑Texts.  
- **Sprache:** Deutsch, **Stil:** genauso normativ wie im Kontext (z. B. „MUSS“, „SOLLTEN“, „DÜRFEN“).  
- **Umfang:** höchstens **drei Sätze** (idealerweise zwei), prägnant und ohne Abschweifungen.  
- **Inhal

  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-220' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-224' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-234' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 0.86 / 1 (86.5%):  17%|█▋        | 1/6 [00:09<00:47,  9.47s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-252' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 1.86 / 2 (93.1%):  33%|███▎      | 2/6 [00:12<00:22,  5.73s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-296' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task exception was never retrieved
future: <Task finished name='Task-296' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> exception=RuntimeError('cannot reuse already awaited coroutine')>
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 385, in __wakeup
    future.result()
  File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/futures.py", line 197, in result
    rais

Average Metric: 3.60 / 4 (89.9%):  67%|██████▋   | 4/6 [00:15<00:05,  2.68s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-216' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-228' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-242' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task

Average Metric: 4.58 / 5 (91.6%):  83%|████████▎ | 5/6 [00:24<00:04,  4.94s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-343' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.76 / 6 (79.3%): 100%|██████████| 6/6 [00:27<00:00,  4.59s/it]

2026/01/29 12:08:56 INFO dspy.evaluate.evaluate: Average Metric: 4.755437597221299 / 6 (79.3%)
2026/01/29 12:08:56 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 79.26

  sampler = optuna.samplers.TPESampler(seed=seed, multivariate=True)
2026/01/29 12:08:56 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 10 =====



  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-381' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-393' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-405' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 0.91 / 1 (90.6%):  17%|█▋        | 1/6 [00:14<01:11, 14.21s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-417' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-423' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 1.62 / 2 (81.2%):  33%|███▎      | 2/6 [00:16<00:29,  7.45s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-356' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-389' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-401' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task

Average Metric: 3.49 / 4 (87.2%):  50%|█████     | 3/6 [00:18<00:14,  4.67s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-457' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-461' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-467' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-473' coro=<LoggingW

Average Metric: 4.64 / 6 (77.4%): 100%|██████████| 6/6 [00:35<00:00,  5.91s/it]

2026/01/29 12:09:32 INFO dspy.evaluate.evaluate: Average Metric: 4.644471262508276 / 6 (77.4%)
2026/01/29 12:09:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.41 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3'].
2026/01/29 12:09:32 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41]
2026/01/29 12:09:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.26


2026/01/29 12:09:32 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 10 =====





Task was destroyed but it is pending!
task: <Task pending name='Task-513' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-519' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-563' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-575' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-488' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-503' c

Average Metric: 0.99 / 1 (98.7%):  17%|█▋        | 1/6 [00:13<01:05, 13.19s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-595' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 1.89 / 2 (94.7%):  33%|███▎      | 2/6 [00:14<00:23,  5.98s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-617' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.37 / 4 (84.3%):  50%|█████     | 3/6 [00:14<00:10,  3.58s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-636' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-642' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-648' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-657' coro=<LoggingW

Average Metric: 4.54 / 6 (75.7%): 100%|██████████| 6/6 [00:35<00:00,  5.89s/it]

2026/01/29 12:10:07 INFO dspy.evaluate.evaluate: Average Metric: 4.540696019468345 / 6 (75.7%)
2026/01/29 12:10:07 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.68 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2026/01/29 12:10:07 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68]
2026/01/29 12:10:07 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.26


2026/01/29 12:10:07 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 10 =====



  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-599' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-632' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-663' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task

Average Metric: 1.68 / 2 (84.0%):  33%|███▎      | 2/6 [00:15<00:25,  6.46s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-790' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 2.59 / 3 (86.2%):  50%|█████     | 3/6 [00:17<00:13,  4.48s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-806' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-819' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.57 / 4 (89.4%):  67%|██████▋   | 4/6 [00:20<00:07,  3.90s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-849' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-861' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.73 / 6 (78.9%): 100%|██████████| 6/6 [00:34<00:00,  5.77s/it]

2026/01/29 12:10:42 INFO dspy.evaluate.evaluate: Average Metric: 4.734105061854381 / 6 (78.9%)
2026/01/29 12:10:42 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 78.9 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5'].
2026/01/29 12:10:42 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9]
2026/01/29 12:10:42 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.26


2026/01/29 12:10:42 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 10 =====



  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-754' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-760' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-784' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task

Average Metric: 1.47 / 2 (73.4%):  33%|███▎      | 2/6 [00:15<00:25,  6.28s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-968' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-974' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-978' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.19 / 4 (79.8%):  67%|██████▋   | 4/6 [00:15<00:04,  2.30s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-997' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1006' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.28 / 6 (71.3%): 100%|██████████| 6/6 [00:35<00:00,  5.94s/it]

2026/01/29 12:11:18 INFO dspy.evaluate.evaluate: Average Metric: 4.27747814139598 / 6 (71.3%)
2026/01/29 12:11:18 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 71.29 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2'].
2026/01/29 12:11:18 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9, 71.29]
2026/01/29 12:11:18 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.26


2026/01/29 12:11:18 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 10 =====
Task was destroyed but it is pending!
task: <Task pending name='Task-1035' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1041' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub




Task was destroyed but it is pending!
task: <Task pending name='Task-905' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>


  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-919' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-929' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-937' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task

Average Metric: 0.86 / 1 (86.2%):  17%|█▋        | 1/6 [00:09<00:49,  9.82s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1116' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 1.54 / 2 (76.8%):  33%|███▎      | 2/6 [00:11<00:19,  4.79s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1158' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1162' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.51 / 4 (87.7%):  67%|██████▋   | 4/6 [00:14<00:04,  2.46s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1176' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1186' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.61 / 6 (76.9%): 100%|██████████| 6/6 [00:26<00:00,  4.46s/it]

2026/01/29 12:11:45 INFO dspy.evaluate.evaluate: Average Metric: 4.613956707127683 / 6 (76.9%)
2026/01/29 12:11:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 76.9 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5'].
2026/01/29 12:11:45 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9, 71.29, 76.9]
2026/01/29 12:11:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.26


2026/01/29 12:11:45 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 10 =====
Task was destroyed but it is pending!
task: <Task pending name='Task-1207' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>





Task was destroyed but it is pending!
task: <Task pending name='Task-1089' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>


  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-1093' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1110' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1122' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <T

Average Metric: 0.99 / 1 (98.7%):  17%|█▋        | 1/6 [00:08<00:42,  8.52s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1283' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.71 / 4 (92.8%):  67%|██████▋   | 4/6 [00:12<00:04,  2.11s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1323' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1336' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1355' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.88 / 6 (81.4%): 100%|██████████| 6/6 [00:26<00:00,  4.43s/it]

2026/01/29 12:12:11 INFO dspy.evaluate.evaluate: Average Metric: 4.883664296682242 / 6 (81.4%)
2026/01/29 12:12:11 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 81.39
2026/01/29 12:12:11 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.39 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2026/01/29 12:12:11 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9, 71.29, 76.9, 81.39]
2026/01/29 12:12:11 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.39


2026/01/29 12:12:11 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 10 =====
Task was destroyed but it is pending!
task: <Task pending name='Task-1367' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>



  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-1249' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1259' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1265' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <T

Average Metric: 0.99 / 1 (98.8%):  17%|█▋        | 1/6 [00:14<01:13, 14.75s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1451' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1457' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.36 / 4 (84.1%):  67%|██████▋   | 4/6 [00:20<00:07,  3.54s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1496' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1501' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1508' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.43 / 6 (73.9%): 100%|██████████| 6/6 [00:35<00:00,  5.85s/it]

2026/01/29 12:12:46 INFO dspy.evaluate.evaluate: Average Metric: 4.431610512652604 / 6 (73.9%)
2026/01/29 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 73.86 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2026/01/29 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9, 71.29, 76.9, 81.39, 73.86]
2026/01/29 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.39


2026/01/29 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 10 =====





Task was destroyed but it is pending!
task: <Task pending name='Task-1539' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-1545' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1410' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1428' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending n

Average Metric: 3.45 / 4 (86.2%):  67%|██████▋   | 4/6 [00:13<00:05,  2.51s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1674' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task exception was never retrieved
future: <Task finished name='Task-1704' coro=<LoggingWorker._process_log_task() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:92> exception=ValueError('task_done() called too many times')>
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 314, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site

Average Metric: 4.61 / 6 (76.8%): 100%|██████████| 6/6 [00:23<00:00,  3.95s/it]

2026/01/29 12:13:10 INFO dspy.evaluate.evaluate: Average Metric: 4.60910326948416 / 6 (76.8%)
2026/01/29 12:13:10 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 76.82 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2026/01/29 12:13:10 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9, 71.29, 76.9, 81.39, 73.86, 76.82]
2026/01/29 12:13:10 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.39


2026/01/29 12:13:10 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 10 =====





Task was destroyed but it is pending!
task: <Task pending name='Task-1703' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-1591' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1603' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1611' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <T

Average Metric: 0.99 / 1 (98.8%):  17%|█▋        | 1/6 [00:09<00:48,  9.75s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1785' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.55 / 4 (88.8%):  50%|█████     | 3/6 [00:12<00:09,  3.28s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1827' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1831' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1837' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1843' coro=<Logg

Average Metric: 4.55 / 6 (75.9%): 100%|██████████| 6/6 [00:23<00:00,  3.95s/it]

2026/01/29 12:13:34 INFO dspy.evaluate.evaluate: Average Metric: 4.552441417346114 / 6 (75.9%)
2026/01/29 12:13:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.87 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2026/01/29 12:13:34 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9, 71.29, 76.9, 81.39, 73.86, 76.82, 75.87]
2026/01/29 12:13:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.39


2026/01/29 12:13:34 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 10 =====
Task was destroyed but it is pending!
task: <Task pending name='Task-1871' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>





Task was destroyed but it is pending!
task: <Task pending name='Task-1877' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


  0%|          | 0/6 [00:00<?, ?it/s]

Task was destroyed but it is pending!
task: <Task pending name='Task-1745' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1761' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1767' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <T

Average Metric: 0.99 / 1 (98.7%):  17%|█▋        | 1/6 [00:13<01:09, 13.95s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1959' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1963' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 2.73 / 3 (90.9%):  50%|█████     | 3/6 [00:17<00:12,  4.31s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-1989' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 3.71 / 4 (92.8%):  67%|██████▋   | 4/6 [00:22<00:09,  4.63s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-2007' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-2027' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.69 / 5 (93.7%):  83%|████████▎ | 5/6 [00:31<00:06,  6.11s/it]

Task was destroyed but it is pending!
task: <Task pending name='Task-2039' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-2045' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>
Task was destroyed but it is pending!
task: <Task pending name='Task-2051' coro=<LoggingWorker._worker_loop() done, defined at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:110> wait_for=<Future cancelled>>


Average Metric: 4.86 / 6 (81.1%): 100%|██████████| 6/6 [00:32<00:00,  5.45s/it]

2026/01/29 12:14:07 INFO dspy.evaluate.evaluate: Average Metric: 4.863083135722558 / 6 (81.1%)
2026/01/29 12:14:07 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.05 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 3'].
2026/01/29 12:14:07 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [79.26, 77.41, 75.68, 78.9, 71.29, 76.9, 81.39, 73.86, 76.82, 75.87, 81.05]
2026/01/29 12:14:07 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.39


2026/01/29 12:14:07 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 81.39!





In [32]:
split = max(1, len(dataset) // 5)
dev_rows = list(dataset)[split:]
dspy_dev_answers = []
for row in dev_rows:
    context = "\n\n".join(row["contexts"])
    pred = optimized_rag(question=row["question"], context=context)
    dspy_dev_answers.append(pred.response)

# Neues Dataset fürs Scoring
from datasets import Dataset

dev_dataset = Dataset.from_dict({
    "question": [r["question"] for r in dev_rows],
    "contexts": [r["contexts"] for r in dev_rows],
    "ground_truth": [r["ground_truth"] for r in dev_rows],
    "answer": dspy_dev_answers,   # <- DSPy Antworten
})


Task was destroyed but it is pending!
task: <Task pending name='Task-1921' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1933' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1939' coro=<LoggingWorker._worker_loop() running at /Users/felixboelter/Documents/GitHub/pilotprojekt-GrundschutzKI/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_worker.py:121> wait_for=<Future pending cb=[Task.__wakeup()]>>
Task was destroyed but it is pending!
task: <T

## RAGAS Evaluation (DSPy Answers)


In [33]:
import asyncio
from ragas.llms import llm_factory
from ragas.embeddings.litellm_provider import LiteLLMEmbeddings
from ragas.metrics.collections import ContextPrecision, ContextRecall, Faithfulness, AnswerCorrectness
import instructor
import litellm

# RAGAS LLM (LiteLLM proxy)
litellm.api_base = llm_cfg.api_base
litellm.api_key = llm_cfg.api_key
client = instructor.from_litellm(litellm.acompletion, mode=instructor.Mode.MD_JSON)
ragas_llm = llm_factory(llm_cfg.model, client=client, adapter='litellm', model_args={'temperature': 0.2})

embeddings = LiteLLMEmbeddings(
    model=llm_cfg.embedding_model,
    api_key=llm_cfg.api_key,
    api_base=llm_cfg.api_base,
    encoding_format='float',
)

scorers = {
    'context_precision': ContextPrecision(llm=ragas_llm),
    'context_recall': ContextRecall(llm=ragas_llm),
    'faithfulness': Faithfulness(llm=ragas_llm),
    'answer_correctness': AnswerCorrectness(llm=ragas_llm, embeddings=embeddings),
}

async def _score_row(row, sem):
    async with sem:
        return {
            'context_precision': (await scorers['context_precision'].ascore(
                user_input=row['question'],
                reference=row['ground_truth'],
                retrieved_contexts=row['contexts'],
            )).value,
            'context_recall': (await scorers['context_recall'].ascore(
                user_input=row['question'],
                reference=row['ground_truth'],
                retrieved_contexts=row['contexts'],
            )).value,
            'faithfulness': (await scorers['faithfulness'].ascore(
                user_input=row['question'],
                response=row['answer'],
                retrieved_contexts=row['contexts'],
            )).value,
            'answer_correctness': (await scorers['answer_correctness'].ascore(
                user_input=row['question'],
                response=row['answer'],
                reference=row['ground_truth'],
            )).value,
        }

async def score_dataset_batched(ds, batch_size=10, concurrency=5):
    sem = asyncio.Semaphore(concurrency)
    rows = list(ds)
    results = []
    for i in range(0, len(rows), batch_size):
        batch = rows[i : i + batch_size]
        tasks = [asyncio.create_task(_score_row(r, sem)) for r in batch]
        results.extend(await asyncio.gather(*tasks))
    return results

scores = await score_dataset_batched(dev_dataset, batch_size=16, concurrency=10)
stats = {
    k: {
        'avg': sum(s[k] for s in scores) / len(scores),
        'min': min(s[k] for s in scores),
        'max': max(s[k] for s in scores),
    }
    for k in scores[0].keys()
}
print(stats)


{'context_precision': {'avg': 0.9248366012785777, 'min': 0.32499999998375, 'max': 0.99999999998}, 'context_recall': {'avg': 0.9387254901960784, 'min': 0.5, 'max': 1.0}, 'faithfulness': {'avg': 0.7920348442407266, 'min': 0.0, 'max': 1.0}, 'answer_correctness': {'avg': 0.6512394217363153, 'min': 0.13643740743746183, 'max': 0.9958830116606052}}


## Ergebnis-DataFrame


In [34]:
import pandas as pd

# DSPy-Antworten erzeugen (optimized_rag falls vorhanden)
_rag_model = optimized_rag if 'optimized_rag' in globals() else rag
dspy_answers = []
for row in dataset:
    context = "\n\n".join(row['contexts'])
    pred = _rag_model(question=row['question'], context=context)
    dspy_answers.append(pred.response)


# DataFrame zusammenbauen
df = pd.DataFrame({
    'question': [r['question'] for r in dev_dataset],
    'contexts': ["\n\n".join(r['contexts']) for r in dev_dataset],
    'answer_dspy': dspy_dev_answers,
    'ground_truth': [r['ground_truth'] for r in dev_dataset],
    'context_precision': [s['context_precision'] for s in scores],
    'context_recall': [s['context_recall'] for s in scores],
    'faithfulness': [s['faithfulness'] for s in scores],
    'answer_correctness': [s['answer_correctness'] for s in scores],
})


df.head()


Unnamed: 0,question,contexts,answer_dspy,ground_truth,context_precision,context_recall,faithfulness,answer_correctness
0,Was ist bei der Auswahl eines externen Webhost...,"gement (B)\nFür Prozesse, die potenziell ausge...",Bei der Auswahl eines externen Webhosters SOLL...,Bei der Nutzung externer Webhosting-Dienste so...,1.0,1.0,1.0,0.976099
1,Wie sollten Fehlermeldungen auf einem Webserve...,lständig dargestellt oder Sicherheitsmechanism...,Fehlermeldungen SOLLTEN weder Produktnamen noc...,Aus HTTP-Antworten und Fehlermeldungen dürfen ...,0.916667,1.0,1.0,0.954347
2,Welche Maßnahmen sind bei erhöhtem Schutzbedar...,"\n...\n\nStrict-Transport-Security,\n\n...\n\n...",Bei erhöhtem Schutzbedarf SOLLTEN Webserver re...,Bei erhöhtem Schutzbedarf sollten Webserver re...,0.45,1.0,0.833333,0.780775
3,Wer trägt die Verantwortung für Informationssi...,rungskatalog Cloud Computing (C5)“ Kriterien z...,Die Verantwortung für die Informationssicherhe...,Die Verantwortung für Informationssicherheit v...,1.0,1.0,1.0,0.995458
4,Welche Prozesse dürfen grundsätzlich ausgelage...,"gement (B)\nFür Prozesse, die potenziell ausge...","Nur Prozesse, die nach einer risikoorientierte...","Nur Prozesse, die risikoorientiert bewertet wu...",0.804167,1.0,0.333333,0.722852


In [38]:
print(df["question"][10])
print()
print(df["answer_dspy"][10])
print()
print(df["ground_truth"][10])


Muss es eine zentrale Stelle für Outsourcing geben?

Eine zentrale Stelle ist nicht zwingend vorgeschrieben; es SOLLTEN jedoch eine zuständige Person für das Auslagerungsmanagement benannt und ein zentrales Auslagerungsregister geführt werden.

Für Standard-Absicherung sollte eine zuständige Person für das Auslagerungsmanagement benannt werden.


In [39]:
print(df["contexts"][10])

für das jeweilige Outsourcing-Vorhaben vorlegen. Das Sicherheitskonzept der Anbietenden von Outsourcing und dessen Umsetzung SOLLTE zu einem gesamten Sicherheitskonzept zusammengeführt werden. Wenn Risikoanalysen notwendig sind, MÜSSEN Vereinbarungen getroffen werden, wie die Risikoanalyse geprüft und gegebenenfalls in das eigene Risikomanagement überführt werden kann. Die Nutzenden von Outsourcing oder unabhängige Dritte MÜSSEN regelmäßig überprüfen, ob das Sicherheitskonzept wirksam ist.

...

OPS.2.3.A7 Regelungen für eine geplante oder ungeplante Beendigung eines Outsourcing-Verhältnisses (B) [Fachverantwortliche, Institutionsleitung]
Für geplante sowie ungeplante Beendigungen des Outsourcing-Verhältnisses MÜSSEN Regelungen getroffen werden. Es MUSS festgelegt werden, wie alle Informationen, Daten und Hardware der Nutzenden vom Anbietenden von Outsourcing zurückgegeben werden. Hierbei MÜSSEN gesetzliche Vorgaben zur Aufbewahrung von Daten beachtet werden. Ferner SOLLTE überprüft we