# Feature Track 1.1: Synthetic Dataset Generation

While `Feature Track 1` evaluated our RAG pipeline against 5 manual queries, a production-grade system requires a **Validation Dataset** (often called a "Gold Dataset") consisting of hundreds of Q&A pairs.

Manually writing these is slow and subject to human bias. In this notebook, we use **LLM-assisted Synthetic Data Generation** to create a robust evaluation suite directly from our knowledge base.

### The Goal
To generate a diverse set of `(Query, Context, Ground Truth)` triplets that specifically target the failure modes identified in the baseline:
* **Hallucinations** regarding non-existent products (e.g., Lara Pallet).
* **Temporal Conflicts** between old and new EPD figures.
* **Compliance Risks** regarding unverified sustainability claims.

### The Generation Workflow



1.  **Source Sampling:** Extract high-quality chunks from our existing `ChromaDB` vector store.
2.  **LLM "Evolution":** Use an LLM to transform simple questions into complex, multi-hop, or adversarial queries.
3.  **Ground Truth Labeling:** Use a stronger LLM (the "Oracle") to provide the definitive answer based *only* on the provided chunks.
4.  **Export:** Save the dataset in a format compatible with RAGAS for automated testing.

### Why not just use the 5 manual queries?

| Approach | Scalability | Diversity | Effort |
| :--- | :--- | :--- | :--- |
| **Manual** | Low | Low (Human bias) | High |
| **Synthetic** | High | High (Systematic) | Low |

## Phase 1: Manual "Golden Query" Creation

Before automating with synthetic data, we must establish a small, high-quality **Reference Set** (10–20 queries). These manual queries serve as the "True North" for our RAG system, representing the most critical questions a user might ask.

### Why Manual Queries Still Matter
While synthetic data provides scale, manual queries capture **business intent**. They allow us to:
* **Target known failure modes:** Explicitly test if the system still hallucinates "Lara Pallet."
* **Verify nuance:** Test if the system can distinguish between a "verified EPD" and a "self-declared claim."
* **Benchmark the LLM Judge:** We use these to see if RAGAS scores align with our human intuition.

### Strategies for High-Quality Queries
To build a robust validation set, we use four specific query types:

| Query Type | Description | Example |
| :--- | :--- | :--- |
| **Direct Fact** | Specific data points found in one chunk. | "What is the CO2e for the Logypal 1?" |
| **Negative Constraint** | Questions about products or facts that *do not* exist. | "Does we offer the Lara Pallet?" |
| **Temporal Conflict** | Queries where the answer changed recently. | "What is the latest verified GWP for tesa 68%?" |
| **Multi-hop** | Requires connecting info from two different documents. | "Which suppliers are non-compliant with the 2025 EPD rule?" |

### The "Ground Truth" Requirement
For every manual query, we must provide:
1. **The Query:** The exact string the user would type.
2. **The Category:** The failure mode or intent it tests (e.g., "hallucination", "temporal_conflict").
2. **The Reference Context:** The specific document IDs or text snippets that *should* have been retrieved.
3. **The Ground Truth:** The perfect, concise answer the LLM should have generated.

---

### Implementation Task: Defining our Golden Set
In the cell below, we will define a list of dictionaries containing our manual validation samples to be used alongside the synthetic ones.

In [24]:
manual_samples = [
  {
    "id": "1",
    "category": "portfolio_scope",
    "query": "Does PrimePack AG offer a product called the \"Lara Pallet\"?",
    "expected_answer": "No. The Lara Pallet is not part of PrimePack AG's portfolio. The product catalog explicitly lists it under products that are not offered. The active pallet portfolio consists of: Noé Pallet (32-100), Wooden Pallet 1208 (32-101), Recycled Plastic Pallet (32-102), Logypal 1 (32-103), LogyLight (32-104), and EP 08 (32-105). Customers should be referred to the current product catalog.",
    "sources": ["data/artificial_markdown/ART_product_catalog.pdf"],
  },
  {
    "id": "2",
    "category": "claim_verification",
    "query": "Can the 68% CO₂ reduction claim for the tesapack ECO & ULTRA STRONG ecoLogo (product 50-102) be included in a customer sustainability response?",
    "expected_answer": "No, not as a stated fact. It is an internal assessment by Tesa SE (Level B/C evidence) and lacks third-party LCA/EPD verification. Policy requires Level A (verified EPD) for customer-facing facts. It may only be mentioned with the caveat: 'self-declared by Tesa SE, not independently verified.' Carbon neutrality targets for 2025 must be labeled as forward-looking goals.",
    "sources": [
      "data/artificial_markdown/ART_supplier_brochure_tesa_ECO.pdf",
      "data/artificial_markdown/ART_internal_procurement_policy.pdf"
    ],
  },
  {
    "id": "3",
    "category": "missing_data",
    "query": "What verified environmental data is available for the LogyLight pallet (product 32-104)?",
    "expected_answer": "None. The datasheet states GWP and impact data are 'not yet available'. While an LCA (REL-LCA-2024-07) is commissioned, no verified figures exist. The 75% recycled content is a self-declaration, not an audit. It must not be used in customer-facing comparisons until an EPD is published.",
    "sources": ["data/artificial_markdown/ART_logylight_incomplete_datasheet.pdf"],
  },
  {
    "id": "4",
    "category": "missing_data",
    "query": "Are any of PrimePack AG's tape products confirmed to be PFAS-free?",
    "expected_answer": "No. As of January 2025, no PFAS declarations have been received from IPG or Tesa SE. This is an open non-compliance item. The mention of 'solvent-free' adhesives does not equal a PFAS-free declaration. No tape may be described as PFAS-free yet.",
    "sources": [
      "data/artificial_markdown/ART_internal_procurement_policy.pdf",
      "data/ART_response_inquiry_frische_felder.pdf"
    ],
  },
  {
    "id": "5",
    "category": "source_conflict",
    "query": "Which GWP figure should be cited for the Relicyc Logypal 1 pallet (product 32-103), and why?",
    "expected_answer": "The 2023 third-party verified EPD (No. S-P-10482) is the authoritative source. The 2021 datasheet citing 4.1 kg CO₂e is SUPERSEDED and used outdated methodology. Policy requires preferring the most recent third-party verified source and flagging the conflict.",
    "sources": [
      "data/EPD_pallet_relicyc_logypal1.pdf",
      "data/artificial_markdown/ART_relicyc_logypal1_datasheet_2021.pdf",
      "data/artificial_markdown/ART_internal_procurement_policy.pdf"
    ],
  },
    # ------------------------------------------------------------------------------------------------------------------------------------------------------
    # -----------------------------------------------------------New manually created queries---------------------------------------------------------------
    # ------------------------------------------------------------------------------------------------------------------------------------------------------
  {
    "id": "6",
    "category": "direct_fact",
    "query": "What is the total 'Climate Change - total' GWP figure for the IPG F4090-05 machine roll tape per square meter?",
    "expected_answer": "The total Climate Change GWP for the IPG F4090-05 machine roll is 2.03E-01 kg CO2 eq. per m2. This is comprised of 1.29E-01 kg CO2 eq. from the upstream stage, 6.69E-02 kg CO2 eq. from the core stage, and 7.50E-03 kg CO2 eq. from the downstream stage.",
    "sources": ["data/EPD_tape_IPG_hotmelt.pdf"]
  },
  {
    "id": "6",
    "category": "Direct Fact",
    "query": "What is the weight and static load capacity of the Stabilplastik EP 08 pallet?",
    "expected_answer": "The Stabilplastik EP 08 pallet has a weight of 25 kg and a static load capacity of 10,000 kg. It is designed for rigorous industrial use and is resistant to moisture, pests, and mold.",
    "sources": ["data/EPD_pallet_stabilplastik_ep08.pdf"]
  },
  {
    "id": "8",
    "category": "Direct Fact",
    "query": "How does the EU Commission define 'Transition Risks' in the context of climate reporting?",
    "expected_answer": "Transition risks are defined as risks to a company arising from the transition to a low-carbon and climate-resilient economy. These include policy risks (e.g., carbon-pricing), legal risks (litigation), technology risks (replacement by cleaner tech), market risks (shifting consumer choices), and reputational risks.",
    "sources": ["data/REF_eu_climate_reporting_guidelines.pdf"]
  },
  {
    "id": "9",
    "category": "Multi-hop",
    "query": "Does the Stabilplastik EP 08 pallet meet the 2025 carbon neutrality target requirements for customer sustainability responses?",
    "expected_answer": "The Stabilplastik EP 08 has a third-party verified EPD valid until 2028, meeting the policy's requirement for verified facts (Level A evidence). however, any claims regarding 'carbon neutrality' must still be labeled as forward-looking goals per the internal policy, as the EPD itself focuses on life-cycle impacts rather than a neutrality guarantee.",
    "sources": [
      "data/EPD_pallet_stabilplastik_ep08.pdf",
      "data/artificial_markdown/ART_internal_procurement_policy.pdf"
    ]
  },
  {
    "id": "10",
    "category": "Direct Fact",
    "query": "What is the primary material source for the Relicyc Logypal 1 according to its 2021 datasheet?",
    "expected_answer": "The Logypal 1 is manufactured from 100% post-consumer recycled plastic. This material is primarily sourced from end-of-life agricultural packaging (such as silage film) and industrial packaging waste.",
    "sources": ["data/artificial_markdown/ART_relicyc_logypal1_datasheet_2021.pdf"]
  }
]

## Phase 2: Synthetic Data Generation (Source Sampling)

To scale our validation set, we use the documents already stored in our Vector DB. This process involves:

1. **Random Sampling**: Selecting diverse chunks from the vector store to ensure we cover various products and policies.
2. **Context Injection**: Providing these chunks to a "Generator LLM" to draft realistic user queries.
3. **Answer Synthesis**: Using an "Oracle LLM" to write the ground truth based strictly on the provided text.

This ensures that our evaluation isn't just testing the LLM's general knowledge, but its ability to retrieve and reason over *our specific* proprietary data.

In [None]:
import os
import pathlib
import warnings

from conversational_toolkit.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from conversational_toolkit.vectorstores.chromadb import ChromaDBVectorStore

from sme_kt_zh_collaboration_rag.feature0_baseline_rag import (
    EMBEDDING_MODEL,
    VS_PATH,
    build_llm,
)

from dotenv import load_dotenv
load_dotenv(dotenv_path="../../.env.local")
warnings.filterwarnings("ignore", category=DeprecationWarning)

_secret_path = pathlib.Path("/secrets/OPENAI_API_KEY")
if "OPENAI_API_KEY" not in os.environ and _secret_path.exists():
    os.environ["OPENAI_API_KEY"] = _secret_path.read_text().strip()

RETRIEVER_TOP_K = 5
BACKEND = "openai"  # "ollama" or "openai"

if not BACKEND:
    raise ValueError('Set BACKEND to "ollama" or "openai" before running.')

# RAG pipeline
embedding_model = SentenceTransformerEmbeddings(model_name=EMBEDDING_MODEL)
vs = ChromaDBVectorStore(db_path=str(VS_PATH))
llm = build_llm(backend=BACKEND)


print(f"Embedding model : {EMBEDDING_MODEL}")
print(f"Vector store    : {VS_PATH}")
print(f"RAG agent LLM   : {BACKEND}")
print("RAGAS judge LLM : gpt-4o-mini (OpenAI)")
print("Setup complete.")

In [12]:
import random

# 1. Fetch data from the existing vector store
# We get the raw text (documents) and metadata (sources)
data = vs.collection.get(include=['documents', 'metadatas'])

# 2. Organize into a list of potential contexts
all_chunks = [
    {"content": doc, "source": meta.get("source", "unknown")}
    for doc, meta in zip(data['documents'], data['metadatas'])
]

# 3. Sample a subset to generate questions from (e.g., 10 chunks)
SAMPLE_SIZE = 10
sampled_chunks = random.sample(all_chunks, min(SAMPLE_SIZE, len(all_chunks)))

# Display the first sampled chunk for verification
print(f"Sampled {len(sampled_chunks)} chunks for generation.")
print("-" * 30)
print(f"Source: {sampled_chunks[0]['source']}")
print(f"Content preview: {sampled_chunks[0]['content'][:200]}...")

Sampled 10 chunks for generation.
------------------------------
Source: ART_logylight_incomplete_datasheet.pdf
Content preview: ## Regulatory Compliance

|Requirement|Status| |---|---| |REACH (no SVHC above 0.1% w/w)|Confirmed| |RoHS|Not applicable (industrial product)| |PFAS declaration|Not yet provided| |ISPM 15|Not applicab...


In [20]:
from conversational_toolkit.llms.base import Roles
from conversational_toolkit.llms.base import LLMMessage

GENERATOR_PROMPT = """
Your task is to create a high-quality Question and Answer pair based STRICTLY on the provided Context.

Context:
{context}

Guidelines:
1. The Question should be something a procurement officer or sustainability auditor would ask.
2. The Answer must be factual, concise, and derived ONLY from the Context.
3. If the Context mentions a specific product, ID, or date, include it in the question or answer.
4. Format your response as:
QUESTION: <question>
ANSWER: <answer>
"""

synthetic_samples = []

for i, chunk in enumerate(sampled_chunks):
    prompt = GENERATOR_PROMPT.format(context=chunk['content'])

    # Generate the Q&A pair using the LLM
    response = await llm.generate([LLMMessage(role=Roles.SYSTEM, content=prompt)])
    content = response.content
    try:
        # Simple parsing logic
        parts = content.split("ANSWER:")
        question = parts[0].replace("QUESTION:", "").strip()
        answer = parts[1].strip()

        # Create EvaluationSample
        sample = {
            "query": question,
            "expected_answer": answer,
            "sources": [chunk['source']],
        }
        synthetic_samples.append(sample)
        print(f"[{i+1}/{len(sampled_chunks)}] Generated: {question[:50]}...")
    except Exception as e:
        print(f"[{i+1}/{len(sampled_chunks)}] Failed to parse response: {e}")

print(f"\nTotal Synthetic Samples: {len(synthetic_samples)}")


2026-02-24 15:33:51.996 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnluPow4sgOKInrCbz2SOKWiIsQ5', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What is the status of the PFAS declaration for the product in question?  \nANSWER: The PFAS declaration has not yet been provided.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943630, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_4ea526dd98', usage=CompletionUsage(completion_tokens=30, prompt_tokens=171, total_tokens=201, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))
2026-02-24 15:33:51.999 | INFO     | conversational_too

['QUESTION: What is the status of the PFAS declaration for the product in question?  \n', ' The PFAS declaration has not yet been provided.']
[1/10] Generated: What is the status of the PFAS declaration for the...


2026-02-24 15:33:53.722 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnlwGky7fqqfSE80S7I2yYnQyUOb', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What is the deadline for PrimePack AG to stop accepting products with intentionally added PFAS?  \nANSWER: Effective 1 July 2024, PrimePack AG will not accept new products containing intentionally added PFAS into the portfolio.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943632, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_373a14eb6f', usage=CompletionUsage(completion_tokens=48, prompt_tokens=197, total_tokens=245, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensD

['QUESTION: What is the deadline for PrimePack AG to stop accepting products with intentionally added PFAS?  \n', ' Effective 1 July 2024, PrimePack AG will not accept new products containing intentionally added PFAS into the portfolio.']
[2/10] Generated: What is the deadline for PrimePack AG to stop acce...


2026-02-24 15:33:55.082 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnlxNlvOuJOJPD762BiCsgnnUi1t', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What types of packaging products are not offered by PrimePack AG?  \nANSWER: PrimePack AG does not currently offer single-use bubble wrap or foam packaging, biodegradable tape products, or compostable packaging of any kind.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943633, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_373a14eb6f', usage=CompletionUsage(completion_tokens=45, prompt_tokens=169, total_tokens=214, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetai

['QUESTION: What types of packaging products are not offered by PrimePack AG?  \n', ' PrimePack AG does not currently offer single-use bubble wrap or foam packaging, biodegradable tape products, or compostable packaging of any kind.']
[3/10] Generated: What types of packaging products are not offered b...


2026-02-24 15:33:57.007 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnlzEp40RsdMXoN6ABHTAGYI0ZKE', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: Which products in the Tape category have an Environmental Product Declaration (EPD) and what are their IDs?\nANSWER: The products in the Tape category with an Environmental Product Declaration (EPD) are the Pressure-Sensitive Hot Melt Carton Sealing Tape (ID: 50-100) and Water-Activated Tape (ID: 50-101).', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943635, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_373a14eb6f', usage=CompletionUsage(completion_tokens=73, prompt_tokens=375, total_tokens=448, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoni

['QUESTION: Which products in the Tape category have an Environmental Product Declaration (EPD) and what are their IDs?\n', ' The products in the Tape category with an Environmental Product Declaration (EPD) are the Pressure-Sensitive Hot Melt Carton Sealing Tape (ID: 50-100) and Water-Activated Tape (ID: 50-101).']
[4/10] Generated: Which products in the Tape category have an Enviro...


2026-02-24 15:33:58.951 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnm16oJzhIpyO6ofF8OYMpO5BhhC', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What materials are used to produce the LogyLight pallet, and what is its primary target market?  \nANSWER: The LogyLight pallet is produced from post-consumer recycled HDPE collected from industrial packaging waste streams, primarily silage film and industrial wrapping. Its primary target markets are food distribution, retail, and pharmaceutical logistics.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943637, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_373a14eb6f', usage=CompletionUsage(completion_tokens=68, prompt_tokens=183, total_tokens=251, completion_tokens_details=CompletionTokensDetails(a

['QUESTION: What materials are used to produce the LogyLight pallet, and what is its primary target market?  \n', ' The LogyLight pallet is produced from post-consumer recycled HDPE collected from industrial packaging waste streams, primarily silage film and industrial wrapping. Its primary target markets are food distribution, retail, and pharmaceutical logistics.']
[5/10] Generated: What materials are used to produce the LogyLight p...


2026-02-24 15:34:01.004 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnm3nQYQJStNF20QQ2p2r0CZcXRl', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What happens to suppliers who fail to meet requirements by the stated deadlines?  \nANSWER: Suppliers failing to meet requirements by the stated deadlines will be placed on "sustainability review" status, which is flagged in the procurement system, must be disclosed when the relevant product is offered to customers, and may affect future procurement decisions at management discretion. Exceptions require written approval from the CEO.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943639, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_373a14eb6f', usage=CompletionUsage(completion_tokens=78, prompt_to

['QUESTION: What happens to suppliers who fail to meet requirements by the stated deadlines?  \n', ' Suppliers failing to meet requirements by the stated deadlines will be placed on "sustainability review" status, which is flagged in the procurement system, must be disclosed when the relevant product is offered to customers, and may affect future procurement decisions at management discretion. Exceptions require written approval from the CEO.']
[6/10] Generated: What happens to suppliers who fail to meet require...


2026-02-24 15:34:02.128 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnm5xwKK1LhoPBGR1ZlF0FJ8ruWu', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What are the categories outlined in the requirements for procurement officers or sustainability auditors?  \nANSWER: The context does not specify the categories outlined in the requirements.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943641, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_373a14eb6f', usage=CompletionUsage(completion_tokens=32, prompt_tokens=114, total_tokens=146, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=

['QUESTION: What are the categories outlined in the requirements for procurement officers or sustainability auditors?  \n', ' The context does not specify the categories outlined in the requirements.']
[7/10] Generated: What are the categories outlined in the requiremen...


2026-02-24 15:34:03.090 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnm6yrfYzHeBPbgyDgUOxXDI9dSx', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: When was the Product Portfolio Policy & Supplier Catalog last updated?  \nANSWER: The Product Portfolio Policy & Supplier Catalog was last updated in January 2025.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943642, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_373a14eb6f', usage=CompletionUsage(completion_tokens=33, prompt_tokens=126, total_tokens=159, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))
2026-02-24 15:34:03.09

['QUESTION: When was the Product Portfolio Policy & Supplier Catalog last updated?  \n', ' The Product Portfolio Policy & Supplier Catalog was last updated in January 2025.']
[8/10] Generated: When was the Product Portfolio Policy & Supplier C...


2026-02-24 15:34:04.278 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnm742ivsHcygnrQWlWHM60mQ9uZ', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What certifications were held at the time of publication in 2021?  \nANSWER: None held at time of publication. EPD in planning.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943643, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_3ee6fe3e89', usage=CompletionUsage(completion_tokens=31, prompt_tokens=125, total_tokens=156, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))
2026-02-24 15:34:04.279 | INFO     | conversational_toolki

['QUESTION: What certifications were held at the time of publication in 2021?  \n', ' None held at time of publication. EPD in planning.']
[9/10] Generated: What certifications were held at the time of publi...


2026-02-24 15:34:05.920 | DEBUG    | conversational_toolkit.llms.openai:generate:87 - Completion: ChatCompletion(id='chatcmpl-DCnm8CiSUN4PrfdaIJnrI5AA9Fjla', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='QUESTION: What is the compliance status of CPR System for products 32-101 and 32-102 as of January 2025?  \nANSWER: CPR System is non-compliant for products 32-101 and 32-102, as there is no EPD and only internal calculation has been performed.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1771943644, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_b4f1283ee2', usage=CompletionUsage(completion_tokens=63, prompt_tokens=417, total_tokens=480, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=Prompt

['QUESTION: What is the compliance status of CPR System for products 32-101 and 32-102 as of January 2025?  \n', ' CPR System is non-compliant for products 32-101 and 32-102, as there is no EPD and only internal calculation has been performed.']
[10/10] Generated: What is the compliance status of CPR System for pr...

Total Synthetic Samples: 10


In [23]:
print("Sample Synthetic Q&A Pair:")
print("-" * 30)
print(f"Query: {synthetic_samples[1]['query']}")
print(f"Expected Answer: {synthetic_samples[1]['expected_answer']}")
print(f"Sources: {synthetic_samples[1]['sources']}")

Sample Synthetic Q&A Pair:
------------------------------
Query: What is the deadline for PrimePack AG to stop accepting products with intentionally added PFAS?
Expected Answer: Effective 1 July 2024, PrimePack AG will not accept new products containing intentionally added PFAS into the portfolio.
Sources: ['ART_internal_procurement_policy.pdf']
