# Fair Forge Generators - Alquimia Example

This notebook demonstrates how to use the Fair Forge generators module with **Alquimia** to create synthetic test datasets from context documents.

## Overview

The generators module provides:
- **BaseGenerator**: Base class that accepts any LangChain-compatible chat model
- **AlquimiaGenerator**: Adapter that wraps Alquimia's agent API as a LangChain model
- **AlquimiaChatModel**: LangChain-compatible adapter for Alquimia agents
- **BaseContextLoader**: Abstract interface for loading and chunking context documents
- **LocalMarkdownLoader**: Implementation for loading local markdown files with hybrid chunking

## How AlquimiaGenerator Works

The AlquimiaGenerator wraps the Alquimia client as a LangChain-compatible model. The `context`, `seed_examples`, and `num_queries` are extracted from the system prompt and passed as extra data kwargs to the agent.

## Installation
First, install Fair Forge with Alquimia support and required dependencies.

In [None]:
import sys

!uv pip install --python {sys.executable} --force-reinstall "$(ls ../../dist/*.whl)[generators-alquimia]" -q

## Setup

Import the required modules and configure your Alquimia credentials.

**Note:** The AlquimiaGenerator requires an agent configured in your Alquimia workspace. The context, seed examples, and num_queries are passed to the agent as extra data that gets injected into the agent's system prompt.

In [None]:
import json
import os
from pathlib import Path

sys.path.insert(0, os.path.dirname(os.getcwd()))

from fair_forge.generators import (
    RandomSamplingStrategy,
    create_alquimia_generator,
    create_markdown_loader,
)
from fair_forge.schemas import Dataset

In [None]:
import getpass

ALQUIMIA_API_KEY = getpass.getpass("Enter your Alquimia API key: ")
ALQUIMIA_URL = input("Enter Alquimia URL (default: https://api.alquimia.ai): ") or "https://api.alquimia.ai"
ALQUIMIA_AGENT_ID = input("Enter your Agent ID: ")
ALQUIMIA_CHANNEL_ID = input("Enter your Channel ID: ")

## Step 1: Create Context Loader

The context loader reads source documents and splits them into chunks for query generation.

The `LocalMarkdownLoader` uses a hybrid chunking strategy:
1. **Primary**: Split by markdown headers (H1, H2, H3)
2. **Fallback**: Split by character count for long sections without headers

In [3]:
# Create context loader with default settings
loader = create_markdown_loader(
    max_chunk_size=2000,  # Maximum characters per chunk
    min_chunk_size=200,  # Minimum characters per chunk
    overlap=100,  # Overlap between size-based chunks
    header_levels=[1, 2, 3],  # Split on H1, H2, H3 headers
)

print("Context loader created successfully")

[32m2026-01-15 17:38:15.850[0m | [1mINFO    [0m | [36mfair_forge.generators[0m:[36mcreate_markdown_loader[0m:[36m138[0m - [1mCreating local markdown loader[0m


Context loader created successfully


## Step 2: Create Sample Markdown Content

Let's create a sample markdown file to demonstrate the generator:

In [4]:
# Create sample markdown content
sample_content = """# Fair Forge Documentation

Fair Forge is a performance-measurement library for evaluating AI models and assistants.

## Key Features

The library provides comprehensive metrics for:
- **Fairness**: Measure bias across different demographic groups
- **Toxicity**: Detect harmful or offensive language
- **Conversational Quality**: Evaluate dialogue coherence and relevance
- **Context Adherence**: Check if responses align with provided context

## Getting Started

To get started with Fair Forge, install the package using pip:

```bash
pip install alquimia-fair-forge
```

Then create a retriever to load your test datasets and run metrics.

### Basic Usage

Here's a simple example of running the toxicity metric:

```python
from fair_forge.metrics import Toxicity

results = Toxicity.run(MyRetriever)
```

## Architecture

Fair Forge follows a modular architecture with the following components:

1. **Core**: Base classes and interfaces
2. **Metrics**: Individual metric implementations
3. **Runners**: Test execution against AI systems
4. **Storage**: Backend for test datasets and results

Each component can be extended to support custom implementations.
"""

# Save to file
sample_file = Path("./sample_docs.md")
sample_file.write_text(sample_content)
print(f"Sample content saved to: {sample_file}")

Sample content saved to: sample_docs.md


## Step 3: Load and Chunk Content

Let's see how the loader chunks the markdown content:

In [5]:
# Load and chunk the markdown file
chunks = loader.load(str(sample_file))

print(f"Created {len(chunks)} chunks:\n")
for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}: {chunk.chunk_id}")
    print(f"  Header: {chunk.metadata.get('header', 'N/A')}")
    print(f"  Method: {chunk.metadata.get('chunking_method', 'N/A')}")
    print(f"  Length: {len(chunk.content)} chars")
    print(f"  Preview: {chunk.content[:80]}...\n")

[32m2026-01-15 17:38:16.661[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:38:16.663[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: sample_docs.md[0m
[32m2026-01-15 17:38:16.665[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 5 total chunks from 1 file(s)[0m


Created 5 chunks:

Chunk 1: sample_docs_fair_forge_documentation
  Header: Fair Forge Documentation
  Method: header
  Length: 88 chars
  Preview: Fair Forge is a performance-measurement library for evaluating AI models and ass...

Chunk 2: sample_docs_key_features
  Header: Key Features
  Method: header
  Length: 309 chars
  Preview: The library provides comprehensive metrics for:
- **Fairness**: Measure bias acr...

Chunk 3: sample_docs_getting_started
  Header: Getting Started
  Method: header
  Length: 167 chars
  Preview: To get started with Fair Forge, install the package using pip:

```bash
pip inst...

Chunk 4: sample_docs_basic_usage
  Header: Basic Usage
  Method: header
  Length: 147 chars
  Preview: Here's a simple example of running the toxicity metric:

```python
from fair_for...

Chunk 5: sample_docs_architecture
  Header: Architecture
  Method: header
  Length: 335 chars
  Preview: Fair Forge follows a modular architecture with the following components:

1. **C...



## Step 4: Create Alquimia Generator

The `AlquimiaGenerator` wraps the Alquimia client as a LangChain-compatible model, allowing it to be used with the `BaseGenerator` interface.

In [6]:
# Create Alquimia generator using factory function
# NOTE: Set these environment variables or replace with your actual values
generator = create_alquimia_generator(
    base_url=os.getenv("ALQUIMIA_URL", "https://api.alquimia.ai"),
    api_key=os.getenv("ALQUIMIA_API_KEY", "your-api-key"),
    agent_id=os.getenv("ALQUIMIA_AGENT_ID", "your-agent-id"),
    channel_id=os.getenv("ALQUIMIA_CHANNEL_ID", "your-channel-id"),
)

print("Generator created successfully")
print(f"  Base URL: {generator.base_url}")
print(f"  Agent ID: {generator.agent_id}")

[32m2026-01-15 17:38:16.997[0m | [1mINFO    [0m | [36mfair_forge.generators[0m:[36mcreate_alquimia_generator[0m:[36m111[0m - [1mCreating Alquimia generator[0m


Generator created successfully
  Base URL: https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com
  Agent ID: test-generator


## Step 5: Generate Queries from Single Chunk

Let's generate queries for a single chunk first:

In [7]:
# Generate queries for a single chunk
async def generate_from_chunk():
    chunk = chunks[0]  # Use first chunk
    print(f"Generating queries for chunk: {chunk.chunk_id}")
    print(f"Content preview: {chunk.content[:100]}...\n")

    queries = await generator.generate_queries(
        chunk=chunk,
        num_queries=3,
    )

    print(f"Generated {len(queries)} queries:\n")
    for i, q in enumerate(queries, 1):
        print(f"{i}. {q.query}")
        print(f"   Difficulty: {q.difficulty}")
        print(f"   Type: {q.query_type}\n")

    return queries


# Execute (uncomment to run)
queries = await generate_from_chunk()

[32m2026-01-15 17:38:17.306[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m493[0m - [34m[1mGenerating 3 queries for chunk sample_docs_fair_forge_documentation[0m


Generating queries for chunk: sample_docs_fair_forge_documentation
Content preview: Fair Forge is a performance-measurement library for evaluating AI models and assistants....



2026-01-15 17:38:18,328 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/infer/chat/test-generator?chat_history=50&agentspace=_default "HTTP/1.1 200 OK"
2026-01-15 17:38:18,525 - httpx - INFO - HTTP Request: GET https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/stream/task-f9df89f3acc94af28814c3519ad0f867?response_only=true "HTTP/1.1 200 OK"
[32m2026-01-15 17:38:18.739[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m518[0m - [34m[1mGenerated 3 queries for chunk sample_docs_fair_forge_documentation[0m


Generated 3 queries:

1. What is the primary purpose of Fair Forge, a performance-measurement library?
   Difficulty: easy
   Type: factual

2. How might Fair Forge be used to evaluate the performance of an AI model that has been trained on a dataset of customer service chat logs?
   Difficulty: medium
   Type: analytical

3. Compare and contrast Fair Forge with other performance-measurement libraries used for AI model evaluation. What advantages does Fair Forge offer over its competitors?
   Difficulty: hard
   Type: comparative



## Step 6: Generate Complete Dataset

Generate a complete test dataset from all chunks:

In [8]:
# Generate complete dataset
async def generate_full_dataset():
    print("Generating complete dataset from markdown file...\n")

    # generate_dataset returns list[Dataset]
    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="test-assistant",
        num_queries_per_chunk=3,
        language="english",
    )

    # With default SequentialStrategy, we get one dataset
    dataset = datasets[0]

    print(f"Generated {len(datasets)} dataset(s):")
    print(f"  Session ID: {dataset.session_id}")
    print(f"  Assistant ID: {dataset.assistant_id}")
    print(f"  Language: {dataset.language}")
    print(f"  Total queries: {len(dataset.conversation)}")
    print(f"  Context length: {len(dataset.context)} chars\n")

    print("Sample queries:")
    for batch in dataset.conversation[:5]:
        print(f"  - [{batch.qa_id}] {batch.query}")

    return datasets


# Execute (uncomment to run)
datasets = await generate_full_dataset()

[32m2026-01-15 17:38:18.749[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: sample_docs.md[0m
[32m2026-01-15 17:38:18.749[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:38:18.750[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: sample_docs.md[0m
[32m2026-01-15 17:38:18.752[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 5 total chunks from 1 file(s)[0m
[32m2026-01-15 17:38:18.753[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 5 chunks from source[0m
[32m2026-01-15 17:38:18.754[0m | [1mINFO    [0m | [36mfair_forge.schema

Generating complete dataset from markdown file...



2026-01-15 17:38:19,761 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/infer/chat/test-generator?chat_history=50&agentspace=_default "HTTP/1.1 200 OK"
2026-01-15 17:38:19,966 - httpx - INFO - HTTP Request: GET https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/stream/task-d16f602bf06f4bedb33922b80710a41d?response_only=true "HTTP/1.1 200 OK"
[32m2026-01-15 17:38:20.275[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m518[0m - [34m[1mGenerated 3 queries for chunk sample_docs_fair_forge_documentation[0m
[32m2026-01-15 17:38:20.276[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m493[0m - [34m[1mGenerating 3 queries for chunk sample_docs_key_features[0m
2026-01-15 17:38:21,093 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-al

Generated 1 dataset(s):
  Session ID: 63ced9b6-0b65-4e87-8014-17d6dbe721cd
  Assistant ID: test-assistant
  Language: english
  Total queries: 15
  Context length: 1054 chars

Sample queries:
  - [sample_docs_fair_forge_documentation_q1] Explain how Fair Forge can be used to evaluate the performance of AI models.
  - [sample_docs_fair_forge_documentation_q2] What are some potential advantages of using Fair Forge over other performance-measurement libraries?
  - [sample_docs_fair_forge_documentation_q3] Describe a scenario where Fair Forge would be particularly useful for evaluating an AI assistant.
  - [sample_docs_key_features_q1] What are some of the key metrics provided by the library for evaluating AI performance?
  - [sample_docs_key_features_q2] How might the library's fairness metric be used to identify biases in a chatbot's responses to users from different age groups?


## Step 7: Generate with Seed Examples

Guide the query generation style using seed examples:

In [21]:
# Generate with seed examples for style guidance
async def generate_with_seeds():
    seed_examples = [
        "What are the main components of Fair Forge's architecture?",
        "How can I measure bias in my AI assistant's responses?",
        "What steps are needed to integrate Fair Forge with an existing pipeline?",
    ]

    print("Generating with seed examples...")
    print(f"Seed examples provided: {len(seed_examples)}\n")

    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="test-assistant",
        num_queries_per_chunk=2,
        language="english",
        seed_examples=seed_examples,
    )

    dataset = datasets[0]
    print(f"Generated {len(dataset.conversation)} queries")
    return datasets


# Execute (uncomment to run)
datasets_with_seeds = await generate_with_seeds()

[32m2026-01-15 17:42:43.776[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: sample_docs.md[0m
[32m2026-01-15 17:42:43.777[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:42:43.777[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: sample_docs.md[0m
[32m2026-01-15 17:42:43.778[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 5 total chunks from 1 file(s)[0m
[32m2026-01-15 17:42:43.779[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 5 chunks from source[0m
[32m2026-01-15 17:42:43.780[0m | [1mINFO    [0m | [36mfair_forge.schema

Generating with seed examples...
Seed examples provided: 3



2026-01-15 17:42:44,679 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/infer/chat/test-generator?chat_history=50&agentspace=_default "HTTP/1.1 200 OK"
2026-01-15 17:42:44,884 - httpx - INFO - HTTP Request: GET https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/stream/task-8a813f3f1e204cfcb1246fd452640279?response_only=true "HTTP/1.1 200 OK"
[32m2026-01-15 17:42:45.089[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m518[0m - [34m[1mGenerated 2 queries for chunk sample_docs_fair_forge_documentation[0m
[32m2026-01-15 17:42:45.090[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m493[0m - [34m[1mGenerating 2 queries for chunk sample_docs_key_features[0m
2026-01-15 17:42:46,112 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-al

Generated 10 queries


## Chunk Selection Strategies

Strategies control how chunks are selected and grouped during generation.

### RandomSamplingStrategy

Randomly samples chunks multiple times to create diverse test datasets:

In [16]:
async def generate_with_random_sampling():
    """Generate multiple datasets using random chunk sampling."""

    strategy = RandomSamplingStrategy(
        num_samples=2,  # Create 2 datasets
        chunks_per_sample=2,  # Each with 2 random chunks
        seed=42,  # For reproducibility
    )

    print(f"Strategy: {strategy}\n")

    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="test-assistant",
        num_queries_per_chunk=2,
        selection_strategy=strategy,
    )

    print(f"Generated {len(datasets)} datasets:\n")
    for i, ds in enumerate(datasets):
        print(f"Dataset {i+1}: {len(ds.conversation)} queries")

    return datasets


# Execute (uncomment to run)
random_datasets = await generate_with_random_sampling()

[32m2026-01-15 17:39:14.168[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: sample_docs.md[0m
[32m2026-01-15 17:39:14.170[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:39:14.171[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: sample_docs.md[0m
[32m2026-01-15 17:39:14.172[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 5 total chunks from 1 file(s)[0m
[32m2026-01-15 17:39:14.173[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 5 chunks from source[0m
[32m2026-01-15 17:39:14.174[0m | [1mINFO    [0m | [36mfair_forge.schema

Strategy: RandomSamplingStrategy(num_samples=2, chunks_per_sample=2, seed=42)



2026-01-15 17:39:15,195 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/infer/chat/test-generator?chat_history=50&agentspace=_default "HTTP/1.1 200 OK"
2026-01-15 17:39:15,375 - httpx - INFO - HTTP Request: GET https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/stream/task-d8fcf3509e084a2aab91508210c3d904?response_only=true "HTTP/1.1 200 OK"
[32m2026-01-15 17:39:15.556[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m518[0m - [34m[1mGenerated 2 queries for chunk sample_docs_fair_forge_documentation[0m
[32m2026-01-15 17:39:15.557[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_queries[0m:[36m493[0m - [34m[1mGenerating 2 queries for chunk sample_docs_architecture[0m
2026-01-15 17:39:16,455 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-al

Generated 2 datasets:

Dataset 1: 4 queries
Dataset 2: 4 queries


## Conversation Mode

Generate coherent multi-turn conversations where each question builds on the previous:

In [19]:
async def generate_conversations():
    """Generate coherent multi-turn conversations."""

    print("Generating conversations (each turn builds on the previous)...\n")

    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="test-assistant",
        num_queries_per_chunk=3,  # 3-turn conversations
        conversation_mode=True,  # Enable conversation mode
    )

    dataset = datasets[0]
    print(f"Generated {len(dataset.conversation)} conversation turns:\n")

    # Group by chunk to show conversation flow
    current_chunk = None
    for batch in dataset.conversation:
        chunk_id = batch.agentic.get("chunk_id", "N/A")
        turn_num = batch.agentic.get("turn_number", 0)

        if chunk_id != current_chunk:
            print(f"\n--- Conversation for: {chunk_id} ---")
            current_chunk = chunk_id

        print(f"  Turn {turn_num}: {batch.query}")

    return datasets


# Execute (uncomment to run)
conversation_datasets = await generate_conversations()

[32m2026-01-15 17:40:01.974[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: sample_docs.md[0m
[32m2026-01-15 17:40:01.975[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:40:01.976[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: sample_docs.md[0m
[32m2026-01-15 17:40:01.977[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 5 total chunks from 1 file(s)[0m
[32m2026-01-15 17:40:01.977[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 5 chunks from source[0m
[32m2026-01-15 17:40:01.978[0m | [1mINFO    [0m | [36mfair_forge.schema

Generating conversations (each turn builds on the previous)...



2026-01-15 17:40:03,293 - httpx - INFO - HTTP Request: POST https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/infer/chat/test-generator?chat_history=50&agentspace=_default "HTTP/1.1 200 OK"
2026-01-15 17:40:03,492 - httpx - INFO - HTTP Request: GET https://alquimia-hermes-alquimia-runtime.apps.rosa.alquimia.zvb4.p3.openshiftapps.com/event/stream/task-4d60dbf5d38348ed9a083cec55aa7c24?response_only=true "HTTP/1.1 200 OK"
[32m2026-01-15 17:40:03.804[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_conversation[0m:[36m575[0m - [34m[1mGenerated 1 turns for chunk sample_docs_fair_forge_documentation[0m
[32m2026-01-15 17:40:03.805[0m | [34m[1mDEBUG   [0m | [36mfair_forge.generators.alquimia_generator[0m:[36mgenerate_conversation[0m:[36m549[0m - [34m[1mGenerating 3-turn conversation for chunk sample_docs_key_features[0m
2026-01-15 17:40:04,643 - httpx - INFO - HTTP Request: POST https://

Generated 5 conversation turns:


--- Conversation for: sample_docs_fair_forge_documentation ---
  Turn 1: {
    "queries": [
        {
            "query": "What is the primary purpose of Fair Forge?",
            "difficulty": "easy",
            "query_type": "factual"
        },
        {
            "query": "How does Fair Forge contribute to the evaluation of AI models and assistants?",
            "difficulty": "medium",
            "query_type": "inferential"
        },
        {
            "query": "Design a scenario where Fair Forge could be used to compare the performance of two AI-powered virtual assistants.",
            "difficulty": "hard",
            "query_type": "analytical"
        }
    ],
    "chunk_summary": "Fair Forge is a performance-measurement library used for evaluating AI models and assistants, enabling accurate and comprehensive assessments of their capabilities."
}

--- Conversation for: sample_docs_key_features ---
  Turn 1: {
    "queries": [
        

## Note on Custom System Prompts

**Important:** The AlquimiaGenerator does not support custom system prompts in the same way as direct LangChain models, because the agent's system prompt is configured in the Alquimia workspace.

Instead, you can:
1. Use **seed examples** to guide the style of generated queries
2. Configure the agent's system prompt directly in your Alquimia workspace to accept `context`, `num_queries`, and `seed_examples` as template variables

For full control over the system prompt, use a LangChain model directly with `BaseGenerator` (see Groq example notebook).

## Step 8: Save Generated Dataset

Save the generated dataset to JSON for use with runners and metrics:

In [23]:
# Save generated dataset to JSON
async def save_dataset(dataset: Dataset, output_path: str):
    output_file = Path(output_path)

    with open(output_file, "w") as f:
        json.dump(dataset.model_dump(), f, indent=2)

    print(f"Dataset saved to: {output_file}")
    print(f"Total queries: {len(dataset.conversation)}")
    return output_file


# Example usage (uncomment after generating dataset)
await save_dataset(datasets[0], "./generated_tests.json")

Dataset saved to: generated_tests.json
Total queries: 15


PosixPath('generated_tests.json')

## Step 9: Integration with Runners

Use the generated dataset with Fair Forge runners:

In [14]:
# Example integration with runners
# from fair_forge.runners import AlquimiaRunner
# from fair_forge.storage import create_local_storage


async def run_generated_tests(dataset: Dataset):
    """
    Example of running generated tests against an AI assistant.

    Uncomment and configure to use.
    """
    # # Configure runner
    # runner = AlquimiaRunner(
    #     base_url=os.getenv("ALQUIMIA_URL"),
    #     api_key=os.getenv("ALQUIMIA_API_KEY"),
    #     agent_id=os.getenv("AGENT_ID"),
    #     channel_id=os.getenv("CHANNEL_ID"),
    # )
    #
    # # Run dataset
    # updated_dataset, summary = await runner.run_dataset(dataset)
    #
    # print(f"Completed: {summary['successes']}/{summary['total_batches']} passed")
    # return updated_dataset


print("Integration example ready (uncomment to use)")

Integration example ready (uncomment to use)


## Creating Custom Context Loaders

You can create custom context loaders by extending `BaseContextLoader`:

In [15]:
from fair_forge.schemas.generators import BaseContextLoader, Chunk


class JsonContextLoader(BaseContextLoader):
    """Example custom loader for JSON documents."""

    def load(self, source: str) -> list[Chunk]:
        """Load and chunk a JSON file."""
        import json
        from pathlib import Path

        path = Path(source)
        with open(path) as f:
            data = json.load(f)

        chunks = []
        # Example: each top-level key becomes a chunk
        for i, (key, value) in enumerate(data.items()):
            content = f"{key}: {json.dumps(value, indent=2)}"
            chunks.append(
                Chunk(
                    content=content,
                    chunk_id=f"json_{key}",
                    metadata={"key": key, "source": str(path)},
                )
            )

        return chunks


print("Custom JsonContextLoader defined")

Custom JsonContextLoader defined


## Cleanup

In [24]:
# Clean up sample files
if sample_file.exists():
    sample_file.unlink()
    print("Sample files cleaned up")

Sample files cleaned up
