# Fair Forge Generators - Groq Example

This notebook demonstrates how to use the Fair Forge generators module with **Groq Cloud** for ultra-fast synthetic test dataset generation.

## Overview

The `BaseGenerator` class accepts any LangChain-compatible chat model, including Groq's `ChatGroq`. This provides extremely fast inference for open-source LLMs.

### Why Groq?
- **Speed**: Up to 10x faster than traditional cloud providers
- **Cost**: Competitive pricing for high-volume usage
- **Models**: Access to popular open-source models (Llama 3, Mixtral, Gemma)

## Installation
First, install Fair Forge with Alquimia support and required dependencies.

In [None]:
!pip install "alquimia-fair-forge[generators]" langchain-groq -q

## Setup

In [1]:
import json
import time
from pathlib import Path

from dotenv import load_dotenv
from langchain_groq import ChatGroq

from fair_forge.generators import (
    BaseGenerator,
    # Strategies for chunk selection
    RandomSamplingStrategy,
    create_markdown_loader,
)

# Load environment variables
load_dotenv()

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


Imports loaded successfully


## Create Sample Content

Let's create a sample markdown document for testing:

In [2]:
sample_content = """# Machine Learning Fundamentals

This guide covers the basics of machine learning for beginners.

## Types of Machine Learning

Machine learning can be categorized into three main types:

### Supervised Learning
- Uses labeled training data
- Predicts outcomes based on input features
- Examples: Classification, Regression

### Unsupervised Learning
- Works with unlabeled data
- Discovers hidden patterns and structures
- Examples: Clustering, Dimensionality Reduction

### Reinforcement Learning
- Agent learns through interaction with environment
- Maximizes cumulative reward
- Examples: Game playing, Robotics

## Model Evaluation

Key metrics for evaluating ML models:

- **Accuracy**: Proportion of correct predictions
- **Precision**: True positives among predicted positives
- **Recall**: True positives among actual positives
- **F1 Score**: Harmonic mean of precision and recall

## Best Practices

1. Split data into train/validation/test sets
2. Use cross-validation for robust evaluation
3. Monitor for overfitting
4. Document your experiments
"""

# Save to file
sample_file = Path("./ml_fundamentals.md")
sample_file.write_text(sample_content)
print(f"Sample content saved to: {sample_file}")

Sample content saved to: ml_fundamentals.md


## Create Context Loader

In [3]:
# Create markdown loader
loader = create_markdown_loader(
    max_chunk_size=2000,
    header_levels=[1, 2, 3],
)

# Preview chunks
chunks = loader.load(str(sample_file))
print(f"Created {len(chunks)} chunks:\n")
for chunk in chunks:
    print(f"- {chunk.chunk_id}: {len(chunk.content)} chars")

[32m2026-01-15 17:10:17.654[0m | [1mINFO    [0m | [36mfair_forge.generators[0m:[36mcreate_markdown_loader[0m:[36m141[0m - [1mCreating local markdown loader[0m
[32m2026-01-15 17:10:17.656[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:10:17.657[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: ml_fundamentals.md[0m
[32m2026-01-15 17:10:17.660[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 7 total chunks from 1 file(s)[0m


Created 7 chunks:

- ml_fundamentals_machine_learning_fundamentals: 63 chars
- ml_fundamentals_types_of_machine_learning: 58 chars
- ml_fundamentals_supervised_learning: 111 chars
- ml_fundamentals_unsupervised_learning: 119 chars
- ml_fundamentals_reinforcement_learning: 116 chars
- ml_fundamentals_model_evaluation: 252 chars
- ml_fundamentals_best_practices: 147 chars


## Create Generator with Groq Model

The `BaseGenerator` accepts any LangChain-compatible chat model. Here we use `ChatGroq` from `langchain-groq`.

In [4]:
# Create Groq model using LangChain
model = ChatGroq(
    model="llama-3.1-8b-instant",  # Fast model for demos
    temperature=0.4,
    max_tokens=2048,
)

# Create generator with the model
generator = BaseGenerator(
    model=model,
    use_structured_output=True,
)

print(f"Generator created with model: {model.model_name}")

Generator created with model: llama-3.1-8b-instant


## Generate Test Dataset

Groq's fast inference makes generation very quick!

In [5]:
async def generate_dataset():
    print("Generating test dataset with Groq...\n")

    start_time = time.time()

    # generate_dataset returns list[Dataset]
    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="ml-assistant",
        num_queries_per_chunk=3,
        language="english",
    )

    elapsed = time.time() - start_time

    # With default SequentialStrategy, we get one dataset
    dataset = datasets[0]

    print(f"Generated {len(datasets)} dataset(s) in {elapsed:.2f} seconds:")
    print(f"  Session ID: {dataset.session_id}")
    print(f"  Total queries: {len(dataset.conversation)}\n")

    print("Generated queries:")
    for batch in dataset.conversation:
        difficulty = batch.agentic.get("difficulty", "N/A")
        query_type = batch.agentic.get("query_type", "N/A")
        print(f"  [{batch.qa_id}] ({difficulty}/{query_type})")
        print(f"    {batch.query}\n")

    return datasets


# Execute
datasets = await generate_dataset()
dataset = datasets[0]

[32m2026-01-15 17:10:18.210[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: ml_fundamentals.md[0m
[32m2026-01-15 17:10:18.211[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:10:18.212[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: ml_fundamentals.md[0m
[32m2026-01-15 17:10:18.213[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 7 total chunks from 1 file(s)[0m
[32m2026-01-15 17:10:18.214[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 7 chunks from source[0m
[32m2026-01-15 17:10:18.215[0m | [1mINFO    [0m | [36mfair_forg

Generating test dataset with Groq...



2026-01-15 17:10:19,305 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:19.321[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m415[0m - [34m[1mGenerated 3 queries for chunk ml_fundamentals_machine_learning_fundamentals[0m
[32m2026-01-15 17:10:19.321[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m384[0m - [34m[1mGenerating 3 queries for chunk ml_fundamentals_types_of_machine_learning[0m
2026-01-15 17:10:20,042 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:20.046[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m415[0m - [34m[1mGenerated 3 queries for chunk ml_fundamentals_types_of_machine_learning[0m
[32m2026-01-15 17:10:20.047[0m | [34m[1mDEBUG   [0m | [36mfair_fo

Generated 1 dataset(s) in 5.13 seconds:
  Session ID: e51451eb-f4f5-4917-885b-efadb4ecfe0d
  Total queries: 19

Generated queries:
  [ml_fundamentals_machine_learning_fundamentals_q1] (medium/factual)
    What are the fundamental concepts of machine learning?

  [ml_fundamentals_machine_learning_fundamentals_q2] (hard/application)
    How can you apply machine learning in real-world scenarios?

  [ml_fundamentals_machine_learning_fundamentals_q3] (medium/comparative)
    What are the key differences between supervised and unsupervised learning?

  [ml_fundamentals_types_of_machine_learning_q1] (easy/factual)
    What are the three main types of machine learning?

  [ml_fundamentals_types_of_machine_learning_q2] (medium/application)
    How can machine learning be applied in real-world scenarios?

  [ml_fundamentals_types_of_machine_learning_q3] (hard/comparative)
    What are the key differences between supervised and unsupervised machine learning?

  [ml_fundamentals_supervised_learni

## Generate with Seed Examples

In [6]:
async def generate_with_seeds():
    seed_examples = [
        "What is the difference between supervised and unsupervised learning?",
        "How do you prevent overfitting in a machine learning model?",
        "When should you use precision vs recall as your primary metric?",
    ]

    print("Generating with seed examples...\n")

    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="ml-assistant",
        num_queries_per_chunk=2,
        seed_examples=seed_examples,
    )

    dataset = datasets[0]
    print(f"Generated {len(dataset.conversation)} queries:")
    for batch in dataset.conversation[:5]:
        print(f"  - {batch.query}")

    return datasets


# Execute
datasets_with_seeds = await generate_with_seeds()

[32m2026-01-15 17:10:23.351[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: ml_fundamentals.md[0m
[32m2026-01-15 17:10:23.352[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:10:23.352[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: ml_fundamentals.md[0m
[32m2026-01-15 17:10:23.354[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 7 total chunks from 1 file(s)[0m
[32m2026-01-15 17:10:23.354[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 7 chunks from source[0m
[32m2026-01-15 17:10:23.355[0m | [1mINFO    [0m | [36mfair_forg

Generating with seed examples...



2026-01-15 17:10:23,829 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:23.831[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m415[0m - [34m[1mGenerated 2 queries for chunk ml_fundamentals_machine_learning_fundamentals[0m
[32m2026-01-15 17:10:23.832[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m384[0m - [34m[1mGenerating 2 queries for chunk ml_fundamentals_types_of_machine_learning[0m
2026-01-15 17:10:24,367 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:24.371[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m415[0m - [34m[1mGenerated 2 queries for chunk ml_fundamentals_types_of_machine_learning[0m
[32m2026-01-15 17:10:24.372[0m | [34m[1mDEBUG   [0m | [36mfair_fo

Generated 13 queries:
  - What are the primary types of machine learning algorithms for beginners?
  - How can you balance model complexity and overfitting in a machine learning model?
  - What are the three main types of machine learning?
  - How do the different types of machine learning impact model performance?
  - What are the key differences between supervised and unsupervised learning?


## Chunk Selection Strategies

Strategies control how chunks are selected and grouped during generation. By default, all chunks are processed sequentially into a single dataset.

### RandomSamplingStrategy

Randomly samples chunks multiple times to create diverse test datasets:

In [7]:
async def generate_with_random_sampling():
    """Generate multiple datasets using random chunk sampling."""

    # Create a strategy that samples 3 random chunks, 2 times
    strategy = RandomSamplingStrategy(
        num_samples=2,  # Create 2 datasets
        chunks_per_sample=3,  # Each with 3 random chunks
        seed=42,  # For reproducibility
    )

    print(f"Strategy: {strategy}\n")

    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="ml-assistant",
        num_queries_per_chunk=2,
        selection_strategy=strategy,
    )

    print(f"Generated {len(datasets)} datasets:\n")
    for i, ds in enumerate(datasets):
        print(f"Dataset {i+1}:")
        print(f"  Session: {ds.session_id[:8]}...")
        print(f"  Queries: {len(ds.conversation)}")
        # Show chunk IDs from the queries
        chunk_ids = set(b.agentic.get("chunk_id", "N/A") for b in ds.conversation)
        print(f"  Chunks: {chunk_ids}\n")

    return datasets


# Execute
random_datasets = await generate_with_random_sampling()

[32m2026-01-15 17:10:27.021[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: ml_fundamentals.md[0m
[32m2026-01-15 17:10:27.023[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:10:27.025[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: ml_fundamentals.md[0m
[32m2026-01-15 17:10:27.027[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 7 total chunks from 1 file(s)[0m
[32m2026-01-15 17:10:27.028[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 7 chunks from source[0m
[32m2026-01-15 17:10:27.029[0m | [1mINFO    [0m | [36mfair_forg

Strategy: RandomSamplingStrategy(num_samples=2, chunks_per_sample=3, seed=42)



2026-01-15 17:10:27,517 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:27.521[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m415[0m - [34m[1mGenerated 2 queries for chunk ml_fundamentals_model_evaluation[0m
[32m2026-01-15 17:10:27.522[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m384[0m - [34m[1mGenerating 2 queries for chunk ml_fundamentals_machine_learning_fundamentals[0m
2026-01-15 17:10:27,949 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:27.952[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_queries[0m:[36m415[0m - [34m[1mGenerated 2 queries for chunk ml_fundamentals_machine_learning_fundamentals[0m
[32m2026-01-15 17:10:27.953[0m | [34m[1mDEBUG   [0m | [36mfair_forge.s

Generated 2 datasets:

Dataset 1:
  Session: c2356905...
  Queries: 6
  Chunks: {'ml_fundamentals_model_evaluation', 'ml_fundamentals_machine_learning_fundamentals', 'ml_fundamentals_best_practices'}

Dataset 2:
  Session: 6541f97c...
  Queries: 5
  Chunks: {'ml_fundamentals_model_evaluation', 'ml_fundamentals_supervised_learning', 'ml_fundamentals_types_of_machine_learning'}



## Conversation Mode

Instead of generating independent queries, conversation mode creates coherent multi-turn conversations where each question builds on the previous ones:

In [8]:
async def generate_conversations():
    """Generate coherent multi-turn conversations."""

    print("Generating conversations (each turn builds on the previous)...\n")

    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="ml-assistant",
        num_queries_per_chunk=3,  # 3-turn conversations
        conversation_mode=True,  # Enable conversation mode
    )

    dataset = datasets[0]
    print(f"Generated {len(dataset.conversation)} conversation turns:\n")

    # Group by chunk to show conversation flow
    current_chunk = None
    for batch in dataset.conversation:
        chunk_id = batch.agentic.get("chunk_id", "N/A")
        turn_num = batch.agentic.get("turn_number", 0)
        builds_on = batch.agentic.get("builds_on", None)

        if chunk_id != current_chunk:
            print(f"\n--- Conversation for chunk: {chunk_id} ---")
            current_chunk = chunk_id

        print(f"  Turn {turn_num}: {batch.query}")
        if builds_on:
            print(f"         (builds on: {builds_on})")

    return datasets


# Execute
conversation_datasets = await generate_conversations()

[32m2026-01-15 17:10:30.401[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: ml_fundamentals.md[0m
[32m2026-01-15 17:10:30.403[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:10:30.404[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: ml_fundamentals.md[0m
[32m2026-01-15 17:10:30.404[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 7 total chunks from 1 file(s)[0m
[32m2026-01-15 17:10:30.405[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 7 chunks from source[0m
[32m2026-01-15 17:10:30.405[0m | [1mINFO    [0m | [36mfair_forg

Generating conversations (each turn builds on the previous)...



2026-01-15 17:10:30,987 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:30.991[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_conversation[0m:[36m482[0m - [34m[1mGenerated 3 turns for chunk ml_fundamentals_machine_learning_fundamentals[0m
[32m2026-01-15 17:10:30.992[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_conversation[0m:[36m444[0m - [34m[1mGenerating 3-turn conversation for chunk ml_fundamentals_types_of_machine_learning[0m
2026-01-15 17:10:31,582 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:31.587[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_conversation[0m:[36m482[0m - [34m[1mGenerated 3 turns for chunk ml_fundamentals_types_of_machine_learning[0m
[32m2026-01-15 17:10:31.588[0m | [34m[1mDEBUG 

Generated 21 conversation turns:


--- Conversation for chunk: ml_fundamentals_machine_learning_fundamentals ---
  Turn 1: What is machine learning?
  Turn 2: Can you give an example of how machine learning is used in real life?
         (builds on: ml_fundamentals_machine_learning_fundamentals_t1)
  Turn 3: How does machine learning differ from traditional programming?
         (builds on: ml_fundamentals_machine_learning_fundamentals_t2)

--- Conversation for chunk: ml_fundamentals_types_of_machine_learning ---
  Turn 1: What are the main categories of machine learning?
  Turn 2: Can you explain supervised learning in more detail?
         (builds on: ml_fundamentals_types_of_machine_learning_t1)
  Turn 3: How does supervised learning compare to unsupervised learning in terms of data requirements?
         (builds on: ml_fundamentals_types_of_machine_learning_t2)

--- Conversation for chunk: ml_fundamentals_supervised_learning ---
  Turn 1: What is the main purpose of using labeled t

### Combined: Random Sampling + Conversation Mode

You can combine strategies with conversation mode to create diverse conversation-based test sets:

In [9]:
async def generate_random_conversations():
    """Combine random sampling with conversation mode."""

    strategy = RandomSamplingStrategy(
        num_samples=2,
        chunks_per_sample=2,
        seed=42,
    )

    print("Generating 2 datasets with 2 random chunks each (conversation mode)...\n")

    datasets = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="ml-assistant",
        num_queries_per_chunk=2,  # 2-turn conversations
        selection_strategy=strategy,
        conversation_mode=True,
    )

    for i, ds in enumerate(datasets):
        print(f"Dataset {i+1} ({len(ds.conversation)} turns):")
        for batch in ds.conversation[:4]:  # Show first 4 turns
            chunk = batch.agentic.get("chunk_id", "N/A")[:15]
            turn = batch.agentic.get("turn_number", 0)
            print(f"  [{chunk}] T{turn}: {batch.query[:50]}...")
        print()

    return datasets


# Execute
combined_datasets = await generate_random_conversations()

[32m2026-01-15 17:10:35.309[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m525[0m - [1mLoading context from: ml_fundamentals.md[0m
[32m2026-01-15 17:10:35.310[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m275[0m - [1mLoading 1 markdown file(s)[0m
[32m2026-01-15 17:10:35.311[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36m_load_single_file[0m:[36m137[0m - [1mLoading markdown file: ml_fundamentals.md[0m
[32m2026-01-15 17:10:35.312[0m | [1mINFO    [0m | [36mfair_forge.generators.context_loaders.local_markdown[0m:[36mload[0m:[36m282[0m - [1mCreated 7 total chunks from 1 file(s)[0m
[32m2026-01-15 17:10:35.313[0m | [1mINFO    [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_dataset[0m:[36m527[0m - [1mLoaded 7 chunks from source[0m
[32m2026-01-15 17:10:35.313[0m | [1mINFO    [0m | [36mfair_forg

Generating 2 datasets with 2 random chunks each (conversation mode)...



2026-01-15 17:10:36,016 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:36.019[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_conversation[0m:[36m482[0m - [34m[1mGenerated 2 turns for chunk ml_fundamentals_model_evaluation[0m
[32m2026-01-15 17:10:36.020[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_conversation[0m:[36m444[0m - [34m[1mGenerating 2-turn conversation for chunk ml_fundamentals_machine_learning_fundamentals[0m
2026-01-15 17:10:36,733 - httpx - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[32m2026-01-15 17:10:36.736[0m | [34m[1mDEBUG   [0m | [36mfair_forge.schemas.generators[0m:[36mgenerate_conversation[0m:[36m482[0m - [34m[1mGenerated 2 turns for chunk ml_fundamentals_machine_learning_fundamentals[0m
[32m2026-01-15 17:10:36.738[0m | [1mINFO    [0m | 

Dataset 1 (4 turns):
  [ml_fundamentals] T1: What are the key metrics used to evaluate machine ...
  [ml_fundamentals] T2: Can you explain the difference between precision a...
  [ml_fundamentals] T1: What is machine learning?...
  [ml_fundamentals] T2: How does it differ from traditional programming?...

Dataset 2 (4 turns):
  [ml_fundamentals] T1: What is machine learning?...
  [ml_fundamentals] T2: How does it differ from traditional programming?...
  [ml_fundamentals] T1: What are the key metrics used to evaluate ML model...
  [ml_fundamentals] T2: Can you explain the difference between precision a...



## Save Generated Dataset

In [10]:
# Save dataset to JSON
output_file = Path("./generated_tests_groq.json")
with open(output_file, "w") as f:
    json.dump(dataset.model_dump(), f, indent=2)

print(f"Dataset saved to: {output_file}")

Dataset saved to: generated_tests_groq.json


## Available Groq Models

Check [Groq Console](https://console.groq.com/docs/models) for the latest available models.

## Cleanup

In [11]:
# Clean up sample files
if sample_file.exists():
    sample_file.unlink()
if output_file.exists():
    output_file.unlink()
print("Cleanup completed")

Cleanup completed
