# Fair Forge Generators - Groq Example

This notebook demonstrates how to use the Fair Forge generators module with **Groq Cloud** for ultra-fast synthetic test dataset generation.

## Overview

The `GroqGenerator` uses LangChain to interact with Groq's inference API, which provides extremely fast inference for models like Llama, Mixtral, and Gemma.

### Why Groq?
- **Speed**: Up to 10x faster than traditional cloud providers
- **Cost**: Competitive pricing for high-volume usage
- **Models**: Access to popular open-source models (Llama 3, Mixtral)

## Setup

1. Get your free API key from [Groq Console](https://console.groq.com/)

2. Set your Groq API key as an environment variable:

```bash
export GROQ_API_KEY="your-api-key"
```

Or create a `.env` file:
```.env
GROQ_API_KEY=your-api-key
```

3. Install required dependencies:
```bash
uv pip install .[generators]
uv pip install langchain-groq python-dotenv
```

## Imports

In [None]:
import os
import json
from pathlib import Path
from dotenv import load_dotenv

from fair_forge.generators import (
    create_groq_generator,
    create_markdown_loader,
    GroqGenerator,
)
from fair_forge.schemas import Dataset, Batch

# Load environment variables
load_dotenv()

print("Imports loaded successfully")

## Create Sample Content

Let's create a sample markdown document for testing:

In [None]:
sample_content = """# Machine Learning Fundamentals

This guide covers the basics of machine learning for beginners.

## Types of Machine Learning

Machine learning can be categorized into three main types:

### Supervised Learning
- Uses labeled training data
- Predicts outcomes based on input features
- Examples: Classification, Regression

### Unsupervised Learning
- Works with unlabeled data
- Discovers hidden patterns and structures
- Examples: Clustering, Dimensionality Reduction

### Reinforcement Learning
- Agent learns through interaction with environment
- Maximizes cumulative reward
- Examples: Game playing, Robotics

## Model Evaluation

Key metrics for evaluating ML models:

- **Accuracy**: Proportion of correct predictions
- **Precision**: True positives among predicted positives
- **Recall**: True positives among actual positives
- **F1 Score**: Harmonic mean of precision and recall

## Best Practices

1. Split data into train/validation/test sets
2. Use cross-validation for robust evaluation
3. Monitor for overfitting
4. Document your experiments
"""

# Save to file
sample_file = Path("./ml_fundamentals.md")
sample_file.write_text(sample_content)
print(f"Sample content saved to: {sample_file}")

## Create Context Loader

In [None]:
# Create markdown loader
loader = create_markdown_loader(
    max_chunk_size=2000,
    header_levels=[1, 2, 3],
)

# Preview chunks
chunks = loader.load(str(sample_file))
print(f"Created {len(chunks)} chunks:\n")
for chunk in chunks:
    print(f"- {chunk.chunk_id}: {len(chunk.content)} chars")

## Create Groq Generator

The generator reads the API key from the `GROQ_API_KEY` environment variable.

In [None]:
# Create Groq generator with Llama 3.1 70B (recommended for quality)
generator = create_groq_generator(
    model_name="llama-3.1-70b-versatile",
    temperature=0.7,
    max_tokens=2048,
    use_structured_output=True,
)

print(f"Groq generator created with model: {generator.model_name}")

## Generate Test Dataset

Groq's fast inference makes generation very quick!

In [None]:
import time

async def generate_dataset():
    print("Generating test dataset with Groq...\n")
    
    start_time = time.time()
    
    dataset = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="ml-assistant",
        num_queries_per_chunk=3,
        language="english",
    )
    
    elapsed = time.time() - start_time
    
    print(f"Generated dataset in {elapsed:.2f} seconds:")
    print(f"  Session ID: {dataset.session_id}")
    print(f"  Total queries: {len(dataset.conversation)}\n")
    
    print("Generated queries:")
    for batch in dataset.conversation:
        difficulty = batch.agentic.get('difficulty', 'N/A')
        query_type = batch.agentic.get('query_type', 'N/A')
        print(f"  [{batch.qa_id}] ({difficulty}/{query_type})")
        print(f"    {batch.query}\n")
    
    return dataset

# Execute
dataset = await generate_dataset()

## Generate with Seed Examples

In [None]:
async def generate_with_seeds():
    seed_examples = [
        "What is the difference between supervised and unsupervised learning?",
        "How do you prevent overfitting in a machine learning model?",
        "When should you use precision vs recall as your primary metric?",
    ]
    
    print("Generating with seed examples...\n")
    
    dataset = await generator.generate_dataset(
        context_loader=loader,
        source=str(sample_file),
        assistant_id="ml-assistant",
        num_queries_per_chunk=2,
        seed_examples=seed_examples,
    )
    
    print(f"Generated {len(dataset.conversation)} queries:")
    for batch in dataset.conversation[:5]:
        print(f"  - {batch.query}")
    
    return dataset

# Execute
dataset_with_seeds = await generate_with_seeds()

## Save Generated Dataset

In [None]:
# Save dataset to JSON
output_file = Path("./generated_tests_groq.json")
with open(output_file, "w") as f:
    json.dump(dataset.model_dump(), f, indent=2)

print(f"Dataset saved to: {output_file}")

## Available Groq Models

| Model | Context | Best For |
|-------|---------|----------|
| `llama-3.1-70b-versatile` | 128K | High quality, complex tasks |
| `llama-3.1-8b-instant` | 128K | Fast, simple tasks |
| `llama3-groq-70b-8192-tool-use-preview` | 8K | Tool use |
| `mixtral-8x7b-32768` | 32K | Balanced performance |
| `gemma2-9b-it` | 8K | Compact, efficient |

In [None]:
# Example: Use Llama 3.1 8B for faster generation
# generator_fast = create_groq_generator(
#     model_name="llama-3.1-8b-instant",
#     temperature=0.7,
# )

# Example: Use Mixtral for longer context
# generator_mixtral = create_groq_generator(
#     model_name="mixtral-8x7b-32768",
#     temperature=0.5,
# )

## Speed Comparison

Groq is known for its extremely fast inference. Here's a quick benchmark:

In [None]:
import time

async def benchmark_generation():
    """Benchmark generation speed."""
    times = []
    
    for i in range(3):
        start = time.time()
        await generator.generate_queries(
            chunk=chunks[0],
            num_queries=3,
        )
        elapsed = time.time() - start
        times.append(elapsed)
        print(f"Run {i+1}: {elapsed:.2f}s")
    
    avg = sum(times) / len(times)
    print(f"\nAverage: {avg:.2f}s per chunk (3 queries)")

# Execute
await benchmark_generation()

## Cleanup

In [None]:
# Clean up sample files
if sample_file.exists():
    sample_file.unlink()
if output_file.exists():
    output_file.unlink()
print("Cleanup completed")