# DKW Benchmark Dataset Collection

**Artifact:** dataset_001 (data.py)

This notebook demonstrates how to collect benchmark data for DKW controller evaluation using the GSM8K dataset from HuggingFace.

## Overview

This notebook:
1. Loads data from the GSM8K dataset (Grade School Math 8K)
2. Processes the first 200 test examples 
3. Creates a structured dataset with questions, answers, and difficulty scores
4. Displays the collected data for analysis

The notebook is completely self-contained and doesn't require any external files.

In [None]:
"""Dataset collection script for DKW benchmark."""
import json
from datasets import load_dataset

## Data Collection Function

The `collect_data()` function:
- Loads the GSM8K dataset from HuggingFace
- Takes the first 200 examples from the test split
- Creates a structured format with ID, question, answer, and difficulty score
- Uses question length as a simple proxy for difficulty

In [None]:
def collect_data():
    """Collect benchmark data for DKW controller evaluation."""
    # Load HuggingFace dataset
    ds = load_dataset("gsm8k", "main", split="test[:200]")

    data = []
    for i, example in enumerate(ds):
        data.append({
            "id": f"example_{i:03d}",
            "question": example["question"],
            "answer": example["answer"],
            "difficulty": len(example["question"]) / 100,  # Simple proxy
        })

    return data

## Execute Data Collection

Run the data collection function and display basic statistics about the collected data.

In [None]:
# Collect the data
data = collect_data()

# Display statistics
print(f"Collected {len(data)} examples")
print(f"Average difficulty: {sum(item['difficulty'] for item in data) / len(data):.2f}")
print(f"Min difficulty: {min(item['difficulty'] for item in data):.2f}")
print(f"Max difficulty: {max(item['difficulty'] for item in data):.2f}")

## Sample Data

Let's look at the first few examples to understand the data structure:

In [None]:
# Display first 3 examples
for i in range(min(3, len(data))):
    example = data[i]
    print(f"Example {i+1}:")
    print(f"  ID: {example['id']}")
    print(f"  Question: {example['question']}")
    print(f"  Answer: {example['answer']}")
    print(f"  Difficulty: {example['difficulty']:.2f}")
    print()

## Reference: Expected Output Format

The original script would save data to `data_out.json`. Here's an example of what the output format looks like (inlined for reference):

In [None]:
# Example output format (inlined from data_out.json)
sample_output = [
    {
        "id": "example_000",
        "question": "What is 2+2?",
        "answer": "4",
        "difficulty": 0.15
    },
    {
        "id": "example_001", 
        "question": "If x=5, what is 2x?",
        "answer": "10",
        "difficulty": 0.22
    },
    {
        "id": "example_002",
        "question": "Solve: 3y + 6 = 15",
        "answer": "y=3",
        "difficulty": 0.28
    }
]

print("Sample output format:")
print(json.dumps(sample_output, indent=2))

## Optional: Save to File

If you want to save the collected data to a JSON file (like the original script), uncomment and run the cell below:

In [None]:
# Uncomment the lines below to save data to file
# with open("data_out.json", "w") as f:
#     json.dump(data, f, indent=2)
# print(f"Data saved to data_out.json with {len(data)} examples")