## Customization Instructions

This notebook is completely self-contained and can be easily modified:

### To add more data:
1. Extend the `sample_dataset` list with additional question-answer pairs
2. Re-run the processing cells

### To change the difficulty calculation:
1. Modify the difficulty calculation in the `collect_data()` function
2. Current method: `len(example["question"]) / 100` (question length proxy)
3. Alternative methods could include: word count, complexity analysis, etc.

### To save output to file:
Uncomment and modify this code in any cell:
```python
# with open("data_out.json", "w") as f:
#     json.dump(processed_data, f, indent=2)
```

### To use with real HuggingFace datasets:
1. Install: `pip install datasets`
2. Replace the `sample_dataset` with: `load_dataset("gsm8k", "main", split="test[:200]")`
3. Add back the original imports

In [None]:
# Expected output format from original requirements
expected_output = [
    {
        "id": "example_000",
        "question": "What is 2+2?",
        "answer": "4",
        "difficulty": 0.15
    },
    {
        "id": "example_001",
        "question": "If x=5, what is 2x?",
        "answer": "10",
        "difficulty": 0.22
    },
    {
        "id": "example_002",
        "question": "Solve: 3y + 6 = 15",
        "answer": "y=3",
        "difficulty": 0.28
    }
]

print("Expected output format:")
print(json.dumps(expected_output, indent=2))

## Expected Output Reference

For comparison, here's the expected output format (from the original artifact requirements):

In [None]:
# In the original script, this would save to a file:
# with open("data_out.json", "w") as f:
#     json.dump(data, f, indent=2)

# Instead, let's display the JSON output that would have been saved
print("JSON output (would be saved to data_out.json):")
print("=" * 50)
print(json.dumps(processed_data, indent=2))

print(f"\nOriginal script would output: 'Collected {len(processed_data)} examples'")

## JSON Output

In the original script, the processed data would be saved to `data_out.json`. Here's what that output would look like:

In [None]:
# Display all processed data
print("All processed examples:")
print("=" * 50)

for item in processed_data:
    print(f"\nID: {item['id']}")
    print(f"Difficulty: {item['difficulty']:.2f}")
    print(f"Question: {item['question'][:100]}{'...' if len(item['question']) > 100 else ''}")
    print(f"Answer: {item['answer'][:100]}{'...' if len(item['answer']) > 100 else ''}")
    print("-" * 30)

## View All Processed Data

Let's examine all the processed examples and their difficulty scores:

In [None]:
def collect_data(dataset):
    """Collect benchmark data for DKW controller evaluation."""
    # Modified to work with inline data instead of HuggingFace datasets
    # Original: ds = load_dataset("gsm8k", "main", split="test[:200]")
    
    data = []
    for i, example in enumerate(dataset):
        data.append({
            "id": f"example_{i:03d}",
            "question": example["question"],
            "answer": example["answer"],
            "difficulty": len(example["question"]) / 100,  # Simple proxy based on question length
        })

    return data

# Test the function with our sample data
processed_data = collect_data(sample_dataset)
print(f"Processed {len(processed_data)} examples")
print("\nFirst example:")
print(json.dumps(processed_data[0], indent=2))

## Data Processing Function

The `collect_data()` function processes the raw dataset and adds metadata like difficulty scores:

In [None]:
# Sample data that mimics the GSM8K dataset structure
# In the original script, this would come from: load_dataset("gsm8k", "main", split="test[:200]")
sample_dataset = [
    {
        "question": "Janet's ducks lay 16 eggs per day. She eats 3 for breakfast every morning and bakes 4 into muffins for her friends every day. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day?",
        "answer": "Janet's ducks lay 16 eggs per day.\nShe eats 3 for breakfast every morning.\nShe bakes 4 into muffins for her friends every day.\nSo she uses 3 + 4 = 7 eggs.\nShe has 16 - 7 = 9 eggs left to sell.\nShe sells them for $2 each, so she makes 9 * $2 = $18 every day.\n#### 18"
    },
    {
        "question": "A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts of fiber does it take?",
        "answer": "It takes 2 bolts of blue fiber.\nIt takes half that much white fiber, so 2 / 2 = 1 bolt of white fiber.\nSo in total it takes 2 + 1 = 3 bolts of fiber.\n#### 3"
    },
    {
        "question": "Josh decides to try flipping a house. He buys a house for $80,000 and then puts in $50,000 in repairs. This increased the value of the house by 150%. How much profit did he make?",
        "answer": "He bought the house for $80,000 and put in $50,000 in repairs for a total cost of 80,000 + 50,000 = $130,000.\nThe value increased by 150%, so the new value is 80,000 * 2.5 = $200,000.\nSo he made a profit of 200,000 - 130,000 = $70,000.\n#### 70000"
    },
    {
        "question": "James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many meters does he run a week?",
        "answer": "He runs 3 sprints 3 times a week so he runs 3*3 = 9 sprints a week.\nEach sprint is 60 meters so he runs 9*60 = 540 meters a week.\n#### 540"
    },
    {
        "question": "Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms, and vegetables. She gives the chickens their feed in three separate meals. How many cups of feed does she give the chickens in the first meal of the day?",
        "answer": "If each chicken gets 3 cups of feed per day, and the feed is given in 3 meals, then each chicken gets 3/3 = 1 cup of feed per meal.\nThe question asks about the first meal of the day for all the chickens, but doesn't specify how many chickens there are. However, the question seems to be asking about the per-chicken amount in the first meal.\n#### 1"
    }
]

print(f"Sample dataset contains {len(sample_dataset)} examples")

## Sample Data

Instead of loading from HuggingFace datasets, we'll use inline sample data that mimics the structure of the GSM8K dataset:

In [None]:
"""Dataset collection script for DKW benchmark."""
import json
# Note: We've removed the 'datasets' import as we're using inline data instead

## Overview

This notebook processes mathematical question-answer data for benchmarking purposes. Instead of loading data from external HuggingFace datasets, we'll use inline sample data to make this notebook completely self-contained.

The original script would:
1. Load data from the GSM8K dataset
2. Process each example to add metadata like difficulty scores
3. Save the processed data to a JSON file

In this notebook version, we'll demonstrate the same functionality with sample data.

# Dataset Collection Script for DKW Benchmark

**Artifact ID:** dataset_001  
**Original File:** data.py

This notebook demonstrates dataset collection and processing for DKW controller evaluation. The original script has been converted to be completely self-contained with inline sample data.