## üõ†Ô∏è Customization & Next Steps

### Easy Modifications:
1. **Add more examples**: Extend the `sample_dataset` list with new questions/answers
2. **Change difficulty calculation**: Modify the difficulty formula in `collect_data()`
3. **Add metadata**: Include additional fields like topic, source, etc.
4. **Filter examples**: Add conditions to process only certain types of questions

### Example: Add a new question
```python
sample_dataset.append({
    "question": "What is 5 √ó 7?",
    "answer": "5 √ó 7 = 35"
})
```

### Example: Custom difficulty metric
```python
# Replace: difficulty = len(example["question"]) / 100
# With: difficulty = len(example["question"].split()) / 20  # Based on word count
```

---
**üéâ Notebook Complete!** This self-contained version replaces the original script's HuggingFace dependency and file I/O with inline data and interactive output.

In [None]:
# Display the JSON output (replaces writing to file)
print("üìÑ Final JSON structure (first 2 examples):")
print("=" * 50)

# Show formatted JSON for first 2 examples
sample_output = processed_data[:2]
formatted_json = json.dumps(sample_output, indent=2)
print(formatted_json)

print("=" * 50)
print(f"\nüí° In the original script, this data would be saved to 'data_out.json'")
print(f"üìà Total examples processed: {len(processed_data)}")

# Simulate the original file saving behavior (commented out)
# with open("data_out.json", "w") as f:
#     json.dump(processed_data, f, indent=2)
# print(f"Collected {len(processed_data)} examples")

## üíæ Output Format

This shows the final JSON structure that would be saved to `data_out.json`:

In [None]:
# Process the full dataset
processed_data = collect_data()

print(f"‚úÖ Collected {len(processed_data)} examples")
print("\nüìä Dataset Summary:")
print(f"   Total examples: {len(processed_data)}")
print(f"   Average difficulty: {sum(item['difficulty'] for item in processed_data) / len(processed_data):.3f}")
print(f"   Min difficulty: {min(item['difficulty'] for item in processed_data):.3f}")
print(f"   Max difficulty: {max(item['difficulty'] for item in processed_data):.3f}")

print("\nüîç Sample processed examples:")
for i, item in enumerate(processed_data[:3]):
    print(f"\n   [{i+1}] {item['id']}")
    print(f"       Question: {item['question']}")
    print(f"       Answer: {item['answer'][:60]}{'...' if len(item['answer']) > 60 else ''}")
    print(f"       Difficulty: {item['difficulty']:.3f}")

## üöÄ Process Full Dataset

Now let's process all examples and see the results!

In [None]:
def collect_data(dataset: List[Dict[str, Any]] = None) -> List[Dict[str, Any]]:
    """
    Collect benchmark data for DKW controller evaluation.
    
    Args:
        dataset: Input dataset examples (uses sample_dataset if None)
    
    Returns:
        List of processed examples with IDs, questions, answers, and difficulty scores
    """
    if dataset is None:
        dataset = sample_dataset
    
    data = []
    for i, example in enumerate(dataset):
        data.append({
            "id": f"example_{i:03d}",
            "question": example["question"],
            "answer": example["answer"], 
            "difficulty": len(example["question"]) / 100,  # Simple proxy metric
        })
    
    return data

# Test the function with a single example
test_example = [{"question": "What is 1+1?", "answer": "1+1=2"}]
test_result = collect_data(test_example)
print("üß™ Function test successful!")
print(f"   Input: {test_example[0]['question']}")
print(f"   Output ID: {test_result[0]['id']}")
print(f"   Difficulty: {test_result[0]['difficulty']:.2f}")

## üîß Data Processing Function

The `collect_data()` function processes raw examples and adds:
- **Unique IDs** for tracking
- **Difficulty scores** based on question length (simple proxy)
- **Structured format** for benchmark evaluation

In [None]:
# Sample dataset (simulates HuggingFace GSM8K dataset structure)
# This replaces: ds = load_dataset("gsm8k", "main", split="test[:200]")

sample_dataset = [
    {
        "question": "What is 2+2?",
        "answer": "To find 2+2, I add the numbers: 2 + 2 = 4"
    },
    {
        "question": "If x=5, what is 2x?", 
        "answer": "If x = 5, then 2x = 2 * 5 = 10"
    },
    {
        "question": "Solve: 3y + 6 = 15",
        "answer": "To solve 3y + 6 = 15, I subtract 6 from both sides: 3y = 9. Then divide by 3: y = 3"
    },
    {
        "question": "A store has 24 apples. If they sell 3/4 of them, how many apples are left?",
        "answer": "3/4 of 24 apples = (3/4) * 24 = 18 apples sold. Remaining apples = 24 - 18 = 6 apples"
    },
    {
        "question": "Calculate the area of a rectangle with length 8 cm and width 5 cm.",
        "answer": "Area of rectangle = length √ó width = 8 cm √ó 5 cm = 40 square cm"
    }
]

print(f"üìã Loaded {len(sample_dataset)} sample examples")
print("üîç Preview of first example:")
print(f"   Question: {sample_dataset[0]['question']}")
print(f"   Answer: {sample_dataset[0]['answer'][:50]}...")

## üìä Sample Dataset

Instead of loading from HuggingFace (which would require internet access), we'll use inline sample data.  
This simulates the GSM8K dataset structure that would normally be loaded.

In [None]:
"""Dataset collection script for DKW benchmark."""
import json
from typing import List, Dict, Any

print("üì¶ Imports loaded successfully!")
print("üéØ Ready to process dataset examples")

## Overview

This notebook demonstrates how to:
1. **Load and process dataset examples** for benchmark evaluation
2. **Calculate difficulty metrics** based on question complexity
3. **Structure data** for downstream DKW controller testing

**Key Features:**
- ‚úÖ Self-contained (no external files required)
- ‚úÖ Interactive and modifiable
- ‚úÖ Well-documented with examples

---

# Dataset Collection for DKW Benchmark
## data.py - Interactive Demo

This notebook demonstrates the dataset collection script for DKW controller evaluation. It has been converted from the original Python script to be completely self-contained and interactive.