## Usage Instructions & Customization

### How to Use This Notebook
1. **Run all cells sequentially** - Click "Run All" or execute each cell with Shift+Enter
2. **Modify the data** - Edit the `sample_raw_data` list to add your own questions and answers
3. **Adjust difficulty calculation** - Modify the difficulty formula in the `collect_data()` function
4. **Export data** - The `json_output` variable contains the formatted JSON string

### Customization Options
- **Add more examples**: Extend the `sample_raw_data` list
- **Change difficulty metric**: Modify the calculation in `collect_data()`
- **Add new fields**: Extend the data structure with additional metadata
- **Save to file**: Uncomment and modify the file writing code if needed

### Original vs. Self-Contained Version
- âœ… **Original**: Used HuggingFace datasets library and external files
- âœ… **This version**: Uses inlined data for complete self-containment
- âœ… **Maintained**: All core functionality and data structure
- âœ… **Added**: Interactive analysis and visualization

This notebook is now completely **runnable without any external dependencies** beyond standard Python libraries!

In [None]:
# Display all processed examples in a readable format
print("ðŸ“Š PROCESSED DATASET SUMMARY")
print("=" * 50)

for i, item in enumerate(data_output):
    print(f"\nExample {i+1}: {item['id']}")
    print(f"Question: {item['question']}")
    print(f"Answer: {item['answer']}")
    print(f"Difficulty: {item['difficulty']:.2f}")
    print("-" * 30)

# Statistics
difficulties = [item['difficulty'] for item in data_output]
print(f"\nðŸ“ˆ STATISTICS:")
print(f"Total examples: {len(data_output)}")
print(f"Min difficulty: {min(difficulties):.2f}")
print(f"Max difficulty: {max(difficulties):.2f}")
print(f"Average difficulty: {sum(difficulties)/len(difficulties):.2f}")

# Export as JSON string (equivalent to original file output)
json_output = json.dumps(data_output, indent=2)
print(f"\nðŸ’¾ JSON OUTPUT (first 200 characters):")
print(json_output[:200] + "..." if len(json_output) > 200 else json_output)

## Data Analysis & Visualization

Let's examine the processed data in detail and display it in a readable format.

In [None]:
# Process the sample data
processed_data = collect_data(sample_raw_data)

# Display summary
print(f"Successfully collected {len(processed_data)} examples")
print(f"Average difficulty: {sum(item['difficulty'] for item in processed_data) / len(processed_data):.2f}")

# Instead of saving to file, store in variable for analysis
data_output = processed_data

print("\nâœ… Data processing complete!")

## Main Execution

Process the sample data and display the results. Instead of saving to an external file, we'll store the results in a variable for immediate analysis.

In [None]:
def collect_data(raw_data: List[Dict[str, str]]) -> List[Dict[str, Any]]:
    """
    Collect and process benchmark data for DKW controller evaluation.
    
    Args:
        raw_data: List of dictionaries with 'question' and 'answer' keys
        
    Returns:
        List of processed examples with additional metadata
    """
    data = []
    for i, example in enumerate(raw_data):
        data.append({
            "id": f"example_{i:03d}",
            "question": example["question"],
            "answer": example["answer"],
            "difficulty": len(example["question"]) / 100,  # Simple proxy based on question length
        })
    
    return data

# Test the function with a single example
test_example = [{"question": "Test question", "answer": "Test answer"}]
test_result = collect_data(test_example)
print("Function test successful:")
print(f"Input: {test_example[0]}")
print(f"Output: {test_result[0]}")

## Data Processing Function

The `collect_data()` function processes the raw data and adds additional metadata:
- Assigns unique IDs to each example
- Calculates a difficulty metric based on question length
- Structures the data for benchmark evaluation

In [None]:
# Sample data representing typical GSM8K math problems
# This replaces the external HuggingFace dataset for demonstration
sample_raw_data = [
    {
        "question": "What is 2+2?",
        "answer": "4"
    },
    {
        "question": "If x=5, what is 2x?", 
        "answer": "10"
    },
    {
        "question": "Solve: 3y + 6 = 15",
        "answer": "y=3"
    },
    {
        "question": "A store has 45 apples. If they sell 18 apples in the morning and 12 apples in the afternoon, how many apples do they have left?",
        "answer": "45 - 18 - 12 = 15 apples"
    },
    {
        "question": "Sarah has 3 times as many books as Tom. If Tom has 8 books, how many books does Sarah have?",
        "answer": "3 Ã— 8 = 24 books"
    }
]

print(f"Loaded {len(sample_raw_data)} sample questions")

## Sample Dataset

Instead of loading from external files or HuggingFace, we'll use inlined sample data to make this notebook completely self-contained. This represents the kind of data that would be collected from the GSM8K dataset.

In [None]:
"""Dataset collection script for DKW benchmark."""
import json
from typing import List, Dict, Any

print("Libraries imported successfully!")

## Imports

Required Python libraries for data processing and JSON handling.

# Dataset Collection for DKW Benchmark

**Artifact:** dataset_001 (data.py)

This notebook demonstrates data collection and processing for the DKW controller evaluation benchmark. The original script has been converted to a self-contained format with inlined data to eliminate external dependencies.

## Overview
- Processes benchmark data for evaluation
- Calculates difficulty metrics
- Generates structured output for analysis