## ‚ú® Customization Guide

**To modify this notebook for your needs:**

1. **Add more data**: Extend the `sample_raw_data` list with your own questions and answers
2. **Change difficulty calculation**: Modify the difficulty scoring logic in `collect_data()`
3. **Add new fields**: Extend the data structure to include additional metadata
4. **Export data**: Uncomment and modify the file writing code if you need persistent storage

**Example: Adding a new question**
```python
sample_raw_data.append({
    "question": "What is the square root of 16?",
    "answer": "4"
})
```

**Example: Custom difficulty scoring**
```python
# Replace the simple length-based scoring with keyword-based difficulty
def calculate_difficulty(question):
    complex_keywords = ['solve', 'calculate', 'area', 'square root']
    return 0.8 if any(keyword in question.lower() for keyword in complex_keywords) else 0.3
```

This notebook is completely self-contained and ready for immediate use! üöÄ

In [None]:
# Display sample of processed data
print("üîç Sample of processed dataset:")
print("="*50)
for i, item in enumerate(processed_dataset[:3]):
    print(f"\nExample {i+1}:")
    print(f"  ID: {item['id']}")
    print(f"  Question: {item['question']}")
    print(f"  Answer: {item['answer']}")
    print(f"  Difficulty: {item['difficulty']:.3f}")

# Basic statistics
print("\nüìä Dataset Statistics:")
print("="*50)
print(f"Total examples: {len(processed_dataset)}")

difficulties = [item['difficulty'] for item in processed_dataset]
print(f"Difficulty range: {min(difficulties):.3f} - {max(difficulties):.3f}")
print(f"Average difficulty: {sum(difficulties)/len(difficulties):.3f}")

# Question length analysis
lengths = [len(item['question']) for item in processed_dataset]
print(f"Question length range: {min(lengths)} - {max(lengths)} characters")
print(f"Average question length: {sum(lengths)/len(lengths):.1f} characters")

print("\nüéØ Dataset ready for DKW benchmark evaluation!")

## üìã Results & Analysis

Let's examine the processed dataset and analyze the results. This section shows the final structure and some basic statistics.

In [None]:
# Main execution - process all data
data = collect_data(sample_raw_data)

# Instead of writing to file, store in memory for self-contained execution
processed_dataset = data

print(f"‚úÖ Successfully collected {len(data)} examples")
print(f"üìä Dataset ready for DKW benchmark evaluation")

# Optional: Save to JSON string for inspection (equivalent to file output)
json_output = json.dumps(data, indent=2)
print(f"\nüìù JSON representation ready ({len(json_output)} characters)")

## üöÄ Main Execution

Process all the sample data and generate the final benchmark dataset. Instead of saving to an external file, we'll store the results in memory for immediate use.

In [None]:
def collect_data(raw_data):
    """Collect benchmark data for DKW controller evaluation."""
    # Process the inline sample data instead of loading from HuggingFace
    data = []
    for i, example in enumerate(raw_data):
        data.append({
            "id": f"example_{i:03d}",
            "question": example["question"],
            "answer": example["answer"],
            "difficulty": len(example["question"]) / 100,  # Simple proxy based on question length
        })
    
    return data

# Test the function with a small subset first
print("Function defined successfully!")
print("Preview of processing logic:")
preview_data = collect_data(sample_raw_data[:2])
for item in preview_data:
    print(f"  {item}")

## üîß Data Processing Function

The `collect_data()` function processes the raw data and adds metadata including:
- Unique ID for each example
- Difficulty score based on question length (simple heuristic)
- Structured format for benchmark evaluation

In [None]:
# Sample dataset - inlined for self-contained execution
# This represents the type of data that would be loaded from external sources
sample_raw_data = [
    {"question": "What is 2+2?", "answer": "4"},
    {"question": "If x=5, what is 2x?", "answer": "10"},
    {"question": "Solve: 3y + 6 = 15", "answer": "y=3"},
    # Add more examples to simulate larger dataset
    {"question": "What is 15 - 7?", "answer": "8"},
    {"question": "Calculate 3 √ó 4", "answer": "12"},
    {"question": "What is 20 √∑ 4?", "answer": "5"},
    {"question": "If a = 8 and b = 3, what is a + b?", "answer": "11"},
    {"question": "Solve for x: x + 7 = 12", "answer": "x = 5"},
    {"question": "What is 25% of 100?", "answer": "25"},
    {"question": "Calculate the area of a rectangle with length 6 and width 4", "answer": "24"}
]

print(f"Loaded {len(sample_raw_data)} sample examples")

## üìä Sample Dataset

Since this is a self-contained notebook, we'll use inline sample data instead of loading from external sources. This data represents the type of mathematical questions and answers that would be processed in the DKW benchmark.

In [None]:
"""Dataset collection script for DKW benchmark."""
import json

## üì¶ Dependencies

Import the required libraries for data processing and JSON handling.

# DKW Benchmark Dataset Collection

This notebook demonstrates dataset collection and processing for DKW (Deep Knowledge Workers) controller evaluation. 

**Artifact Information:**
- ID: dataset_001
- Name: data.py
- Purpose: Collect and process benchmark data for evaluation

The notebook is completely self-contained and doesn't require any external files.