## Usage Instructions

### How to Use This Notebook:

1. **Run all cells**: Execute cells in order from top to bottom
2. **Modify parameters**: 
   - Change `split="test[:200]"` to adjust the number of examples
   - Modify the difficulty calculation in the `collect_data()` function
   - Add your own sample data to the `sample_data` list
3. **Experiment**: This notebook is completely self-contained - no external files needed!

### Key Features:
- ‚úÖ **Self-contained**: No external file dependencies
- üîÑ **Interactive**: Easy to modify and re-run
- üìä **Informative**: Includes data analysis and export functionality
- üõ°Ô∏è **Robust**: Graceful fallback to sample data if HuggingFace is unavailable

### Next Steps:
- Integrate this data collection into your DKW controller evaluation pipeline
- Add additional data processing or filtering as needed
- Expand the difficulty metric to be more sophisticated

In [None]:
# Use collected data if available, otherwise fall back to sample data
data_to_analyze = collected_data if 'collected_data' in locals() else sample_data

# Basic analysis
print("üìà Data Analysis:")
print(f"  Total examples: {len(data_to_analyze)}")

if data_to_analyze:
    difficulties = [item['difficulty'] for item in data_to_analyze]
    print(f"  Average difficulty: {sum(difficulties)/len(difficulties):.3f}")
    print(f"  Min difficulty: {min(difficulties):.3f}")
    print(f"  Max difficulty: {max(difficulties):.3f}")
    
    # Question length analysis
    question_lengths = [len(item['question']) for item in data_to_analyze]
    print(f"  Average question length: {sum(question_lengths)/len(question_lengths):.1f} characters")

# Export functionality (self-contained - saves to memory as JSON string)
def export_data(data, filename="data_out.json"):
    """Export data as JSON string (self-contained version)."""
    json_string = json.dumps(data, indent=2)
    print(f"\nüíæ Data exported as JSON (would be saved to {filename}):")
    print("üìÑ JSON Preview (first 500 characters):")
    print(json_string[:500] + "..." if len(json_string) > 500 else json_string)
    return json_string

# Export the data
exported_json = export_data(data_to_analyze)

print(f"\n‚úÖ Notebook is now completely self-contained!")
print(f"üìä Working with {len(data_to_analyze)} examples")
print("üîß You can modify the collect_data() function or sample_data to experiment!")

## Data Analysis and Export

Let's analyze the collected data and provide export functionality:

In [None]:
# Execute the data collection
try:
    collected_data = collect_data()
    print(f"\nüéâ Successfully collected {len(collected_data)} examples!")
    
    # Show first few examples
    print("\nüìä First 3 examples:")
    for i, example in enumerate(collected_data[:3]):
        print(f"\nExample {i+1}:")
        print(f"  ID: {example['id']}")
        print(f"  Question: {example['question'][:100]}{'...' if len(example['question']) > 100 else ''}")
        print(f"  Answer: {example['answer'][:50]}{'...' if len(example['answer']) > 50 else ''}")
        print(f"  Difficulty: {example['difficulty']:.3f}")
        
except Exception as e:
    print(f"‚ùå Error during data collection: {e}")
    print("üí° You can still work with the sample_data above for testing!")

## Execute Data Collection

Now let's run the data collection function to gather real data from the GSM8K dataset:

In [None]:
# Sample data structure (inlined from data_out.json for demonstration)
sample_data = [
    {
        "id": "example_000",
        "question": "What is 2+2?",
        "answer": "4",
        "difficulty": 0.15
    },
    {
        "id": "example_001", 
        "question": "If x=5, what is 2x?",
        "answer": "10",
        "difficulty": 0.22
    },
    {
        "id": "example_002",
        "question": "Solve: 3y + 6 = 15",
        "answer": "y=3",
        "difficulty": 0.28
    }
]

print("üìã Sample data structure:")
for example in sample_data:
    print(f"  ID: {example['id']}")
    print(f"  Question: {example['question']}")
    print(f"  Answer: {example['answer']}")
    print(f"  Difficulty: {example['difficulty']}")
    print("  ---")

## Sample Data Preview

Before running the full data collection, here's what the output structure looks like with some sample examples:

In [None]:
def collect_data():
    """Collect benchmark data for DKW controller evaluation."""
    print("üîÑ Loading HuggingFace dataset...")
    
    # Load HuggingFace dataset
    ds = load_dataset("gsm8k", "main", split="test[:200]")

    print(f"üìä Processing {len(ds)} examples...")
    data = []
    for i, example in enumerate(ds):
        data.append({
            "id": f"example_{i:03d}",
            "question": example["question"],
            "answer": example["answer"],
            "difficulty": len(example["question"]) / 100,  # Simple proxy
        })

    print(f"‚úÖ Collected {len(data)} examples")
    return data

## Data Collection Function

The main function loads the GSM8K dataset from HuggingFace and processes the first 200 test examples. Each example includes:
- **ID**: Unique identifier 
- **Question**: Math word problem
- **Answer**: Solution to the problem
- **Difficulty**: Simple proxy based on question length

In [None]:
"""Dataset collection script for DKW benchmark."""
import json
from datasets import load_dataset

print("‚úÖ Imports loaded successfully!")

# DKW Benchmark Dataset Collection

**Artifact ID:** dataset_001  
**Name:** data.py  
**Description:** Interactive notebook for collecting benchmark data for DKW controller evaluation

This notebook demonstrates how to collect and process data from the GSM8K dataset for benchmark purposes.