In [None]:
# Save data to JSON file (replicating original script behavior)
output_filename = "data_out.json"

with open(output_filename, "w") as f:
    json.dump(working_data, f, indent=2)

print(f"üíæ Saved {len(working_data)} examples to {output_filename}")
print("‚ú® Dataset collection complete!")

# Show the JSON content as it would appear in the file
print(f"\nüìÑ Content of {output_filename}:")
print(json.dumps(working_data, indent=2))

## Export Data (Optional)

If you want to save the data to a JSON file (replicating the original script behavior), run the cell below:

In [None]:
# Analyze the data
difficulties = [item["difficulty"] for item in working_data]
avg_difficulty = sum(difficulties) / len(difficulties)
min_difficulty = min(difficulties)
max_difficulty = max(difficulties)

print("üìä Dataset Statistics:")
print(f"Total examples: {len(working_data)}")
print(f"Average difficulty: {avg_difficulty:.3f}")
print(f"Difficulty range: {min_difficulty:.3f} - {max_difficulty:.3f}")

print("\nüìù Sample questions by difficulty:")
sorted_data = sorted(working_data, key=lambda x: x["difficulty"])
print(f"Easiest: {sorted_data[0]['question'][:50]}... (difficulty: {sorted_data[0]['difficulty']:.3f})")
print(f"Hardest: {sorted_data[-1]['question'][:50]}... (difficulty: {sorted_data[-1]['difficulty']:.3f})")

print("\nüîç All examples:")
for item in working_data:
    print(f"ID: {item['id']}, Difficulty: {item['difficulty']:.3f}")
    print(f"Q: {item['question']}")
    print(f"A: {item['answer']}")
    print("-" * 50)

## Data Analysis

Let's analyze our collected data to understand its characteristics:

In [None]:
# Attempt to collect live data from HuggingFace
try:
    live_data = collect_data()
    print(f"‚úÖ Successfully collected {len(live_data)} examples from HuggingFace")
    print("\nFirst live example:")
    print(json.dumps(live_data[0], indent=2))
    
    # Use live data for further processing
    working_data = live_data
    
except Exception as e:
    print(f"‚ùå Could not connect to HuggingFace: {e}")
    print("üìù Using sample data instead...")
    
    # Fall back to sample data
    working_data = sample_data

print(f"\nWorking with {len(working_data)} examples")

## Live Data Collection

Run the cell below to collect fresh data from HuggingFace. This requires:
- Internet connection
- `datasets` library installed
- Possible HuggingFace authentication for some datasets

**Note:** If you just want to experiment with the data format, use the `sample_data` above instead!

In [None]:
# Sample data (inlined from data_out.json)
sample_data = [
    {
        "id": "example_000",
        "question": "What is 2+2?",
        "answer": "4",
        "difficulty": 0.15
    },
    {
        "id": "example_001",
        "question": "If x=5, what is 2x?",
        "answer": "10",
        "difficulty": 0.22
    },
    {
        "id": "example_002",
        "question": "Solve: 3y + 6 = 15",
        "answer": "y=3",
        "difficulty": 0.28
    }
]

print(f"Sample dataset contains {len(sample_data)} examples")
print("\nFirst example:")
print(json.dumps(sample_data[0], indent=2))

## Sample Data (Self-Contained)

For immediate experimentation, here's sample data that represents the expected output format. This makes the notebook completely self-contained - you can work with the data without needing to connect to HuggingFace or download anything.

In [None]:
def collect_data():
    """Collect benchmark data for DKW controller evaluation."""
    # Load HuggingFace dataset
    ds = load_dataset("gsm8k", "main", split="test[:200]")

    data = []
    for i, example in enumerate(ds):
        data.append({
            "id": f"example_{i:03d}",
            "question": example["question"],
            "answer": example["answer"],
            "difficulty": len(example["question"]) / 100,  # Simple proxy
        })

    return data

## Data Collection Function

The `collect_data()` function connects to HuggingFace to load the GSM8K dataset and processes it for our benchmark. It:

- Loads the first 200 test examples from GSM8K
- Extracts question and answer pairs  
- Calculates a difficulty score based on question length
- Returns structured data with unique IDs

In [None]:
"""Dataset collection script for DKW benchmark."""
import json
from datasets import load_dataset

## Overview

This notebook provides two ways to work with the dataset:

1. **Live Data Collection**: Connect to HuggingFace to collect fresh data from the GSM8K dataset
2. **Sample Data**: Use pre-processed sample data for immediate experimentation

The sample data has been inlined to make this notebook completely self-contained - no external files required!

# Dataset Collection Script for DKW Benchmark

**Artifact ID:** dataset_001  
**Name:** data.py

This notebook demonstrates how to collect and process benchmark data for DKW controller evaluation. The original script has been converted into an interactive format with sample data included for easy experimentation.