# DataHandler Tutorial Example

In this notebook, we demonstrate how to use the **DataHandler** module to manage simulation outputs. This example complements our other tutorials and illustrates the following steps:

- **Folder and Iteration Setup:** Create the necessary folder hierarchy and start new iterations.
- **Data Registration and Saving:** Register simulation output identifiers and add sample data.
- **Aggregation and Export:** Aggregate data from multiple iterations and export the results as CSV files.

Let's walk through each step.

## Step 0: Preparation of example data to save
The data handler assume some strcuture on the data that is saving. the strcuture is as follows:
1. the data object are passed as dict
2. the data object itself should has a method called flatten which returns a flat dict
3. for the aggregation it is required to pass an adapter. the adapter is an instance that gets as input dict and returns the correct instance

In [1]:
# %% [code]
# Import required modules and models
from pathlib import Path
import json
from typing_extensions import Literal, Annotated
from pydantic import BaseModel, TypeAdapter, Field

# Import the DataHandler (adjust the import path as needed)
from pysubmit.workflow.data_handler.new_data_handler import DataHandler

# ---------------------------------------------------------------------------
# Define Simulation Output Models with flatten() methods
# ---------------------------------------------------------------------------

class Parameters(BaseModel):
    type: Literal['parameters'] = 'parameters'
    id: str = 'build_parameters'
    parameters: dict = {}

    def flatten(self) -> dict:
        return self.parameters

class Classical(BaseModel):
    type: Literal['classical'] = 'classical'
    id: str
    result: dict = {}

    def flatten(self) -> dict:
        return self.result

class Quantum(BaseModel):
    type: Literal['quantum'] = 'quantum'
    id: str
    result: dict = {}

    def flatten(self) -> dict:
        return self.result

# Define the discriminated union for our simulation outputs.
datatype = Annotated[Parameters | Classical | Quantum, Field(discriminator='type')]
adapter = TypeAdapter(datatype)

## Step 1: Setup Folder Structure and Create Iterations

We first create the folder hierarchy. Here, we enable the `overwrite` option to start fresh. Then, we create multiple iterations (4 iterations) to simulate different simulation runs.

In [2]:
# %% [code]
# Define the root directory for our test results.
root_dir = Path("demo_results")

# Instantiate the DataHandler.
handler = DataHandler(root_directory=root_dir)

# Create the folder hierarchy with overwrite enabled.
handler.create_folders(overwrite=True)
print("Folder hierarchy created (overwrite enabled).")


OSError: [WinError 145] The directory is not empty: 'demo_results'

## Step 2: Register and Add Simulation Data

For each iteration, we register three identifiers:

- `build_parameters`
- `classical_simulation`
- `quantum_simulation`

Then we add sample data for each identifier using our simulation models.

In [None]:
# %% [code]
for i in range(4):
    handler.create_new_iteration()

    # Register identifiers for the current iteration.
    handler.register_identifier("build_parameters")
    handler.register_identifier("classical_simulation")
    handler.register_identifier("quantum_simulation")
    print(f"Iteration {i}: Identifiers registered.")

    # Create sample simulation outputs with slight variations.
    sample_build = Parameters(
        id='build_parameters',
        parameters={"parameter1": 42 + i, "parameter2": f"value_{i}", "type": "simulation_result"}
    )
    sample_classical = Classical(
        id='classical_simulation',
        result={"result": 3.14 + i}
    )
    sample_quantum = Quantum(
        id='quantum_simulation',
        result={"result": 2.71 + i}
    )

    # Add the sample data to the current iteration.
    handler.add_data_to_iteration(sample_build.id, sample_build)
    handler.add_data_to_iteration(sample_classical.id, sample_classical)
    handler.add_data_to_iteration(sample_quantum.id, sample_quantum)
    print(f"Iteration {i}: Sample data added.")

Iteration 0: Identifiers registered.
Iteration 0: Sample data added.
Iteration 1: Identifiers registered.
Iteration 1: Sample data added.
Iteration 2: Identifiers registered.
Iteration 2: Sample data added.
Iteration 3: Identifiers registered.
Iteration 3: Sample data added.


## Step 3: Summarize Iterations

We now summarize the stored iterations by reading the metadata from each iteration folder.

In [None]:
# %% [code]
summary = handler.summarize_iterations()
print("Iteration Summary:")
print(json.dumps(summary, indent=2, default=str))

Iteration Summary:
{
  "iteration_0": {
    "iteration_number": 0,
    "base_path": "demo_results\\results\\iterations\\iteration_0",
    "id_to_description": {
      "build_parameters": {
        "path": "build_parameters.json",
        "start_timestamp": "2025-03-30 02:29:31.151153",
        "status": "done",
        "saved_timestamp": "2025-03-30 02:29:31.153147"
      },
      "classical_simulation": {
        "path": "classical_simulation.json",
        "start_timestamp": "2025-03-30 02:29:31.151153",
        "status": "done",
        "saved_timestamp": "2025-03-30 02:29:31.155142"
      },
      "quantum_simulation": {
        "path": "quantum_simulation.json",
        "start_timestamp": "2025-03-30 02:29:31.152150",
        "status": "done",
        "saved_timestamp": "2025-03-30 02:29:31.156139"
      }
    }
  },
  "iteration_1": {
    "iteration_number": 1,
    "base_path": "demo_results\\results\\iterations\\iteration_1",
    "id_to_description": {
      "build_parameters": 

## Step 4: Aggregate Data and Export as CSV

Using an aggregation configuration, we combine data from different simulation outputs. In this example, we set up two aggregations:

- **classical:** Combines `build_parameters` and `classical_simulation`
- **quantum:** Combines `build_parameters` and `quantum_simulation`

After aggregating, the results are saved as CSV files (`classical.csv` and `quantum.csv`) in the aggregations folder.

In [None]:
# %% [code]
# Define the aggregation configuration.
agg_config = {
    "classical": ("build_parameters", "classical_simulation"),
    "quantum": ("build_parameters", "quantum_simulation")
}

# Perform the aggregation using our adapter.
aggregated = handler.aggregate_by_config(agg_config, adapter=adapter)
print("Aggregated Data:")
print(json.dumps(aggregated, indent=2, default=str))

# Save aggregated results to CSV files in the aggregations folder.
handler.save_aggregation_to_csv(agg_config, adapter=adapter)
print(f"Aggregated CSV files saved in: {handler.aggregations_directory}")

Aggregated Data:
{
  "classical": [
    {
      "iteration_number": 0,
      "parameter1": 42,
      "parameter2": "value_0",
      "type": "simulation_result",
      "result": 3.14
    },
    {
      "iteration_number": 1,
      "parameter1": 43,
      "parameter2": "value_1",
      "type": "simulation_result",
      "result": 4.140000000000001
    },
    {
      "iteration_number": 2,
      "parameter1": 44,
      "parameter2": "value_2",
      "type": "simulation_result",
      "result": 5.140000000000001
    },
    {
      "iteration_number": 3,
      "parameter1": 45,
      "parameter2": "value_3",
      "type": "simulation_result",
      "result": 6.140000000000001
    }
  ],
  "quantum": [
    {
      "iteration_number": 0,
      "parameter1": 42,
      "parameter2": "value_0",
      "type": "simulation_result",
      "result": 2.71
    },
    {
      "iteration_number": 1,
      "parameter1": 43,
      "parameter2": "value_1",
      "type": "simulation_result",
      "result": 

# Conclusion

In this notebook, we demonstrated a complete workflow using the **DataHandler** module:

- **Setup:** Creating the folder structure and iterations.
- **Data Management:** Registering simulation outputs and adding sample data.
- **Aggregation:** Combining simulation results from multiple iterations and exporting the aggregated data as CSV files.

This approach allows you to systematically manage and analyze simulation outputs, seamlessly integrating with the other features of our package.