# Constructing a v1 Dataset with the MDIODatasetBuilder

In this notebook, we demonstrate how to use the `MDIODatasetBuilder` to build and write a post-stack depth-migrated (PSDM) seismic dataset using the MDIO v1 schema.

## Imports

In [None]:
from mdio.core.v1.builder import MDIODatasetBuilder, write_mdio_metadata
from mdio.schemas.dtype import ScalarType, StructuredType
from mdio.schemas.compressors import Blosc, ZFP

# Auxiliary import for formatting and pretty printing
from rich import print as rprint
import json

## 1. Create Builder and Add Dimensions
First, instantiate a builder instance with a name and optional global attributes. The builder provides a chainable interface to construct bespoke Dataset contracts that may not exist in the factory.

Attributes are free-form and intended to describe the overall dataset, data providence, processing steps, or any other information that would enrich the Dataset.

In [None]:
# Initialize builder for PSDM stack
builder = MDIODatasetBuilder(
    name="psdm_stack_example",
    attributes={ 'description': 'Example PSDM stack' }
)

# 2. Add Dimensions

The Dimensions represent the core grid of the Dataset.

They are one-dimensional tick-labels which may be populated with values for value-based and index-based access to the Dataset or inert for index-based access to the Dataset.

It is generally recommended to fully populate the dimensions, but is beyond the scope of this example.

In [None]:
# Add core dimensions: inline, crossline, depth
builder.add_dimension('inline', 256, long_name='Inline Number')\
    .add_dimension('crossline', 512, long_name='Crossline Number')\
    .add_dimension('depth', 384, long_name='Depth Sample')

# 3. Add CDP Coordinates (UTM Easting/Northing)

Coordinates are N-dimensional arrays which enrich the dataset by providing auxiliary coordinate systems.

In this example, our Dataset contract shows that we expect that our inline and crossline indices can be translated into real world coordinate values in Map Grid of Australia [Zone 51](https://epsg.io/28351).

In [None]:
# CDP X and Y on inline-crossline grid
builder.add_coordinate(
    name='cdp_x',
    dimensions=['inline','crossline'],
    long_name='CDP X (UTM Easting)',
    data_type=ScalarType.FLOAT64,
    metadata={
        'unitsV1': {'length': 'm'}, 
        "attributes": {"MGA": 51}
    },
).add_coordinate(
    name='cdp_y',
    dimensions=['inline','crossline'],
    long_name='CDP Y (UTM Northing)',
    data_type=ScalarType.FLOAT64,
    metadata={
        'unitsV1': {'length': 'm'},
        "attributes": {"MGA": 51}
    },
)

ValidationError: 2 validation errors for Coordinate
metadata.0.AllUnits.attributes
  Extra inputs are not permitted [type=extra_forbidden, input_value={'MGA': 51}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden
metadata.0.UserAttributes.unitsV1
  Extra inputs are not permitted [type=extra_forbidden, input_value={'length': 'm'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden

## 3. Add Post-Stack Amplitude Volume Variable

In [None]:
builder.add_variable(
    name='stack_amplitude',
    dimensions=['inline','crossline','depth'],
    data_type=ScalarType.FLOAT32,
    compressor=Blosc(algorithm='zstd', level=3),
    coordinates=['inline','crossline','cdp_x','cdp_y'],
    metadata={
        'chunkGrid': {'name': 'regular', 'configuration': {'chunkShape': [64, 64, 64]}}
    },
)

## 4. Build and Write

In [5]:
# Write only metadata to .mdio store and build the interactable Dataset object
ds = builder.to_mdio(store='output/psdm_stack_example.mdio')

# Display the interactable Dataset
ds

  return super().to_zarr(*args, store=store, **kwargs)


# Build and view the Dataset contract

In [7]:
# Build our Dataset model from the builder
dataset = builder.build()

# Serialize the Dataset model to JSON
contract = json.loads(dataset.json())

# Reorder the contract so that metadata is displayed first
ordered_contract = {
    "metadata": contract["metadata"],
    "variables": contract["variables"],
}

rprint(ordered_contract)