Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/template_registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ print(list_templates())
# e.g. ["Seismic2DPostStackTime", "Seismic3DPostStackDepth", ...]

# Grab a template by name
tpl = get_template("Seismic3DPostStackTime")
template = get_template("Seismic3DPostStackTime")

# Customize your copy (safe)
tpl.add_units({"amplitude": "unitless"})
template.add_units({"amplitude": "unitless"})
```

## Common tasks
Expand All @@ -37,8 +37,8 @@ tpl.add_units({"amplitude": "unitless"})
```python
from mdio.builder.template_registry import get_template

tpl = get_template("Seismic2DPostStackDepth")
# Use/modify tpl freely — it’s your copy
template = get_template("Seismic2DPostStackDepth")
# Use/modify template freely — it’s your copy
```

### List available templates
Expand Down
302 changes: 302 additions & 0 deletions docs/tutorials/custom_template.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the docs page theme is set to "auto" the html repr in Making Dummy Xarray Dataset section is very difficult to read if system is set to dark. The repr is responsive to explicit modes "dark" and "light".

Auto

Image

Light

Image

Dark

Image

Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "85114119ae7a4db0",
"metadata": {},
"source": [
"# Create and Register a Custom Template\n",
"\n",
"```{article-info}\n",
":author: Altay Sansal\n",
":date: \"{sub-ref}`today`\"\n",
":read-time: \"{sub-ref}`wordcount-minutes` min read\"\n",
":class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light\n",
"```\n",
"\n",
"```{warning}\n",
"Most SEG-Y files correspond to standard seismic data types or field configurations. We recommend using\n",
"the built-in templates from the registry whenever possible. Create a custom template only when your file\n",
"is unusual and cannot be represented by existing templates. In many cases, you can simply customize the\n",
"SEG-Y header byte mapping during ingestion without defining a new template.\n",
"```\n",
"\n",
"In this tutorial we will walk through the Template Registry and show how to:\n",
"\n",
"- Discover available templates in the registry\n",
"- Define and register your own template\n",
"- Build a dataset model and convert it to an Xarray Dataset using your custom template\n",
"\n",
"If this is your first time with MDIO, you may want to skim the Quickstart first."
]
},
{
"cell_type": "markdown",
"id": "a793f2cfb58f09cc",
"metadata": {},
"source": [
"## What is a Template and a Template Registry?\n",
"\n",
"A template defines how an MDIO dataset is structured: names of dimensions and coordinates, the default variable name, chunking hints, and attributes to be stored. Since many seismic datasets share common structures (e.g., 3D post-stack, 2D post-stack, pre-stack CDP/shot, etc.), MDIO ships with a pre-populated template registry and APIs to fetch or register templates.\n",
"\n",
"Fetching a template from it returns a copied instance you can freely customize without affecting others."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7a760a019930d4e",
"metadata": {},
"outputs": [],
"source": [
"from mdio.builder.template_registry import get_template\n",
"from mdio.builder.template_registry import get_template_registry\n",
"from mdio.builder.template_registry import list_templates\n",
"\n",
"registry = get_template_registry()\n",
"registry # pretty HTML in notebooks"
]
},
{
"cell_type": "markdown",
"id": "810dbba2b6dba787",
"metadata": {},
"source": [
"We can list all registered templates and get a list as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38eb1da635c7be0f",
"metadata": {},
"outputs": [],
"source": [
"list_templates()"
]
},
{
"cell_type": "markdown",
"id": "d87bd9ec781a8a8e",
"metadata": {},
"source": [
"## Defining a Minimal Custom Template\n",
"\n",
"To define a custom template, subclass `AbstractDatasetTemplate` and set:\n",
"\n",
"- `_name`: a public name for the template\n",
"- `_dim_names`: names for each axis of your data variable (the last axis is the trace/time or trace/depth axis)\n",
"- `_physical_coord_names` and `_logical_coord_names`: optional additional coordinate variables to store along the spatial grid\n",
"- `_load_dataset_attributes()`: optional attributes stored at the dataset level\n",
"\n",
"Below we create a special template that can hold interval velocity field with multiple anisotropy parameters for a depth seismic volume.\n",
"\n",
"The dimensions, dimension-coordinates and non-dimension coordinates will automatically get created using the method\n",
"from the base class. However, since we want more variables, we override `_add_variables` to add them."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cfc9d9b0e1b67a76",
"metadata": {},
"outputs": [],
"source": [
"from mdio.builder.schemas import compressors\n",
"from mdio.builder.schemas.chunk_grid import RegularChunkGrid\n",
"from mdio.builder.schemas.chunk_grid import RegularChunkShape\n",
"from mdio.builder.schemas.dtype import ScalarType\n",
"from mdio.builder.schemas.v1.variable import VariableMetadata\n",
"from mdio.builder.templates.base import AbstractDatasetTemplate\n",
"\n",
"\n",
"class AnisotropicVelocityTemplate(AbstractDatasetTemplate):\n",
" \"\"\"A custom template that has unusual dimensions and coordinates.\"\"\"\n",
"\n",
" def __init__(self, data_domain: str = \"depth\") -> None:\n",
" super().__init__(data_domain)\n",
" # Dimension order matters; the last dimension is the depth\n",
" self._dim_names = (\"inline\", \"crossline\", self.trace_domain)\n",
" # Additional coordinates: these are added on top of dimension coordinates\n",
" self._physical_coord_names = (\"cdp_x\", \"cdp_y\")\n",
" self._var_chunk_shape = (128, 128, 128)\n",
" self._units = {}\n",
"\n",
" @property\n",
" def _name(self) -> str: # public name for the registry\n",
" return \"AnisotropicVelocity3DDepth\"\n",
"\n",
" @property\n",
" def _default_variable_name(self) -> str: # public name for the registry\n",
" return \"velocity\"\n",
"\n",
" def _load_dataset_attributes(self) -> dict:\n",
" return {\"surveyType\": \"3D\", \"gatherType\": \"line\"}\n",
"\n",
" def _add_variables(self) -> None:\n",
" \"\"\"Add the variables including default and extra.\"\"\"\n",
" for name in [\"velocity\", \"epsilon\", \"delta\"]:\n",
" chunk_grid = RegularChunkGrid(configuration=RegularChunkShape(chunk_shape=self.full_chunk_shape))\n",
" unit = self.get_unit_by_key(name)\n",
" self._builder.add_variable(\n",
" name=name,\n",
" dimensions=self._dim_names,\n",
" data_type=ScalarType.FLOAT32,\n",
" compressor=compressors.Blosc(cname=compressors.BloscCname.zstd),\n",
" coordinates=self.physical_coordinate_names,\n",
" metadata=VariableMetadata(chunk_grid=chunk_grid, units_v1=unit),\n",
" )\n",
"\n",
"\n",
"AnisotropicVelocityTemplate()"
]
},
{
"cell_type": "markdown",
"id": "15e61310ed0ffd97",
"metadata": {},
"source": [
"## Registering the Custom Template\n",
"\n",
"The registry returns a deep copy of the template on every fetch. To make the template discoverable by name, register it first, then retrieve it with `get_template`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a4e1847b20da6768",
"metadata": {},
"outputs": [],
"source": [
"from mdio.builder.template_registry import register_template\n",
"\n",
"register_template(AnisotropicVelocityTemplate())\n",
"print(\"Registered:\", \"AnisotropicVelocity3DDepth\" in list_templates())\n",
"\n",
"custom_template = get_template(\"AnisotropicVelocity3DDepth\")\n",
"custom_template"
]
},
{
"cell_type": "markdown",
"id": "83b0772f1913c652",
"metadata": {},
"source": [
"You can also set units at any time. For this demo we’ll set metric units. The spatial units will be inferred from the SEG-Y binary header during ingestion, but we can override them here. Ingestion will honor what is in the template."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7dca50d72d2f93",
"metadata": {},
"outputs": [],
"source": [
"from mdio.builder.schemas.v1.units import LengthUnitModel\n",
"from mdio.builder.schemas.v1.units import SpeedUnitModel\n",
"\n",
"custom_template.add_units(\n",
" {\n",
" \"depth\": LengthUnitModel(length=\"m\"),\n",
" \"cdp_x\": LengthUnitModel(length=\"m\"),\n",
" \"cdp_y\": LengthUnitModel(length=\"m\"),\n",
" \"velocity\": SpeedUnitModel(speed=\"m/s\"),\n",
" }\n",
")\n",
"custom_template"
]
},
{
"cell_type": "markdown",
"id": "367ade9824e72bc3",
"metadata": {},
"source": [
"## Changing chunk size (chunks) on an existing template\n",
"\n",
"Often you will want to tweak the chunking strategy for performance. You can do this in two ways:\n",
"\n",
"- When defining a subclass, set a default in the constructor (e.g., `self._var_chunk_shape = (...)`).\n",
"- On an existing template instance, assign to the `full_chunk_shape` property once you know your final\n",
" dataset sizes (the tuple length must match the number of data dimensions).\n",
"\n",
"Below is a tiny demo showing how to modify the chunk shape on a fetched template. We first build the\n",
"template with known sizes to satisfy validation, then update `full_chunk_shape`.\n",
"\n",
"```{note}\n",
"In the SEG-Y to MDIO conversion workflow, MDIO infers the final grid shape from the SEG-Y headers. It’s\n",
"common to set or adjust `full_chunk_shape` right before calling `segy_to_mdio`, using the same sizes\n",
"you expect for the final array.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75939231b58c204a",
"metadata": {},
"outputs": [],
"source": [
"mdio_ds = custom_template.build_dataset(name=\"demo-only\", sizes=(300, 500, 1001))\n",
"# pick smaller chunks than the full array for better parallelism and IO\n",
"custom_template.full_chunk_shape = (64, 64, 64)\n",
"print(\"Chunk shape set to:\", custom_template.full_chunk_shape)\n",
"\n",
"custom_template"
]
},
{
"cell_type": "markdown",
"id": "a76f17cdf235de13",
"metadata": {},
"source": [
"## Making Dummy Xarray Dataset\n",
"\n",
"We can now take the MDIO Dataset model and convert it to Xarray with our configuration. If ingesting from SEG-Y, this step\n",
"gets executed automatically by the converter before populating the data.\n",
"\n",
"Note that the whole dataset will be populated with the fill values."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce3dcf9c7946ea07",
"metadata": {},
"outputs": [],
"source": [
"from mdio.builder.xarray_builder import to_xarray_dataset\n",
"\n",
"to_xarray_dataset(mdio_ds)"
]
},
{
"cell_type": "markdown",
"id": "fc05aa3c81f8465c",
"metadata": {},
"source": [
"## Recap: Key APIs Used\n",
"\n",
"- Template registry helpers: `get_template_registry`, `list_templates`, `register_template`, `get_template`\n",
"- Base template to subclass: `AbstractDatasetTemplate`\n",
"- Make Xarray Dataset from MDIO Data Model: `to_xarray_dataset`\n",
"\n",
"With these pieces, you can standardize how your seismic data is represented in MDIO and keep ingestion code concise and repeatable.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a15848ab5c0811d6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"mystnb": {
"execution_mode": "force"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
1 change: 1 addition & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ creation
compression
rechunking
corrupt_files
custom_template
```
Loading