-
Notifications
You must be signed in to change notification settings - Fork 15
Add template tutorial including custom template registry. #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
3348a23
update dimension handling and variable naming in templates
tasansal dcdd1fc
standardize variable naming in template_registry docs
tasansal 1a55062
Remove PyCharm metadata from `quickstart.ipynb` for cleanup
tasansal 1aa8b26
add tutorial on creating and registering custom templates in MDIO
tasansal 19ad732
add primary foreground color styling to HTML table elements
tasansal 2dfbd4a
update custom template tutorial with revised class name, dimensions, …
tasansal 076391c
simplify logical coordinate names in custom template tutorial
tasansal 305fc67
refine dataset description in custom template tutorial
tasansal 8060be7
update metadata model_dump mode to JSON in xarray_builder
tasansal 26666bb
update metadata model_dump mode to JSON in xarray_builder
tasansal 94ed4a8
refine and enhance explanations in custom template tutorial for impro…
tasansal a2ca0ae
refine custom template tutorial: update dataset modeling details and …
tasansal 5ce6c42
clean up custom template tutorial: adjust markdown formatting and rem…
tasansal 06fa2ce
Merge branch 'main' into custom-template-tutorial
BrianMichell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,302 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "85114119ae7a4db0", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Create and Register a Custom Template\n", | ||
| "\n", | ||
| "```{article-info}\n", | ||
| ":author: Altay Sansal\n", | ||
| ":date: \"{sub-ref}`today`\"\n", | ||
| ":read-time: \"{sub-ref}`wordcount-minutes` min read\"\n", | ||
| ":class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light\n", | ||
| "```\n", | ||
| "\n", | ||
| "```{warning}\n", | ||
| "Most SEG-Y files correspond to standard seismic data types or field configurations. We recommend using\n", | ||
| "the built-in templates from the registry whenever possible. Create a custom template only when your file\n", | ||
| "is unusual and cannot be represented by existing templates. In many cases, you can simply customize the\n", | ||
| "SEG-Y header byte mapping during ingestion without defining a new template.\n", | ||
| "```\n", | ||
| "\n", | ||
| "In this tutorial we will walk through the Template Registry and show how to:\n", | ||
| "\n", | ||
| "- Discover available templates in the registry\n", | ||
| "- Define and register your own template\n", | ||
| "- Build a dataset model and convert it to an Xarray Dataset using your custom template\n", | ||
| "\n", | ||
| "If this is your first time with MDIO, you may want to skim the Quickstart first." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "a793f2cfb58f09cc", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## What is a Template and a Template Registry?\n", | ||
| "\n", | ||
| "A template defines how an MDIO dataset is structured: names of dimensions and coordinates, the default variable name, chunking hints, and attributes to be stored. Since many seismic datasets share common structures (e.g., 3D post-stack, 2D post-stack, pre-stack CDP/shot, etc.), MDIO ships with a pre-populated template registry and APIs to fetch or register templates.\n", | ||
| "\n", | ||
| "Fetching a template from it returns a copied instance you can freely customize without affecting others." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "c7a760a019930d4e", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from mdio.builder.template_registry import get_template\n", | ||
| "from mdio.builder.template_registry import get_template_registry\n", | ||
| "from mdio.builder.template_registry import list_templates\n", | ||
| "\n", | ||
| "registry = get_template_registry()\n", | ||
| "registry # pretty HTML in notebooks" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "810dbba2b6dba787", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "We can list all registered templates and get a list as well." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "38eb1da635c7be0f", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "list_templates()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "d87bd9ec781a8a8e", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Defining a Minimal Custom Template\n", | ||
| "\n", | ||
| "To define a custom template, subclass `AbstractDatasetTemplate` and set:\n", | ||
| "\n", | ||
| "- `_name`: a public name for the template\n", | ||
| "- `_dim_names`: names for each axis of your data variable (the last axis is the trace/time or trace/depth axis)\n", | ||
| "- `_physical_coord_names` and `_logical_coord_names`: optional additional coordinate variables to store along the spatial grid\n", | ||
| "- `_load_dataset_attributes()`: optional attributes stored at the dataset level\n", | ||
| "\n", | ||
| "Below we create a special template that can hold interval velocity field with multiple anisotropy parameters for a depth seismic volume.\n", | ||
| "\n", | ||
| "The dimensions, dimension-coordinates and non-dimension coordinates will automatically get created using the method\n", | ||
| "from the base class. However, since we want more variables, we override `_add_variables` to add them." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "cfc9d9b0e1b67a76", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from mdio.builder.schemas import compressors\n", | ||
| "from mdio.builder.schemas.chunk_grid import RegularChunkGrid\n", | ||
| "from mdio.builder.schemas.chunk_grid import RegularChunkShape\n", | ||
| "from mdio.builder.schemas.dtype import ScalarType\n", | ||
| "from mdio.builder.schemas.v1.variable import VariableMetadata\n", | ||
| "from mdio.builder.templates.base import AbstractDatasetTemplate\n", | ||
| "\n", | ||
| "\n", | ||
| "class AnisotropicVelocityTemplate(AbstractDatasetTemplate):\n", | ||
| " \"\"\"A custom template that has unusual dimensions and coordinates.\"\"\"\n", | ||
| "\n", | ||
| " def __init__(self, data_domain: str = \"depth\") -> None:\n", | ||
| " super().__init__(data_domain)\n", | ||
| " # Dimension order matters; the last dimension is the depth\n", | ||
| " self._dim_names = (\"inline\", \"crossline\", self.trace_domain)\n", | ||
| " # Additional coordinates: these are added on top of dimension coordinates\n", | ||
| " self._physical_coord_names = (\"cdp_x\", \"cdp_y\")\n", | ||
| " self._var_chunk_shape = (128, 128, 128)\n", | ||
| " self._units = {}\n", | ||
| "\n", | ||
| " @property\n", | ||
| " def _name(self) -> str: # public name for the registry\n", | ||
| " return \"AnisotropicVelocity3DDepth\"\n", | ||
| "\n", | ||
| " @property\n", | ||
| " def _default_variable_name(self) -> str: # public name for the registry\n", | ||
| " return \"velocity\"\n", | ||
| "\n", | ||
| " def _load_dataset_attributes(self) -> dict:\n", | ||
| " return {\"surveyType\": \"3D\", \"gatherType\": \"line\"}\n", | ||
| "\n", | ||
| " def _add_variables(self) -> None:\n", | ||
| " \"\"\"Add the variables including default and extra.\"\"\"\n", | ||
| " for name in [\"velocity\", \"epsilon\", \"delta\"]:\n", | ||
| " chunk_grid = RegularChunkGrid(configuration=RegularChunkShape(chunk_shape=self.full_chunk_shape))\n", | ||
| " unit = self.get_unit_by_key(name)\n", | ||
| " self._builder.add_variable(\n", | ||
| " name=name,\n", | ||
| " dimensions=self._dim_names,\n", | ||
| " data_type=ScalarType.FLOAT32,\n", | ||
| " compressor=compressors.Blosc(cname=compressors.BloscCname.zstd),\n", | ||
| " coordinates=self.physical_coordinate_names,\n", | ||
| " metadata=VariableMetadata(chunk_grid=chunk_grid, units_v1=unit),\n", | ||
| " )\n", | ||
| "\n", | ||
| "\n", | ||
| "AnisotropicVelocityTemplate()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "15e61310ed0ffd97", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Registering the Custom Template\n", | ||
| "\n", | ||
| "The registry returns a deep copy of the template on every fetch. To make the template discoverable by name, register it first, then retrieve it with `get_template`." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "a4e1847b20da6768", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from mdio.builder.template_registry import register_template\n", | ||
| "\n", | ||
| "register_template(AnisotropicVelocityTemplate())\n", | ||
| "print(\"Registered:\", \"AnisotropicVelocity3DDepth\" in list_templates())\n", | ||
| "\n", | ||
| "custom_template = get_template(\"AnisotropicVelocity3DDepth\")\n", | ||
| "custom_template" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "83b0772f1913c652", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "You can also set units at any time. For this demo we’ll set metric units. The spatial units will be inferred from the SEG-Y binary header during ingestion, but we can override them here. Ingestion will honor what is in the template." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "d7dca50d72d2f93", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from mdio.builder.schemas.v1.units import LengthUnitModel\n", | ||
| "from mdio.builder.schemas.v1.units import SpeedUnitModel\n", | ||
| "\n", | ||
| "custom_template.add_units(\n", | ||
| " {\n", | ||
| " \"depth\": LengthUnitModel(length=\"m\"),\n", | ||
| " \"cdp_x\": LengthUnitModel(length=\"m\"),\n", | ||
| " \"cdp_y\": LengthUnitModel(length=\"m\"),\n", | ||
| " \"velocity\": SpeedUnitModel(speed=\"m/s\"),\n", | ||
| " }\n", | ||
| ")\n", | ||
| "custom_template" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "367ade9824e72bc3", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Changing chunk size (chunks) on an existing template\n", | ||
| "\n", | ||
| "Often you will want to tweak the chunking strategy for performance. You can do this in two ways:\n", | ||
| "\n", | ||
| "- When defining a subclass, set a default in the constructor (e.g., `self._var_chunk_shape = (...)`).\n", | ||
| "- On an existing template instance, assign to the `full_chunk_shape` property once you know your final\n", | ||
| " dataset sizes (the tuple length must match the number of data dimensions).\n", | ||
| "\n", | ||
| "Below is a tiny demo showing how to modify the chunk shape on a fetched template. We first build the\n", | ||
| "template with known sizes to satisfy validation, then update `full_chunk_shape`.\n", | ||
| "\n", | ||
| "```{note}\n", | ||
| "In the SEG-Y to MDIO conversion workflow, MDIO infers the final grid shape from the SEG-Y headers. It’s\n", | ||
| "common to set or adjust `full_chunk_shape` right before calling `segy_to_mdio`, using the same sizes\n", | ||
| "you expect for the final array.\n", | ||
| "```" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "75939231b58c204a", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "mdio_ds = custom_template.build_dataset(name=\"demo-only\", sizes=(300, 500, 1001))\n", | ||
| "# pick smaller chunks than the full array for better parallelism and IO\n", | ||
| "custom_template.full_chunk_shape = (64, 64, 64)\n", | ||
| "print(\"Chunk shape set to:\", custom_template.full_chunk_shape)\n", | ||
| "\n", | ||
| "custom_template" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "a76f17cdf235de13", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Making Dummy Xarray Dataset\n", | ||
| "\n", | ||
| "We can now take the MDIO Dataset model and convert it to Xarray with our configuration. If ingesting from SEG-Y, this step\n", | ||
| "gets executed automatically by the converter before populating the data.\n", | ||
| "\n", | ||
| "Note that the whole dataset will be populated with the fill values." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "ce3dcf9c7946ea07", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from mdio.builder.xarray_builder import to_xarray_dataset\n", | ||
| "\n", | ||
| "to_xarray_dataset(mdio_ds)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "fc05aa3c81f8465c", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Recap: Key APIs Used\n", | ||
| "\n", | ||
| "- Template registry helpers: `get_template_registry`, `list_templates`, `register_template`, `get_template`\n", | ||
| "- Base template to subclass: `AbstractDatasetTemplate`\n", | ||
| "- Make Xarray Dataset from MDIO Data Model: `to_xarray_dataset`\n", | ||
| "\n", | ||
| "With these pieces, you can standardize how your seismic data is represented in MDIO and keep ingestion code concise and repeatable.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "a15848ab5c0811d6", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "mystnb": { | ||
| "execution_mode": "force" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,4 +13,5 @@ creation | |
| compression | ||
| rechunking | ||
| corrupt_files | ||
| custom_template | ||
| ``` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the docs page theme is set to "auto" the html repr in
Making Dummy Xarray Datasetsection is very difficult to read if system is set to dark. The repr is responsive to explicit modes "dark" and "light".Auto
Light
Dark