Skip to content

Conversation

dmitriyrepin
Copy link
Contributor

No description provided.

@tasansal tasansal added enhancement New feature or request v1 labels Jul 17, 2025
@tasansal tasansal changed the title Templates and Template Registry MDIO v1 Templates and Template Registry Jul 17, 2025
@tasansal tasansal linked an issue Jul 17, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jul 17, 2025

Codecov Report

Attention: Patch coverage is 98.41849% with 13 lines in your changes missing coverage. Please review.

Project coverage is 91.94%. Comparing base (d08e2c4) to head (80dd234).
Report is 5 commits behind head on v1.

Files with missing lines Patch % Lines
tests/unit/v1/templates/test_template_registry.py 95.51% 8 Missing and 3 partials ⚠️
src/mdio/schemas/v1/templates/template_registry.py 97.61% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               v1     #573      +/-   ##
==========================================
+ Coverage   90.60%   91.94%   +1.34%     
==========================================
  Files          72       84      +12     
  Lines        3948     4767     +819     
  Branches      278      301      +23     
==========================================
+ Hits         3577     4383     +806     
- Misses        304      312       +8     
- Partials       67       72       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tasansal tasansal linked an issue Jul 18, 2025 that may be closed by this pull request
@tasansal tasansal moved this to In progress in mdio-python 1.0.0 release Jul 18, 2025
Copy link
Collaborator

@tasansal tasansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dmitriyrepin it looks great now. But as before, I think the tests are too granular and we are adding a lot of test code to maintain. Can we reduce the tests to hit 100% with higher level tests?

@BrianMichell
Copy link
Collaborator

LGTM.

I think we can look to add a new task to reduce all test code granularity towards the end of the 1.0.0 release window.

@dmitriyrepin
Copy link
Contributor Author

dmitriyrepin commented Jul 21, 2025

Hi @dmitriyrepin it looks great now. But as before, I think the tests are too granular and we are adding a lot of test code to maintain. Can we reduce the tests to hit 100% with higher level tests?

@tasansal, there 2.5 ways we can take to achieve this:

  • 1 ) Reduce the number of asserts.
    For example,

    • in test_buid_dataset_time() we can check only the number of the variables created and do not check the structure of the created variables.
    • For templates, we can keep test_build_dataset_depth() and drop test_build_dataset_time(), since they are executing almost identical functionality.
  • 2 ) Do not tests some of the functionality we do not expect to use.
    For example, we could

    • remove tests for parameter validation
    • remove tests for the edge cases, e.g., test_get_nonexistent_template()
    • remove all multi-threaded tests for template_registry.
  • 2.5) We can combine a few separate unit tests into a single one:
    For example, in test_template_registry.py we can combine into a single test.

    • test_register_template
    • test_register_duplicate_template

Please let me know which way you want to proceed.

@tasansal
Copy link
Collaborator

All of the above :) but i agree with Brian, let's just push this through, and we can do test refactor later.

@tasansal tasansal merged commit 5878e97 into TGSAI:v1 Jul 22, 2025
10 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in mdio-python 1.0.0 release Jul 22, 2025
tasansal added a commit that referenced this pull request Sep 8, 2025
* update project metadata and deps

* update project metadata and deps

* add schemas

* Relocate quickstart notebook to tutorials directory

* update docs dependencies

* add new docs

* remove incorrect exclude

* remove duplicate doc directive

* fix creation notebook

* Add basic unit test for v1 dataset schema validation

* update lockfile

* fix broken creation nb

* update lockfile

* lint v1 files

* update lock file

* schema_v1-dataset_builder-add_dimension

* V1 schema review (#553)

* Update from list to discrete values for coordinate metadata

* Add docs to help users understand difference

* Update docs and fix case sensitivity.

* Linting

* Add CoordianteMetadata to docs

* First take on add_dimension(), add_coordinate(), add_variable()

* Finished add_dimension, add_coordinate, add_variable

* Work on build

* Generalize _to_dictionary()

* build

* [v1] Update dependencies to latest (#567)

* Update dependencies to latest versions

* Update linter type-checking code to 'TC' in pyproject.toml

https://astral.sh/blog/ruff-v0.8.0#new-error-codes-for-flake8-type-checking-rules

* Refactor: Move Zarr codec imports to top-level

* disable safety in CI (temporary)

* Refactor: Replace Zarr codec imports with numcodecs equivalents

* Refactor: Remove unused numcodecs imports and related methods

* pin zarr due to zarr 3.0.9 bug

* Dataset Build - pass one

* unpin zarr because breaking bug fixed (#569)

* Revert .container changes

* PR review: remove DEVELOPER_NOTES.md

* PR Review: add_coordinate() should accept only data_type: ScalarType

* PR review: add_variable() data_type remove default

* RE review: do not add dimension variable

* PR Review: get api version from the package version

* PR Review: remove add_dimension_coordinate

* PR Review: add_coordinate() remove data_type default value

* PR Review: improve unit tests by extracting common functionality in validate* functions

* Remove the Dockerfile changes. They are not supposed to be a part of this PR

* PR Review: run ruff

* PR Review: fix pre-commit errors

* remove some noqa overrides

* Implement MDIO Dataset builder to create in-memory instance of schemas.v1.dataset.Dataset (#568)

* schema_v1-dataset_builder-add_dimension

* First take on add_dimension(), add_coordinate(), add_variable()

* Finished add_dimension, add_coordinate, add_variable

* Work on build

* Generalize _to_dictionary()

* build

* Dataset Build - pass one

* Revert .container changes

* PR review: remove DEVELOPER_NOTES.md

* PR Review: add_coordinate() should accept only data_type: ScalarType

* PR review: add_variable() data_type remove default

* RE review: do not add dimension variable

* PR Review: get api version from the package version

* PR Review: remove add_dimension_coordinate

* PR Review: add_coordinate() remove data_type default value

* PR Review: improve unit tests by extracting common functionality in validate* functions

* Remove the Dockerfile changes. They are not supposed to be a part of this PR

* PR Review: run ruff

* PR Review: fix pre-commit errors

* remove some noqa overrides

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* Writing XArray / Zarr

* gitignore

* to_zarr() fix compression

* Fix precommit issues

* Use only make_campos_3d_acceptance_dataset

* PR Review: address the review comments

* Update _get_fill_value for StructuredType

* Fix fill type issue for the Structured Types

* Improve code coverage

* Fix spelling

* Revert "Fix spelling"

This reverts commit 0447659.

* extend per-file ignores for PLR2004 and remove noqa overrides in specific tests

* Refactor tests: clarify Zarr-related test names, fix type hints, and clean unused `# noqa` comments.

* MDIO v1 Templates and Template Registry (#573)

* Templates and TemplateRegistry

* Fix pre-commit issues

* Rever dev container changes

* PR Review: address issues

* PR Review: register default templates at registry initialization

* update deps

* address issues with VS Code dev containers (see issue 559) (#576)

* Templates and TemplateRegistry

* Fix pre-commit issues

* Rever dev container changes

* PR Review: address issues

* PR Review: register default templates at registry initialization

* Dockerfile.dev

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* segy_to_mdio_v1 (#577)

* Templates and TemplateRegistry

* Fix pre-commit issues

* Rever dev container changes

* PR Review: address issues

* PR Review: register default templates at registry initialization

* Dockerfile.dev

* segy_to_mdio_v1

* Clean up

* Prototype review notes

* Add dev comment

* Add notes that will be deleted later

* segy_to_mdio_v1 pass 1

* indexing_v1 and blocked_io_v1

* Remove DEV notes

* Clean up

* Document bug location

* Work around IndexError

* Clean temporary code

* More clean up

* Remove *_1 infrastructure files

* Restore accidently removed dask.array

* Created an issue reproducer

* Make the required template properties public

* Simplify type converter

* Improve templates

* Move test_type_converter.py

* Move test_type_converter.py

* Revert to use the original grid

* Integrate segy_to_mdio_v1_customized, fix indexing

* Add dimension coordinates in tem,plates

* Write statistics to Zarr

* Delete factory_v1.py

* Complete integrationtest. Fix coordinates

* Fir pre-commit errors

* PR review: fix trace_worker docstring

* Review: address some of the issue

* Fix bug

* dding todo for sum squares calculation

* Refactor ChunkIterator

* Refactor ChunkIterator into ChunkIteratorV1

* Remove segy_to_mdio_v1_customized, dataset_serializer.to_zarr

* Add support for trace headers without using _FillValue

* Use StorageLocation in trace_worker_v1

* Fix statistics attribute name

* PR review changes

* PR Improvements: do a single write

* TODO:  chunked write for non-dimensional coordinates and trace_mask

* Update StorageLocation to use fsspec

* Reformat with pre-commit

* Use domain name in get_grid_plan

* Fix non-dim coords and chunk_samples=False

* Convert test_3d_import_v1 to V1

* Fix test_meta_dataset_read

* remove whitespace

* clean up comments

* update deps in lockfile

* simplify dim and non-dim coordinate handling after scan

* remove compatibility tests

* add filtering capability to header worker

* accept subset filter to pass to workers

* make v1 grid planner awesome

* double to single underscores in test names

* fix broken test harnesses due to incorrect Sequence import

* clean up dev comment

* clean up whitespace

* use new module name

* clean up segy_to_mdio_v1

* fix whitespace and remove unnecessary list call

* these are defined as float64 in template

Previous check was passing due to an error in assignment during creation of coordinate variables

* fix missing dimension coordinate for vertical axis

* fix incorrect dtype comparison for time variable

* simplify and fix critical bugs

* rename v1 out of things and get rid of old code

* remove fixed todo

* remove more v1 from names

* rename chunk iterator

* fix dimensionality in tests due to new (missing) vertical dimension coordinate

* add todo for numpy ingestion

* fix references to non-v1 naming

* extract grid operations to its own function

* fix typo

Co-authored-by: Brian Michell <brianm314@comcast.net>

* add todo for simplifying storage location

* Remove no_fill_var_names, add domain var to Seismic3DPreStackShotTemplate

* Part 2 of the previous commit

* pre-commit formatting

* remove dev mount

---------

Co-authored-by: Dmitriy Repin <dmitriy_repin@epam.com>
Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>
Co-authored-by: Brian Michell <brianm314@comcast.net>

* Make some integration tests for work with new `segy_to_mdio` (#599)

* Fix integration import tests

* Fix integration import tests

* mask_and_scale=False

* PR Review

* pre-commit

* PR Review issues

* add todo for headers

* update line length limit to 120 in pyproject.toml

* compact nested code for improved readability in validation tests

* compact coordinate and dimension name definitions in 2D/3D prestack shot templates

* refactor names in header validation in SEG-Y export tests

* remove v1 suffix

* compact code by merging multi-line blocks into single lines where possible

* bump prettier to v3.1.0 and remove prettier-plugin-toml

* update lock file

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* remove developer tests

* Serialize text and binary headers (#600)

* Fix integration import tests

* mask_and_scale=False

* pre-commit

* PR Review issues

* serialize-text-and-binary-headers

* remove dev test data

* add back whitespace

* revert import changes

* fix attribute initialization in `_add_text_binary_headers`

* Add tests

* refactor: improve type annotations and docstrings in test utilities

* fix formatting

* remove redundant `str()` casting in `xr.open_dataset` calls

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* shot_point (#602)

* Add template: Offset + Azimuth binned CDP gathers (COCA) (#605)

* update helper to support structured types in variable validation

* add Seismic3DPreStackCocaTemplate and corresponding unit tests

* register Seismic3DPreStackCocaTemplate in template registry

* reorganize template registrations in template_registry and remove depth ones from shots.

* use registered templates instead of listing them all by hand.

* simplify template instantiation in unit tests

* fix default templates and add missing ones

* refactor default template assertions using shared constant

* Eager memory allocation fix (#609)

* Implement fixes to ensure lazy allocation of data arrays on serialization

* Avoid unnecessary copies of data in memory

* Linting

* Eliminate immediate overwrite of `data` bug

* Remove unused import

* Set appropriate fill value for lazy arrays

* Clean up header value handler

* Resolve data serialization issues

* Ensure all encodings are captured

* Simplify dataset coordinate population logic by removing unused imports and redundant variable handling

* Refactor `_workers.py` to streamline variable handling, replace manual Variable creation with direct assignment, and resolve redundant imports.

* make better use of grid

* fix type hint

* make better use of grid

* fix(regression): make dataset serialization less eager

* update zarr

* remove comment

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* Fix memory and core utilization regressions

* Export functionality for MDIO v1 ingested files (#611)

* Export part 1

* Enable header value validation

* Revert the test names back

* Remove Endianness, new_chunks API args and traceDomain,

* PR review

* lint

* create/use new api location and lint

* allow configuring opener chunks

* clarify xarray open parameters

* fix regression of not-opening with native dask re-chunking

* fix regression of not-opening with native dask re-chunking

* make export rechunker work with named dimension sizes and chunks

* make StorageLocation available at library level and update mdio to segy example

* pre-open with zarr backend and simplify dataset slicing after lazy loading

* better opener docs

* more explicit xarray selection

* rename trace variable name to default variable name

* remove the guard for setting storage options to empty dictionary. new zarr is ok with None.

* update lockfile

* fix broken tests and inconsistent type hints

* clean up comments

* clarify binary header scaling

* make test names clearer

* fix broken unit tests due to storage_options handling

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* v1 implementation of AutoChannelWrap grid override (#632)

* AutoChannelWrap over updated-v1

* Fix test

* rename function for new behaviour and improve type hint for grid_overrides

* simplify metadata handling

* lint

* gridOverride is not required

* remove unnecessary byte order change, handled upstream.

* remove rtol adds, tests pass.

* remove expected behaviour comment

* clean up tests

* use grouped assignments to fix PLR915

* add comments to clarify

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* Move to Zarr v3 as default for on disk storage format (#630)

* remove all zarr v2 refs and fix fill_value attributes

* fix codec initialization for zarr3

* use correct kwargs for compressor definition

* fix fill value for structs

* fix numpy imports

* fix creation logic

* make numpy import namespace

* ensure fill value is correct for structured arrays

* fill value all fields

* remove legacy test for bug in v2

* fix codec related issues and warning spamming

* use UPath instead of StorageLocation and remove all v0 stuff

* undo warning suppression for now

* remove v0 dataset schema

* make immutable metadata tuples, performance optimizations. consistent code styling as well

- remove old zarr APIs
- Ensure grid attrs (map and live) get compressed properly.
- move grid_map slicing to worker from main process
-

* fix cloud i/o issue (#637)

* snake-case to camelCase (#638)

* Fix output URI handling for remote stores (#639)

* fix output uri handling for remote stores

* switch from `as_uri` to `as_posix` for compatibility with xarray

* allow legacy v2 support (#640)

* Reorganize code and simplify schemas and logic everywhere (#642)

* reorg and simplify

* fix comparison of stats

* fix regression in dataset attribute serialization

* ensure histogram alias is compared correctly

* update docs references

* fix broken refs

* remove top level metadata ref

* remove blosc config refs (we now get from zarr)

* delete removed stats metadata wrapper

* update deps and remove safety

- reason for removal: pyupio/safety#673

* fix numpy rng lint errors

* exclude lower level members

* remove singleton from template registry title

* make template registry api ref with autodoc

* First pass review and alignment of templates (#643)

* rename things to be more sensible and add angle gathers configuration to PreStackCdp templates.

- add missing 2d test

* align shot data template with prod

* fix tests for 3d pre-stack shot

* remove deleted attribute (processingStage)

* rename gatherType for coca

* lint and fix 1 bug

* rename gather -> ensemble or raw field data

* add missing 2d shot

* fix docstrings

* fix wrong validation namings

* Fix ingestion of coordinates without full dimensions (#644)

* fix correct ingestion for coordinates that don't share all dimensions

* add todo for verification of reduced dimensions

* Disable unimplemented tests (#647)

* add todo markers for disabled tests.

* set coverage minimum to 85% due to disabled tests

* remove todo, it has correct behaviour, also rename .build_dataset `header` to `header_dtype` for clarity (#648)

* set version to 1.0.0

* unpin hardcoded version from tests

---------

Co-authored-by: Dmitriy Repin <drepin@hotmail.com>
Co-authored-by: Brian Michell <brianm314@comcast.net>
Co-authored-by: Dima From Texas <34629861+dmitriyrepin@users.noreply.github.com>
Co-authored-by: Dmitriy Repin <dmitriy_repin@epam.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Concrete Implementation of MDIO Seismic Templates MDIO Schema Template Registry

3 participants