Skip to content

Conversation

dmitriyrepin
Copy link
Contributor

image

@tasansal tasansal added the v1 label Sep 5, 2025
@tasansal tasansal changed the title AutoChannelWrap over updated-v1 v1 grid overrides: AutoChannelWrap Sep 5, 2025
@tasansal tasansal changed the title v1 grid overrides: AutoChannelWrap v1 implementation of AutoChannelWrap grid override Sep 5, 2025
@tasansal tasansal merged commit db33c2a into TGSAI:v1 Sep 5, 2025
3 of 8 checks passed
@tasansal tasansal mentioned this pull request Sep 5, 2025
3 tasks
tasansal added a commit that referenced this pull request Sep 8, 2025
* update project metadata and deps

* update project metadata and deps

* add schemas

* Relocate quickstart notebook to tutorials directory

* update docs dependencies

* add new docs

* remove incorrect exclude

* remove duplicate doc directive

* fix creation notebook

* Add basic unit test for v1 dataset schema validation

* update lockfile

* fix broken creation nb

* update lockfile

* lint v1 files

* update lock file

* schema_v1-dataset_builder-add_dimension

* V1 schema review (#553)

* Update from list to discrete values for coordinate metadata

* Add docs to help users understand difference

* Update docs and fix case sensitivity.

* Linting

* Add CoordianteMetadata to docs

* First take on add_dimension(), add_coordinate(), add_variable()

* Finished add_dimension, add_coordinate, add_variable

* Work on build

* Generalize _to_dictionary()

* build

* [v1] Update dependencies to latest (#567)

* Update dependencies to latest versions

* Update linter type-checking code to 'TC' in pyproject.toml

https://astral.sh/blog/ruff-v0.8.0#new-error-codes-for-flake8-type-checking-rules

* Refactor: Move Zarr codec imports to top-level

* disable safety in CI (temporary)

* Refactor: Replace Zarr codec imports with numcodecs equivalents

* Refactor: Remove unused numcodecs imports and related methods

* pin zarr due to zarr 3.0.9 bug

* Dataset Build - pass one

* unpin zarr because breaking bug fixed (#569)

* Revert .container changes

* PR review: remove DEVELOPER_NOTES.md

* PR Review: add_coordinate() should accept only data_type: ScalarType

* PR review: add_variable() data_type remove default

* RE review: do not add dimension variable

* PR Review: get api version from the package version

* PR Review: remove add_dimension_coordinate

* PR Review: add_coordinate() remove data_type default value

* PR Review: improve unit tests by extracting common functionality in validate* functions

* Remove the Dockerfile changes. They are not supposed to be a part of this PR

* PR Review: run ruff

* PR Review: fix pre-commit errors

* remove some noqa overrides

* Implement MDIO Dataset builder to create in-memory instance of schemas.v1.dataset.Dataset (#568)

* schema_v1-dataset_builder-add_dimension

* First take on add_dimension(), add_coordinate(), add_variable()

* Finished add_dimension, add_coordinate, add_variable

* Work on build

* Generalize _to_dictionary()

* build

* Dataset Build - pass one

* Revert .container changes

* PR review: remove DEVELOPER_NOTES.md

* PR Review: add_coordinate() should accept only data_type: ScalarType

* PR review: add_variable() data_type remove default

* RE review: do not add dimension variable

* PR Review: get api version from the package version

* PR Review: remove add_dimension_coordinate

* PR Review: add_coordinate() remove data_type default value

* PR Review: improve unit tests by extracting common functionality in validate* functions

* Remove the Dockerfile changes. They are not supposed to be a part of this PR

* PR Review: run ruff

* PR Review: fix pre-commit errors

* remove some noqa overrides

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* Writing XArray / Zarr

* gitignore

* to_zarr() fix compression

* Fix precommit issues

* Use only make_campos_3d_acceptance_dataset

* PR Review: address the review comments

* Update _get_fill_value for StructuredType

* Fix fill type issue for the Structured Types

* Improve code coverage

* Fix spelling

* Revert "Fix spelling"

This reverts commit 0447659.

* extend per-file ignores for PLR2004 and remove noqa overrides in specific tests

* Refactor tests: clarify Zarr-related test names, fix type hints, and clean unused `# noqa` comments.

* MDIO v1 Templates and Template Registry (#573)

* Templates and TemplateRegistry

* Fix pre-commit issues

* Rever dev container changes

* PR Review: address issues

* PR Review: register default templates at registry initialization

* update deps

* address issues with VS Code dev containers (see issue 559) (#576)

* Templates and TemplateRegistry

* Fix pre-commit issues

* Rever dev container changes

* PR Review: address issues

* PR Review: register default templates at registry initialization

* Dockerfile.dev

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* segy_to_mdio_v1 (#577)

* Templates and TemplateRegistry

* Fix pre-commit issues

* Rever dev container changes

* PR Review: address issues

* PR Review: register default templates at registry initialization

* Dockerfile.dev

* segy_to_mdio_v1

* Clean up

* Prototype review notes

* Add dev comment

* Add notes that will be deleted later

* segy_to_mdio_v1 pass 1

* indexing_v1 and blocked_io_v1

* Remove DEV notes

* Clean up

* Document bug location

* Work around IndexError

* Clean temporary code

* More clean up

* Remove *_1 infrastructure files

* Restore accidently removed dask.array

* Created an issue reproducer

* Make the required template properties public

* Simplify type converter

* Improve templates

* Move test_type_converter.py

* Move test_type_converter.py

* Revert to use the original grid

* Integrate segy_to_mdio_v1_customized, fix indexing

* Add dimension coordinates in tem,plates

* Write statistics to Zarr

* Delete factory_v1.py

* Complete integrationtest. Fix coordinates

* Fir pre-commit errors

* PR review: fix trace_worker docstring

* Review: address some of the issue

* Fix bug

* dding todo for sum squares calculation

* Refactor ChunkIterator

* Refactor ChunkIterator into ChunkIteratorV1

* Remove segy_to_mdio_v1_customized, dataset_serializer.to_zarr

* Add support for trace headers without using _FillValue

* Use StorageLocation in trace_worker_v1

* Fix statistics attribute name

* PR review changes

* PR Improvements: do a single write

* TODO:  chunked write for non-dimensional coordinates and trace_mask

* Update StorageLocation to use fsspec

* Reformat with pre-commit

* Use domain name in get_grid_plan

* Fix non-dim coords and chunk_samples=False

* Convert test_3d_import_v1 to V1

* Fix test_meta_dataset_read

* remove whitespace

* clean up comments

* update deps in lockfile

* simplify dim and non-dim coordinate handling after scan

* remove compatibility tests

* add filtering capability to header worker

* accept subset filter to pass to workers

* make v1 grid planner awesome

* double to single underscores in test names

* fix broken test harnesses due to incorrect Sequence import

* clean up dev comment

* clean up whitespace

* use new module name

* clean up segy_to_mdio_v1

* fix whitespace and remove unnecessary list call

* these are defined as float64 in template

Previous check was passing due to an error in assignment during creation of coordinate variables

* fix missing dimension coordinate for vertical axis

* fix incorrect dtype comparison for time variable

* simplify and fix critical bugs

* rename v1 out of things and get rid of old code

* remove fixed todo

* remove more v1 from names

* rename chunk iterator

* fix dimensionality in tests due to new (missing) vertical dimension coordinate

* add todo for numpy ingestion

* fix references to non-v1 naming

* extract grid operations to its own function

* fix typo

Co-authored-by: Brian Michell <brianm314@comcast.net>

* add todo for simplifying storage location

* Remove no_fill_var_names, add domain var to Seismic3DPreStackShotTemplate

* Part 2 of the previous commit

* pre-commit formatting

* remove dev mount

---------

Co-authored-by: Dmitriy Repin <dmitriy_repin@epam.com>
Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>
Co-authored-by: Brian Michell <brianm314@comcast.net>

* Make some integration tests for work with new `segy_to_mdio` (#599)

* Fix integration import tests

* Fix integration import tests

* mask_and_scale=False

* PR Review

* pre-commit

* PR Review issues

* add todo for headers

* update line length limit to 120 in pyproject.toml

* compact nested code for improved readability in validation tests

* compact coordinate and dimension name definitions in 2D/3D prestack shot templates

* refactor names in header validation in SEG-Y export tests

* remove v1 suffix

* compact code by merging multi-line blocks into single lines where possible

* bump prettier to v3.1.0 and remove prettier-plugin-toml

* update lock file

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* remove developer tests

* Serialize text and binary headers (#600)

* Fix integration import tests

* mask_and_scale=False

* pre-commit

* PR Review issues

* serialize-text-and-binary-headers

* remove dev test data

* add back whitespace

* revert import changes

* fix attribute initialization in `_add_text_binary_headers`

* Add tests

* refactor: improve type annotations and docstrings in test utilities

* fix formatting

* remove redundant `str()` casting in `xr.open_dataset` calls

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* shot_point (#602)

* Add template: Offset + Azimuth binned CDP gathers (COCA) (#605)

* update helper to support structured types in variable validation

* add Seismic3DPreStackCocaTemplate and corresponding unit tests

* register Seismic3DPreStackCocaTemplate in template registry

* reorganize template registrations in template_registry and remove depth ones from shots.

* use registered templates instead of listing them all by hand.

* simplify template instantiation in unit tests

* fix default templates and add missing ones

* refactor default template assertions using shared constant

* Eager memory allocation fix (#609)

* Implement fixes to ensure lazy allocation of data arrays on serialization

* Avoid unnecessary copies of data in memory

* Linting

* Eliminate immediate overwrite of `data` bug

* Remove unused import

* Set appropriate fill value for lazy arrays

* Clean up header value handler

* Resolve data serialization issues

* Ensure all encodings are captured

* Simplify dataset coordinate population logic by removing unused imports and redundant variable handling

* Refactor `_workers.py` to streamline variable handling, replace manual Variable creation with direct assignment, and resolve redundant imports.

* make better use of grid

* fix type hint

* make better use of grid

* fix(regression): make dataset serialization less eager

* update zarr

* remove comment

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* Fix memory and core utilization regressions

* Export functionality for MDIO v1 ingested files (#611)

* Export part 1

* Enable header value validation

* Revert the test names back

* Remove Endianness, new_chunks API args and traceDomain,

* PR review

* lint

* create/use new api location and lint

* allow configuring opener chunks

* clarify xarray open parameters

* fix regression of not-opening with native dask re-chunking

* fix regression of not-opening with native dask re-chunking

* make export rechunker work with named dimension sizes and chunks

* make StorageLocation available at library level and update mdio to segy example

* pre-open with zarr backend and simplify dataset slicing after lazy loading

* better opener docs

* more explicit xarray selection

* rename trace variable name to default variable name

* remove the guard for setting storage options to empty dictionary. new zarr is ok with None.

* update lockfile

* fix broken tests and inconsistent type hints

* clean up comments

* clarify binary header scaling

* make test names clearer

* fix broken unit tests due to storage_options handling

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* v1 implementation of AutoChannelWrap grid override (#632)

* AutoChannelWrap over updated-v1

* Fix test

* rename function for new behaviour and improve type hint for grid_overrides

* simplify metadata handling

* lint

* gridOverride is not required

* remove unnecessary byte order change, handled upstream.

* remove rtol adds, tests pass.

* remove expected behaviour comment

* clean up tests

* use grouped assignments to fix PLR915

* add comments to clarify

---------

Co-authored-by: Altay Sansal <tasansal@users.noreply.github.com>

* Move to Zarr v3 as default for on disk storage format (#630)

* remove all zarr v2 refs and fix fill_value attributes

* fix codec initialization for zarr3

* use correct kwargs for compressor definition

* fix fill value for structs

* fix numpy imports

* fix creation logic

* make numpy import namespace

* ensure fill value is correct for structured arrays

* fill value all fields

* remove legacy test for bug in v2

* fix codec related issues and warning spamming

* use UPath instead of StorageLocation and remove all v0 stuff

* undo warning suppression for now

* remove v0 dataset schema

* make immutable metadata tuples, performance optimizations. consistent code styling as well

- remove old zarr APIs
- Ensure grid attrs (map and live) get compressed properly.
- move grid_map slicing to worker from main process
-

* fix cloud i/o issue (#637)

* snake-case to camelCase (#638)

* Fix output URI handling for remote stores (#639)

* fix output uri handling for remote stores

* switch from `as_uri` to `as_posix` for compatibility with xarray

* allow legacy v2 support (#640)

* Reorganize code and simplify schemas and logic everywhere (#642)

* reorg and simplify

* fix comparison of stats

* fix regression in dataset attribute serialization

* ensure histogram alias is compared correctly

* update docs references

* fix broken refs

* remove top level metadata ref

* remove blosc config refs (we now get from zarr)

* delete removed stats metadata wrapper

* update deps and remove safety

- reason for removal: pyupio/safety#673

* fix numpy rng lint errors

* exclude lower level members

* remove singleton from template registry title

* make template registry api ref with autodoc

* First pass review and alignment of templates (#643)

* rename things to be more sensible and add angle gathers configuration to PreStackCdp templates.

- add missing 2d test

* align shot data template with prod

* fix tests for 3d pre-stack shot

* remove deleted attribute (processingStage)

* rename gatherType for coca

* lint and fix 1 bug

* rename gather -> ensemble or raw field data

* add missing 2d shot

* fix docstrings

* fix wrong validation namings

* Fix ingestion of coordinates without full dimensions (#644)

* fix correct ingestion for coordinates that don't share all dimensions

* add todo for verification of reduced dimensions

* Disable unimplemented tests (#647)

* add todo markers for disabled tests.

* set coverage minimum to 85% due to disabled tests

* remove todo, it has correct behaviour, also rename .build_dataset `header` to `header_dtype` for clarity (#648)

* set version to 1.0.0

* unpin hardcoded version from tests

---------

Co-authored-by: Dmitriy Repin <drepin@hotmail.com>
Co-authored-by: Brian Michell <brianm314@comcast.net>
Co-authored-by: Dima From Texas <34629861+dmitriyrepin@users.noreply.github.com>
Co-authored-by: Dmitriy Repin <dmitriy_repin@epam.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants