Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use real array for data of of small netCDF variables. #5229

Merged
merged 6 commits into from Apr 6, 2023

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented Apr 3, 2023

Successor to the experimental #5069.
Addresses #5053

Draft while I see what tests it may break.

@codecov
Copy link

codecov bot commented Apr 3, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 ⚠️

Comparison is base (6b25c3a) 89.28% compared to head (79d16d1) 89.27%.

❗ Current head 79d16d1 differs from pull request most recent head 3dabeca. Consider uploading reports for the commit 3dabeca to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5229      +/-   ##
==========================================
- Coverage   89.28%   89.27%   -0.01%     
==========================================
  Files          88       88              
  Lines       22269    22273       +4     
  Branches     4870     4875       +5     
==========================================
+ Hits        19882    19884       +2     
- Misses       1641     1642       +1     
- Partials      746      747       +1     
Impacted Files Coverage Δ
lib/iris/fileformats/netcdf/loader.py 80.37% <100.00%> (+0.37%) ⬆️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pp-mo pp-mo changed the title Small netCDF variable data is real. Use real array for data of of small netCDF variables. Apr 4, 2023
@bjlittle bjlittle requested a review from ESadek-MO April 4, 2023 09:11
@pp-mo pp-mo marked this pull request as ready for review April 4, 2023 10:05
@pp-mo
Copy link
Member Author

pp-mo commented Apr 4, 2023

@ESadek-MO I think this is good to go now.
Please take a look + see what you think.

N.B. there is basically no API and no "public view" of this, it's just a performance tweak.

On the other hand, it does affect some loads in a backwards-incompatible way.
@trexfeathers can we accept that the real/lazy status of data arrays is not something that we have to absolutely preserve,
or should this count as a breaking change ?

@trexfeathers
Copy link
Contributor

On the other hand, it does affect some loads in a backwards-incompatible way.
@trexfeathers can we accept that the real/lazy status of data arrays is not something that we have to absolutely preserve,
or should this count as a breaking change ?

I'm comfortable with this change. We encourage people to CHECK if their data is lazy or not, rather than to assume it is one way or the other.

Copy link
Contributor

@ESadek-MO ESadek-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on the whole, only one thing to address then I think you're good!

@ESadek-MO
Copy link
Contributor

Looks good, thanks @pp-mo!

@ESadek-MO ESadek-MO merged commit c74d783 into SciTools:main Apr 6, 2023
15 checks passed
tkknight added a commit to tkknight/iris that referenced this pull request Apr 13, 2023
* upstream/main: (59 commits)
  Updated environment lockfiles (SciTools#5211)
  update ci locks location (SciTools#5228)
  Fixes to _discontiguity_in_bounds (attempt 2) (SciTools#4975)
  Finalises Lazy Data documentation (SciTools#5137)
  Modernize and simplify iris.analysis._Groupby (SciTools#5015)
  clarity on whatsnew entry contributors (SciTools#5240)
  Handle derived coordinates correctly in `concatenate` (SciTools#5096)
  Use real array for data of of small netCDF variables. (SciTools#5229)
  Bump scitools/workflows from 2023.04.1 to 2023.04.2 (SciTools#5236)
  fixing whatsnew entry
  remove results creation commit from blame
  configure codecov
  adding a whatsnew entry
  Replacing numpy legacy printing with array2string and remaking results for dependent tests
  Adding a whatsnew entry for 5224 (SciTools#5234)
  Cf cell method (SciTools#5224)
  Bump scitools/workflows from 2023.03.3 to 2023.04.1 (SciTools#5231)
  [pre-commit.ci] pre-commit autoupdate (SciTools#5230)
  Bump scitools/workflows from 2023.03.2 to 2023.03.3 (SciTools#5227)
  raise dask min pin (SciTools#5225)
  ...
pp-mo added a commit to pp-mo/iris that referenced this pull request Apr 13, 2023
* Small netCDF variable data is real.

* Various test fixes.

* More test fixing.

* Fix printout in Mesh documentation.

* Whatsnew + doctests fix.

* Tweak whatsnew.
lbdreyer pushed a commit that referenced this pull request Apr 21, 2023
* Basic functional lazy saving.

* Simplify function signature which upsets Sphinx.

* Non-lazy saves return nothing.

* Now fixed to enable use with process/distributed scheduling.

* Remove dask.utils.SerializableLock, which I think was a mistake.

* Make DefferedSaveWrapper use _thread_safe_nc.

* Fixes for non-lazy save.

* Avoid saver error when no deferred writes.

* Reorganise locking code, ready for shareable locks.

* Remove optional usage of 'filelock' for lazy saves.

* Document dask-specific locking; implement differently for threads or distributed schedulers.

* Minor fix for unit-tests.

* Pin libnetcdf to avoid problems -- see #5187.

* Minor test fix.

* Move DeferredSaveWrapper into _thread_safe_nc; replicate the NetCDFDataProxy fix; use one lock per Saver; add extra up-scaled test

* Update lib/iris/fileformats/netcdf/saver.py

Co-authored-by: Bouwe Andela <bouweandela@gmail.com>

* Update lib/iris/fileformats/netcdf/_dask_locks.py

Co-authored-by: Bouwe Andela <bouweandela@gmail.com>

* Update lib/iris/fileformats/netcdf/saver.py

Co-authored-by: Bouwe Andela <bouweandela@gmail.com>

* Small rename + reformat.

* Remove Saver lazy option; all lazy saves are delayed; factor out fillvalue checks and make them delayable.

* Repurposed 'test__FillValueMaskCheckAndStoreTarget' to 'test__data_fillvalue_check', since old class is gone.

* Disable (temporary) saver debug printouts.

* Fix test problems; Saver automatically completes to preserve existing direct usage (which is public API).

* Fix docstring error.

* Fix spurious error in old saver test.

* Fix Saver docstring.

* More robust exit for NetCDFWriteProxy operation.

* Fix doctests by making the Saver example functional.

* Improve docstrings; unify terminology; simplify non-lazy save call.

* Moved netcdf cell-method handling into nc_load_rules.helpers, and various tests into more specific test folders.

* Fix lockfiles and Makefile process.

* Add unit tests for routine _fillvalue_report().

* Remove debug-only code.

* Added tests for what the save function does with the 'compute' keyword.

* Fix mock-specific problems, small tidy.

* Restructure hierarchy of tests.unit.fileformats.netcdf

* Tidy test docstrings.

* Correct test import.

* Avoid incorrect checking of byte data, and a numpy deprecation warning.

* Alter parameter names to make test reports clearer.

* Test basic behaviour of _lazy_stream_data; make 'Saver._delayed_writes' private.

* Add integration tests, and distributed dependency.

* Docstring fixes.

* Documentation section and whatsnew entry.

* Various fixes to whatsnew, docstrings and docs.

* Minor review changes, fix doctest.

* Arrange tests + results to organise by package-name alone.

* Review changes.

* Review changes.

* Enhance tests + debug.

* Support scheduler type 'single-threaded'; allow retries on delayed-save test.

* Improve test.

* Adding a whatsnew entry for 5224 (#5234)

* Adding a whatsnew entry explaining 5224

* Fixing link and format error

* Replacing numpy legacy printing with array2string and remaking results for dependent tests

* adding a whatsnew entry

* configure codecov

* remove results creation commit from blame

* fixing whatsnew entry

* Bump scitools/workflows from 2023.04.1 to 2023.04.2 (#5236)

Bumps [scitools/workflows](https://github.com/scitools/workflows) from 2023.04.1 to 2023.04.2.
- [Release notes](https://github.com/scitools/workflows/releases)
- [Commits](SciTools/workflows@2023.04.1...2023.04.2)

---
updated-dependencies:
- dependency-name: scitools/workflows
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Use real array for data of of small netCDF variables. (#5229)

* Small netCDF variable data is real.

* Various test fixes.

* More test fixing.

* Fix printout in Mesh documentation.

* Whatsnew + doctests fix.

* Tweak whatsnew.

* Handle derived coordinates correctly in `concatenate` (#5096)

* First working prototype of concatenate that handels derived coordinates correctly

* Added checks for derived coord metadata during concatenation

* Added tests

* Fixed defaults

* Added what's new entry

* Optimized test coverage

* clarity on whatsnew entry contributors (#5240)

* Modernize and simplify iris.analysis._Groupby (#5015)

* Modernize and simplify _Groupby

* Rename variable to improve readability

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

* Add a whatsnew entry

* Add a type hint to _add_shared_coord

* Add a test for iris.analysis._Groupby.__repr__

---------

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

* Finalises Lazy Data documentation (#5137)

* cube and io lazy data notes added

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added comments within analysis, as well as palette and iterate, and what's new

* fixed docstrings as requested in @trexfeathers review

* reverted cube.py for time being

* fixed flake8 issue

* Lazy data second batch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated lastest what'snew

* I almost hope this wasn't the fix, I'm such a moron

* adressed review changes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bill Little <bill.james.little@gmail.com>

* Fixes to _discontiguity_in_bounds (attempt 2) (#4975)

* update ci locks location (#5228)

* Updated environment lockfiles (#5211)

Co-authored-by: Lockfile bot <noreply@github.com>

* Increase retries.

* Change debug to show which elements failed.

* update cf standard units (#5244)

* update cf standard units

* added whatsnew entry

* Correct pull number

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

---------

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

* libnetcdf <4.9 pin (#5242)

* Pin libnetcdf<4.9 and update lock files.

* What's New entry.

* libnetcdf not available on PyPI.

* Fix for Pandas v2.0.

* Fix for Pandas v2.0.

* Avoid possible same-file crossover between tests.

* Ensure all-different testfiles; load all vars lazy.

* Revert changes to testing framework.

* Remove repeated line from requirements/py*.yml (?merge error), and re-fix lockfiles.

* Revert some more debug changes.

* Reorganise test for better code clarity.

* Use public 'Dataset.isopen()' instead of '._isopen'.

* Create output files in unique temporary directories.

* Tests for fileformats.netcdf._dask_locks.

* Fix attribution names.

* Fixed new py311 lockfile.

* Fix typos spotted by codespell.

* Add distributed test dep for python 3.11

* Fix lockfile for python 3.11

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Bouwe Andela <bouweandela@gmail.com>
Co-authored-by: Henry Wright <84939917+HGWright@users.noreply.github.com>
Co-authored-by: Henry Wright <henrywright@sky.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Manuel Schlund <32543114+schlunma@users.noreply.github.com>
Co-authored-by: Bill Little <bill.james.little@gmail.com>
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
Co-authored-by: Elias <110238618+ESadek-MO@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stephenworsley <49274989+stephenworsley@users.noreply.github.com>
Co-authored-by: scitools-ci[bot] <107775138+scitools-ci[bot]@users.noreply.github.com>
Co-authored-by: Lockfile bot <noreply@github.com>
@pp-mo pp-mo deleted the nc_real_small_coords branch April 28, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants