Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the memoryfile #265

Merged
merged 28 commits into from
Aug 25, 2022
Merged

Conversation

erikmannerfelt
Copy link
Contributor

@erikmannerfelt erikmannerfelt commented Jan 11, 2022

Minor edits.

Just kidding...

The memoryfile is gone! This means that the simplest explanation of a Raster is:

"A container of the transform object, CRS object, on-disk filename (if any), metadata dictionary and (if loaded) data ndarray of a raster."

The transform, CRS, metadata and data can be modified freely by the user, and all associated properties and attributes will (read "should") get updated accordingly.

Some advantages of this approach are:

  • Data are not duplicated as they potentially were before (data in the ndarray and in the memoryfile)
  • Operations are more transparent as they edit the ndarray, transform, CRS and/or metadata dict and nothing else (because there is nothing else!)
  • After I/O operations, the in-memory version of the raster is completely disconnected from the disk version until another I/O operation is made, meaning that it will not lock the entire file.
  • Easier to debug: The memoryfile was always a bit of a mystery!
  • Easier to completely switch backends: rio is now only used for reading, writing and some builtin functions that work on numpy arrays + transform + CRS objects. This means that rioxarray could take its place for I/O if we want.
  • Easier to pickle / serialize Rasters, as they are not bound to always-open I/O files

Related issues

Closes #50
Closes #226
Closes #238
Closes #246
Closes #259
Fixes the pickle part of #240 (json doesn't work yet)

Issues that I need to find out may be solved:

Maybe c loses #122 (EDIT: Nope, still an issue)
Maybe closes #159 (EDIT: It does work!!)
Probably closes #153 (EDIT: It seems to be gone!! I did some quick load 6.6Gb raster loops and I can't replicate the OOM crashes anymore).

But wait!

It's not finished...

There's one test that I need help with, probably from @adehecq or @atedstone.
tests/test_georaster.py::TestRaster::test_reproj fails, and I don't understand why...
I temporarily disabled it to make sure that everything else works fine, but we obviously have to find out why it's not working before merging!
Help!!

Copy link
Contributor

@rhugonnet rhugonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! 🥳
And impressive how little needed to be changed in the tests, in comparison to how much was modified in the Class!
Out of curiosity: did you try to check locally if everything worked fine in xdem as well?

Unfortunately I don't have so much time to get into the details of the test failing with the thesis writing, I leave that to @adehecq and @atedstone! And other than that it all looks very good to me 😄

EDIT: in general, a couple more comments here and there would help quite a bit! Especially when using functions that we hardly use everyday (even something like rio.transform.rowcol, but for more unusual stuff also!)

geoutils/geoviewer.py Outdated Show resolved Hide resolved
tests/test_georaster.py Outdated Show resolved Hide resolved
@erikmannerfelt
Copy link
Contributor Author

Okay! In summary, what is left on this PR is:

  1. The cropping test needs to be re-enabled and needs to work. I need help on this one!
  2. The count/nbands/indexes/etc mess needs to be cleared up.

After this, I think it's ready!

@erikmannerfelt
Copy link
Contributor Author

@adehecq, do you think I shall update geoutils to 0.0.6 directly in this PR?

Earlier, we have incremented versions in separate PRs or pushes directly to main. I don't really see the advantage of this, however, so I might as well change it here directly.

@adehecq
Copy link
Member

adehecq commented Jan 21, 2022

@adehecq, do you think I shall update geoutils to 0.0.6 directly in this PR?

Earlier, we have incremented versions in separate PRs or pushes directly to main. I don't really see the advantage of this, however, so I might as well change it here directly.

I would make it in a separate PR, just for clarity. And also because you need to include a summary of changes for the whole version, and you won't know that until this PR is closed 😉

@erikmannerfelt
Copy link
Contributor Author

@adehecq, do you think I shall update geoutils to 0.0.6 directly in this PR?
Earlier, we have incremented versions in separate PRs or pushes directly to main. I don't really see the advantage of this, however, so I might as well change it here directly.

I would make it in a separate PR, just for clarity. And also because you need to include a summary of changes for the whole version, and you won't know that until this PR is closed wink

The summary of changes is for tagging the commit that marks the version increment, so that comes later either way. But sure, I can wait with it.

@adehecq
Copy link
Member

adehecq commented May 18, 2022

Related to issue #272, @erikmannerfelt, with this PR, would we still have access to the rasterio dataset?
It seems like we won't.
Would it be possible to keep self.ds in case the raster was directly loaded from file. Of course, it won't work if the raster was modified afterward. This can be useful to perform some operations like cropping, reprojecting etc, without loading the data into memory...

@adehecq
Copy link
Member

adehecq commented Jul 7, 2022

Check that the latest rasterio release does not solve the memory issue?
https://rasterio.groups.io/g/main/topic/92199259, mostly #2141.

@adehecq
Copy link
Member

adehecq commented Aug 24, 2022

There's one test that I need help with, probably from @adehecq or @atedstone.
tests/test_georaster.py::TestRaster::test_reproj fails, and I don't understand why...
I temporarily disabled it to make sure that everything else works fine, but we obviously have to find out why it's not working before merging!
Help!!

I figured out why it failed. You updated the self.data property. Before, it used to just return self._data, but now you've added a condition to check that data was loaded:

        if not self.is_loaded and self._data is None:
            raise ValueError("Data are not loaded")

However, in several instances, we are running the test if self.data is not None, which raises an error in case the data is not loaded with your update.
I will try updating those cases.

@adehecq adehecq merged commit c30c121 into GlacioHack:main Aug 25, 2022
adehecq added a commit that referenced this pull request Sep 1, 2022
* Increment to 0.0.7

* Fix issue with crop and re-enable associated tests

* Fix error with Raster.crop for mode='match_extent'

* Considerably improve tests for Raster.crop

* Add overload to crop to avoid mypy complain

* Fix comment

* Fix issue with window indexes outside of self's shape

* Improve test for crop with Vector input

* Make mypy happy (but I'm quite unhappy)

* Move tests for Vector.crop2raster to test_geovector.py and improve tests

* Raises errors or warning of nodata are not set prior to reproject. Add associated tests.

* Add test associated with issue #285

* Fix issue with mask not taken into account in reproject

* Fix issue that arose in spatial_tools since crop update

* Linting

* Fix issue with test_docs.py

* Fix type error in tests

* Update CI to exclude 3.7 and test 3.10

* Add TODO statement

* Force fill_value to be equal to nodata when setting self.data

* Force python >= 3.8 in all requirements

* Fix issue with bad default ndv conflict and update tests

* Update to use subsample_raster instead of complex code

* Add comment and replace np.sum with np.count_nonzero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment