Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache flag not working? #477

Closed
cordmaur opened this issue Feb 21, 2022 · 5 comments
Closed

Cache flag not working? #477

cordmaur opened this issue Feb 21, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@cordmaur
Copy link
Contributor

I've wrote a simple routine just to change a nodata values for a series of rasters using rioxarray and noticed a strange behavior probably related to the way rioxarray caches data.

When reopening the raster with rioxarray the nodata value is not changed, but when I open with rasterio (or if I reset the kernel) I can verify the value has been update correctly on the file.

Here is a simple test:
image

@cordmaur cordmaur added the bug Something isn't working label Feb 21, 2022
@snowman2
Copy link
Member

snowman2 commented Feb 21, 2022

Thanks for this. Would you mind:

  • Pasting a code snippet as text instead of/in addition to a screenshot?
  • Providing the output from: python -c "import rioxarray; rioxarray.show_versions()"
  • Provide the example file from before you made changes? You should be able to zip up the file and upload it here if it is small enough.

@cordmaur
Copy link
Contributor Author

Hi @snowman2,
Here is the code and the sample file is attached.

import rasterio as rio
import rioxarray as xrio

src = './test.tif'
ds = xrio.open_rasterio(src, cache=False)
print(f'Nodata value: {ds.rio.nodata}')

# Changing nodata to 0
ds.rio.set_nodata(0)
print(f'Assigned nodata value: {ds.rio.nodata}')

# Overwriting file on disk
ds.rio.to_raster(src)

# Opening file in a new variable
ds2 = xrio.open_rasterio(src, cache=False)
print(f'Nodata value: {ds2.rio.nodata}')

# testing with rasterio
ds3 = rio.open(src)
print(f'Rio nodata value: {ds3.nodata}')

Result:

Nodata value: 255
Assigned nodata value: 0
Nodata value: 255
Rio nodata value: 0.0

ENV:

[rebellm@node527 nbs]$ python -c "import rioxarray; rioxarray.show_versions()"
rioxarray (0.7.1) deps:
  rasterio: 1.2.6
    xarray: 0.19.0
      GDAL: 3.2.1

Other python deps:
     scipy: 1.7.1
    pyproj: 3.1.0

System:
    python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)  [GCC 9.4.0]
executable: /softs/rh7/conda-envs/pangeo_stable/bin/python
   machine: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17

test.zip

@snowman2
Copy link
Member

I think I have figured out what is going on here:

The concept of cache in open_rasterio only pertains to the data loaded from disk and not the raster attributes. In this code here if you disable the cache, it just forces the data to be loaded from the disk each time instead of caching the data in memory.

By default, the files are opened using the xarray CachingFileManager. Though, you can disable that with lock=False. See more here. This is why you see the behavior you do as it looks up whether it has opened the file already or not based on the filepath. If it is already open, then it uses the handle it used previously. This is consistent with the behavior you are seeing.

@snowman2
Copy link
Member

As a workaround, I recommend not using the same filename you used to open the file with to write to disk.

@cordmaur
Copy link
Contributor Author

Thanks for the clarification. Will close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants