-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt reader mviri_l1b_fiduceo_nc #2802
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature! I'm no expert here, so problably @sfinkens needs to have a look, but I wrote some comments already :)
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabling chunking is risky... why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the first comments!
Concerning the chunks, it throws a ValueError at the moment: "ValueError: This function cannot handle duplicate dimensions, but dimensions {'srf_size'} appear more than once on this object's dims: ('srf_size', 'srf_size')"
It could be related to this post: pydata/xarray#8579
Any other suggestions than disabling it are very welcome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that discussion they propose a workaround:
ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2")
ds.chunk(mychunks)
I tried that here: sfinkens@82eb8cd Let me know what you think!
I also updated the tests to trigger that situation:
- Add a new variable with identical dimension names
- Write a fake file to disk
- Read it back in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback! As already discussed, I will include your proposed changes. Concerning the tests, I am currently working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfinkens I submitted some changes and the tests are mostly ok now. The test "test_reassign_coords(self)" still fails - as I am not familiar with the Mock Object Library, it would be great if you could have a look on it - thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll check after lunch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sfinkens and I did some pair programming: Calling time.astype("datetime64[s]").astype("datetime64[ns]") is not properly working because a float input is needed. We propose to open the dataset with decode_cs=False and afterwards decode time and other variables separatly to properly take care of time FillValues and offsets.
sfinkens added a test for the Interpolator.
The test "test_reassign_coords() is still failing because the new functionalities in DatasetWrapper() are not yet considered (mocking should not be of importance here). Maybe we should call the assign_coords() method directly instead of DatasetWrapper().
Also, a test for DatasetWrapper should be included to test the separate encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is working again, DatasetWrapper is implicitly tested within the other tests, so no separate test is needed.
Co-authored-by: Martin Raspaud <martin.raspaud@smhi.se>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @bkremmli! Thanks for updating the tests, so that the problem is triggered.
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that discussion they propose a workaround:
ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2")
ds.chunk(mychunks)
I tried that here: sfinkens@82eb8cd Let me know what you think!
I also updated the tests to trigger that situation:
- Add a new variable with identical dimension names
- Write a fake file to disk
- Read it back in
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.
Also I noticed that space pixels now have some finite values (instead of NaN), because |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2802 +/- ##
==========================================
- Coverage 95.95% 95.90% -0.06%
==========================================
Files 379 366 -13
Lines 53888 53524 -364
==========================================
- Hits 51708 51330 -378
- Misses 2180 2194 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…ead_error Conflicts: satpy/readers/mviri_l1b_fiduceo_nc.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, almost there!
I think in the tests you need to write the dataset to disk and read it back in. Otherwise the dask error with duplicate dimensions is not triggered.
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll check after lunch!
The test for the duplicate dimensions still needs to be added. |
small fix to DataWrapper._decode_cf()
adds support for filenames of MVIRI FCDR L1.5 release 2
|
||
|
||
class TestDatasetWrapper: | ||
"""Unit tests for DatasetWrapper class.""" | ||
|
||
def test_fix_duplicate_dimensions(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two concerns regarding this test:
- It requires
DatasetWrapper
to provide adecode_nc
option, which is only used for testing (as far as I can see). - It is testing private methods, which should be an implementation detail. This has the disadvantage, that refactoring often breaks the tests. I would recommend testing only public methods.
I would suggest writing the fake datasets to disk and reading them back in. Here's an example: sfinkens@82eb8cd#diff-4f79d977c353c958043ee3ccc23eeba478d28f73886784cefa8eb556a3782595
@@ -455,10 +452,40 @@ def is_high_resol(resolution): | |||
class DatasetWrapper: | |||
"""Helper class for accessing the dataset.""" | |||
|
|||
def __init__(self, nc): | |||
def __init__(self, nc, decode_nc=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this option is only used for testing. I think there's a better solution (see below)
time = self.get_time() | ||
time_dims = self.nc[time.name].dims | ||
time = xr.where(time == time.attrs["_FillValue"], np.datetime64("NaT"), | ||
(time + time.attrs["add_offset"]).astype("datetime64[s]").astype("datetime64[ns]")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extract the decoding part into a separate method, e.g. _decode_time
|
||
def _chunk(self, nc): | ||
|
||
(chunk_size_y, chunk_size_x) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment here, that accessing the encoding doesn't load the data into memory? (Martin just raised the concern that this loads the bitmask array into memory. We tested and it doesn't)
This PR fixes the mviri_l1b_fiduceo_nc reader when being used with a new xarray version (2024.3.0). When using the original reader, a ValueError about not being able to decode the times is thrown. The file is now opened without decoding. The decoding is now done in DatasetWrapper()._decode_cf(). The time is decoded separatly from the other data values. FillValues for the time are recognized to replace time values with NaT and time is decoded using the offset values included within the attributes.
Also, opening the dataset using chunks is deactivated because the input files contains dimensions of the same name which cannot be processed by xarray at the moment. The chunking as well as renaming the dimensions is also performed in DatsetWrapper().
AUTHORS.md
if not there already