Adapt reader mviri_l1b_fiduceo_nc #2802

bkremmli · 2024-05-17T09:07:13Z

This PR fixes the mviri_l1b_fiduceo_nc reader when being used with a new xarray version (2024.3.0). When using the original reader, a ValueError about not being able to decode the times is thrown. The file is now opened without decoding. The decoding is now done in DatasetWrapper()._decode_cf(). The time is decoded separatly from the other data values. FillValues for the time are recognized to replace time values with NaT and time is decoded using the offset values included within the attributes.

Also, opening the dataset using chunks is deactivated because the input files contains dimensions of the same name which cannot be processed by xarray at the moment. The chunking as well as renaming the dimensions is also performed in DatsetWrapper().

[x ] Tests added
Fully documented --> this change should not be visible to users
[x ] Add your name to AUTHORS.md if not there already

…mes after import

…ead_error

mraspaud

Thanks for adding this feature! I'm no expert here, so problably @sfinkens needs to have a look, but I wrote some comments already :)

satpy/readers/mviri_l1b_fiduceo_nc.py

mraspaud · 2024-05-17T09:49:13Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


disabling chunking is risky... why is this necessary?

Thanks for the first comments!
Concerning the chunks, it throws a ValueError at the moment: "ValueError: This function cannot handle duplicate dimensions, but dimensions {'srf_size'} appear more than once on this object's dims: ('srf_size', 'srf_size')"

It could be related to this post: pydata/xarray#8579

Any other suggestions than disabling it are very welcome :)

In that discussion they propose a workaround:

ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2") ds.chunk(mychunks)

I tried that here: sfinkens@82eb8cd Let me know what you think!

I also updated the tests to trigger that situation:

Add a new variable with identical dimension names

Write a fake file to disk

Read it back in

There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.

Thank you for your feedback! As already discussed, I will include your proposed changes. Concerning the tests, I am currently working on it.

@sfinkens I submitted some changes and the tests are mostly ok now. The test "test_reassign_coords(self)" still fails - as I am not familiar with the Mock Object Library, it would be great if you could have a look on it - thank you!

Ok, I'll check after lunch!

sfinkens and I did some pair programming: Calling time.astype("datetime64[s]").astype("datetime64[ns]") is not properly working because a float input is needed. We propose to open the dataset with decode_cs=False and afterwards decode time and other variables separatly to properly take care of time FillValues and offsets.

sfinkens added a test for the Interpolator.

The test "test_reassign_coords() is still failing because the new functionalities in DatasetWrapper() are not yet considered (mocking should not be of importance here). Maybe we should call the assign_coords() method directly instead of DatasetWrapper().

Also, a test for DatasetWrapper should be included to test the separate encoding.

Test is working again, DatasetWrapper is implicitly tested within the other tests, so no separate test is needed.

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py

satpy/readers/mviri_l1b_fiduceo_nc.py

Co-authored-by: Martin Raspaud <martin.raspaud@smhi.se>

sfinkens

Nice work @bkremmli! Thanks for updating the tests, so that the problem is triggered.

sfinkens · 2024-05-17T15:34:46Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


In that discussion they propose a workaround:

ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2") ds.chunk(mychunks)

I tried that here: sfinkens@82eb8cd Let me know what you think!

I also updated the tests to trigger that situation:

Add a new variable with identical dimension names

Write a fake file to disk

Read it back in

sfinkens · 2024-05-17T15:38:27Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.

sfinkens · 2024-05-17T15:49:35Z

Also I noticed that space pixels now have some finite values (instead of NaN), because decode_cf=False. You can use decode_cf=True together with time.encoding["add_offset/_FillValue"], see sfinkens@7045a87

codecov · 2024-05-24T06:10:42Z

Codecov Report

Attention: Patch coverage is 94.91525% with 3 lines in your changes missing coverage. Please review.

Project coverage is 95.90%. Comparing base (f33c3e4) to head (73acfb7).
Report is 202 commits behind head on main.

Files	Patch %	Lines
...py/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py	94.59%	2 Missing ⚠️
satpy/readers/mviri_l1b_fiduceo_nc.py	95.45%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2802      +/-   ##
==========================================
- Coverage   95.95%   95.90%   -0.06%     
==========================================
  Files         379      366      -13     
  Lines       53888    53524     -364     
==========================================
- Hits        51708    51330     -378     
- Misses       2180     2194      +14

Flag	Coverage Δ
behaviourtests	`4.04% <0.00%> (-0.05%)`	⬇️
unittests	`95.99% <94.91%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ead_error Conflicts: satpy/readers/mviri_l1b_fiduceo_nc.py

sfinkens

Nice work, almost there!

I think in the tests you need to write the dataset to disk and read it back in. Otherwise the dask error with duplicate dimensions is not triggered.

satpy/readers/mviri_l1b_fiduceo_nc.py

sfinkens · 2024-05-24T12:01:58Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


Ok, I'll check after lunch!

satpy/readers/mviri_l1b_fiduceo_nc.py

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py

bkremmli · 2024-06-04T13:33:17Z

The test for the duplicate dimensions still needs to be added.

…rf_size"

small fix to DataWrapper._decode_cf()

adds support for filenames of MVIRI FCDR L1.5 release 2

sfinkens · 2024-06-12T07:27:25Z

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py



 class TestDatasetWrapper:
    """Unit tests for DatasetWrapper class."""

+    def test_fix_duplicate_dimensions(self):


I have two concerns regarding this test:

It requires DatasetWrapper to provide a decode_nc option, which is only used for testing (as far as I can see).

It is testing private methods, which should be an implementation detail. This has the disadvantage, that refactoring often breaks the tests. I would recommend testing only public methods.

I would suggest writing the fake datasets to disk and reading them back in. Here's an example: sfinkens@82eb8cd#diff-4f79d977c353c958043ee3ccc23eeba478d28f73886784cefa8eb556a3782595

sfinkens · 2024-06-12T07:28:17Z

satpy/readers/mviri_l1b_fiduceo_nc.py

@@ -455,10 +452,40 @@ def is_high_resol(resolution):
 class DatasetWrapper:
    """Helper class for accessing the dataset."""

-    def __init__(self, nc):
+    def __init__(self, nc, decode_nc=True):


Looks like this option is only used for testing. I think there's a better solution (see below)

sfinkens · 2024-06-12T07:31:38Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+        time = self.get_time()
+        time_dims = self.nc[time.name].dims
+        time = xr.where(time == time.attrs["_FillValue"], np.datetime64("NaT"),
+                        (time + time.attrs["add_offset"]).astype("datetime64[s]").astype("datetime64[ns]"))


Can you please extract the decoding part into a separate method, e.g. _decode_time

sfinkens · 2024-06-12T08:34:50Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+
+    def _chunk(self, nc):
+
+        (chunk_size_y, chunk_size_x) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]


Can you please add a comment here, that accessing the encoding doesn't load the data into memory? (Martin just raised the concern that this loads the bitmask array into memory. We tested and it doesn't)

bkremmli added 4 commits May 17, 2024 07:02

fix for file reading; includes removing chunk reading and decoding ti…

f27a91d

…mes after import

Merge branch 'read_error' of https://github.com/bkremmli/satpy into r…

a08d9bc

…ead_error

remove import datetime

2dbae1a

add bkremmli to AUTHORS.md

eb0382a

bkremmli requested review from djhoese and mraspaud as code owners May 17, 2024 09:07

correct for failures from hook id ruff

7d6608a

mraspaud requested changes May 17, 2024

View reviewed changes

mraspaud requested a review from sfinkens May 17, 2024 10:09

bkremmli and others added 2 commits May 17, 2024 10:18

minor adaptations from PR comments

ec22136

Update satpy/readers/mviri_l1b_fiduceo_nc.py

5beedea

Co-authored-by: Martin Raspaud <martin.raspaud@smhi.se>

sfinkens reviewed May 17, 2024

View reviewed changes

bkremmli added 2 commits May 24, 2024 08:02

perform chunking after open_dataset and use decode_cf = False

21679c6

Merge branch 'read_error' of https://github.com/bkremmli/satpy into r…

a7cb10d

…ead_error Conflicts: satpy/readers/mviri_l1b_fiduceo_nc.py

sfinkens reviewed May 24, 2024

View reviewed changes

decode times separatly from other variables, adds TestInterpolator

59880ce

sfinkens reviewed May 29, 2024

View reviewed changes

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py Outdated Show resolved Hide resolved

fixes _decode_cf() and tests

fb93f00

bkremmli and others added 4 commits June 5, 2024 13:37

adds test_fix_duplicate_dimensions and removes leftover dimensions "s…

951c9b0

…rf_size"

Update mviri_l1b_fiduceo_nc.py

a379a08

small fix to DataWrapper._decode_cf()

adds support for filenames of MVIRI FCDR L1.5 release 2

5ec0cf9

Merge pull request #1 from bkremmli/mviri_release2

73acfb7

adds support for filenames of MVIRI FCDR L1.5 release 2

sfinkens reviewed Jun 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt reader mviri_l1b_fiduceo_nc #2802

Adapt reader mviri_l1b_fiduceo_nc #2802

bkremmli commented May 17, 2024 •

edited

mraspaud left a comment

mraspaud May 17, 2024

bkremmli May 17, 2024

sfinkens May 17, 2024

sfinkens May 17, 2024

bkremmli May 23, 2024

bkremmli May 24, 2024

sfinkens May 24, 2024

bkremmli May 28, 2024

bkremmli Jun 4, 2024

sfinkens left a comment

sfinkens May 17, 2024

sfinkens May 17, 2024

sfinkens commented May 17, 2024

codecov bot commented May 24, 2024 •

edited

sfinkens left a comment

sfinkens May 24, 2024

bkremmli commented Jun 4, 2024

sfinkens Jun 12, 2024

sfinkens Jun 12, 2024

sfinkens Jun 12, 2024

sfinkens Jun 12, 2024


		def _chunk(self, nc):

		(chunk_size_y, chunk_size_x) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]

Adapt reader mviri_l1b_fiduceo_nc #2802

Are you sure you want to change the base?

Adapt reader mviri_l1b_fiduceo_nc #2802

Conversation

bkremmli commented May 17, 2024 • edited

mraspaud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfinkens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfinkens commented May 17, 2024

codecov bot commented May 24, 2024 • edited

Codecov Report

sfinkens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkremmli commented Jun 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkremmli commented May 17, 2024 •

edited

codecov bot commented May 24, 2024 •

edited