Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow apply_mask to handle multi-channel Sv dataset #1010

Merged
merged 17 commits into from
Mar 25, 2023

Conversation

leewujung
Copy link
Member

@leewujung leewujung commented Mar 25, 2023

This PR generalizes the apply_mask implementation so that it can accept multi-channel Sv datasets.

Main changes include:

  • restrict source_ds[var_name] to must have dimensions ('ping_time', 'range_sample')
  • restrict each entry in mask to must have dimensions ('ping_time', 'range_sample')
  • move checks of source_ds to a new private function _validate_source_ds
    _check_var_name_fill_value to return fill_value with sanitized dimensions
  • fix tests in utils/test_processinglevels_integration.py::test_raw_to_mvbs

@leewujung leewujung added this to the 0.7.0 milestone Mar 25, 2023
@leewujung leewujung requested a review from emiliom March 25, 2023 01:19
@codecov-commenter
Copy link

codecov-commenter commented Mar 25, 2023

Codecov Report

Merging #1010 (a0b62c2) into dev (fe29fdd) will decrease coverage by 13.35%.
The diff coverage is 88.88%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##              dev    #1010       +/-   ##
===========================================
- Coverage   80.40%   67.06%   -13.35%     
===========================================
  Files          67        7       -60     
  Lines        6013      589     -5424     
===========================================
- Hits         4835      395     -4440     
+ Misses       1178      194      -984     
Flag Coverage Δ
unittests 67.06% <88.88%> (-13.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
echopype/mask/api.py 86.18% <88.88%> (-0.43%) ⬇️

... and 65 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

echopype/mask/api.py Outdated Show resolved Hide resolved
echopype/mask/api.py Outdated Show resolved Hide resolved
@emiliom
Copy link
Collaborator

emiliom commented Mar 25, 2023

I haven't checked the changes thoroughly, though I did look at most of it. Looks good.

I also ran a local, manual test passing the Sv variable (with channel dimension) to apply_mask, and it worked; but that's no surprise, since you were already testing for that in the change you made to test_processinglevels_integration.py.

I tested the previous behavior, passing a single-channel Sv variable with channel squeezed out. That led to an error, but the apply_mask docstring says it should work. I've pasted most of the stack at the end. I didn't have a chance to figure out why it's failing. I can try again late tonight.

Thanks!

File /usr/mayorgadat/workmain/acoustics/gh/OSOceanAcoustics/echopype/echopype/mask/api.py:336, in apply_mask(source_ds, mask, var_name, fill_value, storage_options_ds, storage_options_mask)
    332     final_mask = np.array([final_mask.data] * source_ds["channel"].size)
    334 # Apply the mask to var_name
    335 # Somehow keep_attrs=True errors out here, so will attach later
--> 336 var_name_masked = xr.where(final_mask, x=source_ds[var_name], y=fill_value)
    338 # Obtain a shallow copy of source_ds
    339 output_ds = source_ds.copy(deep=False)

File ~/miniconda3/envs/echopype_dev/lib/python3.9/site-packages/xarray/core/computation.py:1815, in where(cond, x, y, keep_attrs)
   1812     keep_attrs = lambda attrs, context: attrs[1]
   1814 # alignment for three arguments is complicated, so don't support it yet
-> 1815 return apply_ufunc(
   1816     duck_array_ops.where,
   1817     cond,
   1818     x,
   1819     y,
   1820     join="exact",
   1821     dataset_join="exact",
   1822     dask="allowed",
   1823     keep_attrs=keep_attrs,
   1824 )

File ~/miniconda3/envs/echopype_dev/lib/python3.9/site-packages/xarray/core/computation.py:1159, in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args)
   1157 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
   1158 elif any(isinstance(a, DataArray) for a in args):
-> 1159     return apply_dataarray_vfunc(
   1160         variables_vfunc,
   1161         *args,
   1162         signature=signature,
   1163         join=join,
   1164         exclude_dims=exclude_dims,
   1165         keep_attrs=keep_attrs,
   1166     )
   1167 # feed Variables directly through apply_variable_ufunc
   1168 elif any(isinstance(a, Variable) for a in args):

File ~/miniconda3/envs/echopype_dev/lib/python3.9/site-packages/xarray/core/computation.py:286, in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args)
    281 result_coords = build_output_coords(
    282     args, signature, exclude_dims, combine_attrs=keep_attrs
    283 )
    285 data_vars = [getattr(a, "variable", a) for a in args]
--> 286 result_var = func(*data_vars)
    288 if signature.num_outputs > 1:
    289     out = tuple(
    290         DataArray(variable, coords, name=name, fastpath=True)
    291         for variable, coords in zip(result_var, result_coords)
    292     )

File ~/miniconda3/envs/echopype_dev/lib/python3.9/site-packages/xarray/core/computation.py:752, in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
    750 data = as_compatible_data(data)
    751 if data.ndim != len(dims):
--> 752     raise ValueError(
    753         "applied function returned data with unexpected "
    754         f"number of dimensions. Received {data.ndim} dimension(s) but "
    755         f"expected {len(dims)} dimensions with names: {dims!r}"
    756     )
    758 var = Variable(dims, data, fastpath=True)
    759 for dim, new_size in var.sizes.items():

ValueError: applied function returned data with unexpected number of dimensions. Received 3 dimension(s) but expected 2 dimensions with names: ('ping_time', 'range_sample')

@leewujung
Copy link
Member Author

I fixed something but the error message when I tried was different from what you pasted. Maybe you could try again and investigate if it still fails?

@leewujung
Copy link
Member Author

I am not sure why the tests fail. Those tests passed locally on my machine...

@emiliom
Copy link
Collaborator

emiliom commented Mar 25, 2023

I fixed something but the error message when I tried was different from what you pasted. Maybe you could try again and investigate if it still fails?

I reran my manual test (a notebook) with the latest commits in your PR. The error on appy_mask remained the same.

I am not sure why the tests fail. Those tests passed locally on my machine...

I also ran test_mask.py. I get the same 4 errors with test_validate_and_collect_mask_input in the CI:

FAILED echopype/tests/mask/test_mask.py::test_validate_and_collect_mask_input[mask_list_da_single_storage] - ValueError: All masks must have dimensions ('ping_time', 'range_sample')!
FAILED echopype/tests/mask/test_mask.py::test_validate_and_collect_mask_input[mask_list_da_list_storage] - ValueError: All masks must have dimensions ('ping_time', 'range_sample')!
FAILED echopype/tests/mask/test_mask.py::test_validate_and_collect_mask_input[mask_list_str_path] - ValueError: All masks must have dimensions ('ping_time', 'range_sample')!
FAILED echopype/tests/mask/test_mask.py::test_validate_and_collect_mask_input[mask_mixed_da_str_pathlib] - ValueError: All masks must have dimensions ('ping_time', 'range_sample')!

Just as in the CI, the other test_validate_and_collect_mask_input tests pass locally.

For what it's worth, the tests that fail define the mask_np parameter as a list of np arrays while the ones that pass define it as a single np array. The error happens in line 428:

mask_out = _validate_and_collect_mask_input(mask=mask, storage_options_mask=storage_options_mask)

Ok, I think I found the problem with the test failures (but not with my manual-test failure)! In test_mask.py, lines 422-423:

# create coordinates that will be used by all DataArrays created
    coords = {"channel": ("channel", chan_vals, {"long_name": "channel name"}),
"x": np.arange(n), "y": np.arange(n)}

Change "x" to "ping_time" and "y" to "range_sample". That fixes it. I'll push a commit to make the change.

@emiliom
Copy link
Collaborator

emiliom commented Mar 25, 2023

Woo-hoo, all tests passed! Including the windows-utils tests 😅 .

I don't know where that leaves my "manual" failure when passing a single-channel Sv (after creating the dataarray like this Sv_ds["Sv_ch0"] = Sv_ds["Sv"].isel(channel=0), then passing var_name="Sv_ch0"). I've run out of steam to try track down why that's failing.

@leewujung
Copy link
Member Author

Great, thanks! The x, y -> ping_time, range_sample was something I had to do in get_mock_source_ds_apply_mask to make test_mask work. Not sure why the tests all passed at that time...

I'll look into the remaining bug. Probably require some clean up instead of the patching mode I was in yesterday.

@leewujung leewujung merged commit 5ea1f8b into OSOceanAcoustics:dev Mar 25, 2023
@leewujung leewujung deleted the mask-fix branch July 21, 2024 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants