Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check to see if this masking xr.where parallelize #1165

Closed
lsetiawan opened this issue Sep 13, 2023 · 4 comments · Fixed by #1198
Closed

Check to see if this masking xr.where parallelize #1165

lsetiawan opened this issue Sep 13, 2023 · 4 comments · Fixed by #1198
Assignees
Milestone

Comments

@lsetiawan
Copy link
Member

lsetiawan commented Sep 13, 2023

This xr.where needs to be investigated to see if it can parallelize with the use of dask.

da = xr.where(str2ops[operator](lhs, diff), True, False)

@anantmittal
Copy link
Contributor

anantmittal commented Oct 3, 2023

Command to run the tests and capture the stdout:
python .ci_helpers/run-test.py --local --pytest-args="-vvrP" echopype/mask/api.py

pytest -vvrP echopype/tests/mask/test_mask.py::test_frequency_differencing

@leewujung
Copy link
Member

@lsetiawan : the code you pointed to above was actually not the one we were discussing 😬

Below are the 2 potential places where the calibration code (compute_Sv) code can have bottleneck:

  • in get_vend_cal_params_power where a broadcasting is probably used to get a data variable into the "right" shape based on indexing:

    # Get param dataarray into correct shape
    da_param = (
    vend[param]
    .expand_dims(dim={"ping_time": idxmin["ping_time"]}) # expand dims for direct indexing
    .sortby(idxmin.channel) # sortby in case channel sequence differs in vend and beam
    )

  • in harmonize_env_param_time where there is either a check for all the timestamps or an interpolation:

    # If there's only 1 time1 value,
    # or if after dropping NaN there's only 1 time1 value
    if p["time1"].size == 1 or p.dropna(dim="time1").size == 1:
    return p.dropna(dim="time1").squeeze(dim="time1").drop("time1")
    # Direct assignment if all timestamps are identical (EK60 data)
    elif np.all(p["time1"].values == ping_time.values):
    return p.rename({"time1": "ping_time"})
    elif ping_time is None:
    raise ValueError(f"ping_time needs to be provided for interpolating {p.name}")
    else:
    return p.dropna(dim="time1").interp(time1=ping_time)

@leewujung
Copy link
Member

@anantmittal : Thanks for investigating the xr.where in the masking code though -- since the change is to use dask.where, the solution would probably be similarly useful for the get_vend_cal_params_power case above.

@lsetiawan lsetiawan linked a pull request Oct 24, 2023 that will close this issue
@lsetiawan lsetiawan removed their assignment Oct 26, 2023
@leewujung leewujung added this to the 0.8.2 milestone Oct 29, 2023
@leewujung
Copy link
Member

I'll close this now, since my comment above is replicated in #1200.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants