Update compress_pulse to Parallelize Convolution #1208

anantmittal · 2023-11-01T03:24:28Z

Explores #1164

for more information, see https://pre-commit.ci

codecov-commenter · 2023-11-01T03:28:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (63547fc) 83.54% compared to head (ec44c38) 88.91%.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1208      +/-   ##
==========================================
+ Coverage   83.54%   88.91%   +5.36%     
==========================================
  Files          64       11      -53     
  Lines        5675      974    -4701     
==========================================
- Hits         4741      866    -3875     
+ Misses        934      108     -826

Flag	Coverage Δ
unittests	`88.91% <100.00%> (+5.36%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

echopype/calibrate/ek80_complex.py

…to convolution-dask

lsetiawan · 2023-11-07T17:42:07Z

This current PR seem to make a difference in terms of parallelization. @leewujung would you be able to go through the changes and see if this is what you're looking for for this case? Thank you!

leewujung

@anantmittal: Thanks for the PR! The change here does make it possible to perform convolution on a dask array, which was previously not allowed, so that's great.

I tested this using a not very large dataset (test_data/ek80/D20170912-T234910.raw, ~95 MB), and found that by forcing the backscatter_r/i variables to be dask arrays, the parallelization didn't seem to consistently improve the speed compare to straight in-memory computation. But this is probably not a fair comparison since the data is small. Were you able to try this on a much larger mock data (e.g. randomly generate backscatter_chan) to see what the difference may be?

echopype/tests/calibrate/test_calibrate_ek80.py

lsetiawan · 2023-12-15T23:42:17Z

Benchmarking test

I did some simple benchmarking test with a test file in the ek80/D20170912-T234910.raw. The test does the following:

I read in this file and then duplicate the data across additional ping times, which expands the data from 271.2 MB to 5.3 GB.
From here, I chunk the data across ping time and save to a file for consistency
I read in the data back and run this PR's compress_pulse function that have been modified and looks like when run in parallel with dask, we do see some speed up compared to not.

Results

W/O Dask the run took 58.1 s
With Dask the run took 24.3 s

Looking at the dask dashboard, the computation happens in parallel

For reproducibility, you can see my run in the following gist: https://gist.github.com/lsetiawan/20c09c3eaa76508a741e31ff28a83402

lsetiawan · 2023-12-15T23:50:40Z

I tested this using a not very large dataset (test_data/ek80/D20170912-T234910.raw, ~95 MB), and found that by forcing the backscatter_r/i variables to be dask arrays, the parallelization didn't seem to consistently improve the speed compare to straight in-memory computation.

For a small dataset, due to the dask overhead for creating task graph, it's probably better to load things in memory first then do the computation. When I did try this with the original data, sure enough a dask array is slower than just in memory computation.

The approach suggested in this PR works for both in-memory and dask array, which is really powerful since when the data is big, we can parallelize the convolution! This I think satisfy the issue in #1164. Further optimization can probably happen, but it's a first start 😄

lsetiawan · 2024-01-26T17:58:14Z

@leewujung is this good to go?

leewujung · 2024-02-25T20:40:40Z

Yep, I think this is good now. Thanks for doing the benchmarking tests!

* Bump actions/cache from 3.0.11 to 3.2.0 (#18) Bumps [actions/cache](https://github.com/actions/cache) from 3.0.11 to 3.2.0. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](actions/cache@v3.0.11...v3.2.0) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update compress_pulse to Parallelize Convolution (#1208) Update compress_pulse to Parallelize Convolution. This addresses #1164. * Optimize `harmonize_env_param_time` (#1235) * Remove extra elses * Ensure harmonize_env_param_time works with dask arrays * Remove unused import * Optimize Frequency Differencing with Dask (#1198) * Update frequency_differencing method: add dask optimizations if Sv data is dask array * Update test_frequency_differencing function and incorporate test cases for dask * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix ci error * Update echopype/mask/api.py Co-authored-by: Don Setiawan <landungs@uw.edu> * Update echopype/mask/api.py: assign attrs out of if-else * Update echopype/mask/api.py: remove unused code * refactor: Update dask chunks iter to use xr.map_blocks * refactor: Cleanup code and make more explicit * Add better comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix pre-commit ci errors * refactor: Update algorithm to use map_blocks Update the looping algorithm to use dask array map_block instead. This would remove direct slicing ambiguity and prepare for future xarray map_block usage. * refactor: Modify to have no hardcoded channel slicing * refactor: Assign Sv data array --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Don Setiawan <landungs@uw.edu> * Optimize get vend cal params power (#1285) * Optimize get vend cal params power (#2) * Refactor function to avoid expand_dims and sortby if required * test: increase co-ordinates to 1000 for ping_time dimension * style: upper-casing variables * style: upper-casing variables * style: revert --------- Co-authored-by: Anant Mittal <anmittal@cs.washington.edu> * test: update test_get_interp_da() to test on 200 time_coordinates (#3) * test: refactor test data (#4) * test: refactor test data (#5) * test: refactor test data * test: refactor test data --------- Co-authored-by: Anant Mittal <anmittal@cs.washington.edu> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Anant Mittal <anmittal@cs.washington.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Don Setiawan <landungs@uw.edu> Co-authored-by: anujsinha3 <jazzy.a30@gmail.com>

Update compress_pulse to parallelize convolution

63bea20

anantmittal requested a review from lsetiawan November 1, 2023 03:24

[pre-commit.ci] auto fixes from pre-commit.com hooks

6a49e43

for more information, see https://pre-commit.ci

anantmittal self-assigned this Nov 1, 2023

anantmittal changed the title ~~Update compress_pulse to parallelize convolution~~ Update compress_pulse to Parallelize Convolution Nov 1, 2023

lsetiawan reviewed Nov 1, 2023

View reviewed changes

echopype/calibrate/ek80_complex.py Outdated Show resolved Hide resolved

anantmittal added 5 commits November 1, 2023 16:08

Update compress_pulse for it to work with dask arrays as input

11f16b2

Merge branch 'convolution-dask' of github.com:anantmittal/echopype in…

05c2222

…to convolution-dask

Clean compress pulse

c8ff9af

Update tests in test_calibrate_ek.py

57a492e

Remove unused imports

28ca082

lsetiawan linked an issue Nov 3, 2023 that may be closed by this pull request

Does signal.convolve use dask under the hood? #1164

Closed

lsetiawan approved these changes Nov 7, 2023

View reviewed changes

lsetiawan requested a review from leewujung November 7, 2023 17:37

leewujung added this to the 0.8.3 milestone Nov 20, 2023

leewujung reviewed Nov 24, 2023

View reviewed changes

echopype/tests/calibrate/test_calibrate_ek80.py Outdated Show resolved Hide resolved

lsetiawan added 2 commits December 15, 2023 13:51

Merge branch 'dev' into convolution-dask

c147399

fix: Update destination_path to use_swap

ec44c38

leewujung approved these changes Feb 25, 2024

View reviewed changes

leewujung merged commit 97c66a8 into OSOceanAcoustics:dev Feb 25, 2024
5 checks passed

leewujung mentioned this pull request Feb 25, 2024

Does signal.convolve use dask under the hood? #1164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update compress_pulse to Parallelize Convolution #1208

Update compress_pulse to Parallelize Convolution #1208

anantmittal commented Nov 1, 2023

codecov-commenter commented Nov 1, 2023 •

edited

Loading

lsetiawan commented Nov 7, 2023

leewujung left a comment

lsetiawan commented Dec 15, 2023

lsetiawan commented Dec 15, 2023

lsetiawan commented Jan 26, 2024

leewujung commented Feb 25, 2024

Update compress_pulse to Parallelize Convolution #1208

Update compress_pulse to Parallelize Convolution #1208

Conversation

anantmittal commented Nov 1, 2023

codecov-commenter commented Nov 1, 2023 • edited Loading

Codecov Report

lsetiawan commented Nov 7, 2023

leewujung left a comment

Choose a reason for hiding this comment

lsetiawan commented Dec 15, 2023

Benchmarking test

Results

lsetiawan commented Dec 15, 2023

lsetiawan commented Jan 26, 2024

leewujung commented Feb 25, 2024

codecov-commenter commented Nov 1, 2023 •

edited

Loading