-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf segmentation filter #1067
Perf segmentation filter #1067
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1067 +/- ##
==========================================
- Coverage 96.20% 96.14% -0.07%
==========================================
Files 74 74
Lines 5849 5861 +12
Branches 1044 1042 -2
==========================================
+ Hits 5627 5635 +8
- Misses 132 135 +3
- Partials 90 91 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR, especially optimizing the runtimes of cleanlab.segmentation.filter.find_label_issues
and cleanlab.count. num_label_issues
! ⚡
I'm concerned about the increased memory usage reported in your tests. Since we only tested on a small number of images (6, if I understand your code correctly), could we check how the memory usage scales with more images and classes? We need to ensure our changes work well even for larger datasets.
If we can find a way to reduce the memory increase or understand its impact better, I'd be more inclined to merge this PR. Could you also profile the specific parts of the code affected by these changes to get a clearer picture of where the memory usage increases and if there are opportunities for optimization?
Your improvement with the get_unique_classes
call is particularly valuable, and I'm keen to get that part integrated soon.
Looking forward to your thoughts and any further optimizations you might suggest.
@@ -1458,7 +1455,7 @@ def get_confident_thresholds( | |||
# this approach is that there will be no standard value returned for missing classes. | |||
labels = labels_to_array(labels) | |||
all_classes = range(pred_probs.shape[1]) | |||
unique_classes = get_unique_classes(labels) | |||
unique_classes = get_unique_classes(labels, multi_label=multi_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only code change in this file; everything else is just sorting the import statements.
I think this is a smart move 👍
labels.reshape((num_image, h // factor, factor, w // factor, factor)).mean((4, 2)) | ||
) | ||
small_pred_probs = pred_probs.reshape( | ||
(num_image, num_classes, h // factor, factor, w // factor, factor) | ||
).mean((5, 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Hi, thank you for your review, you are absolutely right regarding the memory consumption. With larger datasets (more images and more classes) the memory usage was about 35% higher with this PR compared to the previous version. The issue was that we were allocating new very large arrays to do the mask operation at once. This gave me a great idea to lower memory further while maintaining the speed improvements. I have just pushed some changes now to make the operations in batches using the batch_size parameter and the memory consumption was lower (because we were now allocating arrays in batches, instead of storing large arrays in memory) while still running way faster. I have changed the setup code for the benchmark a little bit to make it more clear now and get a larger input dataset: import numpy as np
from cleanlab.segmentation.filter import find_label_issues
SIZE = 250
NUM_IMAGES = 1000
NUM_CLASSES = 10
np.random.seed(0)
%load_ext memory_profiler
def generate_image_dataset():
labels = np.random.randint(NUM_CLASSES, size=(NUM_IMAGES, SIZE, SIZE), dtype=int)
pred_probs = np.random.random((NUM_IMAGES, NUM_CLASSES, SIZE, SIZE))
return labels, pred_probs
# Create input data
labels, pred_probs = generate_image_dataset() Current version %%timeit
%memit find_label_issues(labels, pred_probs, n_jobs=1, verbose=False)
# peak memory: 10161.57 MiB, increment: 4629.44 MiB
# peak memory: 10162.77 MiB, increment: 4643.84 MiB
# peak memory: 10161.71 MiB, increment: 4642.77 MiB
# peak memory: 10161.88 MiB, increment: 4642.95 MiB
# peak memory: 10162.95 MiB, increment: 4643.82 MiB
# peak memory: 10161.89 MiB, increment: 4642.76 MiB
# peak memory: 10161.89 MiB, increment: 4642.75 MiB
# peak memory: 10162.96 MiB, increment: 4643.82 MiB
# 2min 6s ± 1.06 s per loop (mean ± std. dev. of 7 runs, 1 loop each) %%timeit
%memit find_label_issues(labels, pred_probs, downsample=5, n_jobs=1, verbose=False)
# peak memory: 6694.65 MiB, increment: 1163.38 MiB
# peak memory: 6710.26 MiB, increment: 1182.33 MiB
# peak memory: 6710.26 MiB, increment: 1182.33 MiB
# peak memory: 6710.26 MiB, increment: 1182.33 MiB
# peak memory: 6710.26 MiB, increment: 1182.33 MiB
# peak memory: 6710.26 MiB, increment: 1182.33 MiB
# peak memory: 6710.26 MiB, increment: 1182.33 MiB
# peak memory: 6710.26 MiB, increment: 1182.33 MiB
# 1min 11s ± 402 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) This PR %%timeit
%memit find_label_issues(labels, pred_probs, n_jobs=1, verbose=False)
# peak memory: 8846.55 MiB, increment: 3326.38 MiB
# peak memory: 8847.80 MiB, increment: 3347.29 MiB
# peak memory: 8848.02 MiB, increment: 3346.53 MiB
# peak memory: 8847.03 MiB, increment: 3345.55 MiB
# peak memory: 8848.19 MiB, increment: 3346.53 MiB
# peak memory: 8847.04 MiB, increment: 3345.38 MiB
# peak memory: 8848.20 MiB, increment: 3346.53 MiB
# peak memory: 8847.04 MiB, increment: 3345.37 MiB
# 32.9 s ± 355 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %%timeit
%memit find_label_issues(labels, pred_probs, downsample=5, n_jobs=1, verbose=False)
# peak memory: 5899.00 MiB, increment: 397.33 MiB
# peak memory: 5898.74 MiB, increment: 364.73 MiB
# peak memory: 5908.86 MiB, increment: 412.89 MiB
# peak memory: 5898.74 MiB, increment: 402.76 MiB
# peak memory: 5898.76 MiB, increment: 402.78 MiB
# peak memory: 5898.63 MiB, increment: 402.64 MiB
# peak memory: 5898.76 MiB, increment: 402.76 MiB
# peak memory: 5898.63 MiB, increment: 402.63 MiB
# 11 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM @gogetron!
Had to make sure that the outputs of find_label_issues
before and after the PR are identical, which seems to be the case!
This is a great speedup and memory improvement. Nice work!
Failing CI is unrelated. Should already be addressed on master. |
Summary
This PR partially addresses #862
[ ✏️ Write your summary here. ]
After profiling, it seems that the loops was the slowest part. In addition, the inference in the get_unique_classes is very slow when multi_label is False because of the isinstance calls. The loops were converted to numpy operations. At the point we call that function we already know that multi_label is False, then we can pass this parameter to avoid inference.
For memory I used the memory-profiler library. The code I used for benchmarking is copied below. In addition I sorted the imports in the modified files.
Code Setup
Current version
This PR
Testing
References
Reviewer Notes