change binning aggregation from "mean" to "sum" by jcharkow · Pull Request #23 · OpenMS/pyopenms_viz

jcharkow · 2024-09-08T17:10:39Z

User description

binning aggregation by mean leads to strange trends when have annotations that the annotation peaks are much higher than everything else. Changing to sum leads to more realistic plot.

Binning by mean (old):

Binning by sum (new):

I also changed the peakmap but these are not tested.

PR Type

enhancement

Description

Changed the binning aggregation method from "mean" to "sum" to provide more realistic plots, especially when annotation peaks are significantly higher than other data points.
Improved code readability by updating method signatures and formatting.
Adjusted the logic for binning and peak annotations to align with the new aggregation method.

Changes walkthrough 📝

Relevant files

Enhancement

_core.py `Update binning aggregation from mean to sum in plots` pyopenms_viz/_core.py Changed binning aggregation method from "mean" to "sum" for intensity calculations. Updated method signatures and formatting for better readability. Adjusted logic to handle binning and peak annotations.	+37/-45

💡 PR-Agent usage:
Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

binning aggregation by mean leads to strange trends when have annotations that the annotation peaks are much higher than everything else. Changing to sum leads to more realistic plot

qodo-code-review · 2024-09-08T17:11:07Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review Potential Performance Impact Changing the binning aggregation method from "mean" to "sum" may lead to unexpected results or performance issues for large datasets. Code Consistency The binning method change is not consistently applied across all relevant parts of the code, which may lead to inconsistent behavior.

qodo-code-review · 2024-09-08T17:11:48Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions ✨

Category	Suggestion	Score
Best practice	Replace the parameter name 'cls' with 'self' in the method signature Consider using a more descriptive variable name instead of `cls` in the `plot` method signature. Since this is an instance method (not a class method), using `self` would be more appropriate and consistent with Python conventions. pyopenms_viz/_core.py [317-319] def plot( - cls, fig, data, x, y, by: str \| None = None, plot_3d: bool = False, kwargs + self, fig, data, x, y, by: str \| None = None, plot_3d: bool = False, kwargs ): Apply this suggestion Suggestion importance[1-10]: 9 Why: The suggestion correctly identifies a best practice issue by recommending the use of 'self' instead of 'cls' for an instance method, which improves code readability and adheres to Python conventions.	9
Enhancement	Replace multiple if-elif statements with a dictionary-based approach for more efficient conditional assignment Consider using a more efficient approach for conditional assignment. Instead of using multiple if-elif statements, you can use a dictionary to map the bin methods to their corresponding functions. pyopenms_viz/_core.py [600-605] -if self.bin_method == "sturges": - self.num_x_bins = sturges_rule(data, x) -elif self.bin_method == "freedman-diaconis": - self.num_x_bins = freedman_diaconis_rule(data, x) -elif self.bin_method == "none": - self.num_x_bins = num_x_bins +bin_methods = { + "sturges": lambda: sturges_rule(data, x), + "freedman-diaconis": lambda: freedman_diaconis_rule(data, x), + "none": lambda: num_x_bins +} +self.num_x_bins = bin_methods.get(self.bin_method, lambda: num_x_bins)() Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion provides a more efficient and maintainable way to handle conditional assignments using a dictionary, which simplifies the code and reduces potential errors in future modifications.	8
Enhancement	✅ Combine separate grouping operations into a single, more flexible grouping and aggregation step Suggestion Impact: The commit implemented a more flexible grouping and aggregation operation by using a variable to hold grouping columns and applying aggregation based on a parameterized method, aligning with the suggestion to consolidate separate operations. code diff: # Group by x, y and by columns and calculate the sum intensity within each bin data = ( data.groupby([x, y, by], observed=True) - .agg({z: "sum"}) + .agg({z: aggregation_method}) .reset_index() ) # Add by back to kwargs kwargs["by"] = by else: # Group by x and y bins and calculate the sum intensity within each bin - data = data.groupby([x, y], observed=True).agg({z: "sum"}).reset_index() + data = data.groupby([x, y], observed=True).agg({z: aggregation_method}).reset_index() Consider using a more concise and efficient approach for grouping and aggregating data. The current implementation uses separate code blocks for different conditions, which can be combined into a single, more flexible operation. pyopenms_viz/_core.py [909-920] +group_cols = [x, y] if by is not None: - # Group by x, y and by columns and calculate the sum intensity within each bin - data = ( - data.groupby([x, y, by], observed=True) - .agg({z: "sum"}) - .reset_index() - ) - # Add by back to kwargs + group_cols.append(by) kwargs["by"] = by -else: - # Group by x and y bins and calculate the sum intensity within each bin - data = data.groupby([x, y], observed=True).agg({z: "sum"}).reset_index() +data = data.groupby(group_cols, observed=True).agg({z: "sum"}).reset_index() Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion enhances code efficiency and readability by consolidating separate grouping operations into a single flexible step, reducing redundancy and potential errors.	8
Maintainability	Simplify the condition in the if statement for better readability Consider simplifying the condition in the if statement. The current condition `bin_peaks == True or (data.shape[0] > num_x_bins * num_y_bins and bin_peaks ==` `"auto")` can be simplified to improve readability. pyopenms_viz/_core.py [903-905] -if bin_peaks == True or ( - data.shape[0] > num_x_bins * num_y_bins and bin_peaks == "auto" -): +if bin_peaks is True or (bin_peaks == "auto" and data.shape[0] > num_x_bins * num_y_bins): Apply this suggestion Suggestion importance[1-10]: 7 Why: The suggestion improves code readability by simplifying the condition in the if statement, making it clearer and easier to understand without changing the logic.	7

jcharkow · 2024-09-09T15:20:09Z

Possibly should allow user to customize the aggregation method?

…trum_binning

singjc · 2024-09-09T23:44:07Z

I did a bit more digging and made some comparisons. I added a tolerance binning method, to bin mz's based on a fixed tolerance. I found the sturges and freedman binning methods sometimes don't work as well for either sparse or really dense spectrums. I also added an aggregation param to allow the user to aggregate by 'sum', 'mean' or 'max`. Based on my comparisons using the mz tolerance bining method (with tol = 1) and using max as the aggregation method seems to return spectra that closely matches the orignal raw spectra and is a lot faster for plotting (5.007686 seconds for raw vs 0.614527 seconds for max mz tol=1 binning)

Update:

I added two options for automating the computation of the tolerances for the mz tolerance bining method:

use the freedman bin width
use the 1 percentile of the non-zero differences of the mz values

This results in an even faster binning and plotting, with the binned spectrum still looking similar to the original

raw (5.91sec) > mz-tol-bin + tol=1 (0.83sec) > mz-tol-bin + tol=1pct-dif ( 0.09sec)

Testing with very sparse spectrum (from Spectrum.ipynb)

Testing with a very dense spectrum (from alphatims_tutorial.ipynb)

- add automative tolerance compute methods - use numpy where possible

timosachsenberg · 2024-09-13T10:18:09Z

nbs/alphatims_tutorial.ipynb

what is this nb about?

Sorry this is a rough notebook added by mistake to this PR. Will remove

It's not added by mistake. It's a tutorial notebook for showing an example of loading bruker tdf DIA and DDA data using alphatims and showcasing plotting with pyopenms_viz.

The notebook was updated and added to this PR to reflect the changes with the spectrum binning.

timosachsenberg · 2024-09-13T10:18:32Z

just for reference: I used max in the past for similar reasons.

change binning aggregation from "mean" to "sum"

6692ee7

binning aggregation by mean leads to strange trends when have annotations that the annotation peaks are much higher than everything else. Changing to sum leads to more realistic plot

jcharkow requested a review from singjc September 8, 2024 17:10

qodo-code-review bot added enhancement New feature or request Review effort [1-5]: 2 labels Sep 8, 2024

jcharkow added 2 commits September 8, 2024 13:12

update to nbs

0bc1697

update notebook

74efce1

singjc added 5 commits September 9, 2024 14:42

add: mz tolerance binning

19136bc

update: spectrum and full spectrum nbs

1116d25

update: full spectrum plotting nb

60065f1

Merge branch 'main' of github.com:OpenMS/pyopenms_viz into patch/spec…

b66a4e7

…trum_binning

update: full spectrum plotting nb

e09a097

singjc added 8 commits September 9, 2024 19:45

update: spectrum plot defaults

c6516b8

update: mz_tol_binning method

4ca7518

- add automative tolerance compute methods - use numpy where possible

update spectrum notebooks

9469d39

update: full spectrum plotting nb for main manuscript figure

f7e03a9

update spectrum notebook

92fbfdf

update: alphatims_tutorial notebook

2f1e3e3

fix: bug in bin-mz-tol with freedman tol

8e8d38a

update: alphatims_tutorial notebook

7262148

timosachsenberg reviewed Sep 13, 2024

View reviewed changes

add: aggregation_method param for PeakMap

03e956e

singjc approved these changes Sep 13, 2024

View reviewed changes

singjc merged commit 901e275 into OpenMS:main Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change binning aggregation from "mean" to "sum"#23

change binning aggregation from "mean" to "sum"#23
singjc merged 17 commits intoOpenMS:mainfrom
jcharkow:patch/spectrum_binning

jcharkow commented Sep 8, 2024 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Sep 8, 2024

Uh oh!

qodo-code-review bot commented Sep 8, 2024 •

edited

Loading

Uh oh!

jcharkow commented Sep 9, 2024

Uh oh!

singjc commented Sep 9, 2024 •

edited

Loading

Uh oh!

timosachsenberg Sep 13, 2024

Uh oh!

jcharkow Sep 13, 2024

Uh oh!

singjc Sep 13, 2024

Uh oh!

timosachsenberg commented Sep 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jcharkow commented Sep 8, 2024 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Uh oh!

qodo-code-review bot commented Sep 8, 2024

PR Reviewer Guide 🔍

Uh oh!

qodo-code-review bot commented Sep 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

jcharkow commented Sep 9, 2024

Uh oh!

singjc commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing with very sparse spectrum (from Spectrum.ipynb)

Testing with a very dense spectrum (from alphatims_tutorial.ipynb)

Uh oh!

timosachsenberg Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

jcharkow Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

singjc Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

timosachsenberg commented Sep 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcharkow commented Sep 8, 2024 •

edited by qodo-code-review bot

Loading

qodo-code-review bot commented Sep 8, 2024 •

edited

Loading

singjc commented Sep 9, 2024 •

edited

Loading