change binning aggregation from "mean" to "sum"#23
Conversation
binning aggregation by mean leads to strange trends when have annotations that the annotation peaks are much higher than everything else. Changing to sum leads to more realistic plot
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Reviewer Guide 🔍
|
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Code Suggestions ✨
|
|
Possibly should allow user to customize the aggregation method? |
|
I did a bit more digging and made some comparisons. I added a tolerance binning method, to bin mz's based on a fixed tolerance. I found the sturges and freedman binning methods sometimes don't work as well for either sparse or really dense spectrums. I also added an aggregation param to allow the user to aggregate by 'sum', 'mean' or 'max`. Based on my comparisons using the mz tolerance bining method (with tol = 1) and using max as the aggregation method seems to return spectra that closely matches the orignal raw spectra and is a lot faster for plotting (5.007686 seconds for raw vs 0.614527 seconds for max mz tol=1 binning) Update: I added two options for automating the computation of the tolerances for the mz tolerance bining method:
This results in an even faster binning and plotting, with the binned spectrum still looking similar to the original raw (5.91sec) > mz-tol-bin + tol=1 (0.83sec) > mz-tol-bin + tol=1pct-dif ( 0.09sec) Testing with very sparse spectrum (from Spectrum.ipynb)Testing with a very dense spectrum (from alphatims_tutorial.ipynb) |
- add automative tolerance compute methods - use numpy where possible
There was a problem hiding this comment.
Sorry this is a rough notebook added by mistake to this PR. Will remove
There was a problem hiding this comment.
It's not added by mistake. It's a tutorial notebook for showing an example of loading bruker tdf DIA and DDA data using alphatims and showcasing plotting with pyopenms_viz.
The notebook was updated and added to this PR to reflect the changes with the spectrum binning.
|
just for reference: I used max in the past for similar reasons. |




User description
binning aggregation by mean leads to strange trends when have annotations that the annotation peaks are much higher than everything else. Changing to sum leads to more realistic plot.
Binning by mean (old):

Binning by sum (new):

I also changed the peakmap but these are not tested.
PR Type
enhancement
Description
Changes walkthrough 📝
_core.py
Update binning aggregation from mean to sum in plotspyopenms_viz/_core.py
calculations.