[FFID][tweak] thoughts on settings/scores by jpfeuffer · Pull Request #7130 · OpenMS/OpenMS

jpfeuffer · 2023-10-15T16:49:47Z

Added some thoughts on improvements with settings and scores.
TODO check ElutionModelFitter at the end of FFID regarding imputation/regression of intensities for unfittable features in low concentration environments.

Description

Checklist

Make sure that you are listed in the AUTHORS file
Add relevant changes and new features to the CHANGELOG file
I have commented my code, particularly in hard-to-understand areas
New and existing unit tests pass locally with my changes
Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand

If your PR is failing you can check out

The details of the action statuses at the end of the PR or the "Checks" tab.
http://cdash.openms.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number.
If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand

/reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
commenting with rebuild jenkins will retrigger Jenkins-based CI builds

⚠️ Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

Added some thoughts on improvements with settings and scores. TODO check ElutionModelFitter at the end of FFID regarding imputation/regression of intensities for unfittable features.

timosachsenberg · 2023-10-15T18:01:13Z

Thanks!
I see quite a lot of tests with changed intensities (usually lower but in ProteomicsLFQ it seems to be higher) or slightly lower number of features.
Do you know the reason for that? For lower intensity features, I would suspect the baseline filter?
Or do you know if the other parameters affect feature extraction / fitting in a way such that intensities are different?
https://github.com/OpenMS/OpenMS/pull/7130/files#diff-8398c3d124cd05cff73da64570f549b25172b2feb680f2773e6292f91b69e32dR436-R439

Also pinging @cbielow @jcharkow @hroest. Maybe someone can provide some input here, as I am not super familiar with the OpenSWATH parameters.

jpfeuffer · 2023-10-15T18:23:17Z

I just added some more notes. I don't think my settings have a lot of impact if elution_model is still on.
FFID will fit a combined model with all traces per feature in the end and overwrite the SWATH intensities.
PeakIntegrator "recently" received an update for SmartPeak. Maybe those intensities together with background subtraction are better now than when FFID was originally written.
Maybe ElutionModelFitter should therefore be disabled (although the joint fitting seems interesting). ElutionModelFitter is also the algorithm that imputes features where the combined model does not succeed. It does a regression/interpolation based on the above-mentioned intensities from SWATH's PeakIntegrator (input) and the integrated area of the combined EGH Model from ElutionModelFitter where the fit succeeded.

jpfeuffer · 2023-10-15T18:26:17Z

If my settings indeed had an effect, it must mean that e.g. the elution_model_score is missing now and the combined SWATH LDA prescore is affected (it includes the elution_model_score). Apparently the pre-scoring then has an effect on which peak groups are turned into features (e.g. through a cutoff? or through the quality sorting when multiple peak groups are found in a transition group).

However, I don't think it makes sense to hypothesise on the number of features or absolute differences. It can only be compared reliably on a ground-truth dataset.

jpfeuffer · 2023-10-15T18:53:54Z

It is also a bit unfortunate that we have 3 different fitting steps in the pipeline despite being the computationally most expensive step 🥲 I disabled 2 of 3 for now. Would be great if the results were stored and re-used (although ElutionModelFitter fits multiple traces at the same time and is therefore different).

The problem with disabling "only imputation" in ElutionModelFitter (despite being possible) is that IMO intensities of features without a successful model fit will not really be comparable to the rest.

timosachsenberg · 2023-10-16T06:27:05Z

It is also a bit unfortunate that we have 3 different fitting steps in the pipeline despite being the computationally most expensive step 🥲 I disabled 2 of 3 for now. Would be great if the results were stored and re-used (although ElutionModelFitter fits multiple traces at the same time and is therefore different).

The problem with disabling "only imputation" in ElutionModelFitter (despite being possible) is that IMO intensities of features without a successful model fit will not really be comparable to the rest.

ok this sounds to me we might keep the interpolation but maybe have a better way to detect features below LOQ?
I currently try to do this using the target/decoy approach in FFID + a SVM outside FFID.
There seems to be multiple ways to achieve this and many parameters (e.g., what is the impact of the baseline estimation?). Any idea @cbielow @hroest how to move forward here.

Some background:

Main reason is that we report too many features even if below the LOQ.

jpfeuffer · 2023-10-16T06:41:42Z

Yes I think filtering some of the low intensity features is probably more helpful but what's a bit unfortunate is that the ElutionModelFitter does not support background substraction (which I think will help tremendously in the remaining ones).
Therefore it basically defeats any previously performed bg subtraction from OpenSwath (except for failed features where it still plays a role in imputation).

Maybe adding the algorithms from PeakIntegrator into EMF would be worthwhile.

jcharkow · 2023-10-16T17:19:08Z

It is also a bit unfortunate that we have 3 different fitting steps in the pipeline despite being the computationally most expensive step 🥲 I disabled 2 of 3 for now. Would be great if the results were stored and re-used (although ElutionModelFitter fits multiple traces at the same time and is therefore different).
The problem with disabling "only imputation" in ElutionModelFitter (despite being possible) is that IMO intensities of features without a successful model fit will not really be comparable to the rest.

ok this sounds to me we might keep the interpolation but maybe have a better way to detect features below LOQ? I currently try to do this using the target/decoy approach in FFID + a SVM outside FFID. There seems to be multiple ways to achieve this and many parameters (e.g., what is the impact of the baseline estimation?). Any idea @cbielow @hroest how to move forward here.

Some background:

proteomicsLFQ with new SVM results in UPS1 dataset bigbio/quantms#301

Single cell analysis performance with worse results bigbio/quantms#287 (comment)

Main reason is that we report too many features even if below the LOQ.

Interesting findings. I'm looking at single cell/dilution series DIA data and seeing similar findings of many features being below the LOQ. Most are filtered out with FDR control however not great that they are present to begin with.

timosachsenberg · 2023-11-08T10:23:47Z

src/openms/source/TRANSFORMATIONS/FEATUREFINDER/FeatureFinderIdentificationAlgorithm.cpp

+    // TODO I wonder if the following parameters would be enough.
+    //  In theory we only care for one feature per one set of extracted chromatograms (transition group)
+    //params.setValue("stop_report_after_feature", 1); // best by quality, after scoring
+    //params.setValue("TransitionGroupPicker:stop_after_feature", 1); // best by intensity, after picking, before scoring


Hmm I did not look into it but if we have one feature with smaller intensity but much less RT error it could get lost. @hendrikweisser do you recall?

Having only one feature candidate for sure wouldn't work with the SVM-based rescoring/FDR estimation approach in FFId. If you're not using this functionality you could only extract one feature, but I agree with Timo that RT deviation is often the most important criterion. Certainly if you detect features in the same file where the peptide IDs were generated - then the feature candidate overlapping the ID is always assumed to be the correct one. (So a nice optimisation may be to detect a single peak/feature starting at the ID position and moving outward.)

hendrikweisser · 2023-11-08T12:50:11Z

Some additional comments from reading this thread:

ElutionModelFitter fits multiple traces at the same time and is therefore different

At least in theory this is important because it should reduce the impact of "interference" where something high-intensity overlaps with one of the mass traces of a feature. Then the "raw" intensity of that mass trace may be significantly wrong, but when fitting over multiple traces this should be evened out.

The problem with disabling "only imputation" in ElutionModelFitter (despite being possible) is that IMO intensities of features without a successful model fit will not really be comparable to the rest.

Intuitively this imputation step should be computationally very cheap (it's just a linear regression!) compared to the rest of the feature detection, so I'm surprised it's even a consideration for optimisation.

timosachsenberg · 2024-11-12T08:52:40Z

In test FFID 5 this feature for peptide LC(Carbamidomethyl)VLHEK/2 is missing because of
params.setValue("TransitionGroupPicker:background_subtraction", "exact"); :

/home/sachsenb/Development/OpenMS-build/bin/FeatureFinderIdentification "-test" "-in" "/home/sachsenb/Development/OpenMS/src/tests/topp/FeatureFinderIdentification_1_input.mzML" "-id" "/home/sachsenb/Development/OpenMS/src/tests/topp/FeatureFinderIdentification_1_input.idXML" "-out" "FeatureFinderIdentification_5.tmp.featureXML" "-candidates_out" "FeatureFinderIdentification_5_candidates.tmp.featureXML" "-extract:mz_window" "0.1" "-extract:batch_size" "10" "-detect:peak_width" "60" "-model:type" "none"

Peptide LC(Carbamidomethyl)VLHEK/2 (m/z: 449.744):
Peptide LC(Carbamidomethyl)VLHEK/2 (m/z: 449.744):
PeakPickerChromatogram.cpp(79):  ====  Picking chromatogram LC(Carbamidomethyl)VLHEK/2_i1 with 224 peaks (start at RT 1657.05 to RT 2087.49) using method 'corrected'
PeakPickerChromatogram.cpp(79):  ====  Picking chromatogram LC(Carbamidomethyl)VLHEK/2_i2 with 224 peaks (start at RT 1657.05 to RT 2087.49) using method 'corrected'
MRMFeatureFinderScoring.cpp(572): Scoring feature RT: 1782.34 MZ: 449.744 INT: 1.48314e+06 == LC(Carbamidomethyl)VLHEK/2 [ expected RT 1656.05 / 1656.05 ] with 2 transitions and 2 chromatograms
MRMFeatureFinderScoring.cpp(572): Scoring feature RT: 1961.13 MZ: 449.744 INT: 217331 == LC(Carbamidomethyl)VLHEK/2 [ expected RT 1656.05 / 1656.05 ] with 2 transitions and 2 chromatograms
MRMFeatureFinderScoring.cpp(572): Scoring feature RT: 1920.52 MZ: 449.744 INT: 6375.41 == LC(Carbamidomethyl)VLHEK/2 [ expected RT 1656.05 / 1656.05 ] with 2 transitions and 2 chromatograms

This feature seem to elute multiple times...

jpfeuffer · 2024-11-12T09:25:30Z

Seems to elute a long time to me 😅

timosachsenberg · 2024-11-12T09:30:31Z

How should we deal with those?
I think I would be fine with removing those (via background correction) if it improves results on UPS.

jpfeuffer · 2024-11-12T12:23:42Z

Looks like something that would distort quantities if not correctly normalized.

timosachsenberg · 2024-12-16T12:31:54Z

From @pjones using the default options

Replicate	Source	Cond	Baseline	PR7130	Diff
1	UPS	12500	45	45	0
1	YEAST	12500	828	827	-1
2	YEAST	125	865	862	-3
3	UPS	25000	47	47	0
3	YEAST	25000	833	836	3
4	UPS	2500	16	16	0
4	YEAST	2500	838	833	-5
5	YEAST	250	857	856	-1
6	UPS	50000	48	48	0
6	YEAST	50000	803	801	-2
7	UPS	5000	32	32	0
7	YEAST	5000	819	822	3
8	YEAST	500	833	830	-3
9	UPS	50	1	1	0
9	YEAST	50	937	932	-5

here are some plots of an older version (probably without the SVM filter)
from bigbio/quantms#301

jpfeuffer · 2024-12-17T08:33:21Z

Nice to see there is progress.
I'd suggest ordering by concentration and filling missing values with 0.

And lastly , what I did back then, create a grid of pairwise fold changes and compare to expected (according to concentration ratio). To also validate quantification, not just identification. I did it with two box plots per grid cell but you could do it different.

jpfeuffer · 2024-12-17T08:42:52Z

Ideally, and I never got to it, we could do one grid for every UPS protein with data points for each (found) peptide.
This would be better for debugging than aggregated on protein level. But maybe this is the next step.

timosachsenberg · 2026-03-11T11:18:07Z

Closing this long-running research PR. The key actionable items have been captured in #8886 (redundant elution model fitting, background subtraction exposure, score evaluation).

The branch is preserved if anyone needs to reference the code. Thanks @jpfeuffer for the analysis and discussion — the insights about fitting redundancy, background subtraction inconsistency, and score evaluation are well-documented in the new issue.

[FFID][tweak] thoughts on settings/scores

e4be37d

Added some thoughts on improvements with settings and scores. TODO check ElutionModelFitter at the end of FFID regarding imputation/regression of intensities for unfittable features.

jpfeuffer marked this pull request as draft October 15, 2023 16:50

jpfeuffer requested a review from timosachsenberg October 15, 2023 16:50

jpfeuffer added 2 commits October 15, 2023 20:14

Update FeatureFinderIdentificationAlgorithm.cpp

82abc65

more notes

9573d56

Merge branch 'develop' into jpfeuffer-patch-7

88a2dbf

timosachsenberg reviewed Nov 8, 2023

View reviewed changes

timosachsenberg added 2 commits November 13, 2024 15:52

update tests

e95b623

fix tests

a73f3fd

github-actions bot added the Stale label Nov 18, 2025

github-actions bot closed this Dec 7, 2025

timosachsenberg reopened this Dec 7, 2025

github-actions bot removed the Stale label Jan 10, 2026

timosachsenberg mentioned this pull request Mar 11, 2026

FFID: address redundant elution model fitting, background subtraction, and score evaluation #8886

Open

timosachsenberg closed this Mar 11, 2026

Uh oh!

Conversation

jpfeuffer commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

How can I get additional information on failed tests during CI

Advanced commands (admins / reviewer only)

Uh oh!

timosachsenberg commented Oct 15, 2023

Uh oh!

jpfeuffer commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpfeuffer commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpfeuffer commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timosachsenberg commented Oct 16, 2023

Uh oh!

jpfeuffer commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcharkow commented Oct 16, 2023

Uh oh!

timosachsenberg Nov 8, 2023

Choose a reason for hiding this comment

Uh oh!

hendrikweisser Nov 8, 2023

Choose a reason for hiding this comment

Uh oh!

hendrikweisser commented Nov 8, 2023

Uh oh!

timosachsenberg commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpfeuffer commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timosachsenberg commented Nov 12, 2024

Uh oh!

jpfeuffer commented Nov 12, 2024

Uh oh!

timosachsenberg commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpfeuffer commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpfeuffer commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timosachsenberg commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jpfeuffer commented Oct 15, 2023 •

edited

Loading

jpfeuffer commented Oct 15, 2023 •

edited

Loading

jpfeuffer commented Oct 15, 2023 •

edited

Loading

jpfeuffer commented Oct 15, 2023 •

edited

Loading

jpfeuffer commented Oct 16, 2023 •

edited

Loading

timosachsenberg commented Nov 12, 2024 •

edited

Loading

jpfeuffer commented Nov 12, 2024 •

edited

Loading

timosachsenberg commented Dec 16, 2024 •

edited

Loading

jpfeuffer commented Dec 17, 2024 •

edited

Loading

jpfeuffer commented Dec 17, 2024 •

edited

Loading