[FFID][tweak] thoughts on settings/scores#7130
Conversation
Added some thoughts on improvements with settings and scores. TODO check ElutionModelFitter at the end of FFID regarding imputation/regression of intensities for unfittable features.
|
Thanks! Also pinging @cbielow @jcharkow @hroest. Maybe someone can provide some input here, as I am not super familiar with the OpenSWATH parameters. |
|
I just added some more notes. I don't think my settings have a lot of impact if elution_model is still on. |
|
If my settings indeed had an effect, it must mean that e.g. the elution_model_score is missing now and the combined SWATH LDA prescore is affected (it includes the elution_model_score). Apparently the pre-scoring then has an effect on which peak groups are turned into features (e.g. through a cutoff? or through the quality sorting when multiple peak groups are found in a transition group). However, I don't think it makes sense to hypothesise on the number of features or absolute differences. It can only be compared reliably on a ground-truth dataset. |
|
It is also a bit unfortunate that we have 3 different fitting steps in the pipeline despite being the computationally most expensive step 🥲 I disabled 2 of 3 for now. Would be great if the results were stored and re-used (although ElutionModelFitter fits multiple traces at the same time and is therefore different). The problem with disabling "only imputation" in ElutionModelFitter (despite being possible) is that IMO intensities of features without a successful model fit will not really be comparable to the rest. |
ok this sounds to me we might keep the interpolation but maybe have a better way to detect features below LOQ? Some background:
Main reason is that we report too many features even if below the LOQ. |
|
Yes I think filtering some of the low intensity features is probably more helpful but what's a bit unfortunate is that the ElutionModelFitter does not support background substraction (which I think will help tremendously in the remaining ones). Maybe adding the algorithms from PeakIntegrator into EMF would be worthwhile. |
Interesting findings. I'm looking at single cell/dilution series DIA data and seeing similar findings of many features being below the LOQ. Most are filtered out with FDR control however not great that they are present to begin with. |
| // TODO I wonder if the following parameters would be enough. | ||
| // In theory we only care for one feature per one set of extracted chromatograms (transition group) | ||
| //params.setValue("stop_report_after_feature", 1); // best by quality, after scoring | ||
| //params.setValue("TransitionGroupPicker:stop_after_feature", 1); // best by intensity, after picking, before scoring |
There was a problem hiding this comment.
Hmm I did not look into it but if we have one feature with smaller intensity but much less RT error it could get lost. @hendrikweisser do you recall?
There was a problem hiding this comment.
Having only one feature candidate for sure wouldn't work with the SVM-based rescoring/FDR estimation approach in FFId. If you're not using this functionality you could only extract one feature, but I agree with Timo that RT deviation is often the most important criterion. Certainly if you detect features in the same file where the peptide IDs were generated - then the feature candidate overlapping the ID is always assumed to be the correct one. (So a nice optimisation may be to detect a single peak/feature starting at the ID position and moving outward.)
|
Some additional comments from reading this thread:
At least in theory this is important because it should reduce the impact of "interference" where something high-intensity overlaps with one of the mass traces of a feature. Then the "raw" intensity of that mass trace may be significantly wrong, but when fitting over multiple traces this should be evened out.
Intuitively this imputation step should be computationally very cheap (it's just a linear regression!) compared to the rest of the feature detection, so I'm surprised it's even a consideration for optimisation. |
|
Seems to elute a long time to me 😅 |
|
How should we deal with those? |
|
Looks like something that would distort quantities if not correctly normalized. |
|
From @pjones using the default options here are some plots of an older version (probably without the SVM filter) |
|
Nice to see there is progress. And lastly , what I did back then, create a grid of pairwise fold changes and compare to expected (according to concentration ratio). To also validate quantification, not just identification. I did it with two box plots per grid cell but you could do it different. |
|
Ideally, and I never got to it, we could do one grid for every UPS protein with data points for each (found) peptide. |
|
Closing this long-running research PR. The key actionable items have been captured in #8886 (redundant elution model fitting, background subtraction exposure, score evaluation). The branch is preserved if anyone needs to reference the code. Thanks @jpfeuffer for the analysis and discussion — the insights about fitting redundancy, background subtraction inconsistency, and score evaluation are well-documented in the new issue. |



Added some thoughts on improvements with settings and scores.
TODO check ElutionModelFitter at the end of FFID regarding imputation/regression of intensities for unfittable features in low concentration environments.
Description
Checklist
How can I get additional information on failed tests during CI
Click to expand
If your PR is failing you can check outIf you click in the column that lists the failed tests you will get detailed error messages.
Advanced commands (admins / reviewer only)
Click to expand
/reformat(experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.rebuild jenkinswill retrigger Jenkins-based CI builds