Unified global FLR (decoy-AA method) across AScore / PhosphoRS / LucXor for same-FDR/same-FLR comparison

## Goal

Compute **one common global False-Localization-Rate (FLR)** for all three localization
algorithms (AScore, PhosphoRS, LucXor) so they can be benchmarked at the **same PSM-FDR and
the same FLR level** (e.g. "at 5 % global FLR, how many STY sites does each tool recover?").

## TL;DR feasibility

**Possible in principle — not deliverable by the code as-is.** The decoy-amino-acid method
(Alanine = `PhosphoDecoy`) is the correct and *only* common basis, and it is already the
method we partially implement. But a correct, comparable FLR is currently blocked by two
per-tool scoring bugs, a merge that can align different peptides, and the fact that the only
existing decoy-FLR estimator (`snippets.py`) is statistically invalid (~3.7× optimistic on
our data). The fixes are localized, not architectural.

## Background: there are two different "FLR"s in this repo — don't conflate them

| | Decoy-AA counting FLR | LucXor native model FLR (`onsite/lucxor/flr.py`) |
|---|---|---|
| Basis | Count wins on a residue that *can't* be phosphorylated (A) | KDE of delta-scores: REAL vs **decoy-peptide** permutations |
| Scope | **Tool-agnostic** — definable identically for all 3 | **LucXor only** |
| Granularity | Per-site or per-PSM | Per-PSM (`delta_score`) |

Only the **decoy-AA FLR** is common to all three, so it is the only valid axis for a 3-way
comparison. LucXor's native `Luciphor_global_flr` is a *different* estimator and must not be
mixed in — and note that with `--add-decoys`, the `PhosphoDecoy(A)` wins are scored on the
**REAL/target** side of LucXor's KDE and contribute **nothing** to its native decoy density,
so `Luciphor_global_flr` is *not* a decoy-AA FLR.

## Method grounding (Ramsbottom et al. 2022)

Ramsbottom, K. A. et al. *"Method for Independent Estimation of the False Localization Rate
for Phosphoproteomics."* J. Proteome Res. 2022. DOI
[10.1021/acs.jproteome.1c00827](https://pubs.acs.org/doi/10.1021/acs.jproteome.1c00827) ·
open access [PMC9251759](https://pmc.ncbi.nlm.nih.gov/articles/PMC9251759/). This is the
source of the `PhosphoDecoy` strategy. Equations quoted verbatim from the full text:

- **Decoy-AA global FLR (Eq. 2):** `pX_FLR_n = (2 × (T_c/X_c) × Σ₁ⁿ(p_X_c)) / n`
  - `T_c` = total STY (target) residue count in the scored-phosphosite PSM set
  - `X_c` = total decoy-AA (A) residue count in that set
  - `p_X_c` = cumulative phospho-on-decoy count down the ranked list
  - `T_c/X_c` normalizes for decoy positions being scarcer; `×2` models wrong assignments
    landing in **both** pools.
- **Global model FLR (Eq. 1):** `Global model FLR = Σ₁ⁿ (P^PSM + P^PTM) / n` — sums
  PSM-identification error and localization error (union-bound), i.e. the principled
  "same PSM-FDR *and* FLR" estimator.

Method rules the comparison must follow: **rank sites by the tool's localization confidence**
(not by site count); collapse redundant sites (max prob per site, bin to 2 dp, tie-break by
PSM count); **threshold at 5 %, not 1 %** (the paper states 1 % estimates are not robust);
decoy AA = **Ala or Leu** (we use A ✓).

## Empirical code audit — per-tool status

| Tool | A competes *in scoring*? | A-win annotated in idXML? | Ranking score (granularity) | Status |
|---|---|---|---|---|
| **PhosphoRS** | ✅ verified end-to-end (`phosphors.py:1012,1167`) | ✅ `(PhosphoDecoy)` + `phospho_decoy_count` (`cli.py:531,647`) | `PhosphoRS_site_probs` (per-site, ↑=better) | **Ready** |
| **AScore** | ⚠️ single-site only | ✅ when not dropped by the bug | `AScore_N` (per-site, ↑=better) | **Needs fix** |
| **LucXor** | ❌ A scored as unmodified backbone | ❌ A-wins can't be serialized | `delta_score` (per-**PSM** only) | **Needs fix** |

### Blocking bug 1 — AScore drops decoy-A on multi-phospho peptides
`getSites_` appends A-sites *after* S/T/Y **without re-sorting** (`onsite/ascore/ascore.py:483-491`),
so `combinations()` emits unsorted index combos (e.g. `[4,1]`); `createTheoreticalSpectra_`
(`ascore.py:641-659`) walks positions ascending and **silently drops** any out-of-order
modification. Verified: `createTheoreticalSpectra_([[4,1]],'SAAAYK')` → `SAAAY(Phospho)K`
(the A is lost) instead of `SA(PhosphoDecoy)AAY(Phospho)K`. Net effect: A is systematically
disadvantaged on multi-site peptides → **decoy-AA FLR biased low**.
**Fix:** sort sites in `getSites_` (or sort each permutation before spectrum build).
Secondary: `add_losses` is hard-coded `false` (`ascore.py:37`) so −H3PO4 losses are never
generated for STY *or* A; score scale is polluted by sentinels (`1000.0`/`0.0`/`-1.0`) a
score-sweep must exclude.

### Blocking bug 2 — LucXor scores A as unmodified backbone and can't write A-wins
Production scoring `_get_mod_positions_from_perm` (`onsite/lucxor/psm.py:1115`) only
recognizes lowercase `s/t/y`, **not** `a`, so an A-localization hypothesis is scored as the
**unmodified backbone** (no +79.966 on any fragment) → A almost always loses → **decoy-AA FLR
biased low / falsely optimistic**. The output path `convert_sequence_to_standard_format`
(`psm.py:1651`) has no case for `a`, so an A-win produces a bare-`a` string that fails
`AASequence.fromString`; the `except` (`cli.py:948-951`) silently keeps the *original input*
sequence — **A-wins are never written as `A(PhosphoDecoy)`** and can't be counted downstream.
(A separate path, `_to_pyopenms_format` `peptide.py:377-380`, *does* handle `a` correctly and
is used for KDE model-building — "the code handles A" is false comfort; the wrong path is
wired into scoring.) Mass/NL config is correct (79.966331 + H3PO4).
**Fix (localized):** teach `_get_mod_positions_from_perm`, `_get_mod_map` (`psm.py:996`) and
`convert_sequence_to_standard_format` to treat `a` as `PhosphoDecoy`.

### Merge / "same PSM set" — not guaranteed
All three tools read the same input idXML (`onsite/onsitec.py:149-192`), but
`merge_algorithm_results` zips outputs by **raw positional index** with only a length warning
(`onsitec.py:259-283`); LucXor (`keepNBestHits`, spectrum-miss `cli.py:740`, min-PSM abort
`cli.py:846`) and AScore (error-drops `cli.py:167-177`) drop/reorder PSMs, so a single
mid-list drop **silently aligns hits from different peptides**. Even bypassing the merge, the
three tools retain **different PSM subsets**, so a fair comparison must be on the
**intersection of PSMs all three scored**, with **per-tool drop counts reported as a result**.

### `snippets.py` decoy-FLR is statistically invalid
- **Not score-ranked** — accumulates by *site count* (`snippets.py:971-1023`), so the curve
  is an artifact of ordering, not confidence.
- **Unnormalized** — `FLR = PhosphoDecoy/(Phospho+PhosphoDecoy)` (`snippets.py:982,1570`)
  omits both `T_c/X_c` and `×2`. On our data `T_c/X_c ≈ 1.83`, so it underestimates the true
  FLR by **≈ 2 × 1.83 ≈ 3.7×** (a "1 % FLR" read off it is really ~3–4 %).
- **Cosmetic** — sigmoid surge-smoothing, double Gaussian filtering, hard extension to 0.1
  (`snippets.py:1075-1347`). It is untracked, has hard-coded Windows paths, and is never
  wired into `onsitec`.

## Proposed work

- [ ] **AScore:** sort sites in `getSites_` so multi-phospho decoy-A permutations aren't dropped (`ascore.py:483-491`/`641-659`).
- [ ] **LucXor:** handle lowercase `a` as `PhosphoDecoy` in the scoring + output paths (`psm.py:1115`, `996`, `1651`; `cli.py:948-951`).
- [ ] **Merge:** join by `spectrum_reference`+sequence, not positional index; compare on the PSM intersection and report per-tool drop counts (`onsitec.py:259-283`).
- [ ] **FLR module:** replace `snippets.py` with a single downstream estimator implementing Eq. 2 (apply `2·T_c/X_c`), ranking each tool by *its own* localization confidence, collapsing sites per the paper, thresholding at 5 %.
- [ ] **Granularity decision (needs a maintainer call):** site-level FLR (paper-faithful; requires adding a per-site score to LucXor, which only has per-PSM `delta_score`) **vs** PSM-level FLR for all three (works after the bug fixes; coarser).
- [ ] **Keep LucXor's native FLR separate**; optionally use it as a cross-check of the decoy-AA FLR.
- [ ] (Optional) implement **Eq. 1** (`P^PSM + P^PTM`) using the search-engine PEP for `P^PSM` and each tool's localization probability for `P^PTM`.

## Acceptance

A single command produces, for AScore / PhosphoRS / LucXor on the same q-value-filtered PSM
intersection, a normalized decoy-AA global-FLR curve (sites recovered vs FLR) and the site
count each tool reports at 5 % global FLR, with per-tool PSM drop counts.


	Decoy-AA counting FLR	LucXor native model FLR (`onsite/lucxor/flr.py`)
Basis	Count wins on a residue that can't be phosphorylated (A)	KDE of delta-scores: REAL vs decoy-peptide permutations
Scope	Tool-agnostic — definable identically for all 3	LucXor only
Granularity	Per-site or per-PSM	Per-PSM (`delta_score`)

Tool	A competes in scoring?	A-win annotated in idXML?	Ranking score (granularity)	Status
PhosphoRS	✅ verified end-to-end (`phosphors.py:1012,1167`)	✅ `(PhosphoDecoy)` + `phospho_decoy_count` (`cli.py:531,647`)	`PhosphoRS_site_probs` (per-site, ↑=better)	Ready
AScore	⚠️ single-site only	✅ when not dropped by the bug	`AScore_N` (per-site, ↑=better)	Needs fix
LucXor	❌ A scored as unmodified backbone	❌ A-wins can't be serialized	`delta_score` (per-PSM only)	Needs fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified global FLR (decoy-AA method) across AScore / PhosphoRS / LucXor for same-FDR/same-FLR comparison #40

Goal

TL;DR feasibility

Background: there are two different "FLR"s in this repo — don't conflate them

Method grounding (Ramsbottom et al. 2022)

Empirical code audit — per-tool status

Blocking bug 1 — AScore drops decoy-A on multi-phospho peptides

Blocking bug 2 — LucXor scores A as unmodified backbone and can't write A-wins

Merge / "same PSM set" — not guaranteed

`snippets.py` decoy-FLR is statistically invalid

Proposed work

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Unified global FLR (decoy-AA method) across AScore / PhosphoRS / LucXor for same-FDR/same-FLR comparison #40

Description

Goal

TL;DR feasibility

Background: there are two different "FLR"s in this repo — don't conflate them

Method grounding (Ramsbottom et al. 2022)

Empirical code audit — per-tool status

Blocking bug 1 — AScore drops decoy-A on multi-phospho peptides

Blocking bug 2 — LucXor scores A as unmodified backbone and can't write A-wins

Merge / "same PSM set" — not guaranteed

snippets.py decoy-FLR is statistically invalid

Proposed work

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`snippets.py` decoy-FLR is statistically invalid