### Background: DIA vs DDA


## 4. Cascadia: Positional Embedding of Augmented Spectra

### 4.1 The Problem: What if one spectrum isn't enough?

In the previous notebook, we saw how Casanovo predicts peptide sequences over **DDA** data, whereas Cascadia predicts peptide sequences over **DIA** data. So what happens when:
- Your peptide's signal is **spread across multiple MS2 cycles** (as in DIA)?
- The **MS1 isotope pattern** holds crucial information about charge state and mass?
- Neighboring spectra contain **corroborating evidence** that could improve predictions?

The authors of Cascadia came up with the idea to encode **all** of this information in at once, in the form of a single, so called "Augmented" Spectra


<span style="color:red"><b><<<<<<< local</b></span>

### Data-Dependent Acquisition (DDA) vs. Data-Independent Acquisition (DIA)
<img src='DDA vs. DIA.png' width=500>

**DDA:** Each fragmentation spectrum corresponds to a single peptide.

At every MS1 scan, the instrument selects the top k most intense precursor ions. Only these precursors are isolated, fragmented, and recorded as MS2 spectra This produces clean MS2 spectra, each representing one peptide. But the data is incomplete becuase of the precursor selection. 

**DIA:** A single peptide is reconstructed from multiple time-adjacent spectra collected across wide, overlapping m/z windows.

Instead of selecting specific precursors, DIA fragments every ion within a set of predefined m/z windows. These windows often overlap to avoid losing peptides at boundaries. A single peptide typically contributes fragments to many consecutive MS2 spectra.

### Cascadia's Architecture
<img src='Cascadia Schematic.webp' width=900>

Cascadia builds off of Casanovo’s core architecture, predicting one amino acid at a time using a Transformer decoder. But, instead of relying on a single encoded MS2 spectrum, Cascadia incorporates multiple sources of spectral evidence (the augmented spectra), allowing it to decode a richer representation. The encoded peak representations, embedded precursor information, and the previouly predicted amino acids in the peptide (P, E, P), are fed into the transformer decoder to predict the next amino acid. Linear and Softmax layers are used once again for each peak, this time to determine if the peaks match a b or y ion for the current peptide.

<span style="color:red"><b>=======</b></span>

### Constructing an Augmented Spectrum

The following animation walks through the three-step process for building an augmented spectrum from MS1 and MS2 data:

<video controls>
  <source src="AugmentedManim.mp4" type="video/mp4">
</video>

*The video shows: (1) selecting a central MS2 scan, (2) including neighboring MS2 scans within a chosen width, and (3) identifying the corresponding MS1 scans in order to build the "augmented spectrum"*

### 4.2 Building an Augmented Spectrum: From Video to Data Structure

In the animation, we walked through how to construct an **augmented spectrum** step-by-step:

1. **Pick a central MS2 spectrum:** this is the scan where your peptide of interest was detected
2. **Include neighboring MS2 spectra:** with width $w$, we grab $w$ scans on each side (so $2w+1$ MS2 scans total). In this case it was $w=2$, so we grabbed $5$ scans total.
3. **Add the corresponding MS1 spectra:** each MS2 scan "belongs to" an MS1 scan; we include peaks from those MS1 scans that fall within the isolation window

The result is a single, enriched data structure that captures temporal context (neighboring scans) and precursor information (MS1 isotope patterns).

---

#### Video => Example

In our video, we used **width = 2**, meaning we selected 2 MS2 scans on each side of the central scan. However, the example below from [Cascadia's documentation](https://cascadia.readthedocs.io/en/latest/file_formats.html) uses **width = 1** (1 scan on each side), which makes it easier to trace through.

Here's how to read the augmented spectrum:

| Column | Meaning |
|--------|---------|
| **M/Z** | The mass-to-charge ratio of a given peak |
| **Normalized Intensity** | Peak intensity, normalized to [0, 1] |
| **Width** | Relative position: `0` = central scan, `1` = one scan forward, `-1` = one scan backward |
| **Scan Level** | `2` = MS2 (fragment ions), `1` = MS1 (precursor/isotope peaks) |

---

#### Example: An Augmented Spectrum with w=1

> **Note:** Typically, you might see a list of peaks from a single MS1 or MS2 scan listed as an array. This just "blows out" that array and lists each peak and intensity individually for some scan. 

```
BEGIN IONS
TITLE=1
PEPMASS=402.5
CHARGE=1
SEQ=PEPTIDEK
```

**Central MS2 scan (Width = 0):**
| M/Z | Intensity | Width | Level |
|-----|-----------|-------|-------|
| 185.08 | 0.67 | 0 | 2 |
| 367.85 | 0.41 | 0 | 2 |
| 400.98 | 0.88 | 0 | 2 |
| ... | ... | 0 | 2 |

**Next MS2 scan (Width = +1):**
| M/Z | Intensity | Width | Level |
|-----|-----------|-------|-------|
| 185.08 | 0.67 | 1 | 2 |
| 367.85 | 0.41 | 1 | 2 |
| ... | ... | 1 | 2 |

**Previous MS2 scan (Width = -1):**
| M/Z | Intensity | Width | Level |
|-----|-----------|-------|-------|
| 185.08 | 0.67 | -1 | 2 |
| 367.85 | 0.41 | -1 | 2 |
| ... | ... | -1 | 2 |

**MS1 peaks within isolation window:**
| M/Z | Intensity | Width | Level |
|-----|-----------|-------|-------|
| 400.18 | 0.29 | 0 | 1 |
| 401.21 | 0.35 | 0 | 1 |
| 402.27 | 0.30 | -1 | 1 |
| ... | ... | ... | 1 |

```
END IONS
```

---

#### Why This Structure is Useful

A single MS2 Spectrum in DIA is inherently ambigious. That is, it contains fragments from **all** precursors present in a given isolation window. By putting information across time (neighboring scans) + across MS levels (MS1 and MS2), this augmented spectra allows us to encode all of these details.


### 4.3 From Peaks to Embeddings: The 4-Tuple Representation

Casanovo encoded peaks as simple 2-tuples: (*m/z*, intensity). Cascadia extends this to **4-tuples** that capture the full context of each peak:

$$
\text{peak} = (m/z, \text{intensity}, \text{width}, \text{scan level})
$$

---

#### The Embedding Pipeline

Each 4-tuple is transformed into a **512-dimensional embedding** through these steps:

![AugmentedSpectrumArch.png](AugmentedSpectrumArch.png)


If you understood Casanovo well, Cascadia just adds a few more intricacies.

**In plain English:**

For one peak:

**1.** Take its 4 raw values:
- m/z
- intensity
- width / scan offset
- level

>**Note:** What we call scan offset or width, is actually retention time. It's an integer from $[-w,w]$ saying "this peak came from the central scan (0) or w scans before or after." So what they label as "RT" in the figure is actually the relative retention time/ position in the augmented window.

**2.** Peak encoder turns that 4-tuple into a single vector.

# This needs to be fixed


1. **2D sinusoidal encoding for (m/z, width):** Just as Casanovo uses sinusoidal encoding for m/z alone, Cascadia jointly encodes m/z and temporal position. This allows the model to learn that "a peak at m/z=400 in the central scan" is related to "a peak at m/z=400 in the next scan."

2. **Learned embeddings for intensity and scan level:** Rather than using fixed mathematical functions, these embeddings are learned during training. The model discovers the best way to represent "high intensity" vs "low intensity" and "MS1 peak" vs "MS2 peak."

3. **Concatenation:** The final embedding simply concatenates all components, preserving all information for the transformer to process.


---

#### Comparison: Casanovo vs Cascadia Encoding

| Aspect | Casanovo | Cascadia |
|--------|----------|----------|
| **Input** | Single MS2 spectrum | Augmented spectrum (MS1 + multiple MS2) |
| **Peak representation** | 2-tuple (m/z, intensity) | 4-tuple (m/z, intensity, width, level) |
| **Positional encoding** | 1D sinusoidal (m/z only) | 2D sinusoidal (m/z + temporal) |
| **Output dimensions** | 512-dim per peak | 512-dim per peak |
| **Data type** | DDA | DIA |

### 4.4 The Full Cascadia Pipeline


---

#### Key Architectural Details

1. **Encoder:** Processes all peak embeddings together using self-attention. The model learns which peaks are related — e.g., a y-ion series, or matching peaks across consecutive scans.

2. **Decoder:** Generates amino acids autoregressively, one at a time. At each step, it attends to both the encoded spectrum and the previously generated amino acids.

3. **Beam Search with Mass Constraint:** Rather than greedily picking the most likely amino acid, Cascadia maintains multiple candidate sequences and prunes those whose cumulative mass exceeds the precursor mass.

---

### 4.5 Summary: Why Augmented Spectra Work

In Cascadia and DIA in general, is that **context matters**.So In DIA, a single MS2 spectrum contains fragments from many co-isolated precursors, it's inherently ambiguous. By building an augmented spectrum, Cascadia provides the model with multiple lines of evidence:

| Evidence Type | Source | What it reveals |
|--------------|--------|-----------------|
| **Consistency** | Neighboring MS2 scans | Real peptide fragments appear consistently across time; noise doesn't |
| **Precursor isotopes** | MS1 peaks in isolation window | Charge state, monoisotopic mass, presence of precursor |
| **Co-elution patterns** | Width dimension | Fragments from the same peptide should co-elute |

The 4-tuple encoding $(m/z, \text{intensity}, \text{width}, \text{level})$ and 2D sinusoidal embedding allow the transformer to learn these relationships:

- **Peaks at the same m/z across consecutive scans** → likely the same fragment ion
- **MS1 isotope spacing of 0.5 Da** → charge state = 2
- **Peak intensity decreasing over time** → elution profile falling edge

> **The bottom line:** Casanovo looks at one spectrum and predicts a sequence. Cascadia looks at a *neighborhood* of spectra and predicts a sequence — with more confidence and accuracy, especially for the challenging DIA case.

### Augmented Spectra Introduction
- If we perform a DIA run then, all traces of a peptide is spread across multiple MS2 cycles, and mirrored in MS1 isotopes.
    - Cascadia's solution to this is essentially to look at a given MS2 spectra's _neighboring_ MS2 spectra and all their respective MS1 spectra. Check out the gif below. 

<video controls>
  <source src="AugmentedManim.mp4" type="video/mp4">
</video>

A peptide elutes over about 30 seconds, 

<span style="color:red"><b>>>>>>>> remote</b></span>