# Tier 2 Architecture: Network Propagation

## 1. Introduction: What Was the Initial State?

Our Tier 2 layer operates on network topology rather than isolated drug features. We started with a **Homogeneous Protein-Protein Interaction (PPI) Graph** and a solid mathematical engine (`RWR.calculate_rwr`) leveraging Sparse CSR Matrices to perform Random Walk with Restart (RWR).

### The Original Bottleneck
The initial `PPINetworkModel` had a severe architectural limitation: it executed the iterative RWR graph-traversal *on-the-fly* every time a drug pair was queried. Evaluating a massive 380,000-pair benchmark would have taken hours, and the API was restricted to single loops instead of lightning-fast vectorized matrix operations.


## 2. Phase 1 Execution: Vectorization and Precomputation

To solve the bottleneck and elevate Tier 2 to industry standard, we executed **Phase 1: Precomputation**.

### Step A: Offline Footprint Generation
We wrote an offline generator script (`src/evaluation/precompute_ppi_rwr.py`) that systematically iterates over all $1,500$ valid drugs, runs the heavy iterative RWR logic to generate their `(1, 798)` steady-state biological footprint in the interactome, and safely serializes them into a single dense matrix `network_features.pkl`.

### Step B: The O(1) Lookup Architecture
We entirely stripped down the `PPINetworkModel` class:
- It no longer builds `networkx` graphs dynamically.
- It no longer iterates random walks.
- **Instead**: It loads the precomputed vectors into memory at startup (an instant $O(1)$ dictionary lookup mapped to the matrix) and computes similarity using heavily-optimized **Matrix Cosine Similarity**, exactly matching the batching standard of Tier 1.

## 3. Evaluation: Benchmarking Tier 2 vs Tier 1

With the speed bottleneck eradicated, we dropped the fully-vectorized Tier 2 PPI model straight into the $380,000$-pair 5-seed strict negative ablation benchmark.

### Experimental Results

```text
==========================================================================================
   ABLATION STUDY RESULTS  —  mean ± std across 5 random seeds
==========================================================================================
Condition                                  AUPR                 AUROC         Avg_Precision                 EF@1%
------------------------------------------------------------------------------------------
A: Chemical Only                0.1142 ± 0.0004       0.5848 ± 0.0011       0.1142 ± 0.0004       1.3637 ± 0.0141
B: Phenotypic Only              0.3691 ± 0.0045       0.5917 ± 0.0003       0.2069 ± 0.0021       7.5376 ± 0.0797
C: Tier 1 Fused                 0.1445 ± 0.0004       0.6194 ± 0.0010       0.1444 ± 0.0004       3.0024 ± 0.0431
D: Tier 2 PPI RWR               0.3921 ± 0.0017       0.7599 ± 0.0002       0.3447 ± 0.0017       7.5548 ± 0.1599
==========================================================================================
```

### Scientific Takeaway

The Tier 2 model represents a massive breakthrough over Tier 1:

1. **AUROC Dominance ($0.7599$)**: By utilizing topological connections between drug targets within the human interactome, we successfully rank random true interactions significantly higher than baseline chemical and phenotypic profiles.
2. **AUPR Recovery ($0.3921$)**: Tier 1 Fusion historically suffered a 'dilution' effect because summing disparate modalities (chemistry + side-effects) mathematically failed. By letting the network naturally diffuse the biological signal instead of artificially summing vectors, we recovered and even exceeded the ultra-high precision of the pure phenotypic database.

### Next Steps
The current graph is still homogeneous (Proteins only). The final stage of this project will transition this architecture over to a **Heterogeneous Information Network (HIN)**, officially establishing drug-drug nodes, drug-protein nodes, and crossing semantic topological edges to achieve maximum safety modeling.