# Comparing PPIs

The PPIRef package provides wrappers for [iAlign](https://doi.org/10.1093/bioinformatics/btq404) and [US-align](https://www.biorxiv.org/content/10.1101/2022.04.18.488565v1), as well as their scalable approximation [iDist](https://arxiv.org/pdf/2310.18515.pdf) (used to construct the PPIRef dataset) for comparing PPI structures. Additionally it provides a sequence identity comparator to compare PPIs by their sequences.

> 📌 Using wrappers for iAlign and US-align requires their installation. Please refer to the Reference API documentation for details.

In [2]:
from ppiref.comparison import IAlign, USalign, IDist, SequenceIdentityComparator, FoldseekMMComparator
from ppiref.extraction import PPIExtractor
from ppiref.definitions import PPIREF_TEST_DATA_DIR

# Suppress BioPython warnings
import warnings
from Bio import BiopythonWarning
warnings.simplefilter('ignore', BiopythonWarning)

# Suppress Graphein log
from loguru import logger
logger.disable('graphein')

Prepare near-duplicate PPIs from Figure 1 in the ["Learning to design protein-protein interactions with enhanced generalization"](https://arxiv.org/pdf/2310.18515.pdf) paper.

<p align="center">
  <img width="350" src="./_static/images/1p7z_3p9r.png"/>
</p>

In [4]:
ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'
extractor = PPIExtractor(out_dir=ppi_dir, kind='heavy', radius=6., bsa=False)
extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/1p7z.pdb', partners=['A', 'C'])
extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/3p9r.pdb', partners=['B', 'D'])
ppis = [ppi_dir / 'p7/1p7z_A_C.pdb', ppi_dir / 'p9/3p9r_B_D.pdb']

**Example 1**. Compare PPIs with [iAlign](https://doi.org/10.1093/bioinformatics/btq404). iAlign is the original adaption of [TM-align](https://doi.org/10.1093/nar/gki524) to protein-protein interfaces. TM-align is based on 3D alignment of protein structures. High `IS-score` and low `P-value` produced by iAlign indicate high similarity.

In [4]:
ialign = IAlign()
ialign.compare(*ppis)

{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'IS-score': 0.95822,
 'P-value': 8.22e-67,
 'Z-score': 152.167,
 'Number of aligned residues': 249,
 'Number of aligned contacts': 347,
 'RMSD': 0.37,
 'Seq identity': 0.992}

**Example 2.** Compare PPIs with [US-align](https://www.biorxiv.org/content/10.1101/2022.04.18.488565v1). US-align is a more recent adaption of [TM-align](https://doi.org/10.1093/nar/gki524), designed as a universal comparison method for different kinds of macromolecules. High TM-scores in both directions (`TM1` amd `TM2`) indicate high similarity.

In [5]:
usalign = USalign()
usalign.compare(*ppis)

{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'TM1': 0.984,
 'TM2': 0.984,
 'RMSD': 0.35,
 'ID1': 0.979,
 'ID2': 0.979,
 'IDali': 0.993,
 'L1': 289,
 'L2': 289,
 'Lali': 285}

**Example 3.** Compare PPIs with [Foldseek-MM](https://www.biorxiv.org/content/10.1101/2024.04.14.589414v1). Foldseek-MM is designed to compare protein-protein complexes by applying Foldseek to all partners and finding the best-scoring alignment of the whole complexes. Here, we use the method to compare protein-protein interfaces, similar to Foldseek-MM in the [interface mode](https://github.com/steineggerlab/foldseek/pull/330). Similar to iAlign and US-align, Foldseek-MM produces a TM-score. The high TM-score indicates high similarity.


In [6]:
foldseek_mm = FoldseekMMComparator()
foldseek_mm.compare(*ppis)

{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'Foldseek-MM TM-score (normalized by query PPI0 length)': 0.98084,
 'Foldseek-MM TM-score (normalized by target PPI1 length)': 0.98084,
 'Matched chains in the query PPI0 complex': 'A,C',
 'Matched chains in the target PPI1 complex': 'D,B'}

**Example 4.** Compare by maximum pairwise sequence identity. High sequence identity indicates high similarity. Comparing PPIs based on sequences requires a path to the directory storing complete PDB files, used to extract the PPIs.

In [6]:
seqid = SequenceIdentityComparator(pdb_dir=PPIREF_TEST_DATA_DIR / 'pdb')
seqid.compare(*ppis)

{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'Maximum pairwise sequence identity': 0.9944979367262724}

**Example 5.**  Compare with [iDist](https://arxiv.org/pdf/2310.18515.pdf). iDist is an efficient approximation of 3D alignment-based methods. Low iDist distance indicates high similarity (below 0.04 is considered near-duplicate for 6A distance interfaces).

In [7]:
idist = IDist()
idist.compare(*ppis)

{'PPI0': '1p7z_A_C', 'PPI1': '3p9r_B_D', 'iDist': 0.0034661771664121184}

**Example 6.** Compare PPIs pairwise with iDist. Pairwise comparison in parallel is available for other methods as well but does not scale to large datasets.

In [8]:
idist = IDist(max_workers=2)
idist.compare_all_against_all(ppis, ppis)

Embedding PPIs (2 processes):   0%|          | 0/2 [00:00<?, ?it/s]

Embedding PPIs (2 processes): 100%|██████████| 2/2 [00:04<00:00,  2.49s/it]


Unnamed: 0,PPI0,PPI1,iDist
0,1p7z_A_C,1p7z_A_C,0.0
1,1p7z_A_C,3p9r_B_D,0.003466
2,3p9r_B_D,1p7z_A_C,0.003466
3,3p9r_B_D,3p9r_B_D,0.0
