# Example of PELSA differential expression with `alphatools`

## An example dataset: The original PELSA publication

PELSA [1] is a novel method to investigate protein-ligand interactions through limited proteolysis. Cell lysate is treated with a short pulse of trypsin at extremely high (1:2) enzyme - substrate ratio, which allows for the digestion of surface exposed peptides. If a ligand (such as a small molecule binder) is bound to the protein surface, it stablizes the surrounding protein region and digestion is momentarily slowed. When compared against a control without the ligand, the PELSA-stabilized peptides appear downregulated. We replicate the original publication's analysis of Staurosporine, a pan-kinase binder, and visualize the regulation of kinase targets.

[1]: Li, Kejia, et al. "A peptide-centric local stability assay enables proteome-scale identification of the protein targets and binding regions of diverse ligands." Nature Methods 22.2 (2025): 278-282.

In [None]:
%load_ext autoreload
%autoreload 2

import tempfile

import logging
from alphatools import io  # input/output, preprocessing, plotting and tools modules
from alphabase.tools.data_downloader import DataShareDownloader

logger = logging.getLogger(__name__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Preparing the dataset using `alphatools` loaders and AnnData factory. 

After downloading the relevant files from the study's PRIDE-repository (https://www.ebi.ac.uk/pride/archive/projects/PXD034606):

- LKJ_20211007_480_Hela_stau_0uM_1.raw
- LKJ_20211007_480_Hela_stau_0uM_2.raw
- LKJ_20211007_480_Hela_stau_0uM_3.raw
- LKJ_20211007_480_Hela_stau_0uM_4.raw
- LKJ_20211007_480_Hela_stau_20uM_1.raw
- LKJ_20211007_480_Hela_stau_20uM_2.raw
- LKJ_20211007_480_Hela_stau_20uM_3.raw
- LKJ_20211007_480_Hela_stau_20uM_4.raw

And processing them with DIANN 2.1.0, the report.parquet file was saved in our datashare alongside sample metadata. We have to extract protein-level intensities and precursor-level intensities from this dataset using the `alphatools.io.AnnDataFactory` class. The resulting AnnData object contains protein-group or precursor quantities and any number of feature-metadata columns (for example, protein groups may have genes as secondary annotation, precursors may have protein groups and genes as secondary annotation).

In [None]:
report_url = "https://datashare.biochem.mpg.de/s/6piDQGm2yAEdtKQ/download"
sample_metadata_url = "https://datashare.biochem.mpg.de/s/NGof744gWw66Mc8/download"

# Get the report by downloading it to a temporary directory and directly loading it from there
with tempfile.TemporaryDirectory() as temp_dir:
    file_path = DataShareDownloader(url=report_url, output_dir=temp_dir).download()

    # AnnDataFactory instance containing protein level data
    adata_protein = io.read_psm_table(
        file_paths=file_path,
        search_engine="diann",
        intensity_column="PG.MaxLFQ",
        feature_id_column="Protein.Group",
        sample_id_column="Run",
        secondary_feature_columns=["genes"],
    )

    # AnnDataFactory instance containing gene level data
    adata_gene = io.read_psm_table(
        file_paths=file_path,
        search_engine="diann",
        intensity_column="Genes.MaxLFQ",
        feature_id_column="Genes",
        sample_id_column="Run",
        secondary_feature_columns=["proteins"],
    )

    # AnnDataFactory instance containing precursor level data
    adata_precursor = io.read_psm_table(
        file_paths=file_path,
        search_engine="diann",
        intensity_column="Precursor.Normalised",
        feature_id_column=["Precursor.Id", "Genes", "Protein.Group"],
        sample_id_column="Run",
        secondary_feature_columns=["genes", "sequence"],
    )

/var/folders/2l/hhd_z4hx3070zw8rlj4c3l940000gn/T/tmpjf7x973v/report.parquet does not yet exist


100% |########################################################################|


/var/folders/2l/hhd_z4hx3070zw8rlj4c3l940000gn/T/tmpjf7x973v/report.parquet successfully downloaded (70.8467435836792 MB)


With our different AnnData objects in hand, we can look at 