2_keynotes.tex

\section{Keynote presentations}

\subsection{Alexey Nesvizhskii - Open Search: Open-minded Exploration of Proteomics Data}

Alexey Nesvizhskii  has been fascinated with the subject of the ``dark proteome'' since his entry in the field proteomics in 2001. What are all those spectra that we cannot identify in a typical search, and how do we design efficient computational strategies to move beyond standard peptide identification searches.  To this end, his group has designed a highly efficient fragment ion indexing algorithm and implemented it in MSFragger~\citep{pmid28394336}, which has become a widely used tool and the engine behind our FragPipe computational platform. MSFragger ``open'' and ``mass offset'' searches have empowered new strategies for faster and more sensitive identification of biologically or chemically modified peptides. In his presentation, he described recent improvements in their algorithms, including localization-aware open search~\citep{pmid32792501} and new methods for the identification of N- and O-linked glycopeptides~\citep{pmid33020657}, labile PTMs \citep{pmid37004988}, and chemically labeled peptides in chemoproteomics experiments \citep{pmid37438360}. He  also provided an overview of the various quantification workflows (DIA, TMT, LFQ-MBR) available in FragPipe \url{https://fragpipe.nesvilab.org/}.


\subsection{Maximilian Strauss - Towards the next generation of proteomics analysis with the AlphaPept ecosystem}

Like other omics fields, mass spectrometry (MS)-based proteomics experiences ever-increasing amounts of data with data, demanding swift and effective analysis tools. Tapping into recent computer science strides, Maximilian Strauss presented AlphaPept \citep{Strauss2024}, a Python framework for rapid processing of high-resolution MS data. Utilizing Numba for on-the-fly compilation, AlphaPept offers a hundred-fold increase in processing speed over native Python and can leverage Python's scientific ecosystem and machine learning innovations. Its Jupyter Notebooks facilitate community contributions, while robust engineering practices ensure reliability. AlphaPept can process proteomes within minutes, and supports automated pipelines via a user-friendly interface or use as a library, with its codebase openly available on GitHub. Building on the same design principles, we have since expanded to the AlphaPept ecosystem, which now offers a suite of libraries catering to various stages, from raw data processing to advanced deep learning predictions and downstream statistical and machine learning analyses for biomarker discovery.

\subsection{Mathias Wilhelm - The DOME Recommendations for Machine Learning Exemplified on Prosit}

In light of rising concerns over a reproducibility crisis in machine and deep learning \citep{pmid35883008}, Mathias Wilhelm (Computational Mass Spectrometry, Technical University of Munich, Freising, Germany) presented DOME (data, optimization, model and evaluation), a set of community-wide recommendations for reporting supervised machine learning-based analyses \citep{pmid34316068}, and its interpretation in proteomics research \citep{pmid35119864}. He showed this on the example of developing Prosit, a neural network developed by his group that predicts chromatographic retention time and fragment ion intensities. As machine and deep learning is seeing rapid adoption in proteomics, best practices and reporting standards like DOME are good starting points for scientists, as author and reviewer, to ensure high quality research. The presentation ended by a call to the community: let's learn from each other instead of struggling alone.

\subsection{Anna Susmelj - Predicting physicochemical properties of peptides using deep learning}

Anna Susmelj’s presentation delves into the crucial field of domain generalization in machine learning, with a focus on strategies to bolster model adaptability across diverse datasets while ensuring domain shift robustness. The talk addresses methods to mitigate performance degradation when models confront previously unseen domains, highlighting their practical significance. Additionally, the presentation discusses common pitfalls and testing strategies for AI models in various domains, aiming to contribute to the development of robust and versatile machine learning systems tailored for Mass Spectrometry applications.


\subsection{Pedro Beltrao - Towards a structurally resolved human protein interaction network}

All cellular functions are governed by complex molecular machines that assemble through protein-protein interactions. Their tissue or cell type specificity and atomic details are critical to the study of their molecular and cellular mechanisms but fewer than 5\% of hundreds of thousands of human interactions have been structurally characterized \citep{pmid23399932} and their tissue specificity is essentially unknown. Pedro Beltrao (ETH Zurich) discussed the potential and limitations of recent progress in deep-learning methods using AlphaFold2~\citep{AlphaFold2} to predict structures for human interactions~\citep{pmid36690744}. Higher confidence AlphaFold2 models were correctly enriched in interactions supported by affinity purification or structure based methods and can be orthogonally confirmed by spatial constraints defined by cross-link data. In addition, there were several examples of how the predicted binary complexes can be used to build larger assemblies. To study tissue specificity, it was shown how large-scale protein abundance measurements across samples can be used to predict tissue-specific protein interactions \citep{pmid29032074, pmid28854368}. Protein co-variation is a stronger predictor of protein interactions than mRNA co-expression and can be used to build tissue specific protein interaction networks. These tissue specific networks can be used to link trait/disease associated genes to specific tissues and can serve as a resource for studying tissue specific differences in cell biology and disease.