This repository contains the source files and supplementary information for the implementations and use cases presented in the work:
Gabriel Cabas-Mora1, Anamaría Daza2, Lindybeth Sarmiento-Varón3, Diego Alvarez3,4, Valentina Garrido1, Julieta H. Sepúlveda5, Roberto Uribe-Paredes1, Álvaro Olivera-Nappa2, Mehdi D. Davari6, Marcelo Navarrete3,4 and David Medina-Ortiz1,2*.
PeptipediaDB: peptide sequence database and user-friendly web platform. A major update.
1Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile.
2Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Avenida Beauchef 851, 8320000, Santiago, Chile.
3Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, 6210005, Punta Arenas, Chile.
4Escuela de Medicina, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile.
5Facultad de Ciencias de la Salud, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile.
6Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120, Halle, Germany.
*Corresponding author
Peptides have gained greater relevance in recent years thanks to their therapeutic properties. The increase in the production and synthesis of peptides has resulted in a large volume of data, allowing the generation of databases and information repositories. Significant advances in sequencing techniques and artificial intelligence aimed at accelerating peptide design. However, applying these techniques requires versatile and constantly updated storage systems, along with tools that facilitate peptide research and the application of machine learning techniques for building predictive systems. In this work, we present a significant update of our Peptipedia database, increasing by more than 45% the sequences with experimentally validated biological activity and more than 3.9 million peptides with biological activity predicted through machine learning models. All peptide sequences are described using physicochemical, thermodynamic, structural, and ontologic descriptions. Finally, peptide description tools are incorporated through structural and ontological properties, predictive models of relevant properties to peptide design, and more than 70 binary biological classification models are added along with a moonlight effect estimation system. This new Peptipedia version represents the most significant public repository of peptides and facilitates the study of peptides as support for biotechnological research. Peptipedia is publicly accessible on https://peptipedia.cl/ for non-commercial use licensed under the MIT License.
This web application was implemented using a client-server architecture. The frontend and backend folder contains information about requirements and instalation in his own README.md.
Peptipedia database is located on this Google Drive folder.
The detailed data processing is found in this repository.