#Bio Datasets
To contribute, make changes to bio_datasets.csv
and run python create_readme.py
Dataset name | Link | Short Description | API available | TAGS | |
---|---|---|---|---|---|
1 | RCSB Protein Data Bank (PDB) | https://www.rcsb.org/ | A comprehensive database for the three-dimensional structural data of large biological molecules, including proteins and nucleic acids. | Yes | dna, proteins, rna, small_molecules |
2 | PubChem | https://pubchem.ncbi.nlm.nih.gov/ | A database of chemical molecules and their activities against biological assays, containing information on small molecules, nucleotides, and carbohydrates. | Yes | interactions, small_molecules |
3 | UniProt | https://www.uniprot.org/ | A comprehensive resource for protein sequence and annotation data, providing information about the function and structure of proteins. | Yes | proteins |
4 | The Human Protein Atlas | https://www.proteinatlas.org/ | An interactive database providing high-resolution insights into the spatial distribution of proteins in human tissues and cells. | No | proteins |
5 | BindingDB | https://www.bindingdb.org/rwd/bind/index.jsp | BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules. | Yes | proteins, small_molecules |
6 | DrugBank | https://go.drugbank.com/ | A unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. | Yes with registration | drugs, interactions |
7 | KEGG: Kyoto Encyclopedia of Genes and Genomes | https://www.genome.jp/kegg/ | A collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. | Limited | drugs, genes, glycans, human_diseases, interactions, pathways, proteins, small_molecules |
8 | STRING | https://string-db.org/ | A database of known and predicted protein-protein interactions, including direct (physical) and indirect (functional) associations. | Yes | interactions, proteins |
9 | NCBI Gene Expression Omnibus (GEO) | https://www.ncbi.nlm.nih.gov/geo/ | A public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data. | Yes | genes |
10 | ChEMBL | https://www.ebi.ac.uk/chembl/ | A manually curated database of bioactive molecules with drug-like properties, focusing on the chemical, bioactivity and genomic data. | Yes | interactions, small_molecules |
11 | Ensembl | https://www.ensembl.org/ | A comprehensive source of genomic information, integrating genomic, transcriptomic, proteomic, genetic, and other data. | Yes | genes |
12 | Reactome | https://reactome.org/ | A free, open-source, curated and peer-reviewed pathway database that provides insights into molecular processes and pathways in human biology. | Yes | interactions, pathways |
13 | Gene Ontology Consortium | http://geneontology.org/ | A major bioinformatics initiative that provides a controlled vocabulary to describe gene and gene product attributes in any organism. | Yes | gene_ontology, pathways |
14 | Human Metabolome Database (HMDB) | https://hmdb.ca/ | A richly annotated resource that offers detailed information about small molecule metabolites found in the human body. | Yes | small_molecules |
15 | InterPro | https://www.ebi.ac.uk/interpro/ | A database that provides predictive information about protein families, domains, and functional sites. | Yes | nan |
16 | Pfam | https://pfam.xfam.org/ | A comprehensive database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). | No | proteins |
17 | The Cancer Genome Atlas (TCGA) | https://www.cancer.gov/tcga | A comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through genome analysis techniques. | Limited | cancer, drugs, genes, human_diseases, treatments |
18 | Allen Brain Atlas | https://www.brain-map.org/ | A growing collection of online public resources integrating extensive gene expression, connectivity, and histology data, with the aim to further our understanding of the brain. | Yes | cells, genes, rna |
19 | GlyTouCan | https://glytoucan.org/ | The international glycan structure repository, which provides a platform for the registration of glycan (sugar chain) structure information. | Yes | glycans |
20 | The Zebrafish Information Network (ZFIN) | https://zfin.org/ | The premier database for zebrafish genetic, genomic, developmental, and physiological information. | Yes | gene_ontology, genes, human_diseases, proteins |
21 | FlyBase | http://flybase.org/ | A comprehensive database for information on the genetics and molecular biology of Drosophila (fruit flies). | Yes | gene_ontology, genes, human_diseases, pathways, proteins |
22 | WormBase | https://www.wormbase.org/ | A database for biology and genome information for the nematode model organism, C. elegans, and related species. | Yes | cells, gene_ontology, genes, human_diseases, pathways, proteins, rnai |
23 | Mouse Genome Informatics (MGI) | http://www.informatics.jax.org/ | A comprehensive resource for data on the laboratory mouse, integrating genetic, genomic, and biological data. | Yes | genes, human_diseases, pathways, proteins |
24 | YeastMine | https://yeastmine.yeastgenome.org/yeastmine/begin.do | A data warehouse for the budding yeast Saccharomyces cerevisiae, providing access to gene, protein, and network data. | Yes | genes, human_diseases, interactions, pathways, proteins |
25 | BRENDA | https://www.brenda-enzymes.org/ | A comprehensive enzyme information system providing data on enzyme nomenclature, structure, function, and related properties. | Yes | genes, interactions, ligands, proteins |
26 | TAIR (The Arabidopsis Information Resource) | https://www.arabidopsis.org/ | A database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. | Yes with registration | dna, genes, proteins |
27 | ArrayExpress | https://www.ebi.ac.uk/arrayexpress/ | A repository for functional genomics experiments including gene expression where you can query and download data collected to MIAME and MINSEQE standards. | Yes | experiments, genes, proteins |
28 | Europe PMC | https://europepmc.org/ | A free, comprehensive database of life science and biomedical literature. | Yes | articles |
29 | dbSNP (Database of Single Nucleotide Polymorphisms) | https://www.ncbi.nlm.nih.gov/snp/ | A central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms. | Yes | genes |
30 | miRBase | http://www.mirbase.org/ | A database of published miRNA sequences and annotations, providing information on microRNA biology. | Yes | genes, rna |
31 | GTEx Portal | https://gtexportal.org/home/ | Provides data on gene expression and regulation in multiple human tissues, facilitating studies on the relationship between genotype and phenotype. | Yes | cells, genes |
32 | BioGRID | https://thebiogrid.org/ | A resource for studying protein-protein and genetic interactions in multiple organisms, including humans, yeast, flies, and worms. | Yes | interactions, proteins |
33 | GenBank | https://www.ncbi.nlm.nih.gov/genbank/ | The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. | Yes | genes |
34 | SILVA | https://www.arb-silva.de/ | A comprehensive database of high-quality ribosomal RNA sequence data, supporting research in the phylogeny and taxonomy of microbial and other organisms. | Yes | genes, rna |
35 | ENCODE (Encyclopedia of DNA Elements) | https://www.encodeproject.org/ | A project that aims to catalog all the functional elements in the human genome, including regions of transcription, transcription factor association, chromatin structure, and histone modification. | Yes | functions, genes, pathways, rna |
36 | EMBL-EBI Metabolights | https://www.ebi.ac.uk/metabolights/ | A resource for metabolomics experiments and derived information, hosting a wide range of metabolomics data including raw and processed data, metabolite structures, and bioinformatics analyses. | Yes | experiments, pathways, reactions, small_molecules |
37 | PharmGKB | https://www.pharmgkb.org/ | A knowledge base that collects, curates, and disseminates information about the impact of human genetic variation on drug response. | Yes | drugs, genes, human_diseases, pathways, treatments |
38 | The Cancer Imaging Archive (TCIA) | https://www.cancerimagingarchive.net/ | A service providing access to a large archive of medical images of cancer, available for public download. | Yes | cancer, imaging |
39 | RxRx3 | https://www.rxrx.ai/rxrx3 | RxRx3 is a publicly available map of biology that represents a small subset – less than 1% – of Recursion’s total dataset. | Yes with registration | cells, genes, imaging |
40 | ImmPort (Immunology Database and Analysis Portal) | https://www.immport.org/ | A repository of data from diverse immunology studies, including vaccine trials, infectious disease research, and autoimmune diseases. | Yes | articles, experiments, human_diseases |