Skip to content

KatarinaYuan/awesome-single-cell-foundation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 

Repository files navigation

awesome-single-cell-foundation

Why creating a repo focusing on single cell foundation models?

The evolution of single-cell biology is intersecting with the advancements of artificial intelligence and machine learning. The rise of foundation models, especially large language networks, suggests potential deeper insights into cellular mechanisms. Anticipating a future where a foundational cell model integrates molecular, spatial, and morphometric details offers a promising avenue for holistic biological understanding. This convergence of technology and biology brings forth exciting opportunities to enhance our knowledge of cellular states and their implications in health and disease.

Quote from Fabian Theis: As modern language models enable text generation as answer to a query, we will be able to query a cellular foundation model about unseen, new cell states across health and disease.

Single cell RNA sequencing (scRNA-seq)

  • SCHYENA: FOUNDATION MODEL FOR FULL-LENGTH SINGLE-CELL RNA-SEQ ANALYSIS IN BRAIN paper, code

  • GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model paper, code

  • Geneformer: Transfer learning enables predictions in network biology paper, code

  • scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI paper, code

  • scBert: scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data paper, code

  • scFoundation: Large Scale Foundation Model on Single-cell Transcriptomics paper, code

  • tGPT: Generative pretraining from large-scale transcriptomes for single-cell deciphering paper, code

  • CellLM: Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning paper, code

  • Exceiver: A single-cell gene expression language model paper, code

  • scTranslator: Training on large-scale RNA data to impute protein abundance paper, code

atlas model

  • SCimilarity: Scalable querying of human cell atlases via a foundational model reveals commonalities across fibrosis-associated macrophages paper, code

  • scTab: Scaling cross-tissue single-cell annotation models paper, code

Spatially-resolved Transcriptomic (SRT)

  • CellPLM: Pre-training of Cell Language Model Beyond Single Cells paper, code

Paired scATAC-seq and scRNA-seq

  • GET: a foundation model of transcription across human cell types paper, code

Multi-modality (Incorporating Text)

  • BioTranslator: Multilingual translation for zero-shot biomedical classification using BioTranslator paper, code

  • GENEPT: A SIMPLE BUT HARD-TO-BEAT FOUNDATION MODEL FOR GENES AND CELLS BUILT FROM CHATGPT paper, code

  • BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine paper, code

Modeled as text

  • Cell2Sentence: Teaching Large Language Models the Language of Biology paper, code

4D Nucleome (Hi-C, Micro-C, ChIA-PET)

Enhancer activity (STARR-seq)

Transcriptome (CAGE-seq)

Epigenome (ChIP-seq)

4D Nucleome (Hi-C, Micro-C, ChIA-PET) + Enhancer activity (STARR-seq) + Transcriptome (CAGE-seq, RNA-seq) + Epigenome (ChIP-seq)

  • EPCOT: A generalizable framework to comprehensively predict epigenome, chromatin organization, and transcriptome paper, code

Single modality other than Single Cell

Biomedical Test Only

  • BioLinkBERT: Pretraining Language Models with Document Links paper, code

  • BioGPT: generative pre-trained transformer for biomedical text generation and mining paper, code

  • BioBERT: a pre-trained biomedical language representation model for biomedical text mining paper, code

  • PubMedBERT: Domain-specific language model pretraining for biomedical natural language processing paper, code

Protein Sequence Only

  • ESM-2: Language models of protein sequences at the scale of evolution enable accurate structure prediction paper, code

  • Prottrans: Toward understanding the language of life through self-supervised learning paper, code

  • ProGEN: Large language models generate functional protein sequences across diverse families paper, code

  • ProtGPT2: ProtGPT2 is a deep unsupervised language model for protein design paper, code

  • ZymCTRL: a conditional language model for the controllable generation of artificial enzymes paper, code

  • RITA: a Study on Scaling Up Generative Protein Sequence Models paper, code

DNA Sequence Only

  • GPN: DNA language models are powerful predictors of genome-wide variant effects paper, code

  • Enformer*: Effective gene expression prediction from sequence by integrating long-range interactions paper, code

  • DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome paper, code

  • Nucleotide Transformer: The nucleotide transformer: Building and evaluating robust foundation models for human genomics paper, code

  • Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution paper, code

  • GenSLMs: Genome-scale language models reveal sars-cov-2 evolutionary dynamics paper, code

  • gLM: Genomic language model predicts protein co-regulation and functionpaper, code

RNA Sequence only

  • RNA-FM: Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions paper, code

Small Molecule Only

  • Graphium: Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets paper, code

Benchmarking

  • Assessing the limits of zero-shot foundation models in single-cell biology paper, code

  • scEval: Evaluating the Utilities of Large Language Models in Single-cell Data Analysis paper, code

  • A Deep Dive into Single-Cell RNA Sequencing Foundation Models paper, code

  • Foundation Models Meet Imbalanced Single-Cell Data When Learning Cell Type Annotations paper, code

About

Paper collection for single cell foundation models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published