<a href="https://www.kaggle.com/code/digitalbro/multimodal-sc-integration-meta-resources?scriptVersionId=104112207" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

The aim for this notebook is the create a central log (meta resource) of all the information and resources I've come across in the discussions, other notebooks, in my own discovery. I tend to find it best for me to keep track of all the many nooks one can store information on Kaggle in one place. 

The following notebook has tried to aggregate important papers / resources that will help in understanding how this one might go about completing this challenge. I've also provided where these resources have been taken at the bottom. 

## Experiments
### CITE-seq
![citeseq.com](https://citeseq.files.wordpress.com/2018/02/figure1.png?w=700)
 * Protein (CITE-seq): Enables leveraging legacy markers used over the last decades by immunologists to define cell spectrums 
 
 ### ATAC-seq
 
 * ATACseq: Key to defining immune cell states transitional states best defined by up- and down- regulation of critical transcription factors (usually poorly captured transcriptionally) 
 * RNAseq - Key to degining cell type/states indentity through modules of genes uniquely expressed
 

# Dictionary 
* ATAC - assay for transposase-accessible chromatin 
* CITE-seq: Cellular indexing of transcriptomesand epitopes by sequencing


# Papers
* [Integrated analysis of multimodal single-cell data](https://www.sciencedirect.com/science/article/pii/S0092867421005833)
* [New horizons in the stormy sea of multimodal single-cell data integration](https://www.sciencedirect.com/science/article/abs/pii/S1097276521010741)
* [Computation principles and challenges in single-cell data integration](https://www.nature.com/articles/s41587-021-00895-7)
* [Diagonal integration of multimodal single-cell data: potential pitfalls and paths forward](https://www.nature.com/articles/s41467-022-31104-x)
* [Bi-order multimodal integration of single-cell data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02679-x)
* [Multimodal single-cell approaches shed light on T cell heterogeneity](https://www.sciencedirect.com/science/article/pii/S0952791519300469)
* [Cobolt: integrative analysis of multimodal single-cell sequencing data](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02556-z)
* [Human haematopoietic stem cell lineage commitment is a continuous process](https://www.nature.com/articles/ncb3493)
* [Normalizing and denoising protein expression data from droplet-based single cell profiling](https://www.nature.com/articles/s41467-022-29356-8)
* [BABEL enables cross-modality translation between multiomic profiles at single-cell resolution](https://pubmed.ncbi.nlm.nih.gov/33827925/)
* [Current best practices in single-cell RNA-seq analysis: a tutorial](https://www.embopress.org/doi/full/10.15252/msb.20188746)
## Preprint
* [Computational challenges in cell cycle analysis using single cell transcriptomics](https://arxiv.org/abs/2208.05229) 
* [Multimodal single-cell chromatin analysis with Signac](https://www.biorxiv.org/content/10.1101/2020.11.09.373613v1.abstract) 
* [MultiVI: deep generative model for the integration of multimodal-data](https://www.biorxiv.org/content/10.1101/2021.08.20.457057v1)

# Experimental Details
* Cell Lines Used: https://allcells.com/research-grade-tissue-products/mobilized-leukopak/
* Multignome - ATAC + Gene Expression: https://www.10xgenomics.com/products/single-cell-multiome-atac-plus-gene-expression
    * Chromatin accessibility to predict gene expression
* CITESeq - Single Cell Gene Expression: https://support.10xgenomics.com/permalink/getting-started-single-cell-gene-expression-with-feature-barcoding-technology
    * Cell Surface Reagent - https://www.biolegend.com/en-gb/products/totalseq-b-human-universal-cocktail-v1dot0-20960

# Additional Information 
* [EBI Ensemble Id Information](https://www.ebi.ac.uk/training/online/courses/ensembl-browsing-genomes/navigating-ensembl/investigating-a-gene/#:~:text=Ensembl%20gene%20IDs%20begin%20with,of%20species%20other%20than%20human)
* [Eleven Grand Challenges in Single-Cell Data Science](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6)

# Kaggle Notebooks
* [scRNA-seq 🧬: Differential Expression with scVI](https://www.kaggle.com/code/hiramcho/scrna-seq-differential-expression-with-scvi/notebook)
* [scRNA-seq 🧬: Scanpy & SCMER for Feature Selection](https://www.kaggle.com/code/hiramcho/scrna-seq-scanpy-scmer-for-feature-selection/notebook)
* [scRNA-seq 🧬: scGAE with Spektral and RAPIDS](https://www.kaggle.com/code/hiramcho/scrna-seq-scgae-with-spektral-and-rapids/notebook)
* [scATAC-seq 🧬: Feature Importance with TabNet](https://www.kaggle.com/code/hiramcho/scatac-seq-feature-importance-with-tabnet/notebook)
* [scATAC-seq 🧬: EpiScanpy & PeakVI](https://www.kaggle.com/code/hiramcho/scatac-seq-episcanpy-peakvi)


# External Notebooks / Packages 
* [KNN Solution](https://github.com/adavoudi/msci_knn)

# Learning Resources
* [MIA: Multimodal Single-cell data, open benchmarks, and a NeurIPS 2021](https://www.biolegend.com/en-gb/products/totalseq-b-human-universal-cocktail-v1dot0-20960) - *video* 
* [Open Problems in Single Cells Analysis](https://openproblems.bio/neurips_docs/data/about_multimodal/)
    * Open problems in scAnalysis - 

# Potentially Useful Packages (Python)
* [muon](https://muon.readthedocs.io/en/latest/api/generated/muon.atac.pp.tfidf.html?highlight=tfidf) - muon is a Python framework for multimodal omics analysis. While there are many features that muon brings to the table, there are three key areas that its functionality is focused on.
* [scanpy](https://scanpy.readthedocs.io/en/stable/index.html) - Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
* [anndata](https://anndata.readthedocs.io/en/latest/#) - nndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.
* [Xarray](https://docs.xarray.dev/en/v0.9.2/dask.html) - xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. **This will help split up the large dataset**
* [ivis](https://bering-ivis.readthedocs.io/en/latest/index.html) - ivis is a machine learning library for reducing dimensionality of very large datasets using Siamese Neural Networks. ivis preserves global data structures in a low-dimensional space, adds new data points to existing embeddings using a parametric mapping function, and scales linearly to millions of observations. The algorithm is described in detail in Structure-preserving visualisation of high dimensional single-cell datasets.

# Appendix 

## Cell Types 
* MasP = Mast Cell Progenitor
* MkP = Megakaryocyte Progenitor
* NeuP = Neutrophil Progenitor
* MoP = Monocyte Progenitor
* EryP = Erythrocyte Progenitor
* HSC = Hematoploetic Stem Cell
* BP = B-Cell Progenitor

# Thanks 
At this point I just have aggregated the information from various notebooks and discussions as a way to keep track of all of the various notebooks 
* Thomas Shelby - https://www.kaggle.com/competitions/open-problems-multimodal/discussion/344686
* Daniel Burkhardt - https://www.kaggle.com/competitions/open-problems-multimodal/discussion/344607
* Kaggle Data Details the team at Cellarity - https://www.kaggle.com/competitions/open-problems-multimodal/data
* Peter Holderrieth - https://www.kaggle.com/competitions/open-problems-multimodal/discussion/345958
* Marília Prat - https://www.kaggle.com/competitions/open-problems-multimodal/discussion/346686
* Alireza - https://www.kaggle.com/competitions/open-problems-multimodal/discussion/346894