Skip to content

ImmucanWP7/immucan-scdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMMUcan

scProcessor is used for the processing of scRNAseq datasets in the IMMUcan scDB. It runs on R and is mostly based on the Seurat package.

  • Quality control
  • Measure and correct batch effect (harmony)
  • Clustering optimization
  • Supervised annotation (CHETAH)
  • CNA calling (copyKat)
  • Cell ontology (ebi.ac.uk/ols/ontologies/cl)
  • Differential expression
  • Universal output files (sceasy)

Install instructions

  • Follow install instructions for sceasy (https://github.com/cellgeni/sceasy)
  • Get CHETAH_reference_updatedAnnotation.RData from IMMUcan teams channel
  • Install following R packages
install.packages(c("Seurat", "tidyverse", "readxl", "patchwork", "devtools", "data.table", "BiocManager", "remotes", "openxlsx", "pheatmap", "plyr", "DescTools", "future", "jsonlite"))
BiocManager::install(c("CHETAH", "SingleCellExperiment"))
devtools::install_github("mahmoudibrahim/genesorteR") 
devtools::install_github("immunogenomics/harmony")
devtools::install_github("navinlabcode/copykat")
remotes::install_github("mojaveazure/seurat-disk")

Before starting

Change the paths to files provided in the script

  • cellMarker_path = PATH to TME_markerGenes.xlsx
  • chetahClassifier_path = PATH to CHETAH_reference_updatedAnnotation.RData
  • cellOntology_path = PATH to cell_ontology.xlsx

Run scProcessor

The core of scProcessor are three processing scripts.

1. check_seurat.R: check seurat object and estimate batch

  • It takes a Seurat object as input (in the future this will be extended to other file formats)
  • This step is optional, if data.json is filled in you can immediately run scProcessor_1
    1. Check validity of seurat object
    2. Estimate batch variable
    3. Return QC plots (in temp)
Rscript check_seurat.R [SEURAT] [BATCH]
  • [SEURAT]: path to seurat object (if only one .rds file in directory it will also find it itself)
  • [BATCH]: only necessary when you already know your batch variable

2. data.json

  • scProcessor works without arguments to the Rscripts, therefore it needs an input file that specifies these variables. This is automatically generated by check_seurat and has to be reviewed to make sure scProcessor_1 processes the data how you want.
  • Here is an overview of the data.json (NA in a json is indicated as null)
    • object_path: full path where seurat object is stored
    • batch: e.g. patient
    • norm: boolean indicating if data is already normalized e.g. false
    • QC_feature_min: threshold for minimal number of detected genes per cell e.g. 250
    • QC_mt_max: threshold for maximal percentage of mitochondrial reads per cell e.g. 20
    • pca_dims: number of PCA dimensions to take for further processing e.g. 30
    • features_var: number of highly variable features to take for further processing e.g. 2000
    • nSample: number of cells to take for intense computing steps and for cellxgene.h5ad at the end e.g. 10000
    • cluster_resolution: a sequence of different cluster resolutions, scProcessor will select the most optimal resolution e.g. 0.5, 1, 1.5
    • malignant: boolean indicating if maligant cell prediction is necessary e.g. TRUE
    • normal_cells: cell type taken as normal cells to increase confidence of malingant cell prediction e.g. null (standard Macrophages are taken), false (no normal cells taken)
    • annotation: columns in meta.data that contains annotation information
    • metadata: other important columns contained in the meta.data slot e.g. biopsy, sample_id, treatment ...

3. scProcessor_1: the main processing script

  1. QC
  2. Batch integration and clustering
  3. Supervised classification and CNA calling
  4. Create marker gene plots
  5. Save summary statistics in misc

4. Annotate clusters

  • Check plots in temp/plots:
    • marker gene plots
    • dotplot
  • In out/annotation.xlsx, fill in cell types as defined in the abbreviation column of cell_ontology.xlsx

5. scProcessor_2: link to cell ontology and create all output files

  1. Links cell ontology
  2. Differential expression
  3. Creates output files for SIB scRNAseq interface
    • AverageExpression matrices and DE_results per annotation level
    • geneIndex.tsv
    • Metadata.tsv
    • cellCount.tsv
    • harmony.rds
    • cellxgene.h5ad

6. Create checksum file to send to SIB

on the terminal

zip -r AML_UNB_SW_GSE116256.zip AML_UNB_SW_GSE116256
md5sum AML_UNB_SW_GSE116256.zip
mv AML_UNB_SW_GSE116256 AML_UNB_SW_GSE116256_-_###PASTE_MD5SUM_OUTPUT_HERE###.zip 

Login to SIB through sftp and transfer

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages