IMMUcan

scProcessor is used for the processing of scRNAseq datasets in the IMMUcan scDB. It runs on R and is mostly based on the Seurat package.

Quality control
Measure and correct batch effect (harmony)
Clustering optimization
Supervised annotation (CHETAH)
CNA calling (copyKat)
Cell ontology (ebi.ac.uk/ols/ontologies/cl)
Differential expression
Universal output files (sceasy)

Install instructions

Follow install instructions for sceasy (https://github.com/cellgeni/sceasy)
Get CHETAH_reference_updatedAnnotation.RData from IMMUcan teams channel
Install following R packages

install.packages(c("Seurat", "tidyverse", "readxl", "patchwork", "devtools", "data.table", "BiocManager", "remotes", "openxlsx", "pheatmap", "plyr", "DescTools", "future", "jsonlite"))
BiocManager::install(c("CHETAH", "SingleCellExperiment"))
devtools::install_github("mahmoudibrahim/genesorteR") 
devtools::install_github("immunogenomics/harmony")
devtools::install_github("navinlabcode/copykat")
remotes::install_github("mojaveazure/seurat-disk")

Before starting

Change the paths to files provided in the script

cellMarker_path = PATH to TME_markerGenes.xlsx
chetahClassifier_path = PATH to CHETAH_reference_updatedAnnotation.RData
cellOntology_path = PATH to cell_ontology.xlsx

Run scProcessor

The core of scProcessor are three processing scripts.

1. check_seurat.R: check seurat object and estimate batch

It takes a Seurat object as input (in the future this will be extended to other file formats)
This step is optional, if data.json is filled in you can immediately run scProcessor_1
1. Check validity of seurat object
2. Estimate batch variable
3. Return QC plots (in temp)

Rscript check_seurat.R [SEURAT] [BATCH]

[SEURAT]: path to seurat object (if only one .rds file in directory it will also find it itself)
[BATCH]: only necessary when you already know your batch variable

2. data.json

scProcessor works without arguments to the Rscripts, therefore it needs an input file that specifies these variables. This is automatically generated by check_seurat and has to be reviewed to make sure scProcessor_1 processes the data how you want.
Here is an overview of the data.json (NA in a json is indicated as null)
- object_path: full path where seurat object is stored
- batch: e.g. patient
- norm: boolean indicating if data is already normalized e.g. false
- QC_feature_min: threshold for minimal number of detected genes per cell e.g. 250
- QC_mt_max: threshold for maximal percentage of mitochondrial reads per cell e.g. 20
- pca_dims: number of PCA dimensions to take for further processing e.g. 30
- features_var: number of highly variable features to take for further processing e.g. 2000
- nSample: number of cells to take for intense computing steps and for cellxgene.h5ad at the end e.g. 10000
- cluster_resolution: a sequence of different cluster resolutions, scProcessor will select the most optimal resolution e.g. 0.5, 1, 1.5
- malignant: boolean indicating if maligant cell prediction is necessary e.g. TRUE
- normal_cells: cell type taken as normal cells to increase confidence of malingant cell prediction e.g. null (standard Macrophages are taken), false (no normal cells taken)
- annotation: columns in meta.data that contains annotation information
- metadata: other important columns contained in the meta.data slot e.g. biopsy, sample_id, treatment ...

3. scProcessor_1: the main processing script

QC
Batch integration and clustering
Supervised classification and CNA calling
Create marker gene plots
Save summary statistics in misc

4. Annotate clusters

Check plots in temp/plots:
- marker gene plots
- dotplot
In out/annotation.xlsx, fill in cell types as defined in the abbreviation column of cell_ontology.xlsx

5. scProcessor_2: link to cell ontology and create all output files

Links cell ontology
Differential expression
Creates output files for SIB scRNAseq interface
- AverageExpression matrices and DE_results per annotation level
- geneIndex.tsv
- Metadata.tsv
- cellCount.tsv
- harmony.rds
- cellxgene.h5ad

6. Create checksum file to send to SIB

on the terminal

zip -r AML_UNB_SW_GSE116256.zip AML_UNB_SW_GSE116256
md5sum AML_UNB_SW_GSE116256.zip
mv AML_UNB_SW_GSE116256 AML_UNB_SW_GSE116256_-_###PASTE_MD5SUM_OUTPUT_HERE###.zip

Login to SIB through sftp and transfer

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.gitignore		.gitignore
10x_dir_to_files.py		10x_dir_to_files.py
10x_files_to_dir.py		10x_files_to_dir.py
CHETAH_referenceDataset.Rmd		CHETAH_referenceDataset.Rmd
CHETAH_referenceDataset.html		CHETAH_referenceDataset.html
Convert_seurat_to_anndata.R		Convert_seurat_to_anndata.R
IMMUcan.Rproj		IMMUcan.Rproj
README.md		README.md
Rename_meta.data.Rmd		Rename_meta.data.Rmd
TME_markerGenes.xlsx		TME_markerGenes.xlsx
annotate.R		annotate.R
annotation_CHETAH.xlsx		annotation_CHETAH.xlsx
cell_ontology.xlsx		cell_ontology.xlsx
check_seurat.R		check_seurat.R
create_seurat_fromCSV.R		create_seurat_fromCSV.R
data_example.json		data_example.json
scProcessor_1.R		scProcessor_1.R
scProcessor_1.sh		scProcessor_1.sh
scProcessor_2.R		scProcessor_2.R
scProcessor_2.sh		scProcessor_2.sh
scRNA_seq_database_summary_stat.Rmd		scRNA_seq_database_summary_stat.Rmd
tidy_metadata.R		tidy_metadata.R
tidy_metadata.xlsx		tidy_metadata.xlsx
zip_checksum.R		zip_checksum.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMMUcan

Install instructions

Before starting

Run scProcessor

1. check_seurat.R: check seurat object and estimate batch

2. data.json

3. scProcessor_1: the main processing script

4. Annotate clusters

5. scProcessor_2: link to cell ontology and create all output files

6. Create checksum file to send to SIB

About

Releases

Packages

Languages

ImmucanWP7/immucan-scdb

Folders and files

Latest commit

History

Repository files navigation

IMMUcan

Install instructions

Before starting

Run scProcessor

1. check_seurat.R: check seurat object and estimate batch

2. data.json

3. scProcessor_1: the main processing script

4. Annotate clusters

5. scProcessor_2: link to cell ontology and create all output files

6. Create checksum file to send to SIB

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages