Codebook for analyses in Bosco et al 2023:
KaryoCreate: A CRISPR-based technology to study chromosome-specific aneuploidy by targeting human centromeres. Cell 2023. https://doi.org/10.1016/j.cell.2023.03.029
A modified version of the CopyKat v1.0.5 pipeline for R was used to generate a Copy Number Alteration (SCNA) score for each chromosome or chromosome arm in each single cell.
Single-cell RNA-seq data have been deposited at GEO and are publicly available as of the date of publication through GEO Series accession number GSE215842.
Hashtag and sample metadata for experiment sets 01 and 02 can be found in /metadata/
.
Scripts can be found in the /scripts/
folder.
Software dependencies:
- 10X Genomics cellranger version 6+
- R version 4.1+
R package dependencies:
- argparse
- SeuratObject
- biomaRt
- dplyr
- tidyr
- stringr
- knitr
- kableExtra
- here
- ggplot2
- HGNChelper
- copyKat v1.0.5
10X Genomics libraries were sequenced on an Illumina sequencer. Output is a series of .fastq
files for both the gene expression (GEX) library and hashtag oligo (HTO) library. cellranger count
from 10X's cellranger package to align the sequencing reads for the libraries and deconvolute single cells. Example library and feature reference files can be found in the /metadata/
folder. The HTO-ref.csv
file is set up for the 10 TotalSeqB Human Hashtag sequences from BioLegend.
cellranger count --transcriptome /path/to/reference/genome --id experimentName --libraries fastq-libraries.csv --feature-ref HTO-ref.csv
Gene annotation metadata can be generated using the following script. This metadata only needs to be generated once for each reference genome used in CellRanger.
source(here("getGeneMetadata.R"))
exp.genes <- read.table(here("filtered_feature_bc_matrix", "features.tsv.gz"))
sc.gene.annotations <- getGeneMetadata(gene.list = exp.genes[,2], name.type = "symbol", sex.chr = c("X","Y"))
saveRDS(sc.gene.annotations, file = here("gene-annotation-matrix.RDS"))
The Seurat R package is used to demultiplex the filtered expression matrix into samples labeled by HTO, remove doublets, and additional QC steps to filter low-quality and dying cells.
Analysis R notebooks for these steps for experiment sets 01 and 02 can be found in /qc-notebooks/
.
Copy number alteration (CNA) value matrices are generated by modified CopyKat analysis, done by running copykat-modified-analysis.R
via R through the command line. This script uses the argparse
R package to allow for arguments to be passed in using the command line such that the script does not need to be modified for each new experiment. Argument details can be accessed in the help files retreieved using Rscript copykat-modified-analysis.R -h
.
Rscript copykat-scripts/copykat-modified-analysis-hpc.R --experiment.folder folderName \
--output.prefix prefixName \
--preprocessed.seurat.object sc.Seurat.HTOdeconvolved.filtered.RDS \
--gene.annotation.matrix gene-annotation-matrix.RDS \
--control.samples "HTO-01" \
--evaluation.samples "HTO-01, HTO-02, HTO-03, HTO-04, HTO-05, HTO-06, HTO-07, HTO-08, HTO-09, HTO-10"