Skip to content

davolilab/karyocreate

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KaryoCreate Analysis

Codebook for analyses in Bosco et al 2023:

KaryoCreate: A CRISPR-based technology to study chromosome-specific aneuploidy by targeting human centromeres. Cell 2023. https://doi.org/10.1016/j.cell.2023.03.029

scRNAseq Analysis

A modified version of the CopyKat v1.0.5 pipeline for R was used to generate a Copy Number Alteration (SCNA) score for each chromosome or chromosome arm in each single cell.

Data availability

Single-cell RNA-seq data have been deposited at GEO and are publicly available as of the date of publication through GEO Series accession number GSE215842.

Hashtag and sample metadata for experiment sets 01 and 02 can be found in /metadata/.

Data Processing

Scripts can be found in the /scripts/ folder.

Software dependencies:

R package dependencies:

  • argparse
  • SeuratObject
  • biomaRt
  • dplyr
  • tidyr
  • stringr
  • knitr
  • kableExtra
  • here
  • ggplot2
  • HGNChelper
  • copyKat v1.0.5

cellranger

10X Genomics libraries were sequenced on an Illumina sequencer. Output is a series of .fastq files for both the gene expression (GEX) library and hashtag oligo (HTO) library. cellranger count from 10X's cellranger package to align the sequencing reads for the libraries and deconvolute single cells. Example library and feature reference files can be found in the /metadata/ folder. The HTO-ref.csv file is set up for the 10 TotalSeqB Human Hashtag sequences from BioLegend.

cellranger count --transcriptome /path/to/reference/genome --id experimentName --libraries fastq-libraries.csv --feature-ref HTO-ref.csv

Gene annotations

Gene annotation metadata can be generated using the following script. This metadata only needs to be generated once for each reference genome used in CellRanger.

source(here("getGeneMetadata.R"))
exp.genes <- read.table(here("filtered_feature_bc_matrix", "features.tsv.gz"))
sc.gene.annotations <- getGeneMetadata(gene.list = exp.genes[,2], name.type = "symbol", sex.chr = c("X","Y"))
saveRDS(sc.gene.annotations, file = here("gene-annotation-matrix.RDS"))

QC

The Seurat R package is used to demultiplex the filtered expression matrix into samples labeled by HTO, remove doublets, and additional QC steps to filter low-quality and dying cells.

Analysis R notebooks for these steps for experiment sets 01 and 02 can be found in /qc-notebooks/.

Modified CopyKat Analysis

Copy number alteration (CNA) value matrices are generated by modified CopyKat analysis, done by running copykat-modified-analysis.R via R through the command line. This script uses the argparse R package to allow for arguments to be passed in using the command line such that the script does not need to be modified for each new experiment. Argument details can be accessed in the help files retreieved using Rscript copykat-modified-analysis.R -h.

Rscript copykat-scripts/copykat-modified-analysis-hpc.R --experiment.folder folderName \
                                                        --output.prefix prefixName \
                                                        --preprocessed.seurat.object sc.Seurat.HTOdeconvolved.filtered.RDS \
                                                        --gene.annotation.matrix gene-annotation-matrix.RDS \
                                                        --control.samples "HTO-01" \
                                                        --evaluation.samples "HTO-01, HTO-02, HTO-03, HTO-04, HTO-05, HTO-06, HTO-07, HTO-08, HTO-09, HTO-10"

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%