Skip to content

Running InferCNV

Brian Haas edited this page Oct 31, 2018 · 21 revisions

Running InferCNV

InferCNV can be run via a simple 2-step protocol, or can be run step-by-step with customization for more exploratory purposes.

InferCNV 2-step execution:

Creating an InferCNV object based on your three required inputs: the read count matrix, cell type annotations, and the gene ordering file:

# create the infercnv object
infercnv_obj = CreateInfercnvObject(raw_counts_matrix="singleCell.counts.matrix",
                                    annotations_file="cellAnnotations.txt",
                                    delim="\t",
                                    gene_order_file="gene_ordering_file.txt",
                                    ref_group_names=c("normal"))

where the ref_group_names parameter is set to the various normal-cell type (non-tumor) as defined in the cellAnnotations.txt file. SeeFile-Definitions for more details here.

After creating the infercnv_obj, you can then run the standard infercnv procedure via the built-in 'infercnv::run()' method like so:

# perform infercnv operations to reveal cnv signal
infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=1,  # use 1 for smart-seq, 0.1 for 10x-genomics
                             out_dir="output_dir",  # dir is auto-created for storing outputs
                             cluster_by_groups=T,   # cluster
                             include.spike=T
                             )

The cutoff value determines which genes will be used for the infercnv analysis. Genes with a mean number of counts across cells will be excluded. For smart-seq (full-length transcript sequencing, typically using cell plate assays rather than droplets), a value of 1 works well. For 10x (and potentially other 3'-end sequencing and droplet assays, where the count matrix tends to be more sparse), a value of 0.1 is found to generally work well.

The out_dir is given an output directory name. If the directory doesn't exist, it will be created directly.

The 'cluster_by_groups' setting indicates to perform separate clustering for the tumor cells according to the patient type, as defined in the cell annotations file.

If 'include.spike' is enabled, infercnv will artificially inject a spike-in with defined gain/loss thresholds based on the normal cells. These spiked-in data will be tracked throughout the various infercnv data manipulations, and finally used to scale the data to complete loss (0x) to a gain of 2x. The spiked-in data is then removed before generating the final outputs.

InferCNV step-by-step exploratory execution:

The general infercnv workflow as performed via the above infercnv::run() method operates as follows:

This process can be executed step-by-step as outlined in this Rmd: example.html