Skip to content

Running InferCNV

Brian Haas edited this page Mar 11, 2019 · 21 revisions

Running InferCNV

InferCNV can be run via a simple 2-step protocol, or can be run step-by-step with customization for more exploratory purposes.

InferCNV 2-step execution:

Creating an InferCNV object based on your three required inputs: the read count matrix, cell type annotations, and the gene ordering file:

# create the infercnv object
infercnv_obj = CreateInfercnvObject(raw_counts_matrix="singleCell.counts.matrix",
                                    annotations_file="cellAnnotations.txt",
                                    delim="\t",
                                    gene_order_file="gene_ordering_file.txt",
                                    ref_group_names=c("normal"))

where the ref_group_names parameter is set to the various normal-cell type (non-tumor) as defined in the cellAnnotations.txt file. See File-Definitions for more details here.

Note, if you do not have reference cells, you can set ref_group_names=NULL, in which case the average signal across all cells will be used to define the baseline. This can work well when there are sufficient differences among the cells included (ie. they do not all show a chromosomal deletion at the same place).

Note, inferCNV expects that you've already filtered out low quality cells. If you need to further impose minimum/maximum read counts per cell, you can include an additional filter, such as: min_max_counts_per_cell=c(1e5,1e6)

After creating the infercnv_obj, you can then run the standard infercnv procedure via the built-in 'infercnv::run()' method like so:

# perform infercnv operations to reveal cnv signal
infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=1,  # use 1 for smart-seq, 0.1 for 10x-genomics
                             out_dir="output_dir",  # dir is auto-created for storing outputs
                             cluster_by_groups=T,   # cluster
                             denoise=T,
                             HMM=T
                             )

The cutoff value determines which genes will be used for the infercnv analysis. Genes with a mean number of counts across cells will be excluded. For smart-seq (full-length transcript sequencing, typically using cell plate assays rather than droplets), a value of 1 works well. For 10x (and potentially other 3'-end sequencing and droplet assays, where the count matrix tends to be more sparse), a value of 0.1 is found to generally work well.

The out_dir is given an output directory name. If the directory doesn't exist, it will be created directly.

The 'cluster_by_groups' setting indicates to perform separate clustering for the tumor cells according to the patient type, as defined in the cell annotations file.

InferCNV step-by-step exploratory execution:

The general infercnv workflow as performed via the above infercnv::run() method operates as follows:

Setting run(denoise=TRUE) enables the de-noising procedure. Several de-noising filters are available for exploration.

Setting run(HMM=TRUE) enables the CNV predictions. There are multiple inferCNV HMM prediction methods available to explore as well.

Interactive Data Exploration

To interactively explore the inferCNV heatmap, see our documentation here.