Error in CountPeaks() #35

fciamponi · 2021-07-31T00:42:36Z

Hello Sierra team,

I was hoping to get some help regarding an issue (see below) that I'm having when running Sierra CountPeaks with a specific sample (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR13043563 and SRR13043565). The pipeline works fine with both the vignette dataset and with other samples from the same experiment/patient (ex. SRR13043564).

I've tried some of the common suggestions found in other issues (ex. checking my gtf annotations, barcode file, bam file index and etc...) but so far nothing worked.

I'm using cellranger's annotation as a reference (refdata-gex-GRCh38-2020-A) and mapped my samples with STAR aligner (since the original experiment is MARS-seq) and followed STARsolo guidelines to make it as close to cellranger as possible.

Thanks in advance for any help you can provide.

Best regards,
Felipe

Error in {: task 32 failed - "1 elements in value to replace 0 elements"
Traceback:

CountPeaks(peak.sites.file = peak.merge.output.file, gtf.file = reference.file,
. bamfile = bam.file, whitelist.file = whitelist.bc.file, output.dir = outfile,
. countUMI = TRUE, ncores = 16)
foreach::foreach(each.chr = chr.names, .combine = "rbind", .packages = c("magrittr")) %dopar%
. {
. mat.per.chr <- c()
. message("Processing chr: ", each.chr)
. for (strand in c(1, -1)) {
. message(" and strand ", strand)
. isMinusStrand <- if (strand == 1)
. FALSE
. else TRUE
. peak.sites.chr <- dplyr::filter(peak.sites, Chr ==
. each.chr & Strand == strand) %>% dplyr::select(Gene,
. Chr, Fit.start, Fit.end, Strand)
. peak.sites.chr$Fit.start <- as.integer(peak.sites.chr$Fit.start)
. peak.sites.chr$Fit.end <- as.integer(peak.sites.chr$Fit.end)
. peak.sites.chr <- dplyr::filter(peak.sites.chr, Fit.start <
. Fit.end)
. if (nrow(peak.sites.chr) == 0) {
. next
. }
. isMinusStrand <- if (strand == 1)
. FALSE
. else TRUE
. which <- GenomicRanges::GRanges(seqnames = each.chr,
. ranges = IRanges::IRanges(1, max(peak.sites.chr$Fit.end)))
. param <- Rsamtools::ScanBamParam(tag = c(CBtag, UMItag),
. which = which, flag = Rsamtools::scanBamFlag(isMinusStrand = isMinusStrand))
. aln <- GenomicAlignments::readGAlignments(bamfile,
. param = param)
. nobarcodes <- which(unlist(is.na(GenomicRanges::mcols(aln)[CBtag])))
. noUMI <- which(unlist(is.na(GenomicRanges::mcols(aln)[UMItag])))
. to.remove <- dplyr::union(nobarcodes, noUMI)
. if (length(to.remove) > 0) {
. aln <- aln[-to.remove]
. }
. whitelist.pos <- which(unlist(GenomicRanges::mcols(aln)[CBtag]) %in%
. whitelist.bc)
. aln <- aln[whitelist.pos]
. if (countUMI) {
. GenomicRanges::mcols(aln)$CB_UB <- paste0(unlist(GenomicRanges::mcols(aln)[CBtag]),
. "_", unlist(GenomicRanges::mcols(aln)[UMItag]))
. uniqUMIs <- which(!duplicated(GenomicRanges::mcols(aln)$CB_UB))
. aln <- aln[uniqUMIs]
. }
. aln <- GenomicRanges::split(aln, unlist(GenomicRanges::mcols(aln)[CBtag]))
. polyA.GR <- GenomicRanges::GRanges(seqnames = peak.sites.chr$Chr,
. IRanges::IRanges(start = peak.sites.chr$Fit.start,
. end = as.integer(peak.sites.chr$Fit.end)))
. n.polyA <- length(polyA.GR)
. barcodes.gene <- names(aln)
. res <- sapply(barcodes.gene, function(x) GenomicRanges::countOverlaps(polyA.GR,
. aln[[x]]))
. res.mat <- matrix(0L, nrow = n.polyA, ncol = n.bcs)
. res.mat[, match(barcodes.gene, whitelist.bc)] <- res
. mat.per.strand <- Matrix::Matrix(res.mat, sparse = TRUE)
. polyA.ids <- paste0(peak.sites.chr$Gene, ":", peak.sites.chr$Chr,
. ":", peak.sites.chr$Fit.start, "-", peak.sites.chr$Fit.end,
. ":", peak.sites.chr$Strand)
. rownames(mat.per.strand) <- polyA.ids
. if (is.null(mat.per.chr)) {
. mat.per.chr <- mat.per.strand
. }
. else {
. mat.per.chr <- rbind(mat.per.chr, mat.per.strand)
. }
. }
. return(mat.per.chr)
. }
e$fun(obj, substitute(ex), parent.frame(), e$data)

rj-patrick · 2021-07-31T07:27:30Z

Hi @fciamponi,

Thanks for the information. The easiest thing would be if we could reproduce the error on our end. If you were to take a subset of the bam does it still produce the error? If so, would you please send it (along with the peak whitelist files) and we can work out what's going on.

Cheers,
Ralph

fciamponi · 2021-08-07T01:03:02Z

Thanks for the quick reply @reprobate.

Yes, I reproduced the error with a small subset of the original bam file (with just 10% of total reads). I'm including the sample here, along with the barcodes and merged peaks information.

However, I also noticed that the same error occured when I create randomized subsets for previsously working .bam files (such as the case with SRR13043564). I then proceeded to create a subset that contains 20% of the reads from each cell (instead of drawing 10~50% of reads randomly from the original bam file) but the problem persisted.

I've uploaded the files to a google drive folder (https://drive.google.com/drive/folders/1IhiZJ_86R2MeEgdT_M6Jaxnz_qssAHbK?usp=sharing). In this case, SRR13043564 runs without issues but SRR13043563 gives the error I mentioned in the opening comment.

In both cases, my command line was (just switching replacing 63-to-64 when necessary):

dir.create('./peakCountsTest/',recursive = TRUE)

whitelist.bc.file <- ./'CellBarcodes.COHEN2021.unique.txt'
reference.file <- './genes.gtf'
bam.file <- '../SRR13043564.bam'
sample <- 'SRR13043564'

outfile <- paste0('./peakCountsTest/',sample,'.counts')

peak.merge.output.file <- "./mergedPeaks.txt"

CountPeaks(peak.sites.file = peak.merge.output.file,
gtf.file = reference.file,
bamfile = bam.file,
whitelist.file = whitelist.bc.file,
output.dir = outfile,
countUMI = TRUE,
ncores = 16)

Best,
Felipe

PS. Sorry for taking so long to reply, it was a very busy week.

rj-patrick · 2021-08-13T12:26:58Z

Hi Felipe,

Thanks for the data, it was very helpful for debugging. The issue was traced to an evidently rare situation where there was a single peak called in a chromosome, but with no barcodes in the whitelist. I've added a check for this situation which should fix it - I'm now able to run CountPeaks on your example data without error. Let me know if you have any more issues.

Cheers,
Ralph

fciamponi · 2021-08-19T03:15:34Z

Thanks, it worked perfectly now!

Sorry for the inconvenience.

Best,
Felipe

rj-patrick · 2021-08-19T06:49:37Z

No worries, thanks for helping us find that bug!

Cheers,
Ralph

fciamponi changed the title ~~Error in PeakCount()~~ Error in CountPeaks() Jul 31, 2021

rj-patrick closed this as completed Aug 19, 2021

johnchamberlin mentioned this issue Aug 31, 2021

Error in CountPeaks step #41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in CountPeaks() #35

Error in CountPeaks() #35

fciamponi commented Jul 31, 2021 •

edited

Loading

rj-patrick commented Jul 31, 2021

fciamponi commented Aug 7, 2021 •

edited

Loading

rj-patrick commented Aug 13, 2021

fciamponi commented Aug 19, 2021

rj-patrick commented Aug 19, 2021

Error in CountPeaks() #35

Error in CountPeaks() #35

Comments

fciamponi commented Jul 31, 2021 • edited Loading

rj-patrick commented Jul 31, 2021

fciamponi commented Aug 7, 2021 • edited Loading

rj-patrick commented Aug 13, 2021

fciamponi commented Aug 19, 2021

rj-patrick commented Aug 19, 2021

fciamponi commented Jul 31, 2021 •

edited

Loading

fciamponi commented Aug 7, 2021 •

edited

Loading