Subset subread by single barcode in order to generate inputs for DeepConcensus #523

peterdfields · 2022-07-11T16:06:04Z

Operating system

Linux, Ubuntu 18.04.4 LTS

Package name

lima 2.6.0

Using:
  lima      : 2.6.0 (commit v2.6.0)
  pbbam     : 2.1.0 (commit v2.1.0)
  pbcopper  : 2.0.0 (commit v2.0.0-59-g580e770)
  boost     : 1.77
  htslib    : 1.15
  zlib      : 1.2.11

Conda environment

# packages in environment at /home/peter/miniconda3/envs/lima:
#
# Name                    Version                   Build  Channel
lima                      2.6.0                h9ee0642_0    bioconda

My issue here isn't a bug. I may be overlooking what I need to do in the documentation, in which case I apologize for the inconvenience. I'm trying to subset a raw, HiFi barcoded subread bam file to access just subreads related to a single sample. I had initially tried running lima and included only a single barcode. I see the following summary:

ZMWs input                (A) : 4840015
ZMWs above all thresholds (B) : 2834511 (58.56%)
ZMWs below any threshold  (C) : 2005504 (41.44%)

ZMW marginals for (C):
Below min length              : 40 (0.00%)
Below min score               : 69019 (3.44%)
Below min end score           : 69019 (3.44%)
Below min passes              : 0 (0.00%)
Below min score lead          : 69019 (3.44%)
Below min ref span            : 107230 (5.35%)
Without SMRTbell adapter      : 1881585 (93.82%)

ZMWs for (B):
With same pair                : 2834511 (100.00%)
Coefficient of correlation    : 0.00%

ZMWs for (A):
Allow diff pair               : 2353060 (48.62%)
Allow same pair               : 2958430 (61.12%)
Bad adapter impurity          : 93812 (1.94%)

Reads for (B):
Above length                  : 25106741 (99.73%)
Below length                  : 69198 (0.27%)

My interpretation is that the subsetting did not in fact work as expected. I'm trying to create a single subread.bam file so that I can test out the DeepConcensus pipeline on only one of the three samples multiplexed in the Sequel II SmrtCell. Please let me know if any additional information would be helpful.

The text was updated successfully, but these errors were encountered:

armintoepfer · 2022-07-14T12:03:16Z

Perform CCS first and then use all barcodes in the FASTA with lima --split to get one ccs BAM per barcode pair. Before you go into DeepConsensus, actc will only use the subreads from your input ccs reads.

peterdfields · 2022-07-14T14:59:53Z

Thank you @armintoepfer In the end I'd used lima --split-bam but your solution is probably better.

amwenger · 2022-07-19T04:12:37Z

We also tested running lima after DeepConsensus, and that produces nearly identical results. So, demux can be run either before or after DeepConsensus, whichever is most convenient.

armintoepfer mentioned this issue Jul 14, 2022

Advice for dealing with barcode multiplexing google/deepconsensus#32

Closed

armintoepfer closed this as completed Jul 14, 2022

pgrosu mentioned this issue Aug 2, 2023

Separate subreads for mixed samples? google/deepconsensus#68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subset subread by single barcode in order to generate inputs for DeepConcensus #523

Subset subread by single barcode in order to generate inputs for DeepConcensus #523

peterdfields commented Jul 11, 2022

armintoepfer commented Jul 14, 2022

peterdfields commented Jul 14, 2022

amwenger commented Jul 19, 2022

Subset subread by single barcode in order to generate inputs for DeepConcensus #523

Subset subread by single barcode in order to generate inputs for DeepConcensus #523

Comments

peterdfields commented Jul 11, 2022

armintoepfer commented Jul 14, 2022

peterdfields commented Jul 14, 2022

amwenger commented Jul 19, 2022