Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subset subread by single barcode in order to generate inputs for DeepConcensus #523

Closed
peterdfields opened this issue Jul 11, 2022 · 3 comments

Comments

@peterdfields
Copy link

Operating system

Linux, Ubuntu 18.04.4 LTS

Package name

lima 2.6.0

Using:
  lima      : 2.6.0 (commit v2.6.0)
  pbbam     : 2.1.0 (commit v2.1.0)
  pbcopper  : 2.0.0 (commit v2.0.0-59-g580e770)
  boost     : 1.77
  htslib    : 1.15
  zlib      : 1.2.11

Conda environment

# packages in environment at /home/peter/miniconda3/envs/lima:
#
# Name                    Version                   Build  Channel
lima                      2.6.0                h9ee0642_0    bioconda

My issue here isn't a bug. I may be overlooking what I need to do in the documentation, in which case I apologize for the inconvenience. I'm trying to subset a raw, HiFi barcoded subread bam file to access just subreads related to a single sample. I had initially tried running lima and included only a single barcode. I see the following summary:

ZMWs input                (A) : 4840015
ZMWs above all thresholds (B) : 2834511 (58.56%)
ZMWs below any threshold  (C) : 2005504 (41.44%)

ZMW marginals for (C):
Below min length              : 40 (0.00%)
Below min score               : 69019 (3.44%)
Below min end score           : 69019 (3.44%)
Below min passes              : 0 (0.00%)
Below min score lead          : 69019 (3.44%)
Below min ref span            : 107230 (5.35%)
Without SMRTbell adapter      : 1881585 (93.82%)

ZMWs for (B):
With same pair                : 2834511 (100.00%)
Coefficient of correlation    : 0.00%

ZMWs for (A):
Allow diff pair               : 2353060 (48.62%)
Allow same pair               : 2958430 (61.12%)
Bad adapter impurity          : 93812 (1.94%)

Reads for (B):
Above length                  : 25106741 (99.73%)
Below length                  : 69198 (0.27%)

My interpretation is that the subsetting did not in fact work as expected. I'm trying to create a single subread.bam file so that I can test out the DeepConcensus pipeline on only one of the three samples multiplexed in the Sequel II SmrtCell. Please let me know if any additional information would be helpful.

@armintoepfer
Copy link
Member

Perform CCS first and then use all barcodes in the FASTA with lima --split to get one ccs BAM per barcode pair. Before you go into DeepConsensus, actc will only use the subreads from your input ccs reads.

@peterdfields
Copy link
Author

Thank you @armintoepfer In the end I'd used lima --split-bam but your solution is probably better.

@amwenger
Copy link

We also tested running lima after DeepConsensus, and that produces nearly identical results. So, demux can be run either before or after DeepConsensus, whichever is most convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants