-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor code for assembly, localization, and calling (#294)
Currently, kevlar optimizes the assembly, localization, and calling operations by composing several generator functions and invoking them with multiprocessing. A lot of the benefit is lost, however: the calls to BWA and fermi-lite do take advantage of multithreading, but the coordinating code is limited by GIL. Plus, there are a tremendous number of calls to BWA for localizing, which introduces a lot of overhead. This update takes a different approach to optimizing these operations. The `kevlar assemble`, `kevlar cutout`, `kevlar call` commands now accept partitioned input files, and the `kevlar cutout` command (replacing `kevlar localize`) makes a single call to BWA rather than one call per partition. If multithreading is going to provide any benefit, it will be at the assembly stage, but that is out of scope for this update.
- Loading branch information
Showing
22 changed files
with
488 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#!/usr/bin/env python | ||
# | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2018 The Regents of the University of California | ||
# | ||
# This file is part of kevlar (http://github.com/dib-lab/kevlar) and is | ||
# licensed under the MIT license: see LICENSE. | ||
# ----------------------------------------------------------------------------- | ||
|
||
import re | ||
|
||
|
||
def subparser(subparsers): | ||
"""Define the `kevlar cutout` command-line interface.""" | ||
|
||
desc = """\ | ||
Given a reference genome and a set of contigs assembled from | ||
variant-spanning reads, retrieve the portions of the reference genome | ||
corresponding to the variants. NOTE: this command relies on the `bwa` | ||
program being in the PATH environmental variable. | ||
""" | ||
|
||
subparser = subparsers.add_parser('cutout', description=desc) | ||
subparser.add_argument('-d', '--delta', type=int, metavar='D', | ||
default=50, help='retrieve the genomic interval ' | ||
'from the reference by extending beyond the span ' | ||
'of all k-mer starting positions by D bp') | ||
subparser.add_argument('-p', '--part-id', type=str, metavar='ID', | ||
help='only localize partition "ID" in the input') | ||
subparser.add_argument('-o', '--out', metavar='FILE', default='-', | ||
help='output file; default is terminal (stdout)') | ||
subparser.add_argument('-z', '--seed-size', type=int, metavar='Z', | ||
default=51, help='seed size; default is 51') | ||
subparser.add_argument('-x', '--max-diff', type=int, metavar='X', | ||
default=None, help='split and report multiple ' | ||
'reference targets if the distance between two ' | ||
'seed matches is > X; by default, X is 3 times the ' | ||
'length of the longest contig') | ||
subparser.add_argument('--include', metavar='REGEX', type=re.escape, | ||
help='discard alignments to any chromosomes whose ' | ||
'sequence IDs do not match the given pattern') | ||
subparser.add_argument('--exclude', metavar='REGEX', type=re.escape, | ||
help='discard alignments to any chromosomes whose ' | ||
'sequence IDs match the given pattern') | ||
subparser.add_argument('contigs', help='assembled reads in Fasta format') | ||
subparser.add_argument('refr', help='BWA indexed reference genome') |
Oops, something went wrong.