-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #91 from dib-lab/docs/overhaul
Documentation overhaul
- Loading branch information
Showing
11 changed files
with
244 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,78 @@ | ||
Command-line interface | ||
====================== | ||
Comprehensive command-line interface reference | ||
============================================== | ||
|
||
The **kevlar** command-line interface is designed around a single command :code:`kevlar`. | ||
From this one command, a variety of tasks and procedures can be invoked using several *subcommands*. | ||
|
||
Once **kevlar** is installed, available subcommands can be listed by executing :code:`kevlar -h`. | ||
To see instructions for running a specific subcommand, execute :code:`kevlar <subcommand> -h` (of course replacing :code:`subcommand` with the actual name of the subcommand). | ||
|
||
More information will be posted here soon! | ||
kevlar dump | ||
----------- | ||
|
||
.. argparse:: | ||
:module: kevlar.cli | ||
:func: parser | ||
:nodefault: | ||
:prog: kevlar | ||
:path: dump | ||
|
||
kevlar count | ||
------------ | ||
|
||
.. argparse:: | ||
:module: kevlar.cli | ||
:func: parser | ||
:nodefault: | ||
:prog: kevlar | ||
:path: count | ||
|
||
kevlar novel | ||
------------ | ||
|
||
.. argparse:: | ||
:module: kevlar.cli | ||
:func: parser | ||
:nodefault: | ||
:prog: kevlar | ||
:path: novel | ||
|
||
kevlar filter | ||
------------- | ||
|
||
.. argparse:: | ||
:module: kevlar.cli | ||
:func: parser | ||
:nodefault: | ||
:prog: kevlar | ||
:path: filter | ||
|
||
kevlar assemble | ||
--------------- | ||
|
||
.. argparse:: | ||
:module: kevlar.cli | ||
:func: parser | ||
:nodefault: | ||
:prog: kevlar | ||
:path: assemble | ||
|
||
kevlar localize | ||
--------------- | ||
|
||
.. argparse:: | ||
:module: kevlar.cli | ||
:func: parser | ||
:nodefault: | ||
:prog: kevlar | ||
:path: localize | ||
|
||
kevlar mutate | ||
------------- | ||
|
||
.. argparse:: | ||
:module: kevlar.cli | ||
:func: parser | ||
:nodefault: | ||
:prog: kevlar | ||
:path: mutate |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
File formats in **kevlar** | ||
========================== | ||
|
||
Although **kevlar** performs many operations on *k*-mers, read sequences are the primary currency of exchange between different stages of the analysis workflow. | ||
**kevlar** supports reading from and writing to Fasta and Fastq files, and treats these identically since it does not use any base call quality information. | ||
In most cases, **kevlar** should also be able to automatically detect whether an input file is gzip-compressed or not and handle it accordingly (no bzip2 support). | ||
|
||
Augmented sequences | ||
------------------- | ||
|
||
"Interesing *k*-mers" are putatively novel *k*-mers that are high abundance in the proband/case sample(s) and effectively absent from control samples. | ||
To facilitate reading and writing these "interesting *k*-mers" along with the reads to which they belong, **kevlar** uses an *augmented* version of the Fasta and Fastq formats. | ||
Here is an example of an augmented Fastq file. | ||
|
||
.. code:: | ||
@read1 | ||
TTTTACCCGATGGGCGAGGTGAAATACTATGCCGATTTATTCTTACACAATTAAATTGCTAGTCCGGTTAGGGTTAGTTTGCGGCCTTCGTTCCAGCGCCGTGTT | ||
+ | ||
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB | ||
CCCGATGGGCGAGGTGAAA 18 1 0# | ||
AGGGTTAGTTTGCGGCCTT 11 0 0# | ||
@read2 | ||
AAGAGATTGTCGCTTGCCCCGTAAAGGAATTAGACCGGGCGACCAGAGCCTATTAGTAGCCCGCGCCTGTAGCACAAACGACTTTCGTACTATTATTAGACGTCG | ||
+ | ||
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB | ||
AGAGATTGTCGCTTGCCCC 14 0 1# | ||
GAGATTGTCGCTTGCCCCG 12 0 0# | ||
AGATTGTCGCTTGCCCCGT 14 0 0# | ||
@read3 | ||
GAGACCATAAACCAGCTCTTGGTACCGAAAGAACACCTATGAATAACCGTGAGTGCATGATTCCTGTGAAGAGATTGTCGCTTGCCCCGTAAAGGAATTAGACCG | ||
+ | ||
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB | ||
CTCTTGGTACCGAAAGAAC 19 1 0# | ||
AGAGATTGTCGCTTGCCCC 14 0 1# | ||
GAGATTGTCGCTTGCCCCG 12 0 0# | ||
AGATTGTCGCTTGCCCCGT 14 0 0# | ||
@read4 | ||
TCCGGTTAGGGTTAGTTTGCGGCCTTCGTTCCAGCGCCGTGTTGTTGCAATTTAATCCCGAGAAACCTCATGTAGCGGCTACTGGACCGCTGGGTAAGCTCAGAC | ||
+ | ||
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB | ||
AGGGTTAGTTTGCGGCCTT 11 0 0# | ||
As with a normal Fastq file, each record contains 4 lines to declare the read sequence and qualities. | ||
However, these 4 lines are followed by one or more lines indicating the "interesing *k*-mers", showing their sequence followed by their abundance in each sample (case first, then controls), with a ``#`` as the final character. | ||
Augmented Fastq files are easily converted to normal Fastq files by invoking a command like ``grep -v '#$' reads.augfastq > reads.fastq`` (same for augmented Fasta files). | ||
|
||
The functions ``kevlar.parse_augmented_fastx`` and ``kevlar.print_augmented_fastx`` are used internally to read and write augmented Fastq/Fasta files. | ||
However, these functions can easily be imported and called from third-party Python scripts as well. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Introduction to **kevlar** | ||
========================== | ||
|
||
The **kevlar** software is a testbed for developing reference-free variant discovery methods for genomics. | ||
The initial focus of development is novel germline variant discovery in human trio / quad experimental designs. | ||
However, the method lends itself easily to more general experimental designs, which will get more attention and support in the near future. | ||
|
||
Although a reference genome is not required, it can be utilized to reduce data volume at an early stage in the workflow and reduce the computational demands of subsequent steps. | ||
|
||
**kevlar** is currently under heavy development and is not yet stable. | ||
That said, the core features of the software are reasonbly well tested, and leverage software components from `the khmer library <https://khmer.readthedocs.io>`_ which are very well tested. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
Running **kevlar** | ||
================== | ||
|
||
The **kevlar** software implements a Python library for genetic sequence and variant analysis. | ||
**kevlar**'s primary interface is invoked via the command line, but is also designed so that it can be seamlessly integrated into third-party Python programs. | ||
|
||
|
||
Command line interface | ||
---------------------- | ||
|
||
Once installed, the **kevlar** software can be invoked from the shell using the ``kevlar`` command. | ||
The **kevlar** command line interface (CLI) uses the *subcommand* pattern, in which a single master command supports several different operations by defining multiple subcommands (such as ``kevlar novel`` and ``kevlar partition``). | ||
Comprehensive documentation of the **kevlar** CLI is available :doc:`here <cli>`. | ||
|
||
Starting with version 1.0, the CLI will be under `semantic versioning <http://semver.org/>`_. | ||
|
||
|
||
Python interface | ||
---------------- | ||
|
||
As a result of **kevlar**'s design to facilitate internal testing, the "main method" of each **kevlar** subcommand can easily be executed programmatically. | ||
The following example shows how to execute ``kevlar reaugment`` from a standalone Python program. | ||
|
||
.. code:: python | ||
import kevlar | ||
# Declare arguments just like you would on the command line | ||
arglist = ['reaugment', '-o', 'new.augfastq', 'old.augfastq', 'new.fastq'] | ||
args = kevlar.cli.parser().parse_args(arglist) | ||
kevlar.reaugment.main(args) | ||
Other units of code in the **kevlar** package may also be amenable to importing and executing programmatically. | ||
However, the code internals are not under semantic versioning and by necessity will be less stable and have poorer documentation. | ||
Have fun and knock yourself out, but be prepared for changes in internal behavior in subsequent releases! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
Simulating variants with **kevlar** | ||
=================================== | ||
|
||
To facilitate testing, **kevlar** implements a simple command to apply simulated "mutations" to a reference sequence. | ||
We have used this internally to simulate data sets for testing, to verify that **kevlar** can recover the simulated "mutation" or variant. | ||
|
||
The command-line interface for ``kevlar mutate`` is very simple (for full details see `the CLI documentation <cli.html#kevlar-mutate>`_). | ||
|
||
The "mutation file" format is described here by way of example. | ||
|
||
.. code:: | ||
seq1 2345915 del 141 | ||
seq1 1022305 snv 2 | ||
seq1 2062327 inv 429 | ||
seq1 1234310 del 32 | ||
seq1 388954 ins TGTTTCCTTTCATACCCCACCAC | ||
seq1 2460047 snv 2 | ||
The mutations file is a plain text tabular data file with four fields separated by spaces or tabs. | ||
|
||
- sequence ID | ||
- variant starting position (0-based) | ||
- variant type (currently supported types: ``snv``, ``ins``, ``del``, and ``inv`` for single-nucleotide variants, insertions, deletions, and inversions) | ||
- value; represents lexicographic offset for SNVs, variant length for deletions and inversions, and inserted sequence for insertions. |