crimson
converts non-standard bioinformatics tool outputs to JSON or YAML.
Currently it can convert outputs of the following tools:
- FastQC (
fastqc
) - FusionCatcher (
fusioncatcher
) - samtools flagstat (
flagstat
) - Picard metrics tools (
picard
) - STAR log file (
star
) - STAR-Fusion hits table (
star-fusion
) - Variant Effect Predictor
plain text output (
vep
)
For each conversion, there are two execution options: as command line tool or as a Python
library function. The first alternative uses crimson
as a command-line tool. The second one
requires importing the crimson
library in your program.
crimson
is available on the Python Package Index
and you can install it via pip
:
$ pip install crimson
It is also available on
BioConda, both through the
conda
package manager or as a
Docker container.
For Docker execution, you may also use the GitHub Docker registry. This registry hosts the latest version, but does not host versions 1.1.0 or earlier.
docker pull ghcr.io/bow/crimson
The general command is crimson {tool_name}
. By default, the output is written to
stdout
. For example, to use the picard
parser, you would execute:
$ crimson picard /path/to/a/picard.metrics
You can also write the output to a file by specifying a file name. The following
command writes the output to a file named converted.json
:
$ crimson picard /path/to/a/picard.metrics converted.json
Some parsers may accept additional input formats. The FastQC parser, for example, also accepts a path to a FastQC output directory as its input:
$ crimson fastqc /path/to/a/fastqc/dir
It also accepts a path to a zipped result:
$ crimson fastqc /path/to/a/fastqc_result.zip
When in doubt, use the --help
flag:
$ crimson --help # for the general help
$ crimson fastqc --help # for the parser-specific help, in this case FastQC
The specific function to import is generally located at crimson.{tool_name}.parser
. So to
use the picard
parser in your program, you can do:
from crimson import picard
# You can specify the input file name as a string or path-like object...
parsed = picard.parse("/path/to/a/picard.metrics")
# ... or a file handle
with open("/path/to/a/picard.metrics") as src:
parsed = picard.parse(src)
- Not enough tools use standard output formats.
- Writing and re-writing the same parsers across different scripts is not a productive way to spend the day.
Setting up a local development requires that you set up all of the supported Python versions. We use pyenv for this.
# Clone the repository and cd into it.
$ git clone https://github.com/bow/crimson
$ cd crimson
# Create your local development environment. This command also installs
# all supported Python versions using `pyenv`.
$ make dev
# Run the test and linter suite to verify the setup.
$ make lint test
# When in doubt, just run `make` without any arguments.
$ make
If you are interested, crimson
accepts the following types contribution:
- Documentation updates / tweaks (if anything seems unclear, feel free to open an issue)
- Bug reports
- Support for tools' outputs which can be converted to JSON or YAML
For any of these, feel free to open an issue in the issue tracker or submit a pull request.
crimson
is BSD-licensed. Refer to the LICENSE
file for the full license.