You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm thinking we should support 3 commands right now.
recalibrate
Given any number of BAM files or FASTQ files, recalibrate the BAM or FASTQ files. We should support these options
--use-oq to use the OQ flag in BAM files
--set-oq to set the OQ flag before calibrating the read.
--method to set the error detection method; gatk or lighter
potentially --model to set the calibration model
options for the lighter method
--prefix see below
--output see below
Currently I think it's OK if we support 1 BAM or 1 FASTQ at a time. But it would be neat if we could support an intersection of read groups when multiple input files are specified on the command line. Perhaps such a scheme would have rules like:
Each RGID in the intersection of all the BAM headers is considered a read group
Each FASTQ file is given a RGID after all BAM RGIDs are resolved.
When multiple files are given at the command line, the output rules should probably be something like:
If only BAM files are specified, output a concatenated bam.
If only FASTQ files are specified, output a concatenated FASTQ.
If both BAM and FASTQ inputs are given, output to BAM (with unaligned reads from FASTQ files remaining unaligned)
However, these rules should be able to be overriden with some options like:
--prefix to output the recalibrated file with the same name and type as the input files, but with a prefix.
--output specified as many times as input files are specified, where the ordering specifies which input is associated with each output. That is, the recalibrated first input specified is output to the first output specified, the second input goes to the second output, and so on. An error should be thrown if --output is used too many or too few times. We could probably do a file type conversion but I think it would be OK if --output made an output of the same filetype as the corresponding input.
benchmark
Related to #6. Given a BAM + VCF + BED truthset, and optionally a FASTQ file of reads, output a tsv file with data to be plotted by the plot command. This file should probably be something like
predicted q
actual q
dataset
number of bases
0
0
conf_regions.bam
300
...
...
...
...
Note that we shouldn't actually include the header, so the user can call benchmark with many different datasets and append to the output file each time.
plot
A convenience command to plot the calibration data output from a benchmark command.
Given a file with columns predicted q, actual q, dataset, and number of bases, plot either:
the calibration plot (actual vs predicted)
bin size plot
And possibly other types of plots. The plot type should be specified with a flag like --type.
The text was updated successfully, but these errors were encountered:
I'm thinking we should support 3 commands right now.
recalibrate
Given any number of BAM files or FASTQ files, recalibrate the BAM or FASTQ files. We should support these options
--use-oq
to use the OQ flag in BAM files--set-oq
to set the OQ flag before calibrating the read.--method
to set the error detection method;gatk
orlighter
--model
to set the calibration modellighter
method--prefix
see below--output
see belowCurrently I think it's OK if we support 1 BAM or 1 FASTQ at a time. But it would be neat if we could support an intersection of read groups when multiple input files are specified on the command line. Perhaps such a scheme would have rules like:
When multiple files are given at the command line, the output rules should probably be something like:
However, these rules should be able to be overriden with some options like:
--prefix
to output the recalibrated file with the same name and type as the input files, but with a prefix.--output
specified as many times as input files are specified, where the ordering specifies which input is associated with each output. That is, the recalibrated first input specified is output to the first output specified, the second input goes to the second output, and so on. An error should be thrown if--output
is used too many or too few times. We could probably do a file type conversion but I think it would be OK if--output
made an output of the same filetype as the corresponding input.benchmark
Related to #6. Given a BAM + VCF + BED truthset, and optionally a FASTQ file of reads, output a tsv file with data to be plotted by the plot command. This file should probably be something like
Note that we shouldn't actually include the header, so the user can call
benchmark
with many different datasets and append to the output file each time.plot
A convenience command to plot the calibration data output from a benchmark command.
Given a file with columns predicted q, actual q, dataset, and number of bases, plot either:
And possibly other types of plots. The plot type should be specified with a flag like
--type
.The text was updated successfully, but these errors were encountered: