New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R Chart/PDF output filenames escape '%' with '%%' #1671
Conversation
Hi @alanhoyle this looks good. Could you add a test case for this? Perhaps add to CollectInsertSizeMetrics a test case where it's generating a pdf with an '%' in the file name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test case
@gbggrant Please pardon my ignorance: I'm not much of a Java developer. What do I need to do to add a test case? Taking your suggestion, I assume I need to edit `CollectInsertSizeMetricsTest.java. (edit) Creating a new method How would I confirm that the test passes? Are those things automatically run if I do a Docker build on the source tree? For the record, I tested my code manually after building a local docker image, I am just unfamiliar with how to do automated testing in Java. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alanhoyle looks good - thanks for adding the test.
It's a pity the substitution couldn't be done in a central place (like RExecutor) to get rid of duplicated code.
"LEVEL=LIBRARY", | ||
"LEVEL=READ_GROUP" | ||
}; | ||
Assert.assertEquals(runPicardCommandLine(args), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to check that the outfile is created with the expected name? That is, still with a single %? (just to future proof this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would. I'll add that in a new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added below.
Oh, and the 'Branch' build is currently not working on forks, so if you're all set with this (that is you don't have anything to add), just say so and I'll do the merge to master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an additional test immediately below the selected lines.
"LEVEL=LIBRARY", | ||
"LEVEL=READ_GROUP" | ||
}; | ||
Assert.assertEquals(runPicardCommandLine(args), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added below.
I don't think you can put it in As I said in the original request, it might make slightly more sense to incorporate this directly into the R scripts in the |
Yeah, I realized It couldn't go in RExecutor. Thanks for the good work. |
Before you merge this, I also have a few changes to the Should I commit/add those, do a separate pull request, or just move those into my global
|
I think we'd prefer that you not commit your '.gitignore' - those all seem fairly specific to your environment and so might cause others confusion. At the minimum it should go in another PR. |
Description
The output filenames for R charts now escape
%
characters to%%
It does this by taking running
.replaceAll("%", "%%")
on the PDF chart output filename that gets sent to the Rscript.This pull fixes #1412 (reported by someone else), and my forum post about the error.
The R
pdf()
function has an auto-numbering feature (onefile=false
) that Picard does not appear to use. However, even when not enabled, R sends the PDF output filename through asprintf()
function which uses the%
character as a prefix for formatting and variable replacement.If the filename requested by Picard contains a
%
character, this results in either Picard failing due to aProcessExecutor invalid 'file' argument
error, or (rarely) malformed filenames if the filename happens to conformsprintf()
formatting (e.g. if the filename happened to besomething-%dilution.pdf
it would result in an output filesomething-1ilution.pdf
)This issue can show up in the following Picard tools that use
RExecutor()
methodCollectAlignmentSummaryMetrics
CollectBaseDistributionByCycle
CollectGcBiasMetrics
CollectInsertSizeMetrics
CollectRnaSeqMetrics
CollectRrbsMetrics
CollectWgsMetricsWithNonZeroCoverage
MeanQualityByCycle
QualityScoreDistribution
Another way to solve this would be to make the change in the RScripts in
picard/src/main/resources/analysis
by doing something like this before thepdf(outputFile)
It also fixes a typo in a comment in the
Dockerfile
and removes trailing whitespace on a couple of lines.Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review
For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests