Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The broadinstitute/picard docker image crashes with several tools because R/Rscript is not installed inside the image. #1197

Closed
1 task
alanhoyle opened this issue Jul 12, 2018 · 3 comments

Comments

@alanhoyle
Copy link
Contributor

Bug Report

Affected tools

CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle, CollectBaseDistributionByCycle, CollectGcBiasMetrics, CollectRnaSeqMetrics, CollectRrbsMetrics, and CollectWgsMetricsWithNonZeroCoverage (github code search for 'RExecutor')

Affected version(s)

  • docker image broadinstitute/picard:latest (as of 2018-07-12)

Description

A number of the Picard Metrics tools generate R code and run it by calling RExecutor. That method runs Rscript to generate graphics/PDFs. This would work fine in many installations since R/Rscript is available on the user's PATH, but the docker image distributed does not include R, so those tools crash and do not generate the requested graphics if they are run.

Steps to reproduce

Pull the latest broadinstitute/picard docker image and try to run any of the above tools that invoke RExecutor. Included below is an example with CollectInsertSizeMetrics.

$ docker pull broadinstitute/picard
Using default tag: latest
latest: Pulling from broadinstitute/picard
Digest: sha256:fa932a8d876653cb07d500ba1bdac6e55abcfd48f745d585bcacc70ae2fe95a6
Status: Image is up to date for broadinstitute/picard:latest

$ docker run -v $REF_PATH:/ref -v $SOURCE_DATA:/data -v $OUTPUT:/out broadinstitute/picard CollectInsertSizeMetrics INPUT=/data/input.bam ASSUME_SORTED=true H=/out/insert_size_histogram.pdf O=/out/insert_size_metrics.txt M=0.5
22:09:14.761 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/picard/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jul 12 22:09:14 UTC 2018] CollectInsertSizeMetrics HISTOGRAM_FILE=/out/insert_size_histogram.pdf MINIMUM_PCT=0.5 INPUT=/data/input.bam OUTPUT=/out/insert_sizemetrics.txt ASSUME_SORTED=true REFERENCE_SEQUENCE=/ref/hg19_M.fasta    DEVIATIONS=10.0 METRIC_ACCUMULATION_LEVEL=[ALL_READS] INCLUDE_DUPLICATES=false STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jul 12 22:09:14 UTC 2018] Executing as root@bae7e1a2d91f on Linux 3.10.0-514.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: unspecified-SNAPSHOT
INFO	2018-07-12 22:09:26	SinglePassSamProgram	Processed     1,000,000 records.  Elapsed time: 00:00:11s.  Time for last 1,000,000:    7s.  Last read position: chr1:26,228,117
INFO	2018-07-12 22:09:32	SinglePassSamProgram	Processed     2,000,000 records.  Elapsed time: 00:00:16s.  Time for last 1,000,000:    5s.  Last read position: chr1:40,363,086
[...  deleted may INFO lines...]
INFO	2018-07-12 22:15:56	SinglePassSamProgram	Processed    69,000,000 records.  Elapsed time: 00:06:40s.  Time for last 1,000,000:    4s.  Last read position: chrX:135,631,002
INFO	2018-07-12 22:16:58	RExecutor	Executing R script via command: Rscript /tmp/root/script448471994261648948.R /out/insert_size_metrics.txt /out/insert_size_histogram.pdf input.bam
[Thu Jul 12 22:16:58 UTC 2018] picard.analysis.CollectInsertSizeMetrics done. Elapsed time: 7.74 minutes.
Runtime.totalMemory()=11206131712
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Unexpected exception executing [Rscript /tmp/root/script448471994261648948.R /out/insert_size_metrics.txt /out/insert_size_histogram.pdf input.bam]
	at htsjdk.samtools.util.ProcessExecutor.execute(ProcessExecutor.java:111)
	at htsjdk.samtools.util.ProcessExecutor.execute(ProcessExecutor.java:87)
	at picard.util.RExecutor.executeFromFile(RExecutor.java:78)
	at picard.util.RExecutor.executeFromClasspath(RExecutor.java:59)
	at picard.analysis.CollectInsertSizeMetrics.finish(CollectInsertSizeMetrics.java:168)
	at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:164)
	at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
	at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at java.lang.Runtime.exec(Runtime.java:620)
	at java.lang.Runtime.exec(Runtime.java:485)
	at htsjdk.samtools.util.ProcessExecutor.execute(ProcessExecutor.java:102)
	... 9 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 12 more

Expected behavior

The histogram PDF is generated along with the text output files.

Actual behavior

The text file is generated, but java/Picard crashes with an IOExeption since Rscript is not available in the Docker image.

@alanhoyle
Copy link
Contributor Author

alanhoyle commented Jul 12, 2018

I've submitted pull request #1198 which I think fixes this issue:

compare this: (the "aphoid/picard" docker image only implements that change)

$ docker run --rm -v $HOME/work/:/data aphoid/picard CollectInsertSizeMetrics I=/data/testdata/normal.bam H=/data/testdata/insert_histogram.pdf STOP_AFTER=1000 O=/data/testdata/insert_data.txt
23:14:53.574 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/picard/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jul 12 23:14:53 UTC 2018] CollectInsertSizeMetrics HISTOGRAM_FILE=/data/testdata/insert_histogram.pdf INPUT=/data/testdata/normal.bam OUTPUT=/data/testdata/insert_data.txt STOP_AFTER=1000    DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] INCLUDE_DUPLICATES=false ASSUME_SORTED=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jul 12 23:14:53 UTC 2018] Executing as root@2c0b96e43796 on Linux 4.9.87-linuxkit-aufs amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.9-9-g8e25161-SNAPSHOT
INFO	2018-07-12 23:14:58	RExecutor	Executing R script via command: Rscript /tmp/root/script7433990166872317553.R /data/testdata/insert_data.txt /data/testdata/insert_histogram.pdf normal.bam
INFO	2018-07-12 23:14:58	ProcessExecutor	null device
INFO	2018-07-12 23:14:58	ProcessExecutor	          1
[Thu Jul 12 23:14:58 UTC 2018] picard.analysis.CollectInsertSizeMetrics done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=704118784

with the native broadinstitute/picard run:

$ docker run --rm -v $HOME/work/:/data broadinstitute/picard CollectInsertSizeMetrics I=/data/testdata/normal.bam H=/data/testdata/insert_histogram.pdf STOP_AFTER=1000 O=/data/testdata/insert_data.txt
23:15:10.281 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/picard/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jul 12 23:15:10 UTC 2018] CollectInsertSizeMetrics HISTOGRAM_FILE=/data/testdata/insert_histogram.pdf INPUT=/data/testdata/normal.bam OUTPUT=/data/testdata/insert_data.txt STOP_AFTER=1000    DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] INCLUDE_DUPLICATES=false ASSUME_SORTED=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jul 12 23:15:10 UTC 2018] Executing as root@20d9ff115e58 on Linux 4.9.87-linuxkit-aufs amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: unspecified-SNAPSHOT
INFO	2018-07-12 23:15:15	RExecutor	Executing R script via command: Rscript /tmp/root/script6685120036144437125.R /data/testdata/insert_data.txt /data/testdata/insert_histogram.pdf normal.bam
[Thu Jul 12 23:15:16 UTC 2018] picard.analysis.CollectInsertSizeMetrics done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=428343296
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Unexpected exception executing [Rscript /tmp/root/script6685120036144437125.R /data/testdata/insert_data.txt /data/testdata/insert_histogram.pdf normal.bam]
	at htsjdk.samtools.util.ProcessExecutor.execute(ProcessExecutor.java:111)
	at htsjdk.samtools.util.ProcessExecutor.execute(ProcessExecutor.java:87)
	at picard.util.RExecutor.executeFromFile(RExecutor.java:78)
	at picard.util.RExecutor.executeFromClasspath(RExecutor.java:59)
	at picard.analysis.CollectInsertSizeMetrics.finish(CollectInsertSizeMetrics.java:168)
	at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:164)
	at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
	at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at java.lang.Runtime.exec(Runtime.java:620)
	at java.lang.Runtime.exec(Runtime.java:485)
	at htsjdk.samtools.util.ProcessExecutor.execute(ProcessExecutor.java:102)
	... 9 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 12 more

@bartgrantham
Copy link

Is this resolved? It looks like the Dockerfile has r-base included now, and the "latest" tagged image at hub.docker.com seems to have it.

@alanhoyle
Copy link
Contributor Author

I think the issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants