Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data access via DRS URIs frequently fails in Ammonite script #5069

Open
mikebaumann opened this issue Jul 11, 2019 · 4 comments
Open

Data access via DRS URIs frequently fails in Ammonite script #5069

mikebaumann opened this issue Jul 11, 2019 · 4 comments

Comments

@mikebaumann
Copy link

Terra/Cromwell workflows using data that has been exported from the UChicago Gen3/Windmill system or the HCA Data Browser with DRS URI data references frequently (always?) fail in the Ammonite script that performs the DRS resolution/localization.

Failed workflows using DRS URI data references most often have error messages and logs as shown below. These examples are from the Terra workspace firecloud-cgl/20190701 Test in which a small number of files were exported from Windmill to Terra, and an md5sum workflow was exported from Dockstore. These same error messages and log entries have been seen in many other similar workspaces over the last couple/few months (no data before that).

@abaumann has been recently and actively involved in the investigation of this problem, and has access to this workspace.

Task ga4ghMd5.md5:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation	


2019/07/01 22:54:02 Starting container setup.
2019/07/01 22:54:11 Done container setup.
2019/07/01 22:54:17 Starting localization.
2019/07/01 22:54:24 Localizing input dos://dg.4503/1406db81-91d7-4e57-ada3-40487199ed06 -> /cromwell_root/topmed-irc-share/genomes/NWD522711.b38.irc.v1.cram
Compiling (synthetic)/ammonite/predef/interpBridge.sc

or

Task ga4ghMd5.md5:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation	

2019/07/10 19:25:06 Starting container setup.
2019/07/10 19:25:14 Done container setup.
2019/07/10 19:25:20 Starting localization.
2019/07/10 19:25:26 Localizing input dos://dg.4503/1cba8116-a3d1-41e6-aab3-428e4f42e916 -> /cromwell_root/topmed-irc-share/genomes/NWD735861.b38.irc.v1.cram.crai
Compiling (synthetic)/ammonite/predef/interpBridge.sc
Compiling (synthetic)/ammonite/predef/DefaultPredef.sc

In some cases, additional information is logged, as in the following example where Ammonite dependency failed:

2019/07/10 18:29:15 Starting container setup.
2019/07/10 18:29:24 Done container setup.
2019/07/10 18:29:31 Starting localization.
2019/07/10 18:29:37 Localizing input dos://dg.4503/cbdb14f5-cc89-4481-bad7-2ef8f36a1290 -> /cromwell_root/topmed-irc-share/genomes/NWD127112.b38.irc.v1.cram
Compiling (synthetic)/ammonite/predef/interpBridge.sc
Compiling (synthetic)/ammonite/predef/DefaultPredef.sc
Compiling /scripts/dosUrlLocalizer.sc
Downloading https://repo1.maven.org/maven2/org/http4s/http4s-dsl_2.12/0.18.17/http4s-dsl_2.12-0.18.17.pom.sha1
Downloading https://repo1.maven.org/maven2/org/http4s/http4s-dsl_2.12/0.18.17/http4s-dsl_2.12-0.18.17.pom
https://repo1.maven.org/maven2/org/http4s/http4s-dsl_2.12/0.18.17/http4s-dsl_… 
https://repo1.maven.org/maven2/org/http4s/http4s-dsl_2.12/0.18.17/http4s-dsl_… 

Downloaded https://repo1.maven.org/maven2/org/http4s/http4s-dsl_2.12/0.18.17/http4s-dsl_2.12-0.18.17.pom
Downloaded
...
https://oss.sonatype.org/content/repositories/snapshots/org/apache/httpcompon… 
https://oss.sonatype.org/content/repositories/snapshots/org/apache/httpcompon… 

Failed to resolve ivy dependencies:
  org.apache.httpcomponents:httpcomponents-core:4.0.1 
    not found: /root/.ivy2/local/org.apache.httpcomponents/httpcomponents-core/4.0.1/ivys/ivy.xml
    download error: Caught java.net.UnknownHostException: repo1.maven.org (repo1.maven.org) while downloading https://repo1.maven.org/maven2/org/apache/httpcomponents/httpcomponents-core/4.0.1/httpcomponents-core-4.0.1.pom
    download error: Caught java.net.UnknownHostException: oss.sonatype.org (oss.sonatype.org) while downloading https://oss.sonatype.org/content/repositories/snapshots/org/apache/httpcomponents/httpcomponents-core/4.0.1/httpcomponents-core-4.0.1.pom
  org.apache.commons:commons-parent:5 
    not found: /root/.ivy2/local/org.apache.commons/commons-parent/5/ivys/ivy.xml
    download error: Caught java.net.UnknownHostException: repo1.maven.org (repo1.maven.org) while downloading https://repo1.maven.org/maven2/org/apache/commons/commons-parent/5/commons-parent-5.pom
    download error: Caught java.net.UnknownHostException: oss.sonatype.org (oss.sonatype.org) while downloading https://oss.sonatype.org/content/repositories/snapshots/org/apache/commons/commons-parent/5/commons-parent-5.pom
...
CommandException: No URLs matched: /cromwell_root/stderr
2019/07/10 18:38:31 Delocalizing output /cromwell_root/rc -> gs://fc-94bba050-4ef1-42fb-8436-cd89da17ec53/306ddffc-0ee6-46ff-ac3e-5069668a0eb0/ga4ghMd5/a14f0b9d-839c-4684-863c-93d0e8e2d527/call-md5/rc
2019/07/10 18:38:32 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/rc gs://fc-94bba050-4ef1-42fb-8436-cd89da17ec53/306ddffc-0ee6-46ff-ac3e-5069668a0eb0/ga4ghMd5/a14f0b9d-839c-4684-863c-93d0e8e2d527/call-md5/ failed
CommandException: No URLs matched: /cromwell_root/rc
2019/07/10 18:38:32 Waiting 5 seconds and retrying
2019/07/10 18:38:38 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/rc gs://fc-94bba050-4ef1-42fb-8436-cd89da17ec53/306ddffc-0ee6-46ff-ac3e-5069668a0eb0/ga4ghMd5/a14f0b9d-839c-4684-863c-93d0e8e2d527/call-md5/ failed
CommandException: No URLs matched: /cromwell_root/rc
2019/07/10 18:38:38 Waiting 5 seconds and retrying
2019/07/10 18:38:44 rm -f $HOME/.config/gcloud/gce && gsutil   cp /cromwell_root/rc gs://fc-94bba050-4ef1-42fb-8436-cd89da17ec53/306ddffc-0ee6-46ff-ac3e-5069668a0eb0/ga4ghMd5/a14f0b9d-839c-4684-863c-93d0e8e2d527/call-md5/ failed
CommandException: No URLs matched: /cromwell_root/rc
@mikebaumann
Copy link
Author

Although I said "frequently fails" in the title of this issue, "always fails" would be more correct.
I have not seen a single Terra workflow using DRS URIs successfully get past the localization step during multiple attempts in the last few weeks. This is true even for the md5sum workflow run on a single small file.

This seemed to work for Commons in 2018, but I have not seen it work since I have been trying again in 2019.

@ruchim
Copy link
Contributor

ruchim commented Jul 15, 2019

Hey @mikebaumann -- can you describe for the workflow/task you're running -- what are the # of input files?

When you say it worked for the "commons" in 2018, how was this tested? Just for my understanding of the before/after.

@mikebaumann
Copy link
Author

Hi @ruchim!

Regarding our current test setup:
We (Brian O., Alex B. and I) are currently using a very minimal test configuration:

Workflow:
GA4GH md5sum from Dockstore
https://dockstore.org/workflows/github.com/briandoconnor/dockstore-workflow-md5sum/dockstore-wdl-workflow-md5sum:1.4.0

Single File:
Source: UChicago Gen3 Data STAGE crai file
DRS URL: dos://dg.4503/2132c569-06e7-474c-8806-93aa116c5d1c
Size: 1.49mb

I just now ran this test configuration from scratch, starting with a new workspace, and it failed like all the others have:

Error:

Task ga4ghMd5.md5:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation	

Log:

2019/07/16 20:23:02 Starting container setup.
2019/07/16 20:23:10 Done container setup.
2019/07/16 20:23:16 Starting localization.
2019/07/16 20:23:22 Localizing input dos://dg.4503/2132c569-06e7-474c-8806-93aa116c5d1c -> /cromwell_root/topmed-irc-share/genomes/NWD844894.b38.irc.v1.cram.crai
Compiling (synthetic)/ammonite/predef/interpBridge.sc

The name of this workspace is mbaumann test md5sum 20190716 and I have shared it with you as Owner, in case you would like to investigate.

Regarding successful runs in Commons in 2018:
The last reported success that I am aware of was by Moran Cabi ali (then Broad) in mid-2018, when she did demos of obtaining data from UChicago (Windmill) and UCSC (Boardwalk).
I didn't actually run the workflow myself.
There are still some of the demo workspaces from that time available in Terra, which I can access yet don't have permission to share. I don't know if you can access them or not. One such workspace is:
Team Calcium July 1 Demo - Boardwalk-Windmill_WS

@mikebaumann
Copy link
Author

Here is another Broad issue that is intrinsically related to this issue:
https://broadworkbench.atlassian.net/browse/BA-5821

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants