Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"random" java.nio.file.NoSuchFileException in logs referring to zipped import files #1959

Closed
delocalizer opened this issue Feb 7, 2017 · 1 comment
Labels
Needs Triage Ticket needs further investigation and refinement prior to moving to milestones

Comments

@delocalizer
Copy link
Contributor

This one's a bit odd. Submitting workflows with imports to release 24 server intermittently gives errors in the logs.

A simple workflow:
cat goodImport.wdl:

import "bar.wdl" as doIt

workflow testMe {
	call doIt.doIt
}

And a bunch of wdl tasks in a folder, only one of which is the actual dependency (bar.wdl)

conradL@qimr13054 ~]$ unzip -l foo.zip 
Archive:  foo.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  02-07-2017 14:46   foo/
       99  02-07-2017 14:45   foo/bar7.wdl
       98  02-07-2017 14:00   foo/bar.wdl
       99  02-07-2017 14:46   foo/bar8.wdl
       99  02-07-2017 14:45   foo/bar2.wdl
      100  02-07-2017 14:45   foo/bar10.wdl
       99  02-07-2017 14:46   foo/bar9.wdl
       99  02-07-2017 14:45   foo/bar1.wdl
       99  02-07-2017 14:45   foo/bar3.wdl
       99  02-07-2017 14:45   foo/bar5.wdl
       99  02-07-2017 14:45   foo/bar4.wdl
       99  02-07-2017 14:45   foo/bar6.wdl
---------                     -------
     1089                     12 files

The content of all the task dependencies is just a variation on:

[conradL@qimr13054 ~]$ cat foo/bar.wdl 
task doIt {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
}

Submit to the server:

curl http://localhost:8000/api/workflows/V1 -FwdlSource=@goodImport.wdl -FwdlDependencies=@foo.zip

Now tailing the server logs, the first time this is submitted, the workflow succeeds and the log shows nothing out of the ordinary. But "sometimes" (meaning, I can submit it 5 times and not see it, or twice and see it both times) I see this:

2017-02-07 15:01:10,781 cromwell-system-akka.dispatchers.service-dispatcher-30 ERROR - Sending Actor[akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-84a51727-cfda-41e7-a03c-9e3af35eb0dc/MaterializeWorkflowDescriptorActor#972983209] failure message MetadataPutFailed(PutMetadataAction(Stream(MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar6.wdl),Some(MetadataValue(task doIt6 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.772+10:00), MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar4.wdl),Some(MetadataValue(task doIt4 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.774+10:00), MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar5.wdl),Some(MetadataValue(task doIt5 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.775+10:00), MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar3.wdl),Some(MetadataValue(task doIt3 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.776+10:00), MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar1.wdl),Some(MetadataValue(task doIt1 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.776+10:00), MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar9.wdl),Some(MetadataValue(task doIt9 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.777+10:00), MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar10.wdl),Some(MetadataValue(task doIt10 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.778+10:00), MetadataEvent(MetadataKey(84a51727-cfda-41e7-a03c-9e3af35eb0dc,None,submittedFiles:imports:bar2.wdl),Some(MetadataValue(task doIt2 {
	command { echo "Help, world!" }
	output { String message = read_string(stdout()) }
},MetadataString)),2017-02-07T15:01:10.779+10:00), ?)),java.nio.file.NoSuchFileException: /tmp/7849235605615896249.zip1398073512390398444/foo/bar8.wdl)
java.nio.file.NoSuchFileException: /tmp/7849235605615896249.zip1398073512390398444/foo/bar8.wdl
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
	at java.nio.file.Files.newByteChannel(Files.java:361)
	at java.nio.file.Files.newByteChannel(Files.java:407)
	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
	at java.nio.file.Files.newInputStream(Files.java:152)
	at java.nio.file.Files.newBufferedReader(Files.java:2784)
	at java.nio.file.Files.readAllLines(Files.java:3202)
	at java.nio.file.Files.readAllLines(Files.java:3242)
	at cromwell.engine.workflow.lifecycle.MaterializeWorkflowDescriptorActor$$anonfun$14.apply(MaterializeWorkflowDescriptorActor.scala:350)
	at cromwell.engine.workflow.lifecycle.MaterializeWorkflowDescriptorActor$$anonfun$14.apply(MaterializeWorkflowDescriptorActor.scala:349)
	at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
	at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
	at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233)
	at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223)
	at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
	at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
	at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233)
	at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223)
	at scala.collection.immutable.Stream.foreach(Stream.scala:595)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.immutable.Stream.map(Stream.scala:420)
	at slick.driver.JdbcActionComponent$InsertActionComposerImpl$MultiInsertAction.run(JdbcActionComponent.scala:522)
	at slick.driver.JdbcActionComponent$SimpleJdbcDriverAction.run(JdbcActionComponent.scala:32)
	at slick.driver.JdbcActionComponent$SimpleJdbcDriverAction.run(JdbcActionComponent.scala:29)
	at slick.dbio.SynchronousDatabaseAction$FusedAndThenAction$$anonfun$run$4.apply(DBIOAction.scala:508)
	at slick.dbio.SynchronousDatabaseAction$FusedAndThenAction$$anonfun$run$4.apply(DBIOAction.scala:508)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at slick.dbio.SynchronousDatabaseAction$FusedAndThenAction.run(DBIOAction.scala:508)
	at slick.dbio.SynchronousDatabaseAction$$anon$11.run(DBIOAction.scala:545)
	at slick.backend.DatabaseComponent$DatabaseDef$$anon$2.liftedTree1$1(DatabaseComponent.scala:237)
	at slick.backend.DatabaseComponent$DatabaseDef$$anon$2.run(DatabaseComponent.scala:237)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

As an additional oddness, for this particular demonstration I always see the workflow proceed to completion with Succeeded status.

@geoffjentry geoffjentry added the Needs Triage Ticket needs further investigation and refinement prior to moving to milestones label Feb 21, 2019
@geoffjentry
Copy link
Contributor

We feel this is another in a long series of weird concurrency issues which we see from time to time involving shared filesystems and fs synching. We've never been able to reliably reproduce these, but if someone can provide something here we can reopen and look at it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Ticket needs further investigation and refinement prior to moving to milestones
Projects
None yet
Development

No branches or pull requests

3 participants