Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TES Backend Prototype. Closes #1294 #1816

Closed
wants to merge 87 commits into from

Conversation

adamstruck
Copy link
Contributor

This is a prototype backend that communicates via the GA4GH task execution schema. The reference implementation of the task server that we have been developing against is here.

This is a work in progress and definitely could use some feedback from you with regards to how to better fit this in with your current code base and future development plans.

In the coming weeks we plan to implement unit tests, add documentation and refactor the code based on your recommendations.

@kshakir
Copy link
Contributor

kshakir commented Jan 31, 2017

@adamstruck I still need to do a more in depth review if you're looking for scala syntax feedback (ex: case … => { … } could be case … => …).

Early feedback:

  • PR 1930 contains a few more changes to the standard backend api. I tested cherry-picking 1930 onto this PR to see what would be left to patch up. Using an "Obsolete" set of bridge code for now, these are the minimal changes for the updated path api, plus changes for standardized command line generation. NOTE: 1930 is still under review and may change, plus the linked github commit will be deleted once these two PRs are merged.

  • The standard backend will continue to change for a while as we move more common code. For example, the script generation for globs is now centralized as of PR 1930. The only CI testing I am aware of at the moment is sbt tesBackend/test that runs under travis. Is there a dockerized solution yet for TES that we could use with travis centaur, like we have for JES and Local? Otherwise, the minimal patches above pass the very, very basic sbt test unit tests.

  • Your PR is ok as is, but I need to think about necessity of Async.await a bit more. The standard backend api is synchronous, requiring the Async.await. But the underlying "basic" backend trait is using scala futures, and I need more insight into how those are interacting with the akka actors. For example, I wouldn't want the actors receiving multiple akka poll messages in the mailbox and then queuing up dozens of overlapping poll futures in the thread pool. I also really like that your awaits have timeouts and aren't infinite futures.

More to come. Thanks!

@cjllanwarne cjllanwarne changed the title TES Backend Prototype TES Backend Prototype. Closes #1294 Jan 31, 2017
Copy link
Contributor

@kshakir kshakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various update requests. Hopefully nothing major. Let me know if there are any questions.

@@ -157,7 +157,7 @@ engine {
}

backend {
default = "Local"
default = "TES"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for debugging locally, but will likely break even Travis tests. One should probably debug with a local tes.conf and java -Dconfig.file=tes.conf instead, or sbt -J-Dconfig.file=tes.conf.

endpoint = "http://127.0.0.1:8000/v1/jobs"
poll-interval = 10
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changing soon, but for now, document your backend config as a commented out string here in the reference.conf.

import cromwell.core.path.{PathBuilder, PathFactory}
import cromwell.core.retry.SimpleExponentialBackoff
import lenthall.util.TryUtil
import spray.client.pipelining._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're moving away from spray to akka-http as the former has been end-of-lifed. Can you switch over your API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To throw another wrench in the mix, another idea is to use a library generated from the protobuf/grpc task execution schemas. Seems like possibly everyone is missing out on one of the primary benefits of using protobuf and grpc by implementing everything as http/json.

val submitTask = pipeline[TesPostResponse]
.apply(Post(tesEndpoint, taskMessage))

val response = Await.result(submitTask, 15.seconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mea culpa. I filed ticket #1218 to address the issue of using scala futures instead of akka actors in the backend. In standard, I tried to cut down of the spread of futures until I had more time to look at the issue. After further research, attempting to temporarily reduce the use of scala futures by using synchronous interfaces in the standard backend was a step in the wrong direction. [1] [2]

I submitted a PR #1947 for the standard backend that allows execute and recover to also go back to using scala futures, in addition to the current pollStatusAsync. Instead of using Await.result, you should override the async versions of the methods instead, and compose your Future result as necessary.

If you still want to shortcut your actions after some timeout, there are other ways of providing timeouts with scala futures. [3] But I suspect your spray client will have some timeouts already included.

Eventually I hope to be able to adapt the backend api's use of futures with proper akka actors, as has already begun with the JES batch polling actor. That way one could maybe use akka messages instead to timeout, have akka mailboxes queue messages instead of directly queuing multiple futures onto the thread pools, etc.

TL;DR: Instead of using Await.result, return a Future by overriding pollStatusAsync, and the soon to come executeAsync.

val localPath = Paths.get(file.valueString).toAbsolutePath
val containerPath = containerInput(localPath.toString)
WdlFile(containerPath)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the {}, creating the extra block under the case statement.

case file: WdlFile =>
  val localPath = Paths.get(file.valueString).toAbsolutePath
  val containerPath = containerInput(localPath.toString)
  WdlFile(containerPath)

tesPaths.storageOutput(f),
tesPaths.containerOutput(f),
"File",
Some(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Some() use Option() (and above on line 73)

tesPaths.storageOutput(outputFile),
tesPaths.containerOutput(outputFile),
"File",
Some(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Some() use Option() (and above on line 99)

tesPaths.storageOutput(globListFile),
tesPaths.containerOutput(globListFile),
"File",
Some(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Some() use Option() (and above on lines 116, 120, and 124)

runtimeAttributes.cpu,
runtimeAttributes.memory.to(MemoryUnit.GB).amount.toInt,
Some(false),
Some(volumes),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Some() use Option() (and above on lines 141, 149, and 158)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own personal Scala education, could you explain this one in more detail? volumes is definitely I non-null value I think, so isn't that Option() just converted to Some()? Is this a style thing or is there a case where one is better than the other?

I think I learned Some usage from articles like this, http://alvinalexander.com/scala/best-practice-option-some-none-pattern-scala-idioms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we picked up from my previous team where nulls weren't verboten and we'd get bit by this now and then.Option(null) is None whereas Some(null) is exactly what it says.

Generally my preference is to avoid things which the developer "knows" to be true in favor of things which must be true by definition as the latter doesn't require any further thinking on a future developer - also the latter might end up becoming untrue w/o people realizing.

Another example for me is .get on an Option, sure there are times where I know that it is Some and won't blow up but I'd much prefer to not do that for the reasons cited above. That said even I will drop my moral stance at times due to the pain in the ass factor :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent explanation. Thanks @geoffjentry ! Completely agree with your reasoning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also toss in why do I still say this for examples like Some(false) where it can't possibly be null. There it's more about building up the muscle memory and the uniformity of it. Admittedly I care less in those situations.


"validate a valid dockerWorkingDir entry" in {
val runtimeAttributes = Map("docker" -> WdlString("ubuntu:latest"), "dockerWorkingDir" -> WdlString("/tmp"))
val expectedRuntimeAttributes = expectedDefaults.copy(dockerWorkingDir = Some("/tmp"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Some() use Option()

Standard Backend has even more default implementations, and a few breaking changes.
Moved some code for testing JES out of main.
Increased the timeout for the ask inside SingleWorkflowRunnerActorSpec.
Bumped apache http client to the latest version.
Updated path builder usage and added tests.
NOTE: GcsPathBuilder.build().get.toString now returns "gs://bucket/path" instead of "/path".
Changed def's with concrete implementations to lazy vals.
Reverted lines that have not changed since develop.
Minor updates to comments.
@adamstruck
Copy link
Contributor Author

@kshakir thanks for your review. I will make the changes you requested during the coming week.

@kshakir
Copy link
Contributor

kshakir commented Feb 8, 2017

FYI- the failing test suite "centaurJes" is due to a know limitation in our test setup, but the other three look good, including the "centaurLocal" tests. Again, without a dockerized "centaurTes" we'll certainly try to avoid issues, but there won't be any guarantees that upgrades to the standard backend API don't break TES.

Also, the 87 commits will need to be rebased and squashed correctly to a minimal set, on the order of 1 commit, but otherwise let us know when you're ready for re-review.

Let us know if you have more questions or if we can provide any other assistance.

@adamstruck
Copy link
Contributor Author

I am working on getting centuar testing up and running with a TES implementation. Rather than going through rebase hell on this branch (I have numerous merge commits), I will submit a new PR from another branch where I have squashed the commits via merge. How does that sound?

@adamstruck
Copy link
Contributor Author

Tracking on #1979

@adamstruck adamstruck closed this Feb 13, 2017
@adamstruck adamstruck deleted the tes_backend branch February 22, 2017 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants