Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIO parameter_meta flag for localization optimization #3738

Merged
merged 1 commit into from Jun 13, 2018

Conversation

cjllanwarne
Copy link
Contributor

No description provided.

}
override lazy val inputsToNotLocalize: Set[WomFile] = {
val result = jobDescriptor.findInputFilesByParameterMeta {
// TODO: Is "nio" the right name?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there potentially could be use cases where one does not want a file to be localized for other reasons than using NIO ? It also somehow conveys the feeling that this file will be accessed by the task using NIO which there's really no way to be sure of from the WDL/engine side of things.
OTOH I don't really care.. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe { localize: false }?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree and nio is the shortest placeholder I could come up with as I was writing it... what I really want it to convey is that localization is optional, and the engine is allowed but not required to not localize it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it a bit stronger than that? like ideally the engine should not localize the file if possible?

Copy link
Contributor

@mcovarr mcovarr Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ localize: preferably_no }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Horneth To be clear there should be no fuzziness on the WDL/engine side of things as this is 100% a Cromwell feature and not a WDL feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not sure what you mean. It's a Cromwell feature that can only be taken advantage of through WDL for now (once this gets merged). It also involves a WDL syntax

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but there's 0% expectation that it'll work anywhere but Cromwell. It's effectively piggybacking on existing WDL syntax to create a hint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in theory if we adopted a CWL ParameterMetaHint or LocalizationOptionalHint, we could allow this optimization in CWL workflows too

Copy link
Contributor

@aednichols aednichols Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ localizationOptional: true }? (getting a bit long)

}
override lazy val inputsToNotLocalize: Set[WomFile] = {
val result = jobDescriptor.findInputFilesByParameterMeta {
// TODO: Is "nio" the right name?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe { localize: false }?

@@ -730,7 +731,8 @@ trait StandardAsyncExecutionActor extends AsyncBackendJobExecutionActor with Sta
* @return The Try wrapped and mapped WOM value.
*/
final def outputValueMapper(womValue: WomValue): Try[WomValue] = {
WomFileMapper.mapWomFiles(mapOutputWomFile)(womValue)
// TODO: is this inputsToNot*DE*Localize?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would inputs be delocalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops - this should really be outputsToNotDelocalize, and since the inputsToNotDelocalize is not relevant to that, I think this should probably just hard-code Set.empty for now

// case (_, file: WomFile) => inputsToNotLocalize.contains(file)
// case (_, WomOptionalValue(_, Some(file: WomFile))) => inputsToNotLocalize.contains(file)
// case _ => false
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val parameterMetaSectionElement: ErrorOr[Option[ParameterMetaSectionElement]] = for {
maxOne <- validateOneMax(bodyElements.filterByType[ParameterMetaSectionElement], "parameterMeta"): ErrorOr[Option[ParameterMetaSectionElement]]

} yield maxOne
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line forcomp = map and maybe you can do away with one of the ascriptions

@@ -0,0 +1,118 @@
version 1.0

workflow draft3_nio_file {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name conflicts with version string above

@geoffjentry
Copy link
Contributor

@cjllanwarne It'd be helpful if you could add a description of how this is intended to work, I can kinda see it in the tests but want to make sure I'm not interpreting things incorrectly

@@ -145,14 +144,16 @@ trait StandardAsyncExecutionActor extends AsyncBackendJobExecutionActor with Sta
// keeping track of the paths cleanly without so many value mappers
def mapCommandLineJobInputWomFile(womFile: WomFile): WomFile = mapCommandLineWomFile(womFile)

def inputsToNotLocalize: Set[WomFile] = Set.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to explain here that this optimization is flipped on/off on a per-backend basis, with a default of off

}
override lazy val inputsToNotLocalize: Set[WomFile] = {
val result = jobDescriptor.findInputFilesByParameterMeta {
// TODO: Is "nio" the right name?
Copy link
Contributor

@aednichols aednichols Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ localizationOptional: true }? (getting a bit long)

@cjllanwarne
Copy link
Contributor Author

@geoffjentry and @aednichols regarding documentation I added this: https://github.com/broadinstitute/cromwell/blob/cjl_nio_meta/docs/optimizations/FileLocalization.md

Is that sufficient of is there more to be said?

@aednichols
Copy link
Contributor

Looks good other than "perhaps" might be a bit confusing - someone could be led to believe that there is additional non-determinism within Cromwell, beyond specifying the parameter_meta bit and a compatible backend

@aednichols
Copy link
Contributor

Spitballing:

It allows you to save time and money by not localizing some Files to the execution environment when running on compatible backends

@@ -0,0 +1,52 @@
# The 'localization_optional' Optimization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like the title of a Robert Ludlum novel 😄

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the title of a Big Bang Theory episode

mkdocs.yml Outdated
@@ -32,6 +32,8 @@ pages:
- Workflow Options:
- Overview: wf_options/Overview.md
- Google Cloud: wf_options/Google.md
- Workflow Optimizations
- "Task inputs: localization_optional": optimizations/FileLocalization.md
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary quotes, it should be:

- Workflow Optimizations
  - Task inputs: localization_optional: optimizations/FileLocalization.md

Or did you use quotes so that you could use a : in the name on the menu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's needed for the colon

@@ -0,0 +1,52 @@
# The 'localization_optional' Optimization
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Optimization" is a new section to the docs. It would be great if you could include a sentence or two about the types of optimizations that users should expect to see here in the future. What are these optimizations for? Why would a user use this one? What are the draw backs, and when would you NOT recommend it for users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - are you saying we should have some hierarchy like this?

- Workflow Optimizations
  - Workflow Optimizations: optimizations/optimizations.md
  - "Task inputs: localization_optional": optimizations/FileLocalization.md

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hierarchy sounds good 👍 optimizations.md doesn't have to be very long, just a brief introduction to what users should expect to see there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katevoss added!

## Scope

The 'localization_optional' optimization applies to `File` and `File?` inputs on `task`s. It allows
you to save time and money by perhaps not localizing some `File`s to the execution environment.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear: "perhaps not localizing...". Can you explain why it may or may not localize a file? Even if you go into more depth later in the doc, it's best to have a short summary at the top so users don't have to read it all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you implying that somebody might not want to read all of my glorious prose?

## Effect

The optimization signals to Cromwell that the task has been written in such a way that:
* The task will still work if the file were to be localized - say from a cloud bucket to the task's execution environment; but...
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rephrase:

  • The task will work if Cromwell localizes the file
    • For example, a file from a cloud bucket is localized to the task's execution environment;

@@ -0,0 +1,52 @@
# The 'localization_optional' Optimization
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the title of a Big Bang Theory episode


The optimization signals to Cromwell that the task has been written in such a way that:
* The task will still work if the file were to be localized - say from a cloud bucket to the task's execution environment; but...
* The task will *also* work if the file is *not* localized prior to execution - and remains a path to a value in an object store, for example.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rephrase:

AND

  • The task will also work if Cromwell does not localize the file
    • For example, with "localization_optional" enabled, a file will remain as a path to a cloud bucket.

Questions:

  • Why did you include the information "prior to execution"?
  • Is a "cloud bucket" a more specific example of a "value in an object store"? If not, what's a clearer way of saying "path to a cloud bucket" or "value in object store"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Artistic flourish. Will remove
  • I'll change them to be consistently "in an object store"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this might be too strong:

For example, with "localization_optional" enabled, a file will remain as a path to a cloud bucket.

ie just because localization_optional is specified, it doesn't force Cromwell to not localize the file

* The task will still work if the file were to be localized - say from a cloud bucket to the task's execution environment; but...
* The task will *also* work if the file is *not* localized prior to execution - and remains a path to a value in an object store, for example.

Therefore, Cromwell will (usually) choose not to localize this file, in order to save time and money on disk size and `File` localization.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will Cromwell choose to localize the file, even when "localization_optional" is enabled?


* Your task should always be written bearing in mind that a `File` tagged for this optimization
*might* still be localized at the whim of the execution engine.
* Furthermore, if your WDL is intended to be shared with some other group, that group might
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rephrase:

  • If you share your WDL it may be run on engines other than Cromwell, and that engine may not recognize this hint.

What is a hint? I think it is a new concept in Cromwell-WDL-land, I think it warrants a sentence or two explanation.

CHANGELOG.md Outdated
@@ -2,6 +2,10 @@

## 33 Release Notes

### File Localization (NIO) Hint

Cromwell now allows tasks in WDL 1.0 can now specify a hint in their `parameter_meta` that some `File` inputs do not need to be localized for the task to run successfully.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a link to the new doc in the release notes

@@ -42,11 +53,10 @@ task nio_task {

This optimization is currently only applied to localization in the Pipelines API (GCE) backends.

## Warning
## Portability Warning

Be very careful not to write unportable WDL files!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOL: try to avoid double-negatives. Which is to say, write in the positive tone.

"Be very careful to write portable WDL files", rather than "Be very careful not to write unportable WDL files"

Copy link
Contributor

@aednichols aednichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants