New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIO parameter_meta flag for localization optimization #3738
Conversation
} | ||
override lazy val inputsToNotLocalize: Set[WomFile] = { | ||
val result = jobDescriptor.findInputFilesByParameterMeta { | ||
// TODO: Is "nio" the right name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there potentially could be use cases where one does not want a file to be localized for other reasons than using NIO ? It also somehow conveys the feeling that this file will be accessed by the task using NIO which there's really no way to be sure of from the WDL/engine side of things.
OTOH I don't really care.. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe { localize: false }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree and nio
is the shortest placeholder I could come up with as I was writing it... what I really want it to convey is that localization is optional, and the engine is allowed but not required to not localize it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it a bit stronger than that? like ideally the engine should not localize the file if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ localize: preferably_no }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Horneth To be clear there should be no fuzziness on the WDL/engine side of things as this is 100% a Cromwell feature and not a WDL feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm not sure what you mean. It's a Cromwell feature that can only be taken advantage of through WDL for now (once this gets merged). It also involves a WDL syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but there's 0% expectation that it'll work anywhere but Cromwell. It's effectively piggybacking on existing WDL syntax to create a hint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in theory if we adopted a CWL ParameterMetaHint
or LocalizationOptionalHint
, we could allow this optimization in CWL workflows too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ localizationOptional: true }
? (getting a bit long)
} | ||
override lazy val inputsToNotLocalize: Set[WomFile] = { | ||
val result = jobDescriptor.findInputFilesByParameterMeta { | ||
// TODO: Is "nio" the right name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe { localize: false }
?
@@ -730,7 +731,8 @@ trait StandardAsyncExecutionActor extends AsyncBackendJobExecutionActor with Sta | |||
* @return The Try wrapped and mapped WOM value. | |||
*/ | |||
final def outputValueMapper(womValue: WomValue): Try[WomValue] = { | |||
WomFileMapper.mapWomFiles(mapOutputWomFile)(womValue) | |||
// TODO: is this inputsToNot*DE*Localize? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would inputs be delocalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops - this should really be outputsToNotDelocalize
, and since the inputsToNotDelocalize
is not relevant to that, I think this should probably just hard-code Set.empty
for now
// case (_, file: WomFile) => inputsToNotLocalize.contains(file) | ||
// case (_, WomOptionalValue(_, Some(file: WomFile))) => inputsToNotLocalize.contains(file) | ||
// case _ => false | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌
val parameterMetaSectionElement: ErrorOr[Option[ParameterMetaSectionElement]] = for { | ||
maxOne <- validateOneMax(bodyElements.filterByType[ParameterMetaSectionElement], "parameterMeta"): ErrorOr[Option[ParameterMetaSectionElement]] | ||
|
||
} yield maxOne |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one line forcomp = map
and maybe you can do away with one of the ascriptions
@@ -0,0 +1,118 @@ | |||
version 1.0 | |||
|
|||
workflow draft3_nio_file { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name conflicts with version string above
@cjllanwarne It'd be helpful if you could add a description of how this is intended to work, I can kinda see it in the tests but want to make sure I'm not interpreting things incorrectly |
@@ -145,14 +144,16 @@ trait StandardAsyncExecutionActor extends AsyncBackendJobExecutionActor with Sta | |||
// keeping track of the paths cleanly without so many value mappers | |||
def mapCommandLineJobInputWomFile(womFile: WomFile): WomFile = mapCommandLineWomFile(womFile) | |||
|
|||
def inputsToNotLocalize: Set[WomFile] = Set.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to explain here that this optimization is flipped on/off on a per-backend basis, with a default of off
} | ||
override lazy val inputsToNotLocalize: Set[WomFile] = { | ||
val result = jobDescriptor.findInputFilesByParameterMeta { | ||
// TODO: Is "nio" the right name? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ localizationOptional: true }
? (getting a bit long)
@geoffjentry and @aednichols regarding documentation I added this: https://github.com/broadinstitute/cromwell/blob/cjl_nio_meta/docs/optimizations/FileLocalization.md Is that sufficient of is there more to be said? |
Looks good other than "perhaps" might be a bit confusing - someone could be led to believe that there is additional non-determinism within Cromwell, beyond specifying the parameter_meta bit and a compatible backend |
Spitballing:
|
@@ -0,0 +1,52 @@ | |||
# The 'localization_optional' Optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like the title of a Robert Ludlum novel 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the title of a Big Bang Theory episode
mkdocs.yml
Outdated
@@ -32,6 +32,8 @@ pages: | |||
- Workflow Options: | |||
- Overview: wf_options/Overview.md | |||
- Google Cloud: wf_options/Google.md | |||
- Workflow Optimizations | |||
- "Task inputs: localization_optional": optimizations/FileLocalization.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary quotes, it should be:
- Workflow Optimizations
- Task inputs: localization_optional: optimizations/FileLocalization.md
Or did you use quotes so that you could use a : in the name on the menu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's needed for the colon
@@ -0,0 +1,52 @@ | |||
# The 'localization_optional' Optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Optimization" is a new section to the docs. It would be great if you could include a sentence or two about the types of optimizations that users should expect to see here in the future. What are these optimizations for? Why would a user use this one? What are the draw backs, and when would you NOT recommend it for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - are you saying we should have some hierarchy like this?
- Workflow Optimizations
- Workflow Optimizations: optimizations/optimizations.md
- "Task inputs: localization_optional": optimizations/FileLocalization.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the hierarchy sounds good 👍 optimizations.md
doesn't have to be very long, just a brief introduction to what users should expect to see there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@katevoss added!
## Scope | ||
|
||
The 'localization_optional' optimization applies to `File` and `File?` inputs on `task`s. It allows | ||
you to save time and money by perhaps not localizing some `File`s to the execution environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unclear: "perhaps not localizing...". Can you explain why it may or may not localize a file? Even if you go into more depth later in the doc, it's best to have a short summary at the top so users don't have to read it all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you implying that somebody might not want to read all of my glorious prose?
## Effect | ||
|
||
The optimization signals to Cromwell that the task has been written in such a way that: | ||
* The task will still work if the file were to be localized - say from a cloud bucket to the task's execution environment; but... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested rephrase:
- The task will work if Cromwell localizes the file
- For example, a file from a cloud bucket is localized to the task's execution environment;
@@ -0,0 +1,52 @@ | |||
# The 'localization_optional' Optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the title of a Big Bang Theory episode
|
||
The optimization signals to Cromwell that the task has been written in such a way that: | ||
* The task will still work if the file were to be localized - say from a cloud bucket to the task's execution environment; but... | ||
* The task will *also* work if the file is *not* localized prior to execution - and remains a path to a value in an object store, for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested rephrase:
AND
- The task will also work if Cromwell does not localize the file
- For example, with "localization_optional" enabled, a file will remain as a path to a cloud bucket.
Questions:
- Why did you include the information "prior to execution"?
- Is a "cloud bucket" a more specific example of a "value in an object store"? If not, what's a clearer way of saying "path to a cloud bucket" or "value in object store"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Artistic flourish. Will remove
- I'll change them to be consistently "in an object store"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this might be too strong:
For example, with "localization_optional" enabled, a file will remain as a path to a cloud bucket.
ie just because localization_optional
is specified, it doesn't force Cromwell to not localize the file
* The task will still work if the file were to be localized - say from a cloud bucket to the task's execution environment; but... | ||
* The task will *also* work if the file is *not* localized prior to execution - and remains a path to a value in an object store, for example. | ||
|
||
Therefore, Cromwell will (usually) choose not to localize this file, in order to save time and money on disk size and `File` localization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will Cromwell choose to localize the file, even when "localization_optional" is enabled?
|
||
* Your task should always be written bearing in mind that a `File` tagged for this optimization | ||
*might* still be localized at the whim of the execution engine. | ||
* Furthermore, if your WDL is intended to be shared with some other group, that group might |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested rephrase:
- If you share your WDL it may be run on engines other than Cromwell, and that engine may not recognize this hint.
What is a hint? I think it is a new concept in Cromwell-WDL-land, I think it warrants a sentence or two explanation.
CHANGELOG.md
Outdated
@@ -2,6 +2,10 @@ | |||
|
|||
## 33 Release Notes | |||
|
|||
### File Localization (NIO) Hint | |||
|
|||
Cromwell now allows tasks in WDL 1.0 can now specify a hint in their `parameter_meta` that some `File` inputs do not need to be localized for the task to run successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include a link to the new doc in the release notes
@@ -42,11 +53,10 @@ task nio_task { | |||
|
|||
This optimization is currently only applied to localization in the Pipelines API (GCE) backends. | |||
|
|||
## Warning | |||
## Portability Warning | |||
|
|||
Be very careful not to write unportable WDL files! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TOL: try to avoid double-negatives. Which is to say, write in the positive tone.
"Be very careful to write portable WDL files", rather than "Be very careful not to write unportable WDL files"
f11c5e3
to
6b8a1cf
Compare
6b8a1cf
to
45816af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
45816af
to
e3ba75f
Compare
No description provided.