Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP fix #1330 #1566

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

WIP fix #1330 #1566

wants to merge 10 commits into from

Conversation

manabuishii
Copy link
Contributor

@manabuishii manabuishii commented Nov 11, 2021

#1330

At least workflow-dyn-res.cwl does not create error.

But this PR fail some tests.

(venv) ➜  cwltool git:(fix_issue_1330) ✗ python3 -m cwltool workflow-dyn-res.cwl         
INFO /Users/manabu/ghq/github.com/common-workflow-language/cwltool/cwltool/__main__.py 3.1.20211107152837
INFO Resolved 'workflow-dyn-res.cwl' to 'file:///Users/manabu/ghq/github.com/common-workflow-language/cwltool/workflow-dyn-res.cwl'
INFO [workflow ] start
INFO [workflow ] starting step one
INFO [step one] start
INFO [job one] /private/tmp/docker_tmpduee6x2k$ echo \
    4
4
INFO [job one] completed success
INFO [step one] completed success
INFO [workflow ] completed success
{
    "file:///Users/manabu/ghq/github.com/common-workflow-language/cwltool/workflow-dyn-res.cwl#threads_max": 4
}
INFO Final process status is success

Strategy

( I think below, but need to know more about inside)

To resolve ResourceRequirement value,
First,
I bring Workflow inputs to steps inputs,( see TODO )

Second,
When evalResources is called, resource resolved.

Third,
After the call, CommandLineTool's inputs are rewrite to remove inputs for ResourceRequirement.

TODO:

  • Add workflow-dyn-res.cwl to tests
  • Discuss about adding workflow-dyn-res.cwl related tests to at least unit test or mypy test or both?
  • Check other path ways call evalResourcesare exists or not .
    • I think the self.tool['class'] is not 'Workflow' are called evalResources .
    • cwltool/cwltool/process.py

      Lines 973 to 974 in 898fddf

      if self.tool["class"] != "Workflow":
      builder.resources = self.evalResources(builder, runtime_context)
    • I put the code just below this line, but some tests result are failed. But sometimes my test environment is strange behavior, I should check again.
  • Check other behavior which I do not know now. All tests are passed but I think need to care.
  • Check nested nested (n nested) Workflow behavior.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manabuishii impressive work tracking down where you had to add the state/job info 👏

I think what this change is doing is making the job info from the parent Job available to children Jobs. Not sure if there's any downsides to this.

Maybe another approach could be to modify Process#evalResources and rewrite the Resource value, evaluating the expression to 4. This way when children Job are created they wouldn't have to try to resolve the variables/values.

cwltool/process.py Outdated Show resolved Hide resolved
@@ -386,6 +386,8 @@ def object_from_state(
incomplete: bool = False,
) -> Optional[CWLObjectType]:
inputobj = {} # type: CWLObjectType
for s in state.keys():
inputobj[s] = state[s].value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid accessing keys & values separately. Something like:

for k, v in state.items():
  inputobj[k] = v

Or start inputobj by deep-cloning state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current commited version is
https://github.com/common-workflow-language/cwltool/pull/1566/files#diff-ed87d61e2237460638e7f3ddb21bafc9969d8c142c804d79f94801142342f3d6R389-R391

My local version is following .
It handles 1330 well and pass py39-unit
But I can not pass py39-mypy

    for s in state.keys():
        if hasattr(state[s], "value"):
            inputobj[s] = state[s].value

Now I try to replace it to your code, because it looks good.
But I can not pass py39-mypy

    for k,v  in state.items():
        if type(v) is WorkflowStateItem:
            if hasattr(v, "value"):
                inputobj[k] = v.value

currently

@kinow
Copy link
Member

kinow commented Mar 29, 2022

Maybe another approach could be to modify Process#evalResources and rewrite the Resource value, evaluating the expression to 4. This way when children Job are created they wouldn't have to try to resolve the variables/values.

Hmm, interesting. Tried that, but realized that process.py intentionally skips evaluating the resources for the top level job (the Workflow itself is the parent job).

image

🤷‍♂️ there must be a good reason for doing so, so I think your approach should be the right fix @manabuishii 👍 let's see what's wrong with the build...

@mr-c
Copy link
Member

mr-c commented Mar 29, 2022

@manabuishii Can you fix the merge conflict so that the CI can run?

cwltool/workflow_job.py Outdated Show resolved Hide resolved
cwltool/workflow_job.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Mar 30, 2022

Codecov Report

Merging #1566 (14a2ded) into main (7a058fe) will increase coverage by 0.02%.
The diff coverage is 82.69%.

@@            Coverage Diff             @@
##             main    #1566      +/-   ##
==========================================
+ Coverage   66.82%   66.84%   +0.02%     
==========================================
  Files          93       93              
  Lines       16578    16784     +206     
  Branches     4404     4447      +43     
==========================================
+ Hits        11078    11220     +142     
- Misses       4361     4396      +35     
- Partials     1139     1168      +29     
Impacted Files Coverage Δ
cwltool/command_line_tool.py 77.90% <81.48%> (+0.43%) ⬆️
cwltool/workflow_job.py 60.80% <81.81%> (+1.02%) ⬆️
cwltool/workflow.py 85.82% <100.00%> (+0.16%) ⬆️
cwltool/task_queue.py 73.77% <0.00%> (-3.28%) ⬇️
cwltool/cwltool/argparser.py 72.11% <0.00%> (-2.68%) ⬇️
cwltool/sandboxjs.py 72.63% <0.00%> (-1.97%) ⬇️
cwltool/cwltool/sandboxjs.py 54.73% <0.00%> (-1.48%) ⬇️
cwltool/cwltool/cwlviewer.py 21.97% <0.00%> (-1.02%) ⬇️
cwltool/utils.py 79.33% <0.00%> (-0.37%) ⬇️
cwltool/main.py 71.71% <0.00%> (-0.27%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a058fe...14a2ded. Read the comment docs.

@kinow
Copy link
Member

kinow commented Mar 30, 2022

Getting closer to a green build! 🚀

@manabuishii
Copy link
Contributor Author

@kinow @mr-c

This commit is done by me, but account is not current my account .

a7a66cf

Is there any good way to rewrite my commit ?
Or Can I leave it as it is?

@kinow
Copy link
Member

kinow commented Mar 31, 2022

@manabuishii the commit will be squashed when we merge the PR (it has 6 commits, ideally a single commit is best, or grouped commits if they are helpful for future devs.)

Or if you prefer to edit the commit now and change the author, you can use interactive rebase or amend (if the last commit), e.g. https://stackoverflow.com/a/3042512

@manabuishii
Copy link
Contributor Author

@kinow Thanks !!
I choose to squash when sending pull requests .

@manabuishii
Copy link
Contributor Author

manabuishii commented Mar 31, 2022

@kinow @mr-c
How do I execute following test ( that fails GitHub Actions)

Continuous integration tests / CWL spec conformance tests (v1.0, podman) (pull_request) 

Is this correct ?

source venv/bin/activate
container=podman spec_branch=main version=v1.0 ./conformance-test.sh

In my linux machine, above command is take long time.

I want to fix the error which GitHub Actions says.
Best case is execute only failed test first, and if it is passed, execute all.

@mr-c
Copy link
Member

mr-c commented Mar 31, 2022

You can run just the test that failed

https://github.com/common-workflow-language/cwltool/runs/5765165834?check_suite_focus=true#step:5:490

cwltool --parallel --podman --debug https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/fail-unconnected.cwl

Note that this test should fail: https://github.com/common-workflow-language/common-workflow-language/blob/a62f91e3d75036e7d89dbd454fc4e1f72caf901f/v1.0/conformance_test_v1.0.yaml#L1863

So if this PR causes this to not fail, then that is a problem 🙂

@manabuishii
Copy link
Contributor Author

My current strategy is such that I make everything a global variable, so
It appears that the scope of the variable is not properly controlled.

Therefore, the

workflow-dyn-res.cwl also does not deal with input.cores.
It is looking at inputs.threads_max, which it was forced to look at as a global variable.

Essentialy , inputs.threads_max is first set to coresMax
And the coresMax is reflected in runtime.cores

I think runtime.cores is necessary to put it in arguments of echo.

So this is the reason why cwltool --parallel --podman --debug https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/fail-unconnected.cwl is not failed, I think.

I need to change strategy.

@manabuishii
Copy link
Contributor Author

Memo:
threads_max is required at evalResources in process.py to make coresMax and runtime.core
After using threads_max in evalResources of process.py, I test builder.job(in command_line_tool.py) which is copy of job_order , workflow-dyn-res.cwl and fail-unconnected.cwl is behaved expected.
Of course other unit tests are broken.

@manabuishii
Copy link
Contributor Author

@mr-c I think this PullRequest fix #1330.
All tests are passed.

Now I want to check that there is no unwanted behavior .

I want to ask questions related to fail-unconnected.cwl

Question 1: What kind of output is expected for ExpressionTool instead of CommandLineTool ?

fail-unconnected-expressiontool.cwl is following

class: Workflow
cwlVersion: v1.1
inputs:
  inp1:
    type: string
    default: hello inp1
  inp2:
    type: string
    default: hello inp2
outputs:
  out:
    type: string
    outputSource: step1/out
steps:
  step1:
    in:
      in: inp1
      in2: inp2
    out: [out]
    run: fail-unspecified-input-expressiontool.cwl

and

fail-unspecified-input-expressiontool.cwl is following

cwlVersion: v1.0
class: ExpressionTool

requirements:
  InlineJavascriptRequirement: {} 

inputs:
  in:
    type: string

outputs:
  out:
    type: string

expression: |
  ${ 
    return {"out": inputs.in +" "+inputs.in2};
  }

Both
current HEAD (7a058fe)
and this PullRequest return same

Is this result correct ?

python3 -m cwltool fail-unconnected-expressiontool.cwl
INFO /home/manabu/ghq/github.com/common-workflow-language/cwltool/cwltool/__main__.py 3.1.20220325085616
INFO Resolved 'fail-unconnected-expressiontool.cwl' to 'file:///home/manabu/ghq/github.com/common-workflow-language/cwltool/fail-unconnected-expressiontool.cwl'
WARNING Workflow checker warning:
fail-unconnected-expressiontool.cwl:18:7: 'in2' is not an input parameter of
                                          file:///home/manabu/ghq/github.com/common-workflow-language/cwltool/fail-unspecified-input-expressiontool.cwl,
                                          expected in
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] start
INFO [step step1] completed success
INFO [workflow ] completed success
{
    "out": "hello inp1 undefined"
}
INFO Final process status is success

As WARNING messages indicates , fail-unconnected-expressiontool.cwl does not have in2 as input parameter.
Then result indiates inputs.in2 is evaluated to undefined.

When CommandLineTool case is error, but ExpressionTool case is warning .
So I want to ask correct behavior.

Question2: What kind of output is expected for Operation instead of CommandLineTool ?
I think CWL version 1.2 has Operation .
But I do not know how to write it for tests.

@tom-tan
Copy link
Member

tom-tan commented Apr 3, 2022

Question 1: What kind of output is expected for ExpressionTool instead of CommandLineTool ?
Question2: What kind of output is expected for Operation instead of CommandLineTool ?

I guess the both cases must fail.

The spec of WorkflowStep.in says:

Input parameters include a schema for each parameter which is used to validate the input object.

But in2 has no corresponding schema to be used for validation.

@mr-c mr-c requested a review from tetron April 3, 2022 10:09
Comment on lines +2601 to +2611

- job: tests/empty.json
tool: tests/fail-unconnected-expressiontool.cwl
label: wf_step_access_undeclared_param_expressiontool
id: 198
doc: >-
Test that parameters that don't appear in the `run` process
inputs are not present in the input object used to run the expressiontool.
should_fail: true
tags: [ required, workflow ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI doesn't run this copy of the tests, it makes a new checkout from GitHub

wget "https://github.com/common-workflow-language/${repo}/archive/${spec_branch}.tar.gz"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-c Thanks

I will create a PullRequest about this test case (tests/fail-unconnected-expressiontool.cwl),

Repo:
https://github.com/common-workflow-language/cwl-v1.2
Branch:
1.2.1_proposed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you can also keep it in this PR, just move the test scripts to the the test directory and add an entry to test_examples.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this file to the tests directory, or a sub-directory?

manabuishii pushed a commit to common-workflow-language/cwl-v1.2 that referenced this pull request Apr 6, 2022
manabuishii added a commit to common-workflow-language/cwl-v1.2 that referenced this pull request Apr 6, 2022
It is not passed parameters which is not pass to ExpressionTool
@manabuishii
Copy link
Contributor Author

I improve ExpressionTool inputs handling.

This does not cause error.
It becomes undefined.

First, currently

in is resolved as inp1
in2 is not resolved.

I think this is expected behavior for following class ExpressionTool

https://github.com/common-workflow-language/cwltool/pull/1566/files#diff-3d8236842572888947227e8925a930cebe33d4af9c6bb8abc61282eec6d105c1R17

Just before exec_js_process

fn is following.
inputs member has only one in.

'"use strict";
var inputs = {
    "in": "hello inp1"
};
var self = null;
var runtime = {
    "cores": 1,
    "ram": 1024,
    "tmpdirSize": 1024,
    "outdirSize": 1024,
    "tmpdir": null,
    "outdir": null
};
(function(){ 
  return {"out": inputs.in +" "+inputs.in2};
})()'

function executed,
inputs.in is hello inp1
inputs.in2 is undefined because inputs.in2 is undefined.

Output is here

$ python3 -m cwltool tests/fail-unconnected-expressiontool.cwl 
INFO /Users/manabu/ghq/github.com/common-workflow-language/cwltool/cwltool/__main__.py 3.1.20211111031309
INFO Resolved 'tests/fail-unconnected-expressiontool.cwl' to 'file:///Users/manabu/ghq/github.com/common-workflow-language/cwltool/tests/fail-unconnected-expressiontool.cwl'
WARNING Workflow checker warning:
tests/fail-unconnected-expressiontool.cwl:18:7: 'in2' is not an input parameter of
                                                file:///Users/manabu/ghq/github.com/common-workflow-language/cwltool/tests/fail-unspecified-input-expressiontool.cwl,
                                                expected in
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] start
INFO [step step1] completed success
INFO [workflow ] completed success
{
    "out": "hello inp1 undefined"
}
INFO Final process status is success

I investigate related JS behavior.

When inputs is declared as dictionary.
inputs.in2 returns undefined.
Just in2 is cause error because in2 is not defined.

> inputs={}
> inputs.in = "hello inp1"
> inputs.in2
undefined
> in2
VM1960:1 Uncaught ReferenceError: in2 is not defined
    at <anonymous>:1:1

I think that in2 is resolved as undefined and it is correct behavior at least some Javascript Implementation.

@manabuishii
Copy link
Contributor Author

manabuishii commented Jun 30, 2022

Until previous commit 14a2ded
in2 is resolved as hello inp2

so output is following.
At least this result is wrong.

$ python3 -m cwltool tests/fail-unconnected-expressiontool.cwl
INFO /Users/manabu/ghq/github.com/common-workflow-language/cwltool/cwltool/__main__.py 3.1.20211111031309
INFO Resolved 'tests/fail-unconnected-expressiontool.cwl' to 'file:///Users/manabu/ghq/github.com/common-workflow-language/cwltool/tests/fail-unconnected-expressiontool.cwl'
WARNING Workflow checker warning:
tests/fail-unconnected-expressiontool.cwl:18:7: 'in2' is not an input parameter of
                                                file:///Users/manabu/ghq/github.com/common-workflow-language/cwltool/tests/fail-unspecified-input-expressiontool.cwl,
                                                expected in
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] start
INFO [step step1] completed success
INFO [workflow ] completed success
{
    "out": "hello inp1 hello inp2"
}
INFO Final process status is success

@manabuishii
Copy link
Contributor Author

I think current this PR approach is Lazy evaluation.
But implementation is difficult.
In my opinion, Eager evaluation will be more clear and easy to understand and easy to implement.

How do you think ?

@kinow
Copy link
Member

kinow commented Jul 11, 2022

In my opinion, Eager evaluation will be more clear and easy to understand and easy to implement.

I haven't dug really deep in the issue as @manabuishii , but from what I could see, I also think that eager evaluation in this case would be better for cwltool.

As for specification, I think we can leave it as-is, or maybe add a comment that eager or lazy evaluation is up to implementers, but not enforced by the specification.

@manabuishii
Copy link
Contributor Author

@kinow Thanks !!

As for specification, I think we can leave it as-is, or maybe add a comment that eager or lazy evaluation is up to implementers, but not enforced by the specification.

I consider about multi cloud situation (including on-premise cloud and some on-premise cluster) sometimes lazy evaluations require variable or file which is not accessible from some environment.
Maybe it is related to acquire for the machine resources to distributed multi computing environments (multi-cloud).

To consider such a situation , I think we keep discuss and investigate about pros, cons and reasonable assumption.
But during the discussion I agree that "we can leave it as-is".
Mainly I care about the difference between implementations if the specification is said nothing.(Sorry not so much deeply about "as-is" but I feel it's okay because it is current state.)

One more option is that spec said nothing but implementation is accept variable about eager or lazy evaluation.

But I want to fix #1330 😄

@tom-tan
Copy link
Member

tom-tan commented Jul 12, 2022

I agree with @kinow that we leave the spec as is (at least in v1.2.0) because clarifying the evaluation strategy may break existing platforms.
IMO it is better to fix it in the later release of CWL by clarifying the default evaluation strategy and by extending the spec to support use cases for eager and lazy strategies.


I prefer the eager strategy by default because it makes much easier to implement platforms.
I proposed a tentative extension to cover the use cases for lazy strategy in another issue.

mr-c pushed a commit to common-workflow-language/cwl-v1.2 that referenced this pull request Aug 17, 2023
mr-c pushed a commit to common-workflow-language/cwl-v1.2 that referenced this pull request Aug 17, 2023
mr-c pushed a commit to common-workflow-language/cwl-v1.2 that referenced this pull request Aug 17, 2023
mr-c pushed a commit to common-workflow-language/cwl-v1.2 that referenced this pull request Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants