Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH-1629] TerraExecutor gets method configuration version from Firecloud #585

Merged
merged 7 commits into from
Mar 3, 2022

Conversation

okotsopoulos
Copy link
Contributor

@okotsopoulos okotsopoulos commented Feb 25, 2022

Purpose

Previously, when creating a staged workload's TerraExecutor the caller first needed to query Firecloud API directly for their method configuration's present version. The intention was to make up for a lack of audit trail for method configuration changes, to make debugging easier in the case that workflows started failing after a change.

User feedback indicated that this Firecloud fetch was one of the more annoying start-up costs of launching a staged workload, without much benefit. I agree!

Now, we save the method configuration's version from Firecloud on workload creation rather than making the caller do the fetch themselves. When triggering Terra submissions, if the method configuration has an unexpected version we log a warning (previously an error) but proceed with processing.

Changes

  • TerraExecutor gets its initial method configuration version from Firecloud, not the user's request.
  • Downgraded mismatched version log from ERROR -> WARNING.
  • Updated and expanded tests.
  • Removed methodConfigurationVersion references from public-facing docs.

Review Instructions

  • Take a look at changes to wfl.executor/terra-executor-validate-request-or-throw.
  • Then, take a look at changes to wfl.integration.executor-test/test-validate-terra-executor-with-valid-executor-request.
  • See documentation removed in docs/md/staged-executor.md and smile.
  • If you'd like to see this in action, could run wfl.system.v1-endpoint-test/test-staged-workload, load the resulting workload, and see that the executor has its method configuration version set.

System Tests:

1 known unrelated failure (Slack, ticket):

$ make TARGET=system
export CPCACHE=/Users/okotsopo/wfl/api/.cpcache;            \
	export WFL_WFL_URL=http://localhost:3000; \
	clojure  -M:parallel-test wfl.system.v1-endpoint-test | \
	tee /Users/okotsopo/wfl/derived/api/system.log
WARNING: Specified path is external to project: ../derived/api/src
WARNING: Specified path is external to project: ../derived/api/resources

ERROR in (test-workload-sink-outputs-to-tdr) (util.clj:390)
The workload should have finished
expected: (util/poll (fn* [] (-> workload :uuid endpoints/get-workload-status :finished)) 20 100)
  actual: java.util.concurrent.TimeoutException: Max number of attempts exceeded
 at wfl.util$poll.invokeStatic (util.clj:390)
    wfl.util$poll.invoke (util.clj:383)
    clojure.lang.AFn.applyToHelper (AFn.java:160)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invokeStatic (core.clj:667)
    clojure.core$apply.invoke (core.clj:662)
    wfl.system.v1_endpoint_test$fn__793$fn__796$fn__797$fn__799$fn__827.invoke (v1_endpoint_test.clj:476)
    wfl.system.v1_endpoint_test$fn__793$fn__796$fn__797$fn__799.invoke (v1_endpoint_test.clj:476)
    wfl.system.v1_endpoint_test$fn__793$fn__796$fn__797.invoke (v1_endpoint_test.clj:413)
    wfl.util$bracket.invokeStatic (util.clj:362)
    wfl.util$bracket.invoke (util.clj:352)
    wfl.tools.fixtures$with_shared_temporary_workspace_clone.invokeStatic (fixtures.clj:242)
    wfl.tools.fixtures$with_shared_temporary_workspace_clone.invoke (fixtures.clj:229)
    wfl.system.v1_endpoint_test$fn__793$fn__796.invoke (v1_endpoint_test.clj:413)
    wfl.util$bracket.invokeStatic (util.clj:362)
    wfl.util$bracket.invoke (util.clj:352)
    wfl.tools.fixtures$with_temporary_dataset.invokeStatic (fixtures.clj:191)
    wfl.tools.fixtures$with_temporary_dataset.invoke (fixtures.clj:188)
    wfl.system.v1_endpoint_test$fn__793.invokeStatic (v1_endpoint_test.clj:413)
    wfl.system.v1_endpoint_test/fn (v1_endpoint_test.clj:412)
    clojure.test$test_var$fn__9761.invoke (test.clj:717)
    clojure.test$test_var.invokeStatic (test.clj:717)
    clojure.test$test_var.invoke (test.clj:708)
    wfl.system.v1_endpoint_test$test_workload_sink_outputs_to_tdr.invokeStatic (v1_endpoint_test.clj:412)
    wfl.system.v1_endpoint_test$test_workload_sink_outputs_to_tdr.invoke (v1_endpoint_test.clj:412)
    clojure.lang.Var.invoke (Var.java:380)
    wfl.tools.parallel_runner$_main$run_test_BANG___12.invoke (parallel_runner.clj:18)
    wfl.tools.parallel_runner$_main$fn__15$fn__16.invoke (parallel_runner.clj:26)
    clojure.core$binding_conveyor_fn$fn__5772.invoke (core.clj:2034)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:264)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
    java.lang.Thread.run (Thread.java:834)

FAIL in (test-workload-sink-outputs-to-tdr) (v1_endpoint_test.clj:480)
outputs should have been written to the dataset
expected: (seq (:rows (datarepo/query-table dataset "outputs")))
  actual: (not (seq ()))


Ran 33 tests containing 374 assertions.
1 failures, 1 errors.

Copy link
Contributor

@tbl3rd tbl3rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But "throw or warn" seems an odd combination.
Why choose one over the other?
Why throw from terra-executor-validate-request-or-throw
but not from update-method-configuration!?

@okotsopoulos
Copy link
Contributor Author

But "throw or warn" seems an odd combination.
Why choose one over the other?
Why throw from terra-executor-validate-request-or-throw
but not from update-method-configuration!?

Good question. I left this PR a draft for this reason. Stay tuned, but I expect to remove this check entirely from the request validation (no more throwing). One of the more annoying parts of staged workload creation is asking Firecloud for the method configuration's current version, then copying that into the request. We can remove that requirement and save the method configuration's version ourself during workload creation.

There is value in the life of a workload of knowing when the version misaligns with what we expect in the absence of a Terra method config audit trail, and that is worth a warning.

@okotsopoulos okotsopoulos changed the title [GH-1629] Warn instead of error on method config version mismatch [GH-1629] TerraExecutor gets method configuration version from Firecloud Mar 3, 2022
(defn ^:private warn-on-method-config-version-mismatch
"Log warning if `methodConfigVersion` does not match `expected`."
[{:keys [methodConfigVersion] :as methodconfig} expected]
(when-not (= expected methodConfigVersion)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= instead of == because methodConfigVersion is nullable now, if skipping validation of the TerraExecutor request on workload creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of skipValidation?
(-How would one type that?-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the request has skipValidation set then methodConfigVersion is initially unset in the executor.

@@ -75,39 +72,6 @@ Prerequisites:
- The method configuration must exist within `workspace` prior to
workload creation.

#### `methodConfigurationVersion`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun to remove this.

@okotsopoulos okotsopoulos marked this pull request as ready for review March 3, 2022 17:02
@tbl3rd tbl3rd self-requested a review March 3, 2022 17:51
Copy link
Contributor

@tbl3rd tbl3rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks.

api/src/wfl/executor.clj Outdated Show resolved Hide resolved
(defn ^:private warn-on-method-config-version-mismatch
"Log warning if `methodConfigVersion` does not match `expected`."
[{:keys [methodConfigVersion] :as methodconfig} expected]
(when-not (= expected methodConfigVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of skipValidation?
(-How would one type that?-)

api/src/wfl/executor.clj Show resolved Hide resolved
api/test/wfl/integration/executor_test.clj Show resolved Hide resolved
(is executor)
(is (== testing-method-configuration-version
(:methodConfigurationVersion executor))))]
(testing "request without method configuration version"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deliberately excluding nil above?

api/test/wfl/integration/executor_test.clj Outdated Show resolved Hide resolved
api/test/wfl/integration/modules/staged_test.clj Outdated Show resolved Hide resolved
api/test/wfl/integration/executor_test.clj Outdated Show resolved Hide resolved
docs/md/staged-executor.md Outdated Show resolved Hide resolved
@okotsopoulos
Copy link
Contributor Author

Rerun of system tests with latest changes passes:

wm111-e35:wfl okotsopo$ make TARGET=system
export CPCACHE=/Users/okotsopo/wfl/api/.cpcache;            \
	export WFL_WFL_URL=http://localhost:3000; \
	clojure  -M:parallel-test wfl.system.v1-endpoint-test | \
	tee /Users/okotsopo/wfl/derived/api/system.log
WARNING: Specified path is external to project: ../derived/api/src
WARNING: Specified path is external to project: ../derived/api/resources

Ran 33 tests containing 374 assertions.
0 failures, 0 errors.
api system finished on Thu Mar 3 15:47:28 EST 2022
docs system finished on Thu Mar 3 15:47:29 EST 2022
functions/aou system finished on Thu Mar 3 15:47:29 EST 2022
functions/sg system finished on Thu Mar 3 15:47:29 EST 2022
helm system finished on Thu Mar 3 15:47:29 EST 2022
ui system finished on Thu Mar 3 15:47:29 EST 2022

@okotsopoulos okotsopoulos merged commit d6cedb8 into develop Mar 3, 2022
@okotsopoulos okotsopoulos deleted the okotsopo/GH-1629-warn-on-mc-version-mismatch branch March 3, 2022 21:13
okotsopoulos added a commit that referenced this pull request Mar 4, 2022
GH-1629 TerraExecutor gets method configuration version from Firecloud (#585)
GH-1553 GH-1604 Add error handling to Slack service (#584)
GH-1628 Look at the first 7 workloads returned in test-workflows-by-filters. (#583)
GH-1617 Consistent workload representation for logs (#579)
GH-1624 Rename covid -> staged in code, docs (#580)
okotsopoulos added a commit that referenced this pull request Mar 7, 2022
GH-1629 TerraExecutor gets method configuration version from Firecloud (#585)
GH-1553 GH-1604 Add error handling to Slack service (#584)
GH-1628 Look at the first 7 workloads returned in test-workflows-by-filters. (#583)
GH-1617 Consistent workload representation for logs (#579)
GH-1624 Rename covid -> staged in code, docs (#580)
okotsopoulos added a commit that referenced this pull request Mar 7, 2022
GH-1629 TerraExecutor gets method configuration version from Firecloud (#585)
GH-1553 GH-1604 Add error handling to Slack service (#584)
GH-1628 Look at the first 7 workloads returned in test-workflows-by-filters. (#583)
GH-1617 Consistent workload representation for logs (#579)
GH-1624 Rename covid -> staged in code, docs (#580)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants