Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WX-757 Fix workflow stuck in aborting after WDL type error #7385

Merged
merged 14 commits into from
Mar 14, 2024

Conversation

aednichols
Copy link
Contributor

This bug was caused by "userland" (WDL) code throwing an exception where we did not expect it, causing issues for the "kernel" (Cromwell).

Now we encapsulate possible exceptions in an ErrorOr so they can be safely evaluated in the case Left(f) =>. As a stylistic point, I changed WomMap() to WomMap.apply() to make it more obvious that we're really doing a lot of custom stuff and it's not just a regular old constructor.

As a bonus, the new code detects all previous stuck-aborting workflows and fails them.

After:

INFO  - WorkflowManagerActor: Workflow 1432d67e-3e95-40c8-acbd-d42f75040f1b failed (during ExecutingWorkflowState): cromwell.engine.workflow.lifecycle.execution.WdlRuntimeException: Failed to evaluate 'example2' (reason 1 of 1): Evaluating { "second": test, "lowerLayer": example1 } failed: Cannot construct WomMapType(WomStringType,WomAnyType) with mixed types in map values: [WomString(Hello World), WomObject(Map(first -> WomString(Hello World), number -> WomInteger(2)),WomCompositeType(Map(first -> WomStringType, number -> WomIntegerType),Some(firstLayer)))]
INFO  - WorkflowManagerActor: Workflow actor for 1432d67e-3e95-40c8-acbd-d42f75040f1b completed with status 'Failed'. The workflow will be removed from the workflow store.
{
	"status": "Failed",
	"id": "1432d67e-3e95-40c8-acbd-d42f75040f1b"
}

Before:

ERROR - Cannot construct WomMapType(WomStringType,WomAnyType) with mixed types in map values: [WomString(Hello World), WomObject(Map(first -> WomString(Hello World), number -> WomInteger(2)),WomCompositeType(Map(first -> WomStringType, number -> WomIntegerType),Some(firstLayer)))]
java.lang.UnsupportedOperationException: Cannot construct WomMapType(WomStringType,WomAnyType) with mixed types in map values: [WomString(Hello World), WomObject(Map(first -> WomString(Hello World), number -> WomInteger(2)),WomCompositeType(Map(first -> WomStringType, number -> WomIntegerType),Some(firstLayer)))]
	at wom.values.WomMap.<init>(WomMap.scala:79)
	at wom.values.WomMap$.apply(WomMap.scala:54)
	at wom.values.WomMap$.coerceMap(WomMap.scala:34)
	at wom.values.WomMap$.apply(WomMap.scala:50)
	at wdl.transforms.base.linking.expression.values.LiteralEvaluators$$anon$5.$anonfun$evaluateValue$12(LiteralEvaluators.scala:90)
	at cats.data.Validated.map(Validated.scala:559)
	at wdl.transforms.base.linking.expression.values.LiteralEvaluators$$anon$5.evaluateValue(LiteralEvaluators.scala:87)
	at wdl.transforms.base.linking.expression.values.LiteralEvaluators$$anon$5.evaluateValue(LiteralEvaluators.scala:73)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue$(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$ops$$anon$1.evaluateValue(ValueEvaluator.scala:10)
	at wdl.draft3.transforms.linking.expression.values.package$$anon$1.evaluateValue(package.scala:36)
	at wdl.draft3.transforms.linking.expression.values.package$$anon$1.evaluateValue(package.scala:22)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue$(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$ops$$anon$1.evaluateValue(ValueEvaluator.scala:10)
	at wdl.transforms.base.wdlom2wom.expression.WdlomWomExpression.evaluateValue(WdlomWomExpression.scala:37)
	at wom.graph.expression.ExpressionNode.evaluateAndCoerce(ExpressionNode.scala:35)
	at wom.graph.expression.ExpressionNode.$anonfun$evaluate$2(ExpressionNode.scala:45)
	at scala.util.Either.flatMap(Either.scala:352)
	at wom.graph.expression.ExpressionNode.evaluate(ExpressionNode.scala:44)
	at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.processRunnable(ExpressionKey.scala:31)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$startRunnableNodes$7(WorkflowExecutionActor.scala:644)
	at scala.collection.immutable.List.map(List.scala:246)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.cromwell$engine$workflow$lifecycle$execution$WorkflowExecutionActor$$startRunnableNodes(WorkflowExecutionActor.scala:636)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:235)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:233)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:266)
	at akka.actor.FSM.processEvent(FSM.scala:710)
	at akka.actor.FSM.processEvent$(FSM.scala:704)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowExecutionActor.scala:57)
	at akka.actor.LoggingFSM.processEvent(FSM.scala:847)
	at akka.actor.LoggingFSM.processEvent$(FSM.scala:829)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processEvent(WorkflowExecutionActor.scala:57)
	at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:701)
	at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:695)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$receive$1.applyOrElse(WorkflowExecutionActor.scala:576)
	at akka.actor.Actor.aroundReceive(Actor.scala:539)
	at akka.actor.Actor.aroundReceive$(Actor.scala:537)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$Timers$$super$aroundReceive(WorkflowExecutionActor.scala:57)
	at akka.actor.Timers.aroundReceive(Timers.scala:51)
	at akka.actor.Timers.aroundReceive$(Timers.scala:40)
	at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.aroundReceive(WorkflowExecutionActor.scala:57)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614)
	at akka.actor.ActorCell.invoke(ActorCell.scala:583)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
	at akka.dispatch.Mailbox.run(Mailbox.scala:229)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:241)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2024-03-12 18:49:15 cromwell-system-akka.actor.default-dispatcher-3 INFO  - Message [cromwell.engine.workflow.lifecycle.EngineLifecycleActorAbortCommand$] from Actor[akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-262d278a-cc62-4458-9150-f31976c2c554#401797350] to Actor[akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-262d278a-cc62-4458-9150-f31976c2c554/WorkflowExecutionActor-262d278a-cc62-4458-9150-f31976c2c554#-742739735] was not delivered. [1] dead letters encountered, no more dead letters will be logged. If this is not an expected behavior, then [Actor[akka://cromwell-system/user/cromwell-service/WorkflowManagerActor/WorkflowActor-262d278a-cc62-4458-9150-f31976c2c554/WorkflowExecutionActor-262d278a-cc62-4458-9150-f31976c2c554#-742739735]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
{
	"status": "Aborting",
	"id": "262d278a-cc62-4458-9150-f31976c2c554"
}

@aednichols aednichols requested a review from a team as a code owner March 12, 2024 19:00
Copy link
Contributor

@jgainerdewar jgainerdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent sleuthing!

@aednichols
Copy link
Contributor Author

Adding on to this PR... a third (!) variant that is more recent (Slack link) and ended up being an array issue. I think we understand the pattern at this point so not adding a Centaur test for every single WOM type.

Could not construct array of type WomMaybeEmptyArrayType(WomOptionalType(WomMaybeEmptyArrayType(WomStringType))) with this value: List(WomOptionalValue(WomMaybeEmptyArrayType(WomSingleFileType),Some([])), ["gs://fc-secure-0a879173-62d3-4c3a-8fc3-e35ee4248901/whitelists/10x_multiome/737K-arc-v1_ATAC_whitelist.txt.gz"])java.lang.UnsupportedOperationException: Could not construct array of type WomMaybeEmptyArrayType(WomOptionalType(WomMaybeEmptyArrayType(WomStringType))) with this value: List(WomOptionalValue(WomMaybeEmptyArrayType(WomSingleFileType),Some([])), ["gs://fc-secure-0a879173-62d3-4c3a-8fc3-e35ee4248901/whitelists/10x_multiome/737K-arc-v1_ATAC_whitelist.txt.gz"])
	at wom.values.WomArray$.apply(WomArray.scala:43)
	at wom.values.WomArray$.apply(WomArray.scala:49)
	at wdl.transforms.base.linking.expression.values.LiteralEvaluators$$anon$6.$anonfun$evaluateValue$16(LiteralEvaluators.scala:109)
	at cats.data.Validated.map(Validated.scala:559)
	at wdl.transforms.base.linking.expression.values.LiteralEvaluators$$anon$6.evaluateValue(LiteralEvaluators.scala:106)
	at wdl.transforms.base.linking.expression.values.LiteralEvaluators$$anon$6.evaluateValue(LiteralEvaluators.scala:95)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue$(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$ops$$anon$1.evaluateValue(ValueEvaluator.scala:10)
	at wdl.draft3.transforms.linking.expression.values.package$$anon$1.evaluateValue(package.scala:37)
	at wdl.draft3.transforms.linking.expression.values.package$$anon$1.evaluateValue(package.scala:22)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue$(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$ops$$anon$1.evaluateValue(ValueEvaluator.scala:10)
	at wdl.transforms.base.linking.expression.values.EngineFunctionEvaluators$$anon$23.evaluateValue(EngineFunctionEvaluators.scala:544)
	at wdl.transforms.base.linking.expression.values.EngineFunctionEvaluators$$anon$23.evaluateValue(EngineFunctionEvaluators.scala:537)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue(ValueEvaluator.scala:10)
	at wdl.model.draft3.graph.expression.ValueEvaluator$Ops.evaluateValue$(ValueEvaluator.scala:10)

@aednichols aednichols enabled auto-merge (squash) March 13, 2024 02:51
@aednichols aednichols merged commit b79ead2 into develop Mar 14, 2024
34 checks passed
@aednichols aednichols deleted the aen_wx_757 branch March 14, 2024 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants