Fix st2-self-check script reporting success on failed runs #5487

arm4b · 2021-12-08T20:26:53Z

When the test run contains several nested workflows the script assumes it's succeeded if at least a single sub-workflow was successful. Instead of that, the check should refer to the parent workflow with "status: succeeded".

Example

id: 61b105a82f44ed586b2d24d0
action.ref: tests.test_inquiry_chain
parameters: 
  protocol: http
  token: bbeb3f7d29e240ac83ce3d53fff90745
status: failed 
result_task: get_workflow_details_1
result: 
  failed: true
  return_code: 2
  stderr: ""
  stdout: ''
  succeeded: false 
error: ""
traceback: None
failed_on: get_workflow_details_1
start_timestamp: Wed, 08 Dec 2021 19:21:12 UTC
end_timestamp: Wed, 08 Dec 2021 19:21:23 UTC
log: 
  - status: requested
    timestamp: '2021-12-08T19:21:12.277000Z'
  - status: scheduled
    timestamp: '2021-12-08T19:21:12.434000Z'
  - status: running
    timestamp: '2021-12-08T19:21:12.697000Z'
  - status: failed
    timestamp: '2021-12-08T19:21:23.334000Z'
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| id                       | status                 | task                     | action     | start_timestamp               |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| 61b105a8b1a7886ef90b7b15 | succeeded (4s elapsed) | execute_inquiry_workflow | core.local | Wed, 08 Dec 2021 19:21:12 UTC |
| 61b105acb1a7886ef90b7b17 | succeeded (2s elapsed) | get_inquiry_trigger      | core.local | Wed, 08 Dec 2021 19:21:16 UTC |
| 61b105afb1a7886ef90b7b19 | succeeded (0s elapsed) | get_inquiry_id           | core.local | Wed, 08 Dec 2021 19:21:19 UTC |
| 61b105b0b1a7886ef90b7b1b | succeeded (1s elapsed) | get_workflow_id          | core.local | Wed, 08 Dec 2021 19:21:20 UTC |
| 61b105b2b1a7886ef90b7b1d | failed (1s elapsed)    | get_workflow_details_1   | core.local | Wed, 08 Dec 2021 19:21:22 UTC |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+

When the test run contains several nested workflows the script assumes it's succeeded if at least single sub-workflow was successful. Instead of that, the check should refer to the parent workflow "status: succeeded".

st2common/bin/st2-self-check

nzlosh · 2021-12-09T14:30:14Z

Other than a small restriction on the regex pattern, this looks good to me.

…ubstitution ``` echo ${OUTPUT} ``` results in entire output being on a single line, which is why it breaks the grep rule at the first place. Fixes #5487 when st2-self-check script reported success on failed runs.

st2common/bin/st2-self-check

nzlosh

LGTM

cognifloyd · 2021-12-09T21:53:57Z

Now that you've already figured it out, I'm somewhere I can look up how I've dealt with something similar.

First I get an execution id (I only did chatops.match_and_execute, but it should be generalizable):

match_and_execute() {
  # usage: match_and_execute "<alias>" <timeout>
  # returns the execution id
  echoerr "chatops> ${1}"
  st2 --cacert=true run chatops.match_and_execute text="${1}" source_channel=vela user=${VELA_BUILD_AUTHOR} timeout=${2:-600} --attr result.result --json | jq -r .result.result
}

Then I query an execution status from bash:

assert_status() {
  # usage: assert_status <exec_id> <expected_status>
  local exec_status
  exec_status=$(st2 --cacert=true execution get --detail --json ${1} | jq -r .status)
  if [ "${2}" != "${exec_status}" ]; then
    echo failed
    echoerr "The alias execution should be ${2} but it is ${exec_status}!"
    return 4
  fi
  echo passed
  echo
}

Note how I used --detail --json. --detail makes it put only the current execution (no children) in a table, and then --json says "actually give me the table as json". Thus, you only get info about the current execution in json format instead of a series of json objects for the execution and its children.

This is what one of my tests using those functions looks like:

ALIAS_EXEC_ID=$(match_and_execute "tc list" 120)
echo ALIAS_EXEC_ID=${ALIAS_EXEC_ID}
assert_status "${ALIAS_EXEC_ID}" "succeeded"

cognifloyd · 2021-12-09T21:54:36Z

To be clear, I'm not recommending you change the approach for this PR. I just wanted to share it while I was thinking about it as a possible future enhancement.

arm4b · 2021-12-09T22:11:41Z

@cognifloyd That does multiple requests, adding smart code, complexity, and jq as a dependency.

Form the dull code we have:

    OUTPUT=$(st2 run ${TEST} protocol=${PROTOCOL} token=${ST2_AUTH_TOKEN})
    echo "${OUTPUT}" | grep "status" | grep -q "succeeded"
    EXIT_CODE=$?

    if [ ${EXIT_CODE} -ne 0 ]; then
        echo "Test output: ${OUTPUT}"
        ((ERRORS++))
    fi

In this case, the workflow would show the original human-readable execution output as-is, as someone would run it.
That's helpful in identifying more quickly in logs what's wrong when st2-self-check errored.
When the e2e tests failed, someone is looking at logs, and the instance is destroyed already, the things should be as simple and obvious as possible.

Just a different approach depending on what matters more in a particular situation.

arm4b · 2021-12-09T22:20:52Z

@amanda11 @nzlosh @cognifloyd Thanks everyone for the reviews 😉 We found a better way!

Fix st2-self-check script reporting success on failed runs

b3c4491

When the test run contains several nested workflows the script assumes it's succeeded if at least single sub-workflow was successful. Instead of that, the check should refer to the parent workflow "status: succeeded".

arm4b added the bug label Dec 8, 2021

arm4b added this to In progress in StackStorm v3.7.0 via automation Dec 8, 2021

pull-request-size bot added the size/XS PR that changes 0-9 lines. Quick fix/merge. label Dec 8, 2021

Add the changelog for #5487

8e09d0e

arm4b modified the milestone: 3.7.0 Dec 8, 2021

arm4b requested a review from a team December 8, 2021 21:22

nzlosh reviewed Dec 9, 2021

View reviewed changes

st2common/bin/st2-self-check Outdated Show resolved Hide resolved

arm4b requested a review from nzlosh December 9, 2021 15:16

arm4b commented Dec 9, 2021

View reviewed changes

st2common/bin/st2-self-check Show resolved Hide resolved

nzlosh approved these changes Dec 9, 2021

View reviewed changes

arm4b merged commit 8420fef into master Dec 9, 2021

StackStorm v3.7.0 automation moved this from In progress to Done Dec 9, 2021

arm4b deleted the fix/st2-self-check branch December 9, 2021 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix st2-self-check script reporting success on failed runs #5487

Fix st2-self-check script reporting success on failed runs #5487

arm4b commented Dec 8, 2021 •

edited

nzlosh commented Dec 9, 2021

nzlosh left a comment

cognifloyd commented Dec 9, 2021

cognifloyd commented Dec 9, 2021

arm4b commented Dec 9, 2021 •

edited

arm4b commented Dec 9, 2021 •

edited

Fix st2-self-check script reporting success on failed runs #5487

Fix st2-self-check script reporting success on failed runs #5487

Conversation

arm4b commented Dec 8, 2021 • edited

nzlosh commented Dec 9, 2021

nzlosh left a comment

Choose a reason for hiding this comment

cognifloyd commented Dec 9, 2021

cognifloyd commented Dec 9, 2021

arm4b commented Dec 9, 2021 • edited

arm4b commented Dec 9, 2021 • edited

arm4b commented Dec 8, 2021 •

edited

arm4b commented Dec 9, 2021 •

edited

arm4b commented Dec 9, 2021 •

edited