Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix st2-self-check script reporting success on failed runs #5487

Merged
merged 3 commits into from
Dec 9, 2021

Conversation

arm4b
Copy link
Member

@arm4b arm4b commented Dec 8, 2021

When the test run contains several nested workflows the script assumes it's succeeded if at least a single sub-workflow was successful. Instead of that, the check should refer to the parent workflow with "status: succeeded".

Example

id: 61b105a82f44ed586b2d24d0
action.ref: tests.test_inquiry_chain
parameters: 
  protocol: http
  token: bbeb3f7d29e240ac83ce3d53fff90745
status: failed 
result_task: get_workflow_details_1
result: 
  failed: true
  return_code: 2
  stderr: ""
  stdout: ''
  succeeded: false 
error: ""
traceback: None
failed_on: get_workflow_details_1
start_timestamp: Wed, 08 Dec 2021 19:21:12 UTC
end_timestamp: Wed, 08 Dec 2021 19:21:23 UTC
log: 
  - status: requested
    timestamp: '2021-12-08T19:21:12.277000Z'
  - status: scheduled
    timestamp: '2021-12-08T19:21:12.434000Z'
  - status: running
    timestamp: '2021-12-08T19:21:12.697000Z'
  - status: failed
    timestamp: '2021-12-08T19:21:23.334000Z'
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| id                       | status                 | task                     | action     | start_timestamp               |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| 61b105a8b1a7886ef90b7b15 | succeeded (4s elapsed) | execute_inquiry_workflow | core.local | Wed, 08 Dec 2021 19:21:12 UTC |
| 61b105acb1a7886ef90b7b17 | succeeded (2s elapsed) | get_inquiry_trigger      | core.local | Wed, 08 Dec 2021 19:21:16 UTC |
| 61b105afb1a7886ef90b7b19 | succeeded (0s elapsed) | get_inquiry_id           | core.local | Wed, 08 Dec 2021 19:21:19 UTC |
| 61b105b0b1a7886ef90b7b1b | succeeded (1s elapsed) | get_workflow_id          | core.local | Wed, 08 Dec 2021 19:21:20 UTC |
| 61b105b2b1a7886ef90b7b1d | failed (1s elapsed)    | get_workflow_details_1   | core.local | Wed, 08 Dec 2021 19:21:22 UTC |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+

When the test run contains several nested workflows the script assumes it's succeeded if at least single sub-workflow was successful.
Instead of that, the check should refer to the parent workflow "status: succeeded".
@arm4b arm4b added the bug label Dec 8, 2021
@arm4b arm4b added this to In progress in StackStorm v3.7.0 via automation Dec 8, 2021
@pull-request-size pull-request-size bot added the size/XS PR that changes 0-9 lines. Quick fix/merge. label Dec 8, 2021
@arm4b arm4b modified the milestone: 3.7.0 Dec 8, 2021
@arm4b arm4b requested a review from a team December 8, 2021 21:22
st2common/bin/st2-self-check Outdated Show resolved Hide resolved
@nzlosh
Copy link
Contributor

nzlosh commented Dec 9, 2021

Other than a small restriction on the regex pattern, this looks good to me.

@arm4b arm4b requested a review from nzlosh December 9, 2021 15:16
…ubstitution

```
echo ${OUTPUT}
```
results in entire output being on a single line, which is why it breaks the grep rule at the first place.

Fixes #5487 when st2-self-check script reported success on failed runs.
Copy link
Contributor

@nzlosh nzlosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cognifloyd
Copy link
Member

Now that you've already figured it out, I'm somewhere I can look up how I've dealt with something similar.

First I get an execution id (I only did chatops.match_and_execute, but it should be generalizable):

match_and_execute() {
  # usage: match_and_execute "<alias>" <timeout>
  # returns the execution id
  echoerr "chatops> ${1}"
  st2 --cacert=true run chatops.match_and_execute text="${1}" source_channel=vela user=${VELA_BUILD_AUTHOR} timeout=${2:-600} --attr result.result --json | jq -r .result.result
}

Then I query an execution status from bash:

assert_status() {
  # usage: assert_status <exec_id> <expected_status>
  local exec_status
  exec_status=$(st2 --cacert=true execution get --detail --json ${1} | jq -r .status)
  if [ "${2}" != "${exec_status}" ]; then
    echo failed
    echoerr "The alias execution should be ${2} but it is ${exec_status}!"
    return 4
  fi
  echo passed
  echo
}

Note how I used --detail --json. --detail makes it put only the current execution (no children) in a table, and then --json says "actually give me the table as json". Thus, you only get info about the current execution in json format instead of a series of json objects for the execution and its children.

This is what one of my tests using those functions looks like:

ALIAS_EXEC_ID=$(match_and_execute "tc list" 120)
echo ALIAS_EXEC_ID=${ALIAS_EXEC_ID}
assert_status "${ALIAS_EXEC_ID}" "succeeded"

@cognifloyd
Copy link
Member

To be clear, I'm not recommending you change the approach for this PR. I just wanted to share it while I was thinking about it as a possible future enhancement.

@arm4b
Copy link
Member Author

arm4b commented Dec 9, 2021

@cognifloyd That does multiple requests, adding smart code, complexity, and jq as a dependency.

Form the dull code we have:

    OUTPUT=$(st2 run ${TEST} protocol=${PROTOCOL} token=${ST2_AUTH_TOKEN})
    echo "${OUTPUT}" | grep "status" | grep -q "succeeded"
    EXIT_CODE=$?

    if [ ${EXIT_CODE} -ne 0 ]; then
        echo "Test output: ${OUTPUT}"
        ((ERRORS++))
    fi

In this case, the workflow would show the original human-readable execution output as-is, as someone would run it.
That's helpful in identifying more quickly in logs what's wrong when st2-self-check errored.
When the e2e tests failed, someone is looking at logs, and the instance is destroyed already, the things should be as simple and obvious as possible.

Just a different approach depending on what matters more in a particular situation.

@arm4b arm4b merged commit 8420fef into master Dec 9, 2021
StackStorm v3.7.0 automation moved this from In progress to Done Dec 9, 2021
@arm4b arm4b deleted the fix/st2-self-check branch December 9, 2021 22:11
@arm4b
Copy link
Member Author

arm4b commented Dec 9, 2021

@amanda11 @nzlosh @cognifloyd Thanks everyone for the reviews 😉 We found a better way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug size/XS PR that changes 0-9 lines. Quick fix/merge.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants