Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: StatusCheck in Workflow #299

Merged
merged 8 commits into from
May 27, 2022
Merged

Conversation

iguoyr
Copy link
Member

@iguoyr iguoyr commented May 23, 2022

close #254

Signed-off-by: SiyuChen <ryougi201@gmail.com>
Signed-off-by: SiyuChen <ryougi201@gmail.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented May 23, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • STRRL

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 23, 2022
@ti-chi-bot
Copy link
Member

@iguoyr: Adding label: do-not-merge/blocked-paths because PR changes a protected file.

Reasons for blocking this PR:

[This PR modifies the files under the docs or versioned_docs folder and requires the docs team to follow up on the PR.

/label documentation
]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot added do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. documentation Additions or improvements to documentation size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 23, 2022
@netlify
Copy link

netlify bot commented May 23, 2022

Deploy Preview for chaos-mesh-website-preview ready!

Name Link
🔨 Latest commit adf990a
🔍 Latest deploy log https://app.netlify.com/sites/chaos-mesh-website-preview/deploys/62906578375ecf0008e6f958
😎 Deploy Preview https://deploy-preview-299--chaos-mesh-website-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Signed-off-by: SiyuChen <ryougi201@gmail.com>
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 23, 2022
Signed-off-by: SiyuChen <ryougi201@gmail.com>
@iguoyr iguoyr marked this pull request as ready for review May 24, 2022 07:06
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2022
@iguoyr iguoyr requested a review from STRRL May 24, 2022 07:10
Copy link
Member

@STRRL STRRL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to move "Status Check Mode" to the position before "Status Check Reuslt"

What do you think about it?

Rest LGTM

docs/create-chaos-mesh-workflow.md Outdated Show resolved Hide resolved
@iguoyr
Copy link
Member Author

iguoyr commented May 24, 2022

I prefer to move "Status Check Mode" to the position before "Status Check Reuslt"

@STRRL hmmm, I put the "Status Check Result" first because that "Status Check Mode" needs to explain the difference between Continuous and Synchronous, and the result of status check is included, so I try to explain what a failure/success status check is before "Status Check Mode". 🤔

Signed-off-by: SiyuChen <ryougi201@gmail.com>
@STRRL
Copy link
Member

STRRL commented May 24, 2022

I prefer to move "Status Check Mode" to the position before "Status Check Reuslt"

@STRRL hmmm, I put the "Status Check Result" first because that "Status Check Mode" needs to explain the difference between Continuous and Synchronous, and the result of status check is included, so I try to explain what a failure/success status check is before "Status Check Mode". thinking

got that ❤️

Copy link
Member

@STRRL STRRL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 24, 2022
@STRRL
Copy link
Member

STRRL commented May 24, 2022

/cc @Oreoxmt PTAL

@ti-chi-bot ti-chi-bot requested a review from Oreoxmt May 24, 2022 08:47
@ti-chi-bot
Copy link
Member

@STRRL: GitHub didn't allow me to request PR reviews from the following users: PTAL.

Note that only chaos-mesh members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @Oreoxmt PTAL

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Oreoxmt Oreoxmt added v2.2 translation/no-need No need to translate this PR. labels May 24, 2022
@@ -199,6 +199,8 @@ It is flexible to create a workflow using a YAML file and `kubectl`. You can nes
| stressChao | object | Configures StressChaos. You need to configure this field when the type is `StressChaos`. See the [Simulate Heavy Stress on Kubernetes](simulate-heavy-stress-on-kubernetes.md) document for details. | None | No | |
| timeChaos | object | Configures TimeChaos. You need to configure this field when the type is `TimeChaos`. See the [SImulate Time Faults](simulate-time-chaos-on-kubernetes.md) document for details. | None | No | |
| schedule | object | Configures Schedule. You need to configure this field when the type is `Schedule`. See the [Define Scheduling Rules](define-scheduling-rules.md) document for details. | None | No | |
| statusCheck | object | Configures StatusCheck. You need to configure this field when the type is `StatusCheck`. See the [StatusCheck in Workflow](status-check-in-workflow.md) document for details. | None | No | |
| abortWithStatusCheck | bool | Configures whether abort the Workflow when StatusCheck is failed. You can configure this field when the type is `StatusCheck`. | false | No | `true` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| abortWithStatusCheck | bool | Configures whether abort the Workflow when StatusCheck is failed. You can configure this field when the type is `StatusCheck`. | false | No | `true` |
| abortWithStatusCheck | bool | Configures whether abort the Workflow when StatusCheck is failed. You can configure this field when the type is `StatusCheck`. | `false` | No | `true` |

Comment on lines 202 to 203
| statusCheck | object | 配置 StatusCheck,当 type 为 StatusCheck 时需要配置该字段。详见 [在工作流中进行状态检查](status-check-in-workflow.md) | 无 | 否 | |
| abortWithStatusCheck | bool | 配置当 StatusCheck 失败时是否终止 Workflow,当 type 为 StatusCheck 时可选配置该字段。 | false | 否 | `true` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| statusCheck | object | 配置 StatusCheck,当 type 为 StatusCheck 时需要配置该字段。详见 [在工作流中进行状态检查](status-check-in-workflow.md) ||| |
| abortWithStatusCheck | bool | 配置当 StatusCheck 失败时是否终止 Workflow,当 type 为 StatusCheck 时可选配置该字段。 | false || `true` |
| statusCheck | object | 配置 StatusCheck,当 type 为 StatusCheck 时需要配置该字段。详见[在工作流中进行状态检查](/status-check-in-workflow.md) ||| |
| abortWithStatusCheck | bool | 配置当 StatusCheck 失败时是否终止 Workflow,当 type 为 StatusCheck 时可选配置该字段。 | `false` || `true` |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/status-check-in-workflow.md

I just keep the same format with the above links in this document :)

title: 在工作流中进行状态检查
---

在 Workflow 中,状态检查可对外部系统(比如业务应用系统、监控系统)执行指定的操作来获得系统的状态,并当检查到系统不健康时可以自动地终止 Workflow,其概念类似于 Kubernetes 中的 `Container Probes`。本文介绍如果通过 yaml 的方式在 Workflow 中进行状态检查。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
在 Workflow 中,状态检查可对外部系统(比如业务应用系统、监控系统)执行指定的操作来获得系统的状态,并当检查到系统不健康时可以自动地终止 Workflow,其概念类似于 Kubernetes 中的 `Container Probes`本文介绍如果通过 yaml 的方式在 Workflow 中进行状态检查。
在 Workflow 中,状态检查可对外部系统(比如业务应用系统、监控系统)执行指定的操作来获得系统的状态,并当检查到系统不健康时可以自动地终止 Workflow,其概念类似于 Kubernetes 中的 `Container Probes`本文介绍如何通过 YAML 的方式在 Workflow 中进行状态检查。


:::note

Chaos Mesh does not yet support to create `StatusCheck` nodes on Chaos Dashboard, so you could only create `StatusCheck` nodes using yaml for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Chaos Mesh does not yet support to create `StatusCheck` nodes on Chaos Dashboard, so you could only create `StatusCheck` nodes using yaml for now.
Chaos Mesh does not yet support creating `StatusCheck` nodes on Chaos Dashboard, so you could only create `StatusCheck` nodes using YAML for now.


## Status Check Type

Chaos Mesh only support `HTTP` type to execute status check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Chaos Mesh only support `HTTP` type to execute status check.
Chaos Mesh only supports the `HTTP` type to execute a status check.

statusCode: "200"
```

In the configuration, the `StatusCheck` node will execute status checks every 1 second, and exit when any of the following conditions are met:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the configuration, the `StatusCheck` node will execute status checks every 1 second, and exit when any of the following conditions are met:
In the configuration, the `StatusCheck` node will execute status checks every second, and exit when any of the following conditions are met:

- The status check fails, i.e. 3 or more consecutive failed `execution results`
- Trigger the node timeout after 20 seconds

### One shot Status Check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### One shot Status Check
### One time Status Check

statusCode: "200"
```

In the configuration, the `StatusCheck` node will execute status checks every 1 second, and exit when any of the following conditions are met:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the configuration, the `StatusCheck` node will execute status checks every 1 second, and exit when any of the following conditions are met:
In the configuration, the `StatusCheck` node will execute status checks every second, and exit when any of the following conditions are met:

| --- | --- | --- | --- | --- | --- |
| mode | string | 状态检查的模式,可选值有:`Synchronous` / `Continuous`。| 无 | 是 | `Synchronous` |
| type | string | 状态检查的类型,可选值有:`HTTP`。 | `HTTP` | 是 | `HTTP` |
| duration | string | 当失败的执行次数小于 failureThreshold 时的状态检查的持续时间。`Duration` 字段对于 `Synchronous` 和 `Continuous` 模式的状态检查都适用。| 无 | 否 | `100s` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| duration | string | 当失败的执行次数小于 failureThreshold 时的状态检查的持续时间。`Duration` 字段对于 `Synchronous``Continuous` 模式的状态检查都适用。||| `100s` |
| duration | string | 当失败的执行次数小于 `failureThreshold` 时状态检查的持续时间。对于 `Synchronous``Continuous` 模式的状态检查都适用。||| `100s` |

| intervalSeconds | int | Defines how often (in seconds) to perform an execution of status check. | `1` | 否 | `1` |
| failureThreshold | int | 决定状态检查失败的最小连续失败次数。 | `3` | 否 | `3` |
| successThreshold | int | 决定状态检查成功的最小连续成功次数。 | `1` | 否 | `1` |
| recordsHistoryLimit | int | 保存历史执行记录的条数。 | 100 | 否 | `100` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| recordsHistoryLimit | int | 保存历史执行记录的条数。 | 100 || `100` |
| recordsHistoryLimit | int | 保存历史执行记录的条数。 | `100` || `100` |

Signed-off-by: SiyuChen <ryougi201@gmail.com>
@iguoyr
Copy link
Member Author

iguoyr commented May 26, 2022

@Oreoxmt Fixed! PTAL again ❤️

@cwen0 cwen0 requested a review from Oreoxmt May 27, 2022 02:59
Copy link
Contributor

@Oreoxmt Oreoxmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM


### Define a `HTTP` `StatusCheck` node

A `StatusCheck` node sends `GET` or `POST` HTTP requests to the specific URL, with custom request headers and request body, and then determines the result of the request by the conditions in the `criteria` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A `StatusCheck` node sends `GET` or `POST` HTTP requests to the specific URL, with custom request headers and request body, and then determines the result of the request by the conditions in the `criteria` field.
A `StatusCheck` node sends `GET` or `POST` HTTP requests to the specific URL, with custom headers and body, and then determines the result of the request by the conditions in the `criteria` field.


In the configuration, you can see a `StatusCheck` node with `HTTP` type. The `deadline` field specifies that this node could be executed for a maximum of 20 seconds. The `mode` field specifies that this node will execute status checks continuously. The `intervalSeconds` field specifies a repetition interval of 1 second. The `timeoutSeconds` field specifies the timeout for each execution.

When Workflow runs to this `StatusCheck` node, the specified status check would be executed every second. The status check uses the `GET` method to send an HTTP request to the URL `http://123.123.123.123`, if the response is returned within 1 second and the status code is `200`, this execution succeeds, otherwise it fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When Workflow runs to this `StatusCheck` node, the specified status check would be executed every second. The status check uses the `GET` method to send an HTTP request to the URL `http://123.123.123.123`, if the response is returned within 1 second and the status code is `200`, this execution succeeds, otherwise it fails.
When Workflow runs to this `StatusCheck` node, the specified status check would be executed every second. The status check uses the `GET` method to send an HTTP request to the URL `http://123.123.123.123`. If the response is returned within 1 second and the status code is `200`, this execution succeeds, otherwise it fails.

The status check is considered unsuccessful when any of the following conditions are met:

- The status check fails.
- When the `StatusCheck` node timeout is exceeded, and the `status check result` is not successful. For example, `successThreshold` is 1, `failureThreshold` is 3, and when the timeout is exceeded, there are 2 consecutive failures and 0 successes. Although it does not meet the condition for `status check fails`, it's also considered in this case, that the status check is unsuccessful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When the `StatusCheck` node timeout is exceeded, and the `status check result` is not successful. For example, `successThreshold` is 1, `failureThreshold` is 3, and when the timeout is exceeded, there are 2 consecutive failures and 0 successes. Although it does not meet the condition for `status check fails`, it's also considered in this case, that the status check is unsuccessful.
- When the `StatusCheck` node timeout is exceeded, and the `status check result` is not successful. For example, `successThreshold` is 1, `failureThreshold` is 3, and when the timeout is exceeded, there are 2 consecutive failures and 0 successes. Although it does not meet the condition for "status check fails", it is also considered to be unsuccessful in this case.


:::

## Status Check Type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Status Check Type
## Status Check type


When Workflow runs to this `StatusCheck` node, the specified status check would be executed every second. The status check uses the `GET` method to send an HTTP request to the URL `http://123.123.123.123`, if the response is returned within 1 second and the status code is `200`, this execution succeeds, otherwise it fails.

## Status Check Result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Status Check Result
## Status Check results

- The status check fails.
- When the `StatusCheck` node timeout is exceeded, and the `status check result` is not successful. For example, `successThreshold` is 1, `failureThreshold` is 3, and when the timeout is exceeded, there are 2 consecutive failures and 0 successes. Although it does not meet the condition for `status check fails`, it's also considered in this case, that the status check is unsuccessful.

## Status Check Mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Status Check Mode
## Status Check mode


:::note

当前 `StatusCheck` 节点还不支持在 Dashboard 上创建,只能通过 yaml 方式进行创建。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
当前 `StatusCheck` 节点还不支持在 Dashboard 上创建,只能通过 yaml 方式进行创建。
当前 `StatusCheck` 节点还不支持在 Dashboard 上创建,只能通过 YAML 方式进行创建。

title: Status Check in Workflow
---

In Workflow, the status check could execute specified operations on external systems, such as application systems and monitoring systems, to obtain their statuses, and automatically abort the `Workflow` when it finds the system is unhealthy. The concept is similar to `Container Probes` in Kubernetes. This article describes how to execute status checks in `Workflow` using YAML files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In Workflow, the status check could execute specified operations on external systems, such as application systems and monitoring systems, to obtain their statuses, and automatically abort the `Workflow` when it finds the system is unhealthy. The concept is similar to `Container Probes` in Kubernetes. This article describes how to execute status checks in `Workflow` using YAML files.
In Workflow, the status check could execute specified operations on external systems, such as application systems and monitoring systems, to obtain their statuses, and automatically abort the `Workflow` when it finds the system is unhealthy. The concept is similar to `Container Probes` in Kubernetes. This article describes how to execute status checks in Workflow using YAML files.

statusCode: "200"
```

In the configuration, the `StatusCheck` node will execute status checks every ssecond, and exit when any of the following conditions are met:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the configuration, the `StatusCheck` node will execute status checks every ssecond, and exit when any of the following conditions are met:
In the configuration, the `StatusCheck` node will execute status checks every second, and exit when any of the following conditions are met:

- The status check fails, i.e. 3 or more consecutive failed `execution results`
- Trigger the node timeout after 20 seconds

## Status Check vs HTTP Request Task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Status Check vs HTTP Request Task
## StatusCheck vs HTTP Request Task

Signed-off-by: SiyuChen <ryougi201@gmail.com>
@iguoyr
Copy link
Member Author

iguoyr commented May 27, 2022

@Oreoxmt Updated!


在 Workflow 中,状态检查可对外部系统(比如业务应用系统、监控系统)执行指定的操作来获得系统的状态,并当检查到系统不健康时可以自动地终止 Workflow,其概念类似于 Kubernetes 中的 `Container Probes`。本文介绍如何通过 YAML 的方式在 Workflow 中进行状态检查。

:::note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:::note
:::note 注意

- 当出现连续 1 次及以上“执行结果”为“成功”时,认为“状态检查结果”为成功
- 当出现连续 3 次及以上“执行结果”为“失败”时,认为“状态检查结果”为失败

:::note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:::note
:::note 注意


### 当状态检查不成功时,终止 Workflow

:::note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:::note
:::note 注意

Signed-off-by: SiyuChen <ryougi201@gmail.com>
@iguoyr
Copy link
Member Author

iguoyr commented May 27, 2022

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: adf990a

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 27, 2022
@cwen0 cwen0 removed the do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. label May 27, 2022
@ti-chi-bot ti-chi-bot merged commit e169e4b into chaos-mesh:master May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Additions or improvements to documentation size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. translation/no-need No need to translate this PR. v2.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add document for statuscheck in workflow
5 participants