Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller): Add liveness probe #5875

Merged
merged 5 commits into from May 24, 2021
Merged

feat(controller): Add liveness probe #5875

merged 5 commits into from May 24, 2021

Conversation

alexec
Copy link
Contributor

@alexec alexec commented May 10, 2021

No description provided.

Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec
Copy link
Contributor Author

alexec commented May 10, 2021

@sarabala1979 while easy to implement, hard to create the circumstances to test, as you need to simulate a random crash in the controller, which does not happen.

Is the test to just see if we don't see any more situations when we have pending workflows? Maybe need to dig deeper?

@alexec alexec marked this pull request as ready for review May 19, 2021 15:33
@alexec alexec requested a review from sarabala1979 May 19, 2021 15:34
func healthz(ctx context.Context, wfclientset wfclientset.Interface, managedNamespace string) {
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
err := func() error {
list, err := wfclientset.ArgoprojV1alpha1().Workflows(managedNamespace).List(ctx, metav1.ListOptions{LabelSelector: "!" + common.LabelKeyPhase})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instance ID needed here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instanceID?

@alexec alexec requested review from jessesuen and whynowy May 20, 2021 18:23
cmd/workflow-controller/healthz.go Outdated Show resolved Hide resolved
func healthz(ctx context.Context, wfclientset wfclientset.Interface, managedNamespace string) {
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
err := func() error {
list, err := wfclientset.ArgoprojV1alpha1().Workflows(managedNamespace).List(ctx, metav1.ListOptions{LabelSelector: "!" + common.LabelKeyPhase})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instanceID?

# This takes advantage of the fact that if the metrics service has died,
# then the controller has died.
# In testing, it appears to take 60-90s from failure to restart.
- containerPort: 6060
Copy link
Contributor Author

@alexec alexec May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this port also exposes pprof endpoints, but these are already accessible

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Alex Collins <alex_collins@intuit.com>
func (wfc *WorkflowController) Healthz(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
instanceID := wfc.Config.InstanceID
instanceIDSelector := func() string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whynowy added instance ID, I had to make this a receiver func on WorkflowController because main.go does not know the instance ID

}
return nil
}()
log.WithField("err", err).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now log diagnostics time it is requested, this will help deal with problems.

@alexec
Copy link
Contributor Author

alexec commented May 24, 2021

@whynowy ready for review again

@alexec alexec enabled auto-merge (squash) May 24, 2021 18:07
@codecov
Copy link

codecov bot commented May 24, 2021

Codecov Report

Merging #5875 (6609047) into master (46dcaea) will decrease coverage by 0.09%.
The diff coverage is 0.00%.

❗ Current head 6609047 differs from pull request most recent head 4575237. Consider uploading reports for the commit 4575237 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5875      +/-   ##
==========================================
- Coverage   47.39%   47.30%   -0.10%     
==========================================
  Files         247      248       +1     
  Lines       15623    15649      +26     
==========================================
- Hits         7405     7403       -2     
- Misses       7286     7313      +27     
- Partials      932      933       +1     
Impacted Files Coverage Δ
workflow/controller/healthz.go 0.00% <0.00%> (ø)
cmd/argo/commands/get.go 56.45% <0.00%> (-0.65%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46dcaea...4575237. Read the comment docs.

Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec alexec merged commit d55a8db into argoproj:master May 24, 2021
@alexec alexec deleted the healthz branch May 24, 2021 18:35
@alexec alexec mentioned this pull request May 24, 2021
14 tasks
alexec added a commit that referenced this pull request May 24, 2021
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@sarabala1979 sarabala1979 mentioned this pull request Jun 10, 2021
88 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants