Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpchaos: chaos-tproxy will be recreate if instance not found #2610

Merged
merged 7 commits into from
Jan 4, 2022

Conversation

Hexilee
Copy link
Member

@Hexilee Hexilee commented Dec 7, 2021

Signed-off-by: xixi i@hexilee.me

What problem does this PR solve?

Close #2536

What's changed and how it works?

Related changes

  • Need to update chaos-mesh/website
  • Need to update Dashboard UI
  • Need to cheery-pick to release branches
    • release-2.1
    • release-2.0

Checklist

Tests

  • Unit test
  • E2E test
  • No code
  • Manual test (add steps below)

Side effects

  • Breaking backward compatibility

Release note

Please add a release note.

You can safely ignore this section if you don't think this PR needs a release note.

DCO

If you find the DCO check fails, please run commands like below (Depends on the actual situations. For example, if the failed commit isn't the most recent) to fix it:

git commit --amend --signoff
git push --force

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Dec 7, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Andrewmatilde
  • cwen0

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Copy link
Member

@Andrewmatilde Andrewmatilde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -64,15 +64,15 @@ func (s *DaemonServer) ApplyHttpChaos(ctx context.Context, in *pb.ApplyHttpChaos
log := log.WithValues("Request", in)
log.Info("applying http chaos")

if in.Instance == 0 {
stdio := s.backgroundProcessManager.Stdio(int(in.Instance), in.StartTime)
for stdio == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a times limit? If the instances can never be found, here will block the whole program

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it would be stuck in here if chaos-tproxy failed to start by invalid arguments or other potential bugs.

And the deeper issue is that we do not specify the lifecycle/status of a process managed by bpm, we always assume that processes controlled by bpm SHOULD work well, but that's out of the scope of this PR.

PTAL @Hexilee

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexilee PTAL

Copy link
Member Author

@Hexilee Hexilee Dec 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a times limit? If the instances can never be found, here will block the whole program

I think the whole program shouldn't be blocked as this is just a gRPC call, only a goroutine will be blocked. Moreover, this call will timeout if the instances can never be found.

+1, it would be stuck in here if chaos-tproxy failed to start by invalid arguments or other potential bugs.

And the deeper issue is that we do not specify the lifecycle/status of a process managed by bpm, we always assume that processes controlled by bpm SHOULD work well, but that's out of the scope of this PR.

PTAL @Hexilee

If chaos-tproxy failed to start by invalid arguments or other potential bugs, this experiment SHOULD fail by gRPC timeout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first not-found may be caused by restarting of chaos-daemon, so we try to recreate one; the second not-found is unexpected, so we should retry once only.

PTAL changes @STRRL @cwen0

@codecov
Copy link

codecov bot commented Dec 24, 2021

Codecov Report

Merging #2610 (848aeac) into master (8c448f4) will decrease coverage by 0.17%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2610      +/-   ##
==========================================
- Coverage   41.16%   40.99%   -0.18%     
==========================================
  Files         144      144              
  Lines       11713    11719       +6     
==========================================
- Hits         4822     4804      -18     
- Misses       6529     6552      +23     
- Partials      362      363       +1     
Impacted Files Coverage Δ
pkg/chaosdaemon/httpchaos_server.go 0.00% <0.00%> (ø)
.../workflow/controllers/workflow_entry_reconciler.go 46.03% <0.00%> (-6.88%) ⬇️
pkg/workflow/controllers/deadline_reconciler.go 64.44% <0.00%> (-5.19%) ⬇️
pkg/workflow/controllers/utils.go 87.30% <0.00%> (+1.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c448f4...848aeac. Read the comment docs.

Signed-off-by: xixi <i@hexilee.me>
Signed-off-by: xixi <i@hexilee.me>
Signed-off-by: xixi <i@hexilee.me>
Copy link
Member

@cwen0 cwen0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Andrewmatilde
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: e92c952

@ti-chi-bot ti-chi-bot merged commit 8cdbeac into chaos-mesh:master Jan 4, 2022
craig-seeman pushed a commit to craig-seeman/chaos-mesh that referenced this pull request Jan 10, 2022
…-mesh#2610)

* httpchaos: chaos-tproxy will be recreate if instance not found

Signed-off-by: xixi <i@hexilee.me>

* retry only once if tproxy notfound

Signed-off-by: xixi <i@hexilee.me>

* format

Signed-off-by: xixi <i@hexilee.me>

Co-authored-by: Andrewmatilde <davis6813585853062@outlook.com>
Co-authored-by: STRRL <str_ruiling@outlook.com>
Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Signed-off-by: Craig Seeman <cseeman@zendesk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chaos-tproxy process not found after chaos-daemon restarting
5 participants