Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: latency-based failover #119

Merged
merged 7 commits into from Jul 10, 2023
Merged

feat: latency-based failover #119

merged 7 commits into from Jul 10, 2023

Conversation

mzz2017
Copy link
Contributor

@mzz2017 mzz2017 commented Jun 4, 2023

Background

I want to treat my US node as main use, and if it is down, failover to HK node. In this way, US nodes with high latency can be used preferentially.

Idea

Normalize all failover situations to latency-based solution.

This solution has following advantages:

  1. If the US node is under DDOS, the connectivity is not stable and the node will be up and down. In this situation, a timeout (10000ms) will be added to the latency every down and this node will have lower priority. The priority will recover automatically if DDOS is over.
  2. Reuse current policies: min, min_avg10, min_moving_avg, ...
group {
  my_group {
    filter: name(my_HKnode) [add_latency: 300ms]
    filter: name(my_USnode, node_c) [add_latency: -500ms]
    filter: name(node_d, node_e, node_f)
    policy: min_avg10
  }
}

Thus, we need to introduce the annotation grammar([add_latency: 50ms]).

Checklist

Full changelog

  • Support annotation syntax for declaration ([key: value, ...]).
  • Support multiple filters in one group.
  • Support add_latency annotation for filter and apply it to corresponding dialers (if multiple filter hits, only the first works).

Issue reference

Fix #118

@mzz2017 mzz2017 changed the title stash feat: latency-based failover Jun 4, 2023
@mzz2017 mzz2017 marked this pull request as ready for review June 4, 2023 13:18
@mzz2017 mzz2017 marked this pull request as draft June 4, 2023 15:29
@mzz2017
Copy link
Contributor Author

mzz2017 commented Jun 4, 2023

Some changes need to do. The added latency should be in group level.

@mzz2017
Copy link
Contributor Author

mzz2017 commented Jun 10, 2023

One more thing:
It changed behavior that "no filter" means "no filter" to "no filter" means "no node". It should be fixed.

@mzz2017 mzz2017 marked this pull request as ready for review July 9, 2023 12:29
@mzz2017
Copy link
Contributor Author

mzz2017 commented Jul 9, 2023

"Group level 'add_latency'" and "no filter problem" have been fixed. Ready for review.

@mzz2017
Copy link
Contributor Author

mzz2017 commented Jul 10, 2023

Test result:

image

Copy link
Contributor

@dae-prow dae-prow bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧪 Since the PR has been fully tested, please consider merging it.

@mzz2017
Copy link
Contributor Author

mzz2017 commented Jul 10, 2023

Another test result:
image

@mzz2017 mzz2017 merged commit 11d2ea9 into main Jul 10, 2023
8 checks passed
@mzz2017 mzz2017 deleted the feat_add_latency_by_filter branch July 10, 2023 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request failover feature
2 participants