Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support controller as etcd server #1803

Merged
merged 51 commits into from Aug 30, 2023

Conversation

AlinsRan
Copy link
Contributor

@AlinsRan AlinsRan commented Apr 21, 2023

Type of change:

  • Bugfix
  • New feature provided
  • Improve performance
  • Backport patches
  • Documentation
  • Refactor
  • Chore
  • CI/CD or Tests

What this PR does / why we need it:

ref:

Pre-submission checklist:

  • Did you explain what problem does this PR solve? Or what new features have been added?
  • Have you added corresponding test cases?
  • Have you modified the corresponding document?
  • Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

@codecov-commenter
Copy link

codecov-commenter commented Apr 23, 2023

Codecov Report

Merging #1803 (8836478) into master (e0a2b17) will decrease coverage by 3.61%.
Report is 3 commits behind head on master.
The diff coverage is 12.95%.

❗ Current head 8836478 differs from pull request most recent head 5433e2a. Consider uploading reports for the commit 5433e2a to get more accurate results

@@            Coverage Diff             @@
##           master    #1803      +/-   ##
==========================================
- Coverage   40.47%   36.87%   -3.61%     
==========================================
  Files          92       93       +1     
  Lines        7100     7851     +751     
==========================================
+ Hits         2874     2895      +21     
- Misses       3842     4568     +726     
- Partials      384      388       +4     
Files Changed Coverage Δ
pkg/apisix/apisix.go 61.22% <ø> (ø)
pkg/apisix/consumer.go 26.06% <0.00%> (-11.35%) ⬇️
pkg/apisix/global_rule.go 26.06% <0.00%> (-11.35%) ⬇️
pkg/apisix/plugin_metadata.go 2.23% <0.00%> (-2.42%) ⬇️
pkg/apisix/pluginconfig.go 24.40% <0.00%> (-13.66%) ⬇️
pkg/apisix/route.go 25.38% <0.00%> (-14.47%) ⬇️
pkg/apisix/ssl.go 25.88% <0.00%> (-12.46%) ⬇️
pkg/apisix/stream_route.go 25.37% <0.00%> (-11.86%) ⬇️
pkg/apisix/upstream.go 31.62% <0.00%> (-18.75%) ⬇️
pkg/apisix/utils.go 0.92% <0.00%> (-0.30%) ⬇️
... and 10 more

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tao12345666333
Copy link
Member

I want to put this to v1.8.

So we won't do a detailed review of this PR before the release of v1.7.

Copy link

@leowmjw leowmjw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo. Does this feature means the adpater is using btree in-mem?

@@ -390,6 +433,7 @@ func (c *Controller) run(ctx context.Context) {
KubeClient: c.kubeClient,
MetricsCollector: c.MetricsCollector,
Recorder: c.recorder,
Eletor: c.elector,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - Elector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Updated.

@@ -280,7 +279,7 @@ updateStatus:
}

func (c *apisixUpstreamController) updateStatus(obj kube.ApisixUpstream, statusErr error) {
if obj == nil {
if obj == nil || c.Kubernetes.DisableStatusUpdates || !c.Eletor.IsLeader() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - c.Elector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pkg/providers/apisix/apisix_tls.go Outdated Show resolved Hide resolved
pkg/providers/apisix/apisix_plugin_config.go Outdated Show resolved Hide resolved
pkg/providers/apisix/apisix_consumer.go Outdated Show resolved Hide resolved
pkg/config/config_test.go Outdated Show resolved Hide resolved
Copy link
Member

@tao12345666333 tao12345666333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main implementation in it, I think is enough for this PR. So once the other comments are all addressed I will give my LGTM

@Gallardot
Copy link
Member

In a high availability architecture, a simple hang is meaningless for non-leader nodes, they still need to watch k8s resources continue to work, and are not allowed to hold write permissions until they become elected (write k8s resources, write apisix)

Great design, providing a clearer control plane and data plane architecture. However, I have a few questions:

  1. Does the current architecture support high availability ?
  2. Can a non-leader APISIX ingress controller be used as a viable ETCD server? How does APISIX avoid connecting to an unavailable ETCD server?
  3. Some extreme edge cases may not be handled, such as when an APISIX ingress controller restarts before K8S CR resources such as APISIXRoute are synchronized to the corresponding ETCD server. Will the ETCD server come online prematurely and provide service? APISIX may obtain empty router data at this time, leading to failure.

@AlinsRan
Copy link
Contributor Author

AlinsRan commented Aug 30, 2023

In a high availability architecture, a simple hang is meaningless for non-leader nodes, they still need to watch k8s resources continue to work, and are not allowed to hold write permissions until they become elected (write k8s resources, write apisix)

Great design, providing a clearer control plane and data plane architecture. However, I have a few questions:

  1. Does the current architecture support high availability ?
  2. Can a non-leader APISIX ingress controller be used as a viable ETCD server? How does APISIX avoid connecting to an unavailable ETCD server?
  3. Some extreme edge cases may not be handled, such as when an APISIX ingress controller restarts before K8S CR resources such as APISIXRoute are synchronized to the corresponding ETCD server. Will the ETCD server come online prematurely and provide service? APISIX may obtain empty router data at this time, leading to failure.
  1. Supported and implemented.
  2. The non-leader node can still work, but it does not have write permission, but APISIX can still read data from it
  3. In the current stage, DP and CP will run in one Pod, and if one of them is abnormal, the Pod will be restarted, and this problem does not exist. This problem only exists in the architecture where DP and CP are separated, which depends on the implementation of etcd-adapter, which can actually be solved, just ensure that etcd revision is always incremented, so when the CP plane is available, APISIX will still watch data from the CP.

@AlinsRan
Copy link
Contributor Author

Minor typo. Does this feature means the adpater is using btree in-mem?

Yes.

@tao12345666333 tao12345666333 modified the milestones: 1.8.0, v1.7.0 Aug 30, 2023
@tao12345666333
Copy link
Member

Due to the fact that this feature has already been mostly completed and v1.7 has not yet been released, I think it's possible to move this feature to version v1.7.

@tao12345666333
Copy link
Member

Wait for CI.
Then we can merge this one.

@tao12345666333 tao12345666333 merged commit 0bbdc4f into apache:master Aug 30, 2023
87 checks passed
@tao12345666333 tao12345666333 linked an issue Aug 31, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

proposal: New architecture of Apache APISIX Ingress controller
6 participants