Skip to content

Conversation

@erikfuller
Copy link
Contributor

@erikfuller erikfuller commented Mar 11, 2024

What type of PR is this?
feature

Which issue does this PR fix:
#596

What does this PR do / Why do we need it:

  1. Updates Helm install to include pod readiness gate webhook.
  2. For manual installs, defers cert generation to install time and does NOT enable webhook by default
  3. New scripts in scripts/ to provision webhook TLS certificate and enable the webhook
  4. Include webhook-e2e-test in github workflow
  5. Doc updates

Since the controller cannot start without a valid TLS secret in place AND we cannot provision a cert at build time without confusing github with "secret" material, we now disable the webhook by default for manual installs using deploy.yaml. Instead, I have created new scripts (gen-webhook-secret.sh and patch-deploy-yaml.sh) to help provision the certificate and configure the webhook. The webhook is still enabled by default in the Helm chart.

Otherwise, the github workflow looks like it's currently broken, so there is a small risk I may be breaking it more.

Testing done on this change:
Installed locally built Helm chart. Tested with and without setting new environment variable setting.

helm package helm
tar xvf aws-gateway-controller-chart-v1.0.3.tgz

helm install test-release ./aws-gateway-controller-chart --set=serviceAccount.create=true --namespace aws-application-networking-system --set=log.level=debug --set=awsRegion=$AWS_REGION --set=clusterVpcId=$CLUSTER_VPC_ID --set=awsAccountId=$AWS_ACCOUNT_ID --set clusterName=$CLUSTER_NAME --set=defaultServiceNetwork=test-gateway
NAME: test-release
LAST DEPLOYED: Mon Mar 11 19:30:22 2024
NAMESPACE: aws-application-networking-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
aws-gateway-controller-chart has been installed.
This chart deploys "<redacted>.dkr.ecr.us-west-2.amazonaws.com/controller:".

Check its status by running:
  kubectl --namespace aws-application-networking-system get pods -l "app.kubernetes.io/instance=test-release"

The controller is running in "cluster" mode.

Ran webhook e2e-tests

make webhook-e2e-test                                                                       
=== RUN   TestIntegration
Running Suite: WebhookIntegration - /.../aws-application-networking-k8s/test/suites/webhook
===============================================================================================================
Random Seed: 1710185534

Will run 2 of 2 specs
------------------------------
[SynchronizedBeforeSuite] 
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:23
[SynchronizedBeforeSuite] PASSED [0.000 seconds]
------------------------------
Readiness Gate Inject create deployment in untagged namespace, no readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:42
{"level":"info","ts":"2024-03-11T12:32:15.506-0700","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-03-11T12:32:15.526-0700","caller":"test/framework.go:186","msg":"Creating objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-03-11T12:32:15.621-0700","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:untagged-test-pod]"}
{"level":"info","ts":"2024-03-11T12:32:25.694-0700","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:untagged-test-pod]"}
• [11.019 seconds]
------------------------------
Readiness Gate Inject create deployment in tagged namespace, has readiness gate
/.../aws-application-networking-k8s/test/suites/webhook/readiness_gate_inject_test.go:62
{"level":"info","ts":"2024-03-11T12:32:25.772-0700","caller":"test/pod_manager.go:68","msg":"deployment.Spec.Selector.MatchLabels: map[app:tagged-test-pod]"}
{"level":"info","ts":"2024-03-11T12:32:25.798-0700","caller":"test/framework.go:264","msg":"Deleting objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
{"level":"info","ts":"2024-03-11T12:32:25.825-0700","caller":"test/framework.go:282","msg":"Waiting for NotFound, objects: *v1.Namespace/webhook-e2e-test-no-tag, *v1.Namespace/webhook-e2e-test-tagged"}
• [11.325 seconds]
------------------------------
[SynchronizedAfterSuite] 
/.../aws-application-networking-k8s/test/suites/webhook/suite_test.go:39
[SynchronizedAfterSuite] PASSED [0.000 seconds]
------------------------------

Ran 2 of 2 Specs in 22.345 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (22.35s)
PASS
ok  	github.com/aws/aws-application-networking-k8s/test/suites/webhook	23.032s

Automation added to e2e:
n/a

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
By design, the webhook does not install to the controller namespace

kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io aws-appnet-gwc-mutating-webhook         
NAME                              WEBHOOKS   AGE
aws-appnet-gwc-mutating-webhook   1          8m18s

If a namespace is tagged with application-networking.k8s.aws/pod-readiness-gate-inject=enabled and the webhook exists BUT the controller has been downgraded to a pre-webhook version, you will not be able to successfully create new pods. To work around this issue, you would need to remove the namespace tag or delete the webhook, then also re-launch any pods or manually set the readiness gate value.

Does this PR introduce any user-facing change?:
Yes, but it is covered in the main PR #606

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@erikfuller erikfuller marked this pull request as ready for review March 11, 2024 19:50
@erikfuller erikfuller marked this pull request as draft March 12, 2024 18:26
@erikfuller
Copy link
Contributor Author

Have some additional work coming - converting back to draft while I finalize.

Comment on lines -158 to -159
webhook-e2e-test-namespace := "webhook-e2e-test"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The webhook tests don't actually use this namespace. Instead they create new ones from within the tests.

@@ -0,0 +1,62 @@
{{ $tls := fromYaml ( include "aws-gateway-controller.webhookTLS" . ) }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned through this process that the include here is generated at include time, not once for the overall helm install. So, for example, if you have the same include in multiple files you will get different results when generating certificate data and the values will not match across files.

-p="[{'op': 'replace', 'path': '/webhooks/0/clientConfig/caBundle', 'value': '${CERT_B64}'}]"

rm $TEMP_KEY $TEMP_CERT
echo "Done" No newline at end of file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example output:

scripts/gen-webhook-secret.sh 
Generating certificate for webhook
..+++++++++++++++++++++++++++++++++++++++*.+...............+.+......+...+..+.+............+++++++++++++++++++++++++++++++++++++++*...+......+................+........+......+.+.....+.+...........................+.....+......+....+...+..............+...+......+.+......+.........+.....+.+..................+..+...+....+......+......+...+.....+...+....+...+......+....................+.+.....+......+.......+..++++++
...........+.+.....+.........+......+.......+...............+..+......+....+........+++++++++++++++++++++++++++++++++++++++*....+...+.+......+..............+.+...........+.......+.....+.......+.....+...+.+..+...+.+.....+.+.....+....+......+...+.....+++++++++++++++++++++++++++++++++++++++*...+.+.....+.........+.+......+...+......................................+...+.+......+...............+.....+.........+....+..+.............+..+...+.+.....+.+...+....................+......+...+......+.+.....+......+.......+...+..+...+..................+.+...+.....+.+.........+..+...+.............+..+...............+.............+...+...........+...+..........++++++
-----
Recreating webhook secret
secret "webhook-cert" deleted
secret/webhook-cert created
Patching webhook with new cert
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-appnet-gwc-mutating-webhook patched
Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double confirm, does gen-webhook-secret.sh work for both deploy.yaml deployed and helm installed controller?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double confirm, doing a kubectl delete secret webhook-cert for a running pod with

      volumes:
        - name: webhook-cert
          secret:
            defaultMode: 420
            secretName: webhook-cert

is all fine, right?

Do we need to print log here, say, the user should reboot the controller to get the new webhook-cert content and set WEBHOOK_ENABLED==true to make webhook work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double confirm, does gen-webhook-secret.sh work for both deploy.yaml deployed and helm installed controller?

Correct.

kubectl delete secret webhook-cert for a running pod

I have recreated the secret before for a running pod. It does work. I haven't tested what happens if you only delete but don't recreate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the prompt for WEBHOOK_ENABLED==true, I think it's easy enough to find in the docs.

yq -i -e '(.[] as $item | select(.metadata.name == "gateway-api-controller" and .kind == "Deployment") | .spec.template.spec.containers[] | select(.name == "manager") | .env[] | select(.name == "WEBHOOK_ENABLED") | .value) = "true"' $DEPLOY_YAML 2>&1

rm $TEMP_KEY $TEMP_CERT
echo "Done" No newline at end of file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example output

Key not specified. Will generate...
Generating certificate for webhook
.+....+........+.+.....................+.....+..........+.....+.+........+++++++++++++++++++++++++++++++++++++++*........+....+...+++++++++++++++++++++++++++++++++++++++*..+......+...........................+...+.....+......+.+..............+......+.............+...........+....+.........+............+.....+.+...++++++
....+++++++++++++++++++++++++++++++++++++++*........+...+...+......+.........+.....+.+........+...+++++++++++++++++++++++++++++++++++++++*........+....+...+..+....+.........+...+..+...+............+...+....+...+...+..+......++++++
-----
Patching webhook secret
Patching webhook
Enabling webhook
Done

Example updates

...
apiVersion: v1
data:
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0...VElGSUNBVEUtLS0tLQo=
  tls.key: LS0tLS1CRUdJTiBQUklWQVRFI...ZsOEY0RVhjd084YVh1Zz09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K
  ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0F...FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
kind: Secret
metadata:
  name: webhook-cert
  namespace: aws-application-networking-system
type: kubernetes.io/tls
...
  - admissionReviewVersions:
      - v1
    clientConfig:
      service:
        name: webhook-service
        namespace: aws-application-networking-system
        path: /mutate-pod
      caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tL...kzaz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
...
          env:
            - name: WEBHOOK_ENABLED
              value: "true"
...

@erikfuller erikfuller marked this pull request as ready for review March 12, 2024 19:13
caCert: {{ .Values.webhookTLS.caCert }}
cert: {{ .Values.webhookTLS.cert }}
key: {{ .Values.webhookTLS.key }}
{{- else -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something we can consider adding? For now if folks use the auto-installed cert the new one will be just as good. If they are using their own cert the same values for ca/cert/key should work across installs.

Copy link
Contributor

@zijun726911 zijun726911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! overall LGTM

- name: Run test
run: |
make e2e-test
make webhook-e2e-test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test github worker node can run the make webhook-e2e-test success? or hard to test it until this code merge to mainline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't. I think it does a Helm install before this so it should "just work".

-p="[{'op': 'replace', 'path': '/webhooks/0/clientConfig/caBundle', 'value': '${CERT_B64}'}]"

rm $TEMP_KEY $TEMP_CERT
echo "Done" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double confirm, does gen-webhook-secret.sh work for both deploy.yaml deployed and helm installed controller?

-addext "subjectAltName = DNS:${HOST}, DNS:${HOST}.cluster.local"

export KEY_B64=`cat $TEMP_KEY | base64`
export CERT_B64=`cat $TEMP_CERT | base64`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems don't need export here, and don't need the KEY_B64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Fixed.

-p="[{'op': 'replace', 'path': '/webhooks/0/clientConfig/caBundle', 'value': '${CERT_B64}'}]"

rm $TEMP_KEY $TEMP_CERT
echo "Done" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double confirm, doing a kubectl delete secret webhook-cert for a running pod with

      volumes:
        - name: webhook-cert
          secret:
            defaultMode: 420
            secretName: webhook-cert

is all fine, right?

Do we need to print log here, say, the user should reboot the controller to get the new webhook-cert content and set WEBHOOK_ENABLED==true to make webhook work?

@erikfuller erikfuller merged commit 553422e into aws:main Mar 14, 2024
@erikfuller erikfuller deleted the webhook-helm branch April 11, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants