Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update addons versions | Examples to use cluster version 1.22 | Fix self-managed-node example #634

Merged
merged 3 commits into from
Jun 17, 2022

Conversation

Zvikan
Copy link
Contributor

@Zvikan Zvikan commented Jun 14, 2022

What does this PR do?

  • Update all addons to latest released version
  • Update all examples cluster_version to 1.22
  • Add helm provider to self-manged node example

Motivation

  • This industry moves fast. Addons contributors fixing bugs, adding features, resolving security concerns and more! we need to keep things updated.
  • Support EKS version 1.22 for EKS Blueprints
    • NOTE: this PR does not cover migration from 1.21 to 1.22, this is focused on testing and validating creation of clusters versions 1.22 and deploying addons using the latest version across all current examples ( see below "For Moderators" for link).
  • self-managed node group is failing today due to helm provider config missing, preventing from issues such as:
Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

More

  • Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
    • Sanity testing by leveraging E2E action workflow, creating examples and leaving clusters up, then manually checking each individual cluster events, nodes and overall pods restarts.
  • Yes, I have added a new example under examples to support my PR
  • Yes, I have created another PR for add-ons under add-ons repo (if applicable)
  • Yes, I have updated the docs for this feature
  • Yes, I ran pre-commit run -a with this PR

Note: Not all the PRs required examples and docs except a new pattern or add-on added.

For Moderators

Additional Notes

  • This PR focused around cluster creation using version 1.22 with latest addons versions, this does not cover upgrade from 1.21 to 1.22.

@Zvikan Zvikan temporarily deployed to EKS Blueprints Test June 14, 2022 16:49 Inactive
@Zvikan Zvikan temporarily deployed to EKS Blueprints Test June 16, 2022 23:19 Inactive
@Zvikan Zvikan temporarily deployed to EKS Blueprints Test June 17, 2022 04:00 Inactive
@Zvikan
Copy link
Contributor Author

Zvikan commented Jun 17, 2022

Additional changes pushed to the branch:

  • Set cluster version to 1.22
  • Pushed fix for self-managed node example, helm provider is missing and without it example is failing
Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

What I noticed so far by testings our e2e parallel with the above changes:

  1. We are currently using single terraform apply command , this leads to timeout failures due to addons hitting timeout, this mainly caused because worker nodes and addons are created in parallel by terraform, but it can take around 10-15 minutes for worker nodes to be created and join the cluster, and until then addons will be hanging with "Creating" state.
    Ideally, addons should be start be in status "Creating" only after worker nodes (AKA blueprints module) completed successfully, bringing back -target to the E2E may be the right idea.
  2. Self-managed addon failing as called out above, this is probably true also to current main branch and this PR also fixing this for now.
  3. Karpenter example is failing due to destroy issue with NTH eventbridge rules, where rules cant be created because there are associated targets...diving deep into this issue I figured out that the targets are not related to the karpenter example but other example, which lead me to see that we're currently creating the same event rules names, so this means if I have 2 clusters that have enabled NTH, the event bridge rules will get mixed and TF won't know about it (there's no failures like "This event rule name already exists" it seems).
    A suggestion would be to give the event rule a unique name, e.g. keeping same names with postfix of the cluster name.
    Changes has not been done to this yet.

link to E2E workflows: https://github.com/aws-ia/terraform-aws-eks-blueprints/actions/runs/2512350932 (note failures re-ran).

So far I have no identified any "sanity" failures (i.e. terraform apply and destroy works) related to 1.22 or addons with latest version.

@Zvikan Zvikan changed the title WIP - Update addons to latest version Update addons versions | Examples to use cluster version 1.22 | Fix self-managed-node example Jun 17, 2022
@Zvikan Zvikan marked this pull request as ready for review June 17, 2022 16:24
Copy link
Contributor

@bryantbiggs bryantbiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thank you!

Copy link
Contributor

@kcoleman731 kcoleman731 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thank you!

@jybaek
Copy link

jybaek commented Nov 6, 2023

@Zvikan This code has inspired me a lot, thanks. If I make it work with GitHub Actions, I need to install the AWS CLI first, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants