diff --git a/CODEOWNERS b/.github/CODEOWNERS similarity index 100% rename from CODEOWNERS rename to .github/CODEOWNERS diff --git a/.github/workflows/publish-docs.yml b/.github/workflows/publish-docs.yml index 910d1d7ff3..9a806ec576 100644 --- a/.github/workflows/publish-docs.yml +++ b/.github/workflows/publish-docs.yml @@ -4,9 +4,10 @@ on: branches: - main paths: - - "docs/**" - - "mkdocs.yml" - + - 'docs/**' + - mkdocs.yml + - README.md + - '.github/workflows/publish-docs.yml' release: types: - published @@ -32,7 +33,10 @@ jobs: - name: Install dependencies run: | python -m pip install --upgrade pip - pip install mike==1.1.2 mkdocs-material==8.3.2 mkdocs-awesome-pages-plugin==2.7.0 mkdocs-include-markdown-plugin==3.5.2 + pip install mike==1.1.2 \ + mkdocs-material==9.1.4 \ + mkdocs-include-markdown-plugin==4.0.4 \ + mkdocs-awesome-pages-plugin==2.9.1 - name: git config run: | diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 1adb378661..0759776129 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -10,7 +10,7 @@ repos: - id: detect-aws-credentials args: ['--allow-missing-credentials'] - repo: https://github.com/antonbabenko/pre-commit-terraform - rev: v1.78.0 + rev: v1.80.0 hooks: - id: terraform_fmt - id: terraform_docs @@ -33,8 +33,8 @@ repos: - '--args=--only=terraform_workspace_remote' - id: terraform_validate exclude: deploy - - id: terraform_tfsec - files: ^examples/ # only scan `examples/*` which are the implementation - args: - - --args=--config-file=__GIT_WORKING_DIR__/tfsec.yaml - - --args=--concise-output + # - id: terraform_tfsec + # files: ^examples/ # only scan `examples/*` which are the implementation + # args: + # - --args=--config-file=__GIT_WORKING_DIR__/tfsec.yaml + # - --args=--concise-output diff --git a/README.md b/README.md index 64de5dd5ce..92190b58b1 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,44 @@ # Amazon EKS Blueprints for Terraform -[![plan-examples](https://github.com/aws-ia/terraform-aws-eks-blueprints/actions/workflows/plan-examples.yml/badge.svg)](https://github.com/aws-ia/terraform-aws-eks-blueprints/actions/workflows/plan-examples.yml) -[![pre-commit](https://github.com/aws-ia/terraform-aws-eks-blueprints/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/aws-ia/terraform-aws-eks-blueprints/actions/workflows/pre-commit.yml) +Welcome to Amazon EKS Blueprints for Terraform! ---- +This project contains a collection of Amazon EKS cluster patterns implemented in Terraform that demonstrate how fast and easy it is for customers to adopt [Amazon EKS](https://aws.amazon.com/eks/). The patterns can be used by AWS customers, partners, and internal AWS teams to configure and manage complete EKS clusters that are fully bootstrapped with the operational software that is needed to deploy and operate workloads. -## :bangbang: Notice of Potential Breaking Changes in Version 5 :bangbang: +## Motivation -The direction for EKS Blueprints in v5 will shift from providing an all-encompassing, monolithic "framework" and instead focus more on how users can organize a set of modular components to create the desired solution on Amazon EKS. +Kubernetes is a powerful and extensible container orchestration technology that allows you to deploy and manage containerized applications at scale. The extensible nature of Kubernetes also allows you to use a wide range of popular open-source tools, commonly referred to as add-ons, in Kubernetes clusters. With such a large number of tooling and design choices available however, building a tailored EKS cluster that meets your application’s specific needs can take a significant amount of time. It involves integrating a wide range of open-source tools and AWS services and requires deep expertise in AWS and Kubernetes. -The issue below was created to provide community notice and to help track progress, learn what's new and how the migration path would look like to upgrade your current Terraform deployments. +AWS customers have asked for examples that demonstrate how to integrate the landscape of Kubernetes tools and make it easy for them to provision complete, opinionated EKS clusters that meet specific application requirements. Customers can use EKS Blueprints to configure and deploy purpose built EKS clusters, and start onboarding workloads in days, rather than months. -We welcome the EKS Blueprints community to continue the discussion in issue https://github.com/aws-ia/terraform-aws-eks-blueprints/issues/1421 +## Core Concepts ---- +This document provides a high level overview of the Core Concepts that are embedded in EKS Blueprints. For the purposes of this document, we will assume the reader is familiar with Git, Docker, Kubernetes and AWS. -Welcome to Amazon EKS Blueprints for Terraform! +| Concept | Description | +| --------------------------- | --------------------------------------------------------------------------------------------- | +| [Cluster](#cluster) | An Amazon EKS Cluster and associated worker groups. | +| [Add-on](#add-on) | Operational software that provides key functionality to support your Kubernetes applications. | +| [Team](#team) | A logical grouping of IAM identities that have access to Kubernetes resources. | -This project contains a collection of Amazon EKS cluster patterns implemented in Terraform that demonstrate how fast and easy it is for customers to adopt [Amazon EKS](https://aws.amazon.com/eks/). The patterns can be used by AWS customers, partners, and internal AWS teams to configure and manage complete EKS clusters that are fully bootstrapped with the operational software that is needed to deploy and operate workloads. +### Cluster -## Getting Started +A `cluster` is simply an EKS cluster. EKS Blueprints provides for customizing the compute options you leverage with your `clusters`. The framework currently supports `EC2`, `Fargate` and `BottleRocket` instances. It also supports managed and self-managed node groups. -The easiest way to get started with EKS Blueprints is to follow our [Getting Started guide](https://aws-ia.github.io/terraform-aws-eks-blueprints/latest/getting-started/). +We rely on [`terraform-aws-modules/eks/aws`](https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest) to configure `clusters`. See our [examples](getting-started.md) to see how `terraform-aws-modules/eks/aws` is configured for EKS Blueprints. -## Examples +### Add-on -To view examples for how you can leverage EKS Blueprints, please see the [examples](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/examples) directory. +`Add-ons` allow you to configure the operational tools that you would like to deploy into your EKS cluster. When you configure `add-ons` for a `cluster`, the `add-ons` will be provisioned at deploy time by leveraging the Terraform Helm provider. Add-ons can deploy both Kubernetes specific resources and AWS resources needed to support add-on functionality. -## Motivation +For example, the `metrics-server` add-on only deploys the Kubernetes manifests that are needed to run the Kubernetes Metrics Server. By contrast, the `aws-load-balancer-controller` add-on deploys both Kubernetes YAML, in addition to creating resources via AWS APIs that are needed to support the AWS Load Balancer Controller functionality. -Kubernetes is a powerful and extensible container orchestration technology that allows you to deploy and manage containerized applications at scale. The extensible nature of Kubernetes also allows you to use a wide range of popular open-source tools, commonly referred to as add-ons, in Kubernetes clusters. With such a large number of tooling and design choices available however, building a tailored EKS cluster that meets your application’s specific needs can take a significant amount of time. It involves integrating a wide range of open-source tools and AWS services and requires deep expertise in AWS and Kubernetes. +EKS Blueprints allows you to manage your add-ons directly via Terraform (by leveraging the Terraform Helm provider) or via GitOps with ArgoCD. See our [`Add-ons`](https://aws-ia.github.io/terraform-aws-eks-blueprints-addons/main/) documentation page for detailed information. -AWS customers have asked for examples that demonstrate how to integrate the landscape of Kubernetes tools and make it easy for them to provision complete, opinionated EKS clusters that meet specific application requirements. Customers can use EKS Blueprints to configure and deploy purpose built EKS clusters, and start onboarding workloads in days, rather than months. +### Team + +`Teams` allow you to configure the logical grouping of users that have access to your EKS clusters, in addition to the access permissions they are granted. + +See our [`Teams`](https://github.com/aws-ia/terraform-aws-eks-blueprints-teams) documentation page for detailed information. ## Support & Feedback diff --git a/aws-auth-configmap.tf b/aws-auth-configmap.tf deleted file mode 100644 index bd924e676b..0000000000 --- a/aws-auth-configmap.tf +++ /dev/null @@ -1,34 +0,0 @@ -resource "kubernetes_config_map" "aws_auth" { - count = var.create_eks ? 1 : 0 - - metadata { - name = "aws-auth" - namespace = "kube-system" - labels = merge( - { - "app.kubernetes.io/managed-by" = "terraform-aws-eks-blueprints" - "app.kubernetes.io/created-by" = "terraform-aws-eks-blueprints" - }, - var.aws_auth_additional_labels - ) - } - - data = { - mapRoles = yamlencode( - distinct(concat( - local.managed_node_group_aws_auth_config_map, - local.self_managed_node_group_aws_auth_config_map, - local.windows_node_group_aws_auth_config_map, - local.fargate_profiles_aws_auth_config_map, - local.emr_on_eks_config_map, - local.application_teams_config_map, - local.platform_teams_config_map, - var.map_roles, - )) - ) - mapUsers = yamlencode(var.map_users) - mapAccounts = yamlencode(var.map_accounts) - } - - depends_on = [module.aws_eks.cluster_id, data.http.eks_cluster_readiness[0]] -} diff --git a/data.tf b/data.tf deleted file mode 100644 index ef7a17bc74..0000000000 --- a/data.tf +++ /dev/null @@ -1,137 +0,0 @@ -data "aws_partition" "current" {} -data "aws_caller_identity" "current" {} -data "aws_region" "current" {} - -data "aws_eks_cluster" "cluster" { - count = var.create_eks ? 1 : 0 - name = module.aws_eks.cluster_id -} - -data "http" "eks_cluster_readiness" { - count = var.create_eks ? 1 : 0 - - url = join("/", [data.aws_eks_cluster.cluster[0].endpoint, "healthz"]) - ca_certificate = base64decode(data.aws_eks_cluster.cluster[0].certificate_authority[0].data) - timeout = var.eks_readiness_timeout -} - -data "aws_iam_session_context" "current" { - arn = data.aws_caller_identity.current.arn -} - -data "aws_iam_policy_document" "eks_key" { - statement { - sid = "Allow access for all principals in the account that are authorized" - effect = "Allow" - actions = [ - "kms:CreateGrant", - "kms:Decrypt", - "kms:DescribeKey", - "kms:Encrypt", - "kms:GenerateDataKey*", - "kms:ReEncrypt*", - ] - resources = ["*"] - - principals { - type = "AWS" - identifiers = [ - "arn:${local.context.aws_partition_id}:iam::${local.context.aws_caller_identity_account_id}:root" - ] - } - - condition { - test = "StringEquals" - variable = "kms:CallerAccount" - values = [local.context.aws_caller_identity_account_id] - } - - condition { - test = "StringEquals" - variable = "kms:ViaService" - values = ["eks.${local.context.aws_region_name}.amazonaws.com"] - } - } - - statement { - sid = "Allow direct access to key metadata to the account" - effect = "Allow" - actions = [ - "kms:Describe*", - "kms:Get*", - "kms:List*", - "kms:RevokeGrant", - ] - resources = ["*"] - - principals { - type = "AWS" - identifiers = [ - "arn:${local.context.aws_partition_id}:iam::${local.context.aws_caller_identity_account_id}:root" - ] - } - } - - statement { - sid = "Allow access for Key Administrators" - effect = "Allow" - actions = [ - "kms:*" - ] - resources = ["*"] - - principals { - type = "AWS" - identifiers = concat( - var.cluster_kms_key_additional_admin_arns, - [data.aws_iam_session_context.current.issuer_arn] - ) - } - } - - statement { - sid = "Allow use of the key" - effect = "Allow" - actions = [ - "kms:Decrypt", - "kms:DescribeKey", - "kms:Encrypt", - "kms:GenerateDataKey*", - "kms:ReEncrypt*", - ] - resources = ["*"] - - principals { - type = "AWS" - identifiers = [ - local.cluster_iam_role_pathed_arn - ] - } - } - - # Permission to allow AWS services that are integrated with AWS KMS to use the CMK, - # particularly services that use grants. - statement { - sid = "Allow attachment of persistent resources" - effect = "Allow" - actions = [ - "kms:CreateGrant", - "kms:ListGrants", - "kms:RevokeGrant", - ] - resources = ["*"] - - principals { - type = "AWS" - identifiers = [ - local.cluster_iam_role_pathed_arn - ] - } - - condition { - test = "Bool" - variable = "kms:GrantIsForAWSResource" - values = ["true"] - } - } -} diff --git a/docs/.pages b/docs/.pages index 882bceafbd..2ff82ce715 100644 --- a/docs/.pages +++ b/docs/.pages @@ -1,11 +1,5 @@ nav: - - Overview: index.md - - Getting Started: getting-started.md - - Core Concepts: core-concepts.md - - IAM: iam - - Teams: teams.md - - Modules: modules - - Add-ons: add-ons - - Advanced: advanced - - Extensibility: extensibility.md - - ... + - Overview: index.md + - Getting Started: getting-started.md + - Blueprints: blueprints + - IAM: iam diff --git a/docs/add-ons/.pages b/docs/add-ons/.pages deleted file mode 100644 index 3f7ce6c21a..0000000000 --- a/docs/add-ons/.pages +++ /dev/null @@ -1,3 +0,0 @@ -nav: - - Overview: index.md - - ... diff --git a/docs/add-ons/agones.md b/docs/add-ons/agones.md deleted file mode 100644 index a36dc002d4..0000000000 --- a/docs/add-ons/agones.md +++ /dev/null @@ -1,45 +0,0 @@ -# Agones - -[Agones](https://agones.dev/) is an open source platform for deploying, hosting, scaling, and orchestrating dedicated game servers for large scale multiplayer games on Kubernetes. - -For complete project documentation, please visit the [Agones documentation site](https://agones.dev/site/docs/). - -## Usage - -Agones can be deployed by enabling the add-on via the following. - -```hcl -enable_agones = true -``` - -You can optionally customize the Helm chart that deploys `Agones` via the following configuration. - -*NOTE: Agones requires a Node group in Public Subnets and enable Public IP* - -```hcl - enable_agones = true - # Optional agones_helm_config - agones_helm_config = { - name = "agones" - chart = "agones" - repository = "https://agones.dev/chart/stable" - version = "1.21.0" - namespace = "agones-system" # Agones recommends to install in it's own namespace such as `agones-system` as shown here. You can specify any namespace other than `kube-system` - values = [templatefile("${path.module}/helm_values/agones-values.yaml", { - expose_udp = true - gameserver_namespaces = "{${join(",", ["default", "xbox-gameservers", "xbox-gameservers"])}}" - gameserver_minport = 7000 - gameserver_maxport = 8000 - })] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` -agones = { - enable = true -} -``` diff --git a/docs/add-ons/apache-airflow.md b/docs/add-ons/apache-airflow.md deleted file mode 100644 index b6197b470f..0000000000 --- a/docs/add-ons/apache-airflow.md +++ /dev/null @@ -1,116 +0,0 @@ -# Apache Airflow add-on - -This document describes the details of the best practices for building and deploying Self-managed **Highly Scalable Apache Airflow cluster on Kubernetes(Amazon EKS) Cluster**. -Alternatively, Amazon also provides a fully managed Apache Airflow service(MWAA). - -Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. -Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing complex data pipelines from diverse sources. -A workflow is represented as a [DAG](https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html) (a Directed Acyclic Graph), and contains individual pieces of work called [Tasks](https://airflow.apache.org/docs/apache-airflow/stable/concepts/tasks.html), arranged with dependencies and data flows taken into account. - -## Production considerations for running Apache Airflow on EKS - -### Airflow Metadata Database -It is advised to set up an external database for the Airflow metastore. The default Helm chart deploys a Postgres database running in a container but this should be used only for development. -Apache Airflow recommends to use MySQL or Postgres. This deployment configures the highly available Amazon RDS Postgres database as external database. - -### PgBouncer for Amazon Postgres RDS -Airflow can open a lot of database connections due to its distributed nature and using a connection pooler can significantly reduce the number of open connections on the database. -This deployment enables the PgBouncer for Postgres - -### Webserver Secret Key -You should set a static webserver secret key when deploying with this chart as it will help ensure your Airflow components only restart when necessary. -This deployment creates Kubernetes secret for Webserver Secret Key and applies to Airflow - -### Managing DAG Files with GitHub and EFS -It's recommended to Mounting DAGs using Git-Sync sidecar with Persistence enabled. -Developers can create a repo to store the DAGs and configure to sync with Airflow servers. -This deployment provisions EFS(Amazon Elastic File System) through Persistent Volume Claim with an access mode of ReadWriteMany. -The Airflow scheduler pod will sync DAGs from a git repository onto the PVC every configured number of seconds. -The other pods will read the synced DAGs. - -GitSync is configured with a sample repo with this example. This can be replaced with your internal GitHub repo - -### Managing Log Files with S3 with IRSA -Airflow writes logs for tasks in a way that allows you to see the logs for each task separately in the Airflow UI. -Core Airflow implements writing and serving logs locally. However, you can also write logs to remote services via community providers, or write your own loggers. -This example configures S3 bucket to store the Airflow logs. IAM roles for server account(IRSA) is configured for Airflow pods to access this S3 bucket. - -### Airflow StatsD Metrics -This example configures to send the metrics to an existing StatsD to Prometheus endpoint. This can be configured to send it to external StatsD instance - -### Airflow Executors (Celery Vs Kubernetes) -This deployment uses Kubernetes Executor. With KubernetesExecutor, each task runs in its own pod. -The pod is created when the task is queued, and terminates when the task completes. -With KubernetesExecutor, the workers (pods) talk directly to the same Postgres backend as the Scheduler and can to a large degree take on the labor of task monitoring. - -* KubernetesExecutor can work well when your tasks are not very uniform with respect to resource requirements or images. -* Each task on the Kubernetes executor gets its own pod, which allows you to pass an executor_config in your task params. This lets you assign resources at the task level by passing an executor_config. e.g, the first task may be a sensor that only requires a few resources, but the downstream tasks have to run on your GPU node pool with a higher CPU request. See the code snippet below -* Since each task is a pod, it is managed independently of the code deploys. This is great for longer running tasks or environments with a lot of users, as users can push new code without fear of interrupting that task. -* This makes the *k8s executor the most fault-tolerant* option, as running tasks won’t be affected when code is pushed -* In contrast to CeleryExecutor, KubernetesExecutor does not require additional components such as Redis, but does require access to Kubernetes cluster. -* Pod monitoring can be done with native Kubernetes tools -* A Kubernetes watcher is a thread that can subscribe to every change that occurs in Kubernetes’ database. It is alerted when pods start, run, end, and fail. By monitoring this stream, the KubernetesExecutor can discover that the worker crashed and correctly report the task as failed - -### Airflow Schedulers -The Airflow scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete. -Ths deployment uses *HA scheduler* with two replicas to take advantage of the existing metadata database. - -### Accessing Airflow Web UI -This deployment example uses internet facing Load Balancer to easily access the WebUI however it's not recommended for Production. -You can modify the `values.yaml` to set the Load Balancer to `internal` and upload certificate to use HTTPS. -Ensure access to the WebUI using internal domain and network. - -## Usage - -The Apache Airflow can be deployed by enabling the add-on via the following. - -```hcl - enable_airflow = true -``` - -For production workloads, you can use the following custom Helm Config. - -```hcl - enable_airflow = true - airflow_helm_config = { - name = "airflow" - chart = "airflow" - repository = "https://airflow.apache.org" - version = "1.6.0" - namespace = module.airflow_irsa.namespace - create_namespace = false - timeout = 360 - description = "Apache Airflow v2 Helm chart deployment configuration" - # Check the example for `values.yaml` file - values = [templatefile("${path.module}/values.yaml", { - # Airflow Postgres RDS Config - airflow_db_user = "airflow" - airflow_db_name = module.db.db_instance_name - airflow_db_host = element(split(":", module.db.db_instance_endpoint), 0) - # S3 bucket config for Logs - s3_bucket_name = aws_s3_bucket.this.id - webserver_secret_name = local.airflow_webserver_secret_name - airflow_service_account = local.airflow_service_account - })] - - set_sensitive = [ - { - name = "data.metadataConnection.pass" - value = data.aws_secretsmanager_secret_version.postgres.secret_string - } - ] - } -``` - -Once deployed, you will be able to see the deployment status - -```shell -kubectl get deployment -n airflow - -NAME READY UP-TO-DATE AVAILABLE AGE -airflow-pgbouncer 1/1 1 1 77m -airflow-scheduler 2/2 2 2 77m -airflow-statsd 1/1 1 1 77m -airflow-triggerer 1/1 1 1 77m -airflow-webserver 2/2 2 2 77m -``` diff --git a/docs/add-ons/argo-workflows.md b/docs/add-ons/argo-workflows.md deleted file mode 100644 index e6a524d352..0000000000 --- a/docs/add-ons/argo-workflows.md +++ /dev/null @@ -1,26 +0,0 @@ -# Argo Workflows - -[Argo Workflows](https://argoproj.github.io/argo-workflows/) is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. It is implemented as a Kubernetes CRD (Custom Resource Definition). As a result, Argo workflows can be managed using kubectl and natively integrates with other Kubernetes services such as volumes, secrets, and RBAC. - -For complete project documentation, please visit the [Argo Workflows documentation site](https://argoproj.github.io/argo-workflows/). - -## Usage - -Argo Workflows can be deployed by enabling the add-on via the following. - -```hcl -enable_argo_workflows = true -``` - - -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` -argoWorkflows = { - enable = true -} -``` diff --git a/docs/add-ons/argocd.md b/docs/add-ons/argocd.md deleted file mode 100644 index 91f4c24a85..0000000000 --- a/docs/add-ons/argocd.md +++ /dev/null @@ -1,155 +0,0 @@ -# ArgoCD - -[ArgoCD](https://argo-cd.readthedocs.io/en/stable/) Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. - -Application definitions, configurations, and environments should be declarative and version controlled. Application deployment and lifecycle management should be automated, auditable, and easy to understand. - -## Usage - -ArgoCD can be deployed by enabling the add-on via the following. - -```hcl -enable_argocd = true -``` - -### Admin Password - -ArgoCD has a built-in `admin` user that has full access to the ArgoCD instance. By default, Argo will create a password for the admin user. - -See the [ArgoCD documentation](https://argo-cd.readthedocs.io/en/stable/operator-manual/user-management/) for additional details on managing users. - -### Customizing the Helm Chart - -You can customize the Helm chart that deploys `ArgoCD` via the following configuration: - -```hcl -argocd_helm_config = { - name = "argo-cd" - chart = "argo-cd" - repository = "https://argoproj.github.io/argo-helm" - version = "" - namespace = "argocd" - timeout = "1200" - create_namespace = true - values = [templatefile("${path.module}/argocd-values.yaml", {})] -} -``` - -### Bootstrapping - -The framework provides an approach to bootstrapping workloads and/or additional add-ons by leveraging the ArgoCD [App of Apps](https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/) pattern. - -The following code example demonstrates how you can supply information for a repository in order to bootstrap multiple workloads in a new EKS cluster. The example leverages a [sample App of Apps repository](https://github.com/aws-samples/eks-blueprints-workloads.git). - -```hcl -argocd_applications = { - addons = { - path = "chart" - repo_url = "https://github.com/aws-samples/eks-blueprints-add-ons.git" - add_on_application = true # Indicates the root add-on application. - } -} -``` - -### Add-ons - -A common operational pattern for EKS customers is to leverage Infrastructure as Code to provision EKS clusters (in addition to other AWS resources), and ArgoCD to manage cluster add-ons. This can present a challenge when add-ons managed by ArgoCD depend on AWS resource values which are created via Terraform execution (such as an IAM ARN for an add-on that leverages IRSA), to function properly. The framework provides an approach to bridging the gap between Terraform and ArgoCD by leveraging the ArgoCD [App of Apps](https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/) pattern. - -To indicate that ArgoCD should responsible for managing cluster add-ons (applying add-on Helm charts to a cluster), you can set the `argocd_manage_add_ons` property to true. When this flag is set, the framework will still provision all AWS resources necessary to support add-on functionality, but it will not apply Helm charts directly via the Terraform Helm provider. - -Next, identify which ArgoCD Application will serve as the add-on configuration repository by setting the `add_on_application` flag to true. When this flag is set, the framework will aggregate AWS resource values that are needed for each add-on into an object. It will then pass that object to ArgoCD via the values map of the Application resource. [See here](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/locals.tf#L4) for the values object that gets passed to the ArgoCD add-ons Application. - -Sample configuration can be found below: - -```hcl -enable_argocd = true -argocd_manage_add_ons = true -argocd_applications = { - addons = { - path = "chart" - repo_url = "https://github.com/aws-samples/eks-blueprints-add-ons.git" - add_on_application = true # Indicates the root add-on application. - } -} -``` - -### Private Repositories - -In order to leverage ArgoCD with private Git repositories, you must supply a private SSH key to Argo. The framework provides support for doing so via an integration with AWS Secrets Manager. - -To leverage private repositories, do the following: - -1. Create a new secret in AWS Secrets Manager for your desired region. The value for the secret should be a private SSH key for your Git provider. -2. Set the `ssh_key_secret_name` in each Application's configuration as the name of the secret. - -Internally, the framework will create a Kubernetes Secret, which ArgoCD will leverage when making requests to your Git provider. See the example configuration below. - -```hcl -enable_argocd = true -argocd_manage_add_ons = true -argocd_applications = { - addons = { - path = "chart" - repo_url = "git@github.com:aws-samples/eks-blueprints-add-ons.git" - project = "default" - add_on_application = true # Indicates the root add-on application. - ssh_key_secret_name = "github-ssh-key" # Needed for private repos - insecure = false # Set to true to disable the server's certificate verification - } -} -``` - -### Complete Example - -The following demonstrates a complete example for configuring ArgoCD. - -```hcl -enable_argocd = true -argocd_manage_add_ons = true - -argocd_helm_config = { - name = "argo-cd" - chart = "argo-cd" - repository = "https://argoproj.github.io/argo-helm" - version = "3.29.5" - namespace = "argocd" - timeout = "1200" - create_namespace = true - values = [templatefile("${path.module}/argocd-values.yaml", {})] -} - -argocd_applications = { - workloads = { - path = "envs/dev" - repo_url = "https://github.com/aws-samples/eks-blueprints-workloads.git" - values = {} - type = "helm" # Optional, defaults to helm. - } - kustomize-apps = { - /* - This points to a single application with no overlays, but it could easily - point to a a specific overlay for an environment like "dev", and/or utilize - the ArgoCD app of apps model to install many additional ArgoCD apps. - */ - path = "argocd-example-apps/kustomize-guestbook/" - repo_url = "https://github.com/argoproj/argocd-example-apps.git" - type = "kustomize" - } - addons = { - path = "chart" - repo_url = "git@github.com:aws-samples/eks-blueprints-add-ons.git" - add_on_application = true # Indicates the root add-on application. - # If provided, the type must be set to "helm" for the root add-on application. - ssh_key_secret_name = "github-ssh-key" # Needed for private repos - values = {} - type = "helm" # Optional, defaults to helm. - #ignoreDifferences = [ # Enable this to ignore children apps' sync policy - # { - # group = "argoproj.io" - # kind = "Application" - # jsonPointers = ["/spec/syncPolicy"] - # } - #] - } -} -``` diff --git a/docs/add-ons/aws-cloudwatch-metrics.md b/docs/add-ons/aws-cloudwatch-metrics.md deleted file mode 100644 index 82c7a51b0f..0000000000 --- a/docs/add-ons/aws-cloudwatch-metrics.md +++ /dev/null @@ -1,40 +0,0 @@ -# AWS CloudWatch Metrics - -Use CloudWatch Container Insights to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices. CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. You can also set CloudWatch alarms on metrics that Container Insights collects. - -Container Insights collects data as performance log events using embedded metric format. These performance log events are entries that use a structured JSON schema that enables high-cardinality data to be ingested and stored at scale. From this data, CloudWatch creates aggregated metrics at the cluster, node, pod, task, and service level as CloudWatch metrics. The metrics that Container Insights collects are available in CloudWatch automatic dashboards, and also viewable in the Metrics section of the CloudWatch console. - -## Usage - -[aws-cloudwatch-metrics](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/aws-cloudwatch-metrics) can be deployed by enabling the add-on via the following. - -```hcl -enable_aws_cloudwatch_metrics = true -``` - -You can optionally customize the Helm chart that deploys `aws_cloudwatch_metrics` via the following configuration. - -```hcl - enable_aws_cloudwatch_metrics = true - aws_cloudwatch_metrics_irsa_policies = ["IAM Policies"] - aws_cloudwatch_metrics_helm_config = { - name = "aws-cloudwatch-metrics" - chart = "aws-cloudwatch-metrics" - repository = "https://aws.github.io/eks-charts" - version = "0.0.7" - namespace = "amazon-cloudwatch" - values = [templatefile("${path.module}/values.yaml", { - eks_cluster_id = var.addon_context.eks_cluster_id - })] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```hcl -awsCloudWatchMetrics = { - enable = true -} -``` diff --git a/docs/add-ons/aws-efs-csi-driver.md b/docs/add-ons/aws-efs-csi-driver.md deleted file mode 100644 index 051d2e70c9..0000000000 --- a/docs/add-ons/aws-efs-csi-driver.md +++ /dev/null @@ -1,57 +0,0 @@ -# AWS EFS CSI Driver - -This add-on deploys the [AWS EFS CSI driver](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) into an EKS cluster. - -## Usage - -The [AWS EFS CSI driver](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/aws-efs-csi-driver) can be deployed by enabling the add-on via the following. Check out the full [example](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/stateful/main.tf) to deploy an EKS Cluster with EFS backing the dynamic provisioning of persistent volumes. - -```hcl - enable_aws_efs_csi_driver = true -``` - -Once deployed, you will be able to see a number of supporting resources in the `kube-system` namespace. - -```sh -$ kubectl get deployment efs-csi-controller -n kube-system - -NAME READY UP-TO-DATE AVAILABLE AGE -efs-csi-controller 2/2 2 2 4m29s -``` - -```sh -$ kubectl get daemonset efs-csi-node -n kube-system - -NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE -efs-csi-node 3 3 3 3 3 beta.kubernetes.io/os=linux 4m32s -``` - -You can optionally customize the Helm chart that deploys the driver via the following configuration. - -```hcl - enable_aws_efs_csi_driver = true - - # Optional aws_efs_csi_driver_helm_config - aws_efs_csi_driver_helm_config = { - repository = "https://kubernetes-sigs.github.io/aws-efs-csi-driver/" - version = "2.2.3" - } - aws_efs_csi_driver_irsa_policies = [""] -``` - -### GitOps Configuration - -`ArgoCD` with `App of Apps` GitOps enabled for this Add-on by enabling the following variable - -```hcl -argocd_manage_add_ons = true -``` - -The following is configured to ArgoCD App of Apps for this Add-on. - -```hcl - argocd_gitops_config = { - enable = true - serviceAccountName = local.service_account - } -``` diff --git a/docs/add-ons/aws-for-fluent-bit.md b/docs/add-ons/aws-for-fluent-bit.md deleted file mode 100644 index 0c735720e2..0000000000 --- a/docs/add-ons/aws-for-fluent-bit.md +++ /dev/null @@ -1,70 +0,0 @@ -# AWS for Fluent Bit - -Fluent Bit is an open source Log Processor and Forwarder which allows you to collect any data like metrics and logs from different sources, enrich them with filters and send them to multiple destinations. - -## AWS for Fluent Bit - -AWS provides a Fluent Bit image with plugins for both CloudWatch Logs and Kinesis Data Firehose. The [AWS for Fluent Bit](https://github.com/aws/aws-for-fluent-bit) image is available on the Amazon ECR Public Gallery. For more details, see [aws-for-fluent-bit](https://gallery.ecr.aws/aws-observability/aws-for-fluent-bit) on the Amazon ECR Public Gallery. - -### Usage - -[aws-for-fluent-bit](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/aws-for-fluentbit) can be deployed by enabling the add-on via the following. - -This add-on is configured to stream the worker node logs to CloudWatch Logs by default. It can further be configured to stream the logs to additional destinations like Kinesis Data Firehose, Kinesis Data Streams and Amazon OpenSearch Service by passing the custom `values.yaml`. -See this [Helm Chart](https://github.com/aws/eks-charts/tree/master/stable/aws-for-fluent-bit) for more details. - -```hcl -enable_aws_for_fluentbit = true -``` - -You can optionally customize the Helm chart that deploys `aws_for_fluentbit` via the following configuration. - -```hcl - enable_aws_for_fluentbit = true - aws_for_fluentbit_irsa_policies = ["IAM Policies"] # Add list of additional policies to IRSA to enable access to Kinesis, OpenSearch etc. - aws_for_fluentbit_cw_log_group_retention = 90 - aws_for_fluentbit_helm_config = { - name = "aws-for-fluent-bit" - chart = "aws-for-fluent-bit" - repository = "https://aws.github.io/eks-charts" - version = "0.1.0" - namespace = "logging" - aws_for_fluent_bit_cw_log_group = "/${local.cluster_id}/worker-fluentbit-logs" # Optional - create_namespace = true - values = [templatefile("${path.module}/values.yaml", { - region = data.aws_region.current.name, - aws_for_fluent_bit_cw_log_group = "/${local.cluster_id}/worker-fluentbit-logs" - })] - set = [ - { - name = "nodeSelector.kubernetes\\.io/os" - value = "linux" - } - ] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` -awsForFluentBit = { - enable = true - logGroupName = "" -} -``` - -### Externally-Created CloudWatch Log Group(s) - -If the CloudWatch log group FluentBit puts logs to is required to be encrypted by an existing KMS -customer-managed key, then the CloudWatch log group needs to be created external to the -kubernetes-addons module and passed in. Creating the CloudWatch log group externally is also useful -if FluentBit is putting logs to multiple log groups because all the log groups can be created in -the same code file. To do this, set the create log group flag to false and supply the -previously-created log group name. - -```hcl -aws_for_fluentbit_create_cw_log_group = false -aws_for_fluentbit_cw_log_group_name = aws_cloudwatch_log_group.application.name -``` diff --git a/docs/add-ons/aws-fsx-csi-driver.md b/docs/add-ons/aws-fsx-csi-driver.md deleted file mode 100644 index 1c2dcdade6..0000000000 --- a/docs/add-ons/aws-fsx-csi-driver.md +++ /dev/null @@ -1,60 +0,0 @@ -# Amazon FSx for Lustre CSI Driver - -Fully managed shared storage built on the world's most popular high-performance file system. -This add-on deploys the [Amazon FSx for Lustre CSI Driver](https://aws.amazon.com/fsx/lustre/) into an EKS cluster. - -## Usage - -The [Amazon FSx for Lustre CSI Driver](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/aws-fsx-csi-driver) can be deployed by enabling the add-on via the following. - -```hcl - enable_aws_fsx_csi_driver = true -``` - -You can optionally customize the Helm chart that deploys `enable_aws_fsx_csi_driver` via the following configuration. - -```hcl - enable_aws_fsx_csi_driver = true - aws_fsx_csi_driver_helm_config = { - name = "aws-fsx-csi-driver" - chart = "aws-fsx-csi-driver" - repository = "https://kubernetes-sigs.github.io/aws-fsx-csi-driver/" - version = "1.4.2" - namespace = "kube-system" - values = [templatefile("${path.module}/aws-fsx-csi-driver-values.yaml", {})] # Create this `aws-fsx-csi-driver-values.yaml` file with your own custom values - } - aws_fsx_csi_driver_irsa_policies = [""] -``` - -Once deployed, you will be able to see a number of supporting resources in the `kube-system` namespace. - -```sh -$ kubectl get deployment fsx-csi-controller -n kube-system - -NAME READY UP-TO-DATE AVAILABLE AGE -fsx-csi-controller 2/2 2 2 4m29s -``` - -```sh -$ kubectl get daemonset fsx-csi-node -n kube-system - -NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE -fsx-csi-node 3 3 3 3 3 kubernetes.io/os=linux 4m32s -``` - -### GitOps Configuration - -`ArgoCD` with `App of Apps` GitOps enabled for this Add-on by enabling the following variable - -```hcl -argocd_manage_add_ons = true -``` - -The following is configured to ArgoCD App of Apps for this Add-on. - -```hcl - argocd_gitops_config = { - enable = true - serviceAccountName = local.service_account - } -``` diff --git a/docs/add-ons/aws-load-balancer-controller.md b/docs/add-ons/aws-load-balancer-controller.md deleted file mode 100644 index 04cd2d6370..0000000000 --- a/docs/add-ons/aws-load-balancer-controller.md +++ /dev/null @@ -1,63 +0,0 @@ -# AWS Load Balancer Controller - -The [AWS Load Balancer Controller](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html) manages AWS Elastic Load Balancers for a Kubernetes cluster. The controller provisions the following resources: - -* An AWS Application Load Balancer (ALB) when you create a Kubernetes Ingress. -* An AWS Network Load Balancer (NLB) when you create a Kubernetes Service of type LoadBalancer. - -For more information about AWS Load Balancer Controller please see the [official documentation](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html). - -## Usage - -```hcl -enable_aws_load_balancer_controller = true -``` - -You can optionally customize the Helm chart that deploys `aws-lb-ingress-controller` via the following configuration. - -```hcl - enable_aws_load_balancer_controller = true - # Optional - aws_load_balancer_controller_helm_config = { - name = "aws-load-balancer-controller" - chart = "aws-load-balancer-controller" - repository = "https://aws.github.io/eks-charts" - version = "1.3.1" - namespace = "kube-system" - values = [templatefile("${path.module}/values.yaml", {})] - } -``` - -To validate that controller is running, ensure that controller deployment is in RUNNING state: - -```sh -# Assuming controller is installed in kube-system namespace -$ kubectl get deployments -n kube-system -NAME READY UP-TO-DATE AVAILABLE AGE -aws-load-balancer-controller 2/2 2 2 3m58s -``` -#### AWS Service annotations for LB Ingress Controller - -Here is the link to get the AWS ELB [service annotations](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/) for LB Ingress controller. - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` -awsLoadBalancerController = { - enable = true - serviceAccountName = "" -} -``` - -### IRSA is too long - -If the IAM role is too long, override the service account name in the `helm_config` to create a shorter role name. - -```hcl - enable_aws_load_balancer_controller = true - aws_load_balancer_controller_helm_config = { - service_account = "aws-lb-sa" - } -``` diff --git a/docs/add-ons/aws-node-termination-handler.md b/docs/add-ons/aws-node-termination-handler.md deleted file mode 100644 index 4bb60d5155..0000000000 --- a/docs/add-ons/aws-node-termination-handler.md +++ /dev/null @@ -1,61 +0,0 @@ -# AWS Node Termination Handler - -This project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as EC2 maintenance events, EC2 Spot interruptions, ASG Scale-In, ASG AZ Rebalance, and EC2 Instance Termination via the API or Console. If not handled, your application code may not stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down. For more information see [README.md](https://github.com/aws/aws-node-termination-handler#readme). - -The aws-node-termination-handler (NTH) can operate in two different modes: Instance Metadata Service (IMDS) or the Queue Processor. In the EKS Blueprints, we provision the NTH in Queue Processor mode. This means that NTH will monitor an SQS queue of events from Amazon EventBridge for ASG lifecycle events, EC2 status change events, Spot Interruption Termination Notice events, and Spot Rebalance Recommendation events. When NTH detects an instance is going down, NTH uses the Kubernetes API to cordon the node to ensure no new work is scheduled there, then drain it, removing any existing work. - -The NTH will be deployed in the `kube-system` namespace. AWS resources required as part of the setup of NTH will be provisioned for you. These include: - -1. Node group ASG tagged with `key=aws-node-termination-handler/managed` -2. AutoScaling Group Termination Lifecycle Hook -3. Amazon Simple Queue Service (SQS) Queue -4. Amazon EventBridge Rule -5. IAM Role for the aws-node-termination-handler Queue Processing Pods - -## Usage - -```hcl -enable_aws_node_termination_handler = true -``` - -You can optionally customize the Helm chart that deploys `aws-node-termination-handler` via the following configuration. - -```hcl - enable_aws_node_termination_handler = true - - aws_node_termination_handler_helm_config = { - name = "aws-node-termination-handler" - chart = "aws-node-termination-handler" - repository = "https://aws.github.io/eks-charts" - version = "0.16.0" - timeout = "1200" - } -``` - - -To validate that controller is running, ensure that controller deployment is in RUNNING state: - -```sh -# Assuming controller is installed in kube-system namespace -$ kubectl get deployments -n kube-system -aws-node-termination-handler 1/1 1 1 5d9h -``` - -### GitOps Configuration -The following properties are made available for use when managing the add-on via GitOps. - -GitOps with ArgoCD Add-on repo is located [here](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/values.yaml) - -When enabling NTH for GitOps, be sure that you are using `self_managed_node_groups` as this module will check to ensure that it finds valid backing autoscaling groups. - -If you're using `managed_node_groups`, NTH isn't required as per the following - https://github.com/aws/aws-node-termination-handler/issues/186 -``` -Amazon EKS automatically drains nodes using the Kubernetes API during terminations or updates. Updates respect the pod disruption budgets that you set for your pods. -``` - -```hcl - awsNodeTerminationHandler = { - enable = true - serviceAccountName = "" - } -``` diff --git a/docs/add-ons/aws-privateca-issuer.md b/docs/add-ons/aws-privateca-issuer.md deleted file mode 100644 index 35f0041f40..0000000000 --- a/docs/add-ons/aws-privateca-issuer.md +++ /dev/null @@ -1,110 +0,0 @@ -# aws-privateca-issuer - -AWS ACM Private CA is a module of the [AWS Certificate Manager](https://aws.amazon.com/certificate-manager/) that can setup and manage private CAs. `cert-manager` is a Kubernetes add-on to automate the management and issuance of TLS certificates from various issuing sources. It will ensure certificates are valid and up to date periodically, and attempt to renew certificates at an appropriate time before expiry. This module `aws-pca-issuer` is a addon for `cert-manager` that issues certificates using AWS ACM PCA. - -See the [aws-privateca-issuer documentation](https://cert-manager.github.io/aws-privateca-issuer/). - -## Usage - -aws_privateca_issuer can be deployed by enabling the add-on via the following. - -```hcl -enable_cert_manager = true -enable_aws_privateca_issuer = true -``` - -Create `AWSPCAClusterIssuer` custom resource definition (CRD). It is a Kubernetes resources that represent certificate authorities (CAs) from AWS ACM and are able to generate signed certificates by honoring certificate signing requests. For more details on external `Issuer` types, please check [aws-privateca-issuer](https://github.com/cert-manager/aws-privateca-issuer) - -```hcl -resource "kubernetes_manifest" "cluster-pca-issuer" { - manifest = { - apiVersion = "awspca.cert-manager.io/v1beta1" - kind = "AWSPCAClusterIssuer" - - metadata = { - name = "logical.name.of.this.issuer" - } - - spec = { - arn = "ARN for AWS PCA" - region: "data.aws_region.current.id OR AWS region of the AWS PCA" - - } - } -} -``` - -Create `Certificate` CRD. Certificates define a desired X.509 certificate which will be renewed and kept up to date. For more details on how to specify and request Certificate resources, please check [Certificate Resources guide](https://cert-manager.io/docs/usage/certificate/). - -A Certificate is a namespaced resource that references `AWSPCAClusterIssuer` (created in above step) that determine what will be honoring the certificate request. - -```hcl -resource "kubernetes_manifest" "example_pca_certificate" { - manifest = { - apiVersion = "cert-manager.io/v1" - kind = "Certificate" - - metadata = { - name = "name of the certificate" - namespace = "default or any namespace" - } - - spec = { - commonName = "common name for your certificate" - duration = "duration" - issuerRef = { - group = "awspca.cert-manager.io" - kind = "AWSPCAClusterIssuer" - name: "name of AWSPCAClusterIssuer created above" - } - renewBefore = "360h0m0s" - secretName = "name of the secret where certificate will be mounted" - usages = [ - "server auth", - "client auth" - ] - privateKey = { - algorithm: "RSA" - size: 2048 - } - } - } - -} -``` - -When a Certificate is created, a corresponding CertificateRequest resource is created by `cert-manager` containing the encoded X.509 certificate request, Issuer reference, and other options based upon the specification of the Certificate resource. - -This Certificate CRD will tell cert-manager to attempt to use the Issuer (as AWS ACM) to obtain a certificate key pair for the specified domains. If successful, the resulting TLS key and certificate will be stored in a kubernetes secret named , with keys of tls.key, and tls.crt respectively. This secret will live in the same namespace as the Certificate resource. - -Now, you may run `kubectl get Certificate` to view the status of Certificate Request from AWS PCA. - -``` -NAME READY SECRET AGE -example True aws001-preprod-dev-eks-clusterissuer 3h35m -``` - -If the status is `True`, that means, the `tls.crt`, `tls.key` and `ca.crt` will all be available in [Kubernetes Secret](https://kubernetes.io/docs/concepts/configuration/secret/) - -``` -aws001-preprod-dev-eks-clusterissuer -Name: aws001-preprod-dev-eks-clusterissuer -Namespace: default -Labels: -Annotations: cert-manager.io/alt-names: - cert-manager.io/certificate-name: example - cert-manager.io/common-name: example.com - cert-manager.io/ip-sans: - cert-manager.io/issuer-group: awspca.cert-manager.io - cert-manager.io/issuer-kind: AWSPCAClusterIssuer - cert-manager.io/issuer-name: aws001-preprod-dev-eks - cert-manager.io/uri-sans: - -Type: kubernetes.io/tls - -Data -==== -ca.crt: 1785 bytes -tls.crt: 1517 bytes -tls.key: 1679 bytes -``` diff --git a/docs/add-ons/calico.md b/docs/add-ons/calico.md deleted file mode 100644 index cb29067aa5..0000000000 --- a/docs/add-ons/calico.md +++ /dev/null @@ -1,39 +0,0 @@ -# Calico - -Calico is a widely adopted, battle-tested open source networking and network security solution for Kubernetes, virtual machines, and bare-metal workloads -Calico provides two major services for Cloud Native applications: network connectivity between workloads and network security policy enforcement between workloads. -[Calico](https://projectcalico.docs.tigera.io/getting-started/kubernetes/helm#download-the-helm-chart) docs chart bootstraps Calico infrastructure on a Kubernetes cluster using the Helm package manager. - -For complete project documentation, please visit the [Calico documentation site](https://docs.tigera.io/calico/next/about/). - -## Usage - -Calico can be deployed by enabling the add-on via the following. - -```hcl -enable_calico = true -``` - -Deploy Calico with custom `values.yaml` - -```hcl - # Optional Map value; pass calico-values.yaml from consumer module - calico_helm_config = { - name = "calico" # (Required) Release name. - repository = "https://docs.projectcalico.org/charts" # (Optional) Repository URL where to locate the requested chart. - chart = "tigera-operator" # (Required) Chart name to be installed. - version = "v3.24.1" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/calico/locals.tf - namespace = "tigera-operator" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/calico-values.yaml", {})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```sh -calico = { - enable = true -} -``` diff --git a/docs/add-ons/cert-manager-csi-driver.md b/docs/add-ons/cert-manager-csi-driver.md deleted file mode 100644 index dd2b917c67..0000000000 --- a/docs/add-ons/cert-manager-csi-driver.md +++ /dev/null @@ -1,24 +0,0 @@ -# cert-manager-csi-driver - -Cert Manager csi-driver is a Container Storage Interface (CSI) driver plugin for Kubernetes to work along cert-manager. The goal for this plugin is to seamlessly request and mount certificate key pairs to pods. This is useful for facilitating mTLS, or otherwise securing connections of pods with guaranteed present certificates whilst having all of the features that cert-manager provides. - -For complete project documentation, please visit the [cert-manager-csi-driver documentation site](https://cert-manager.io/docs/projects/csi-driver). - -## Usage - -cert-manger can be deployed by enabling the add-on via the following. - -```hcl -enable_cert_manager_csi_driver = true -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` - -certManagerCsiDriver = { - enable = true -} -``` diff --git a/docs/add-ons/cert-manager-istio-csr.md b/docs/add-ons/cert-manager-istio-csr.md deleted file mode 100644 index 8050c93976..0000000000 --- a/docs/add-ons/cert-manager-istio-csr.md +++ /dev/null @@ -1,24 +0,0 @@ -# cert-manager-istio-csr - -istio-csr is an agent that allows for Istio workload and control plane components to be secured using cert-manager. - -For complete project documentation, please visit the [cert-manager documentation site](https://cert-manager.io/docs/usage/istio/). - -## Usage - -cert-manger-istio-csr can be deployed by enabling the add-on via the following. - -```hcl -enable_cert_manager_istio_csr = true -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` - -certManagerIstioCsr = { - enable = true -} -``` diff --git a/docs/add-ons/cert-manager.md b/docs/add-ons/cert-manager.md deleted file mode 100644 index f126f3be34..0000000000 --- a/docs/add-ons/cert-manager.md +++ /dev/null @@ -1,50 +0,0 @@ -# cert-manager - -cert-manager adds certificates and certificate issuers as resource types in Kubernetes clusters, and simplifies the process of obtaining, renewing and using those certificates. - -For complete project documentation, please visit the [cert-manager documentation site](https://cert-manager.io/docs/). - -## Usage - -cert-manger can be deployed by enabling the add-on via the following. - -```hcl -enable_cert_manager = true -``` - -cert-manger can optionally leverage the `cert_manager_domain_names` global property of the `kubernetes_addon` submodule for DNS01 protocol. The value for this property should be a list of Route53 domains managed by your account. cert-manager is restricted to the zones from the list. - -``` -cert_manager_domain_names = [, ] -``` - -With this add-on self-signed CA and Let's Encrypt cluster issuers will be installed. - -You can disable Let's Encrypt cluster issuers with: - -``` -cert_manager_install_letsencrypt_issuers = false -``` - -You can set an email address for expiration emails with: - -``` -cert_manager_letsencrypt_email = "user@example.com" -``` - -You can pass previously created secrets for use as `imagePullSecrets` on the Service Account - -``` -cert_manager_kubernetes_svc_image_pull_secrets = ["regcred"] -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` - -certManager = { - enable = true -} -``` diff --git a/docs/add-ons/chaos-mesh.md b/docs/add-ons/chaos-mesh.md deleted file mode 100644 index 6cf27d48b8..0000000000 --- a/docs/add-ons/chaos-mesh.md +++ /dev/null @@ -1,39 +0,0 @@ -# Chaos Mesh - -Chaos Mesh is an open source cloud-native Chaos Engineering platform. It offers various types of fault simulation and has an enormous capability to orchestrate fault scenarios - -[Chaos Mesh](https://chaos-mesh.org/docs/production-installation-using-helm/) docs chart bootstraps Chaos Mesh infrastructure on a Kubernetes cluster using the Helm package manager. - -For complete project documentation, please visit the [Chaos Mesh site](https://chaos-mesh.org/docs/). - -## Usage - -Chaos Mesh can be deployed by enabling the add-on via the following. - -```hcl -enable_chaos_mesh = true -``` - -Deploy Chaos Mesh with custom `values.yaml` - -```hcl - # Optional Map value; pass chaos-mesh-values.yaml from consumer module - chaos_mesh_helm_config = { - name = "chaos-mesh" # (Required) Release name. - repository = "https://charts.chaos-mesh.org" # (Optional) Repository URL where to locate the requested chart. - chart = "chaos-mesh" # (Required) Chart name to be installed. - version = "2.3.0" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/chaos-mesh/locals.tf - namespace = "chaos-testing" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/chaos-mesh-values.yaml", {})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```sh -chaosMesh = { - enable = true -} -``` diff --git a/docs/add-ons/cilium.md b/docs/add-ons/cilium.md deleted file mode 100644 index a2b194ee64..0000000000 --- a/docs/add-ons/cilium.md +++ /dev/null @@ -1,45 +0,0 @@ -# Cilium - -Cilium is open source software for transparently securing the network connectivity between application services deployed using Linux container management platforms like Docker and Kubernetes. - -Cilium can be set up in two manners: -- In combination with the `Amazon VPC CNI plugin`. In this hybrid mode, the AWS VPC CNI plugin is responsible for setting up the virtual network devices as well as for IP address management (IPAM) via ENIs. -After the initial networking is setup for a given pod, the Cilium CNI plugin is called to attach eBPF programs to the network devices set up by the AWS VPC CNI plugin in order to enforce network policies, perform load-balancing and provide encryption. -Read the installation instruction [here](https://docs.cilium.io/en/latest/installation/cni-chaining-aws-cni/) -- As a replacement of `Amazon VPC CNI`, read the complete installation guideline [here](https://docs.cilium.io/en/latest/installation/k8s-install-helm/) - -For complete project documentation, please visit the [Cilium documentation site](https://docs.cilium.io/en/stable/). - -## Usage - -By Cilium in combination with the `Amazon VPC CNI plugin` by enabling the add-on via the following. - -```hcl -enable_cilium = true -``` - -Deploy Cilium with custom `values.yaml` - -```hcl - # Optional Map value; pass cilium-values.yaml from consumer module - cilium_helm_config = { - name = "cilium" # (Required) Release name. - repository = "https://helm.cilium.io/" # (Optional) Repository URL where to locate the requested chart. - chart = "cilium" # (Required) Chart name to be installed. - version = "1.12.1" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/cilium/locals.tf - values = [templatefile("${path.module}/cilium-values.yaml", {})] - } -``` - -Refer to the [cilium default values file](https://github.com/cilium/cilium/blob/master/install/kubernetes/cilium/values.yaml) for complete values options for the chart - - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```hcl -cilium = { - enable = true -} -``` diff --git a/docs/add-ons/cluster-autoscaler.md b/docs/add-ons/cluster-autoscaler.md deleted file mode 100644 index 810dc405a0..0000000000 --- a/docs/add-ons/cluster-autoscaler.md +++ /dev/null @@ -1,27 +0,0 @@ -# Cluster Autoscaler - -Cluster Autoscaler is a tool that automatically adjusts the number of nodes in your cluster when: - -* Pods fail due to insufficient resources, or -* Pods are rescheduled onto other nodes due to being in nodes that are underutilized for an extended period of time. - -The [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) add-on adds support for Cluster Autoscaler to an EKS cluster. It is typically installed as a **Deployment** in your cluster. It uses leader election to ensure high availability, but scaling is one done via one replica at a time. - -## Usage - -[Cluster Autoscaler](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/cluster-autoscaler) can be deployed by enabling the add-on via the following. - -```hcl -enable_cluster_autoscaler = true -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```hcl -clusterAutoscaler = { - enable = true - serviceAccountName = "" -} -``` diff --git a/docs/add-ons/cluster-proportional-autoscaler.md b/docs/add-ons/cluster-proportional-autoscaler.md deleted file mode 100644 index 647358d369..0000000000 --- a/docs/add-ons/cluster-proportional-autoscaler.md +++ /dev/null @@ -1,81 +0,0 @@ -# Horizontal cluster-proportional-autoscaler container - -Horizontal cluster-proportional-autoscaler watches over the number of schedulable nodes and cores of the cluster and resizes the number of replicas for the required resource. This functionality may be desirable for applications that need to be autoscaled with the size of the cluster, such as CoreDNS and other services that scale with the number of nodes/pods in the cluster. - -The [cluster-proportional-autoscaler](https://github.com/kubernetes-sigs/cluster-proportional-autoscaler) helps to scale the applications using deployment or replicationcontroller or replicaset. This is an alternative solution to Horizontal Pod Autoscaling. -It is typically installed as a **Deployment** in your cluster. - -## Usage - -This add-on requires both `enable_coredns_autoscaler` and `coredns_autoscaler_helm_config` as mandatory fields. - -[cluster-proportional-autoscaler](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/cluster-proportional-autoscaler) can be deployed by enabling the add-on via the following. - -The example shows how to enable `cluster-proportional-autoscaler` for `CoreDNS Deployment`. CoreDNS deployment is not configured with HPA. So, this add-on helps to scale CoreDNS Add-on according to the size of the nodes and cores. - -This Add-on can be used to scale any application with Deployment objects. - -```hcl -enable_coredns_autoscaler = true -coredns_autoscaler_helm_config = { - name = "cluster-proportional-autoscaler" - chart = "cluster-proportional-autoscaler" - repository = "https://kubernetes-sigs.github.io/cluster-proportional-autoscaler" - version = "1.0.0" - namespace = "kube-system" - timeout = "300" - values = [ - <<-EOT - nameOverride: kube-dns-autoscaler - - # Formula for controlling the replicas. Adjust according to your needs - # replicas = max( ceil( cores * 1/coresPerReplica ) , ceil( nodes * 1/nodesPerReplica ) ) - config: - linear: - coresPerReplica: 256 - nodesPerReplica: 16 - min: 1 - max: 100 - preventSinglePointFailure: true - includeUnschedulableNodes: true - - # Target to scale. In format: deployment/*, replicationcontroller/* or replicaset/* (not case sensitive). - options: - target: deployment/coredns # Notice the target as `deployment/coredns` - - serviceAccount: - create: true - name: kube-dns-autoscaler - - podSecurityContext: - seccompProfile: - type: RuntimeDefault - supplementalGroups: [ 65534 ] - fsGroup: 65534 - - resources: - limits: - cpu: 100m - memory: 128Mi - requests: - cpu: 100m - memory: 128Mi - - tolerations: - - key: "CriticalAddonsOnly" - operator: "Exists" - description = "Cluster Proportional Autoscaler for CoreDNS Service" - EOT - ] -} -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` -corednsAutoscaler = { - enable = true -} -``` diff --git a/docs/add-ons/consul.md b/docs/add-ons/consul.md deleted file mode 100644 index dfa785b231..0000000000 --- a/docs/add-ons/consul.md +++ /dev/null @@ -1,41 +0,0 @@ -# Consul - -HashiCorp Consul is a service networking solution that enables teams to manage secure network connectivity between services and across on-prem and multi-cloud environments and runtimes. Consul offers service discovery, service mesh, traffic management, and automated updates to network infrastructure device. - -For complete project documentation, please visit the [consul](https://developer.hashicorp.com/consul/docs/k8s/installation/install). - -## Usage - -Consul can be deployed by enabling the add-on via the following. - -```hcl -enable_consul = true -``` - -You can optionally customize the Helm chart via the following configuration. - -```hcl - enable_consul = true - # Optional consul_helm_config - consul_helm_config = { - name = "consul" - chart = "consul" - repository = "https://helm.releases.hashicorp.com" - version = "1.0.1" - namespace = "consul" - values = [templatefile("${path.module}/values.yaml", { - ... - })] - } -``` - -### GitOps Configuration -The following properties are made available for use when managing the add-on via GitOps. - -GitOps with ArgoCD Add-on repo is located [here](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/values.yaml) - -```hcl - consul = { - enable = true - } -``` diff --git a/docs/add-ons/crossplane.md b/docs/add-ons/crossplane.md deleted file mode 100644 index 67d9166815..0000000000 --- a/docs/add-ons/crossplane.md +++ /dev/null @@ -1,105 +0,0 @@ -# Crossplane -Crossplane is an open source Kubernetes add-on that enables platform teams to assemble infrastructure from multiple vendors, and expose higher level self-service APIs for application teams to consume, without having to write any code. - - - Crossplane is a control plane - - Allow engineers to model their infrastructure as declarative configuration - - Support managing a myriad of diverse infrastructure using "provider" plugins - - It's an open source tool with strong communities - -For complete project documentation, please visit the [Crossplane](https://crossplane.io/). - -## Usage - -### Crossplane Deployment - -Crossplane can be deployed by enabling the add-on via the following. Check out the full [example](https://github.com/awslabs/crossplane-on-eks/tree/main/bootstrap/terraform) to deploy the EKS Cluster with Crossplane. - -```hcl - enable_crossplane = true -``` - -You can optionally customize the Helm chart that deploys `Crossplane` via the following configuration. - -```hcl - enable_crossplane = true - - crossplane_helm_config = { - name = "crossplane" - chart = "crossplane" - repository = "https://charts.crossplane.io/stable/" - version = "1.10.1" # Get the lates version from https://github.com/crossplane/crossplane - namespace = "crossplane-system" - } -``` - -To install the [Upbound Universal Crossplane (UXP) helm chart](https://github.com/upbound/universal-crossplane/tree/main/cluster/charts/universal-crossplane) use the following configuration. - -```hcl - enable_crossplane = true #defaults to Upstream Crossplane Helm Chart - - crossplane_helm_config = { - name = "crossplane" - chart = "universal-crossplane" - repository = "https://charts.upbound.io/stable/" - version = "1.10.1" # Get the latest version from https://github.com/upbound/universal-crossplane - namespace = "upbound-system" - description = "Upbound Universal Crossplane (UXP)" - } -``` - - -### Crossplane Providers Deployment -This module provides options to deploy the following providers for Crossplane. These providers disabled by default, and it can be enabled using the config below. - - - [AWS Provider](https://github.com/crossplane/provider-aws) - - [Upbound AWS Provider](https://github.com/upbound/provider-aws) - - [Kubernetes Provider](https://github.com/crossplane-contrib/provider-kubernetes) - - [Helm Provider](https://github.com/crossplane-contrib/provider-helm) - - [Terrajet AWS Provider](https://github.com/crossplane-contrib/provider-jet-aws) - -_NOTE: Crossplane requires Admin like permissions to create and update resources similar to Terraform deploy role. -This example config uses AdministratorAccess, but you should select a policy with the minimum permissions required to provision your resources._ - -Config to deploy [AWS Provider](https://github.com/crossplane/provider-aws) -```hcl -# Creates ProviderConfig -> aws-provider -crossplane_aws_provider = { - enable = true -} -``` - -Config to deploy [Upbound AWS Provider](https://github.com/upbound/provider-aws) -```hcl -# Creates ProviderConfig -> upbound-aws-provider -crossplane_upbound_aws_provider = { - enable = true -} -``` - -Config to deploy [Terrajet AWS Provider (Deprecated)](https://github.com/crossplane-contrib/provider-jet-aws) -```hcl -# Creates ProviderConfig -> jet-aws-provider -crossplane_jet_aws_provider = { - enable = true - provider_aws_version = "v0.4.1" # Get the latest version from https://github.com/crossplane-contrib/provider-jet-aws - additional_irsa_policies = ["arn:aws:iam::aws:policy/AdministratorAccess"] -} -``` - -_NOTE: Crossplane requires cluster-admin permissions to create and update Kubernetes resources._ - -Config to deploy [Kubernetes provider](https://github.com/crossplane-contrib/provider-kubernetes) -```hcl -# Creates ProviderConfig -> kubernetes-provider -crossplane_kubernetes_provider = { - enable = true -} -``` - -Config to deploy [Helm Provider](https://github.com/crossplane-contrib/provider-helm) -```hcl -# Creates ProviderConfig -> helm-provider -crossplane_helm_provider = { - enable = true -} -``` diff --git a/docs/add-ons/crowdstrike-falcon.md b/docs/add-ons/crowdstrike-falcon.md deleted file mode 100644 index 7fe372f163..0000000000 --- a/docs/add-ons/crowdstrike-falcon.md +++ /dev/null @@ -1,23 +0,0 @@ -# CrowdStrike Falcon - -[`terraform-kubectl-falcon`](https://github.com/CrowdStrike/terraform-kubectl-falcon) is a Terraform module that can automate the deployment of CrowdStrike Falcon Sensor and the Kubernetes Protection Agent on a Kubernetes cluster. - -## Falcon Operator - -Falcon Operator is a Kubernetes operator that manages the deployment of the CrowdStrike Falcon Sensor on a Kubernetes cluster. The CrowdStrike Falcon Sensor provides runtime protection for workloads running on a Kubernetes cluster. - -More information can be found in the [Operator submodule](https://github.com/CrowdStrike/terraform-kubectl-falcon/blob/main/modules/operator/README.md). - -## Kubernetes Protection Agent (KPA) - -The Kubernetes Protection Agent provides visibility into the cluster by collecting event information from the Kubernetes layer. These events are correlated to sensor events and cloud events to provide complete cluster visibility. - -More information can be found in the [KPA sub-module](https://github.com/CrowdStrike/terraform-kubectl-falcon/blob/main/modules/k8s-protection-agent/README.md). - -## Usage - -Refer to the [`terraform-kubectl-falcon`](https://github.com/CrowdStrike/terraform-kubectl-falcon) documentation for the most up-to-date information on inputs and outputs. - -## Example - -A full end to end example of using the `terraform-kubectl-falcon` module with `eks_blueprints` can be found in the [examples](https://github.com/CrowdStrike/terraform-kubectl-falcon/tree/v0.1.0/examples/aws-eks-blueprint-example) directory of the `terraform-kubectl-falcon` module. diff --git a/docs/add-ons/csi-secrets-store-provider-aws.md b/docs/add-ons/csi-secrets-store-provider-aws.md deleted file mode 100644 index 2c4c88f793..0000000000 --- a/docs/add-ons/csi-secrets-store-provider-aws.md +++ /dev/null @@ -1,12 +0,0 @@ -# secrets-store-csi-driver-provider-aws - -AWS Secrets Manager and Config Provider for Secret Store CSI Driver allows you to get secret contents stored in AWS Key Management Service instance and use the Secrets Store CSI driver interface to mount them into Kubernetes pods. For detailed architectual overview, refer [How to use AWS Secrets & Configuration Provider with your Kubernetes Secrets Store CSI driver] (https://aws.amazon.com/blogs/security/how-to-use-aws-secrets-configuration-provider-with-kubernetes-secrets-store-csi-driver/) - -## Usage - -csi-secrets-store-provider-aws can be deployed by enabling the add-ons via the following. - -```hcl -enable_secrets_store_csi_driver = true -enable_secrets_store_csi_driver_provider_aws = true -``` diff --git a/docs/add-ons/datadog-operator.md b/docs/add-ons/datadog-operator.md deleted file mode 100644 index 19626a5f27..0000000000 --- a/docs/add-ons/datadog-operator.md +++ /dev/null @@ -1,11 +0,0 @@ -# Datadog Operator -The [Datadog Operator](https://github.com/DataDog/datadog-operator) is a Kubernetes add-on that can automate the deployment of a best-practice Datadog monitoring agent on a Kubernetes cluster. - -## Usage -The Datadog Operator can be deployed by enabling the add-on via the following. - -```hcl -enable_datadog_operator = true -``` - -Once the operator is provisioned, the Datadog Agent can be deployed by creating a `DatadogAgent` resource and supplying an API key. diff --git a/docs/add-ons/external-dns.md b/docs/add-ons/external-dns.md deleted file mode 100644 index 19e6752c72..0000000000 --- a/docs/add-ons/external-dns.md +++ /dev/null @@ -1,52 +0,0 @@ -# ExternalDNS - -[External DNS](https://github.com/kubernetes-sigs/external-dns) is a Kubernetes add-on that can automate the management of DNS records based on Ingress and Service resources. - -For complete project documentation, please visit the [External DNS Github repository](https://github.com/kubernetes-sigs/external-dns). - -## Usage - -External DNS can be deployed by enabling the add-on via the following. - -```hcl -enable_external_dns = true -``` - -External DNS can optionally leverage the `eks_cluster_domain` global property of the `kubernetes_addon` submodule. The value for this property should be a Route53 domain managed by your account. ExternalDNS will leverage the value supplied for its `zoneIdFilters` property, which will restrict ExternalDNS to only create records for this domain. See docs [here](https://github.com/bitnami/charts/tree/master/bitnami/external-dns). - -``` -eks_cluster_domain = -``` - -Alternatively, you can supply a list of Route53 zone ARNs which external-dns will have access to create/manage records: - -```hcl - external_dns_route53_zone_arns = [ - "arn:aws:route53::123456789012:hostedzone/Z1234567890" - ] -``` - -You can optionally customize the Helm chart that deploys `external-dns` via the following configuration. - -```hcl - enable_external_dns = true - external_dns_helm_config = { - name = "external-dns" - chart = "external-dns" - repository = "https://charts.bitnami.com/bitnami" - version = "6.1.6" - namespace = "external-dns" - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` -external_dns = { - enable = true - zoneFilterIds = local.zone_filter_ids - serviceAccountName = local.service_account -} -``` diff --git a/docs/add-ons/external-secrets.md b/docs/add-ons/external-secrets.md deleted file mode 100644 index d3488eadac..0000000000 --- a/docs/add-ons/external-secrets.md +++ /dev/null @@ -1,37 +0,0 @@ - -# External Secrets Operator - -[External Secrets Operator](https://external-secrets.io/latest) is a Kubernetes operator that integrates external secret management systems like AWS Secrets Manager, HashiCorp Vault, Google Secrets Manager, Azure Key Vault and many more. The operator reads information from external APIs and automatically injects the values into a Kubernetes Secret. - -## Usage - -The External Secrets Operator can be deployed by enabling the add-on via the following. - -```hcl -enable_external_secrets = true -``` - -You can optionally customize the Helm chart that deploys the operator via the following configuration. - -```hcl - enable_external_secrets = true - external_secrets_helm_config = { - name = "external-secrets" - chart = "external-secrets" - repository = "https://charts.external-secrets.io/" - version = "0.5.9" - namespace = "external-secrets" - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -Refer to [locals.tf](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/external-secrets/locals.tf) for latest config. GitOps with ArgoCD Add-on repo is located [here](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/values.yaml). - -```hcl - argocd_gitops_config = { - enable = true - } -``` diff --git a/docs/add-ons/fargate-fluent-bit.md b/docs/add-ons/fargate-fluent-bit.md deleted file mode 100644 index 2bbf826a04..0000000000 --- a/docs/add-ons/fargate-fluent-bit.md +++ /dev/null @@ -1,11 +0,0 @@ -## Fluent Bit for Fargate - -[Fluent Bit for Fargate](https://aws.amazon.com/blogs/containers/fluent-bit-for-amazon-eks-on-aws-fargate-is-here/) configures Fluent Bit to forward Fargate Container logs to CloudWatch. - -### Usage - -Fluent Bit for Fargate can be deployed by enabling the add-on via the following. - -```hcl -enable_fargate_fluentbit = true -``` diff --git a/docs/add-ons/gatekeeper.md b/docs/add-ons/gatekeeper.md deleted file mode 100644 index dbc3dd82ec..0000000000 --- a/docs/add-ons/gatekeeper.md +++ /dev/null @@ -1,43 +0,0 @@ -# Gatekeeper - -Gatekeeper is an admission controller that validates requests to create and update Pods on Kubernetes clusters, using the Open Policy Agent (OPA). Using Gatekeeper allows administrators to define policies with a constraint, which is a set of conditions that permit or deny deployment behaviors in Kubernetes. - -For complete project documentation, please visit the [Gatekeeper](https://open-policy-agent.github.io/gatekeeper/website/docs/). -For reference templates refer [Templates](https://github.com/open-policy-agent/gatekeeper/tree/master/charts/gatekeeper/templates) - -## Usage - -Gatekeeper can be deployed by enabling the add-on via the following. - -```hcl -enable_gatekeeper = true -``` - -You can optionally customize the Helm chart that deploys `Gatekeeper` via the following configuration. - -```hcl - enable_gatekeeper = true - # Optional gatekeeper_helm_config - gatekeeper_helm_config = { - name = "gatekeeper" - chart = "gatekeeper" - repository = "https://open-policy-agent.github.io/gatekeeper/charts" - version = "3.9.0" - namespace = "gatekeeper-system" - values = [ - <<-EOT - clusterName: ${var.eks_cluster_id} - EOT - ] - } -``` - -### GitOps Configuration -The following properties are made available for use when managing the add-on via GitOps. - -```hcl - argocd_gitops_config = { - enable = true - clusterName = var.eks_cluster_id - } -``` diff --git a/docs/add-ons/grafana.md b/docs/add-ons/grafana.md deleted file mode 100644 index 19076fb89d..0000000000 --- a/docs/add-ons/grafana.md +++ /dev/null @@ -1,53 +0,0 @@ -# Grafana - -[Grafana](https://github.com/grafana/grafana) is an open source platform for monitoring and observability. - -Grafana addon can be deployed with EKS blueprints in Amazon EKS server. -This add-on configures [Prometheus](https://grafana.com/docs/grafana/latest/datasources/prometheus/) and [CloudWatch](https://grafana.com/docs/grafana/latest/datasources/aws-cloudwatch/) data sources. -You can add more data sources using the [values.yaml](https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.yaml) - -## Usage - -[Grafana](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/spark-k8s-operator) can be deployed by enabling the add-on via the following. This example shows the usage of the Secrets Manager to create a new secret for Grafana adminPassword. - -This option sets a default `adminPassword` by the helm chart which can be extracted from kubernetes `secrets` with the name as `grafana`. - -``` -enable_grafana = true -``` - -You can optionally customize the Helm chart that deploys `Grafana` via the following configuration. -Also, provide the `adminPassword` using set_sensitive values as shown in the example - -``` - enable_grafana = true - grafana_irsa_policies = [] # Optional to add additional policies to IRSA - -# Optional grafana_helm_config - grafana_helm_config = { - name = "grafana" - chart = "grafana" - repository = "https://grafana.github.io/helm-charts" - version = "6.32.1" - namespace = "grafana" - description = "Grafana Helm Chart deployment configuration" - values = [templatefile("${path.module}/values.yaml", {})] - set_sensitive = [ - { - name = "adminPassword" - value = "" - } - ] - } - -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -``` -grafana = { - enable = true -} -``` diff --git a/docs/add-ons/index.md b/docs/add-ons/index.md deleted file mode 100644 index 425da74426..0000000000 --- a/docs/add-ons/index.md +++ /dev/null @@ -1,104 +0,0 @@ -# Kubernetes Addons Module - -The [`kubernetes-addons`](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons) module within EKS Blueprints allows you to configure the add-ons you would like deployed into you EKS cluster with simple **true/false** flags. - -The framework currently provides support for add-ons listed in the current folder. - -## Add-on Management - -The framework provides two approaches to managing add-on configuration for your EKS clusters. They are: - -1. Via Terraform by leveraging the [Terraform Helm provider](https://registry.terraform.io/providers/hashicorp/helm/latest/docs). -2. Via GitOps with [ArgoCD](https://argo-cd.readthedocs.io/en/stable/). - -### Terraform - -The default method for managing add-on configuration is via Terraform. By default, each individual add-on module will do the following: - -1. Create any AWS resources needed to support add-on functionality. -2. Deploy a Helm chart into your EKS cluster by leveraging the Terraform Helm provider. - -In order to deploy an add-on with default configuration, simply enable the add-on via Terraform properties. - -```hcl -module "eks_blueprints_kubernetes_addons" { - source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons" - - eks_cluster_id = - - # EKS Addons - - enable_amazon_eks_aws_ebs_csi_driver = true - enable_amazon_eks_coredns = true - enable_amazon_eks_kube_proxy = true - enable_amazon_eks_vpc_cni = true - - #K8s Add-ons - enable_argocd = true - enable_aws_for_fluentbit = true - enable_aws_load_balancer_controller = true - enable_cluster_autoscaler = true - enable_metrics_server = true -} -``` - -To customize the behavior of the Helm charts that are ultimately deployed, you can supply custom Helm configuration. The following demonstrates how you can supply this configuration, including a dedicated `values.yaml` file. - -```hcl -enable_metrics_server = true -metrics_server_helm_config = { - name = "metrics-server" - repository = "https://kubernetes-sigs.github.io/metrics-server/" - chart = "metrics-server" - version = "3.8.1" - namespace = "kube-system" - timeout = "1200" - - # (Optional) Example to pass values.yaml from your local repo - values = [templatefile("${path.module}/values.yaml", { - operating_system = "linux" - })] -} -``` - -Each add-on module is configured to fetch Helm Charts from Open Source, public Helm repositories and Docker images from Docker Hub/Public ECR repositories. This requires outbound Internet connection from your EKS Cluster. - -If you would like to use private repositories, you can download Docker images for each add-on and push them to an AWS ECR repository. ECR can be accessed from within a private existing VPC using an ECR VPC endpoint. For instructions on how to download existing images and push them to ECR, see [ECR instructions](../advanced/ecr-instructions.md). - -### GitOps with ArgoCD - -To indicate that you would like to manage add-ons via ArgoCD, you must do the following: - -1. Enable the ArgoCD add-on by setting `enable_argocd` to `true`. -2. Specify you would like ArgoCD to be responsible for deploying your add-ons by setting `argocd_manage_add_ons` to `true`. This will prevent the individual Terraform add-on modules from deploying Helm charts. -3. Pass Application configuration for your add-ons repository via the `argocd_applications` property. - -Note, that the `add_on_application` flag in your `Application` configuration must be set to `true`. - -```hcl -enable_argocd = true -argocd_manage_add_ons = true -argocd_applications = { - infra = { - namespace = "argocd" - path = "" - repo_url = "" - values = {} - add_on_application = true # Indicates the root add-on application. - } -} -``` - -#### GitOps Bridge - -When managing add-ons via ArgoCD, certain AWS resources may still need to be created via Terraform in order to support add-on functionality (e.g. IAM Roles and Services Account). Certain resource values will also need to passed from Terraform to ArgoCD via the ArgoCD Application resource's values map. We refer to this concept as the `GitOps Bridge` - -To ensure that AWS resources needed for add-on functionality are created, you still need to indicate in Terraform configuration which add-ons will be managed via ArgoCD. To do so, simply enable the add-ons via their boolean properties. - -``` -enable_metrics_server = true # Deploys Metrics Server Addon -enable_cluster_autoscaler = true # Deploys Cluster Autoscaler Addon -enable_prometheus = true # Deploys Prometheus Addon -``` - -This will indicate to each add-on module that it should create the necessary AWS resources and pass the relevant values to the ArgoCD Application resource via the Application's values map. diff --git a/docs/add-ons/kafka.md b/docs/add-ons/kafka.md deleted file mode 100644 index 0206a96381..0000000000 --- a/docs/add-ons/kafka.md +++ /dev/null @@ -1,38 +0,0 @@ -# Strimzi Operator for Apache Kafka -[Apache Kafka](https://kafka.apache.org/intro) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. -This addon deploys Strimizi Kafka Operator and it makes it really easy to spin up a Kafka cluster in minutes. - -For complete project documentation, please visit the [Strimzi Kafka](https://strimzi.io/). - -## Usage -Strimzi Kafka Operator can be deployed by enabling the add-on via the following. - -```hcl -enable_strimzi_kafka_operator = true -``` - -You can optionally customize the Helm chart that deploys `Kafka` via the following configuration. - -```hcl - enable_strimzi_kafka_operator = true - # Optional kafka_helm_config - strimzi_kafka_operator_helm_config = { - name = local.name - chart = "strimzi-kafka-operator" - repository = "https://strimzi.io/charts/" - version = "0.31.1" - namespace = local.name - create_namespace = true - values = [templatefile("${path.module}/values.yaml", {})] - description = "Strimzi - Apache Kafka on Kubernetes" - } -``` - -### GitOps Configuration -The following properties are made available for use when managing the add-on via GitOps. - -```hcl -strimziKafkaOperator = { - enable = true -} -``` diff --git a/docs/add-ons/karpenter.md b/docs/add-ons/karpenter.md deleted file mode 100644 index 4ec6777e1d..0000000000 --- a/docs/add-ons/karpenter.md +++ /dev/null @@ -1,54 +0,0 @@ -# Karpenter - -Karpenter is an open-source node provisioning project built for Kubernetes. Karpenter automatically launches just the right compute resources to handle your cluster's applications. It is designed to let you take full advantage of the cloud with fast and simple compute provisioning for Kubernetes clusters. - -For complete project documentation, please visit the [Karpenter documentation](https://karpenter.sh/docs/getting-started/). - -## Usage - -Karpenter can be deployed by enabling the add-on via the following. Check out the full [example](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/karpenter/locals.tf) to deploy the EKS Cluster with Karpenter. - -```hcl -enable_karpenter = true -``` - -You can optionally customize the Helm chart that deploys `Karpenter` via the following configuration. - -```hcl - enable_karpenter = true - # Queue optional for native handling of instance termination events - karpenter_sqs_queue_arn = "arn:aws:sqs:us-west-2:444455556666:queue1" - # Optional to add name prefix for Karpenter's event bridge rules - karpenter_event_rule_name_prefix = "Karpenter" - # Optional karpenter_helm_config - karpenter_helm_config = { - name = "karpenter" - chart = "karpenter" - repository = "https://charts.karpenter.sh" - version = "0.19.3" - namespace = "karpenter" - values = [templatefile("${path.module}/values.yaml", { - eks_cluster_id = var.eks_cluster_id, - eks_cluster_endpoint = var.eks_cluster_endpoint, - service_account = var.service_account, - operating_system = "linux" - })] - } - - karpenter_irsa_policies = [] # Optional to add additional policies to IRSA -``` - -### GitOps Configuration -The following properties are made available for use when managing the add-on via GitOps. - -Refer to [locals.tf](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/karpenter/locals.tf) for latest config. GitOps with ArgoCD Add-on repo is located [here](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/values.yaml) - -```hcl - argocd_gitops_config = { - enable = true - serviceAccountName = local.service_account - controllerClusterName = var.eks_cluster_id - controllerClusterEndpoint = local.eks_cluster_endpoint - awsDefaultInstanceProfile = var.node_iam_instance_profile - } -``` diff --git a/docs/add-ons/keda.md b/docs/add-ons/keda.md deleted file mode 100644 index 75b71074c6..0000000000 --- a/docs/add-ons/keda.md +++ /dev/null @@ -1,42 +0,0 @@ -# KEDA - -KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. - -KEDA is a single-purpose and lightweight component that can be added into any Kubernetes cluster. KEDA works alongside standard Kubernetes components like the Horizontal Pod Autoscaler and can extend functionality without overwriting or duplication. With KEDA you can explicitly map the apps you want to use event-driven scale, with other apps continuing to function. This makes KEDA a flexible and safe option to run alongside any number of any other Kubernetes applications or frameworks.. - -[KEDA](https://github.com/kedacore/charts/tree/main/keda) chart bootstraps KEDA infrastructure on a Kubernetes cluster using the Helm package manager. - -For complete project documentation, please visit the [KEDA documentation site](https://keda.sh/). - -## Usage - -KEDA can be deployed by enabling the add-on via the following. - -```hcl -enable_keda = true -``` - -Deploy KEDA with custom `values.yaml` - -```hcl - # Optional Map value; pass keda-values.yaml from consumer module - keda_helm_config = { - name = "keda" # (Required) Release name. - repository = "https://kedacore.github.io/charts" # (Optional) Repository URL where to locate the requested chart. - chart = "keda" # (Required) Chart name to be installed. - version = "2.6.2" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/keda/locals.tf - namespace = "keda" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/keda-values.yaml", {})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -``` -keda = { - enable = true - serviceAccountName = "" -} -``` diff --git a/docs/add-ons/kube-prometheus-stack.md b/docs/add-ons/kube-prometheus-stack.md deleted file mode 100644 index b6e935ed44..0000000000 --- a/docs/add-ons/kube-prometheus-stack.md +++ /dev/null @@ -1,53 +0,0 @@ -# kube-prometheus-stack -[kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)is a a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator. - -Components installed by this chart in this package by default: - - - [The Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) - - Highly available [Prometheus](https://github.com/prometheus/prometheus) - - Highly available [Alertmanager](https://github.com/prometheus/alertmanager) - - [Prometheus node-exporter](https://github.com/prometheus/node_exporter) - - [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) - - [Grafana](https://github.com/grafana/grafana) - - -## Usage - -The default values.yaml file in this add-on has disabled the components that are unreachable in EKS environments, and an EBS Volume for Persistent Storage. - -You can override the defaults using the `set` helm_config key, and set the admin password with `set_sensitive`: - -```hcl - enable_kube_prometheus_stack = true - kube_prometheus_stack_helm_config = { - set = [ - { - name = "kubeProxy.enabled" - value = false - } - ], - set_sensitive = [ - { - name = "grafana.adminPassword" - value = data.aws_secretsmanager_secret_version.admin_password_version.secret_string - } - ] - } -``` - -## Upgrading the Chart - -Be aware that it is likely necessary to update the CRDs when updating the Chart version. Refer to the Project documentation on upgrades for your specific versions: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#upgrading-chart - - -For complete project documentation, please visit the [kube-prometheus-stack Github repository](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```hcl -kubePrometheusStack = { - enable = true -} -``` diff --git a/docs/add-ons/kube-state-metrics.md b/docs/add-ons/kube-state-metrics.md deleted file mode 100644 index 6d64b7422e..0000000000 --- a/docs/add-ons/kube-state-metrics.md +++ /dev/null @@ -1,42 +0,0 @@ -# Kube-State-Metrics - -[kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. - -The metrics are exported on the HTTP endpoint /metrics on the listening port (default 8080). They are served as plaintext. They are designed to be consumed either by Prometheus itself or by a scraper that is compatible with scraping a Prometheus client endpoint. - -This add-on is implemented as an external add-on. For detailed documentation and usage of the add-on please refer to the add-on [repository](https://github.com/askulkarni2/terraform-eksblueprints-kube-state-metrics-addon). - -## Usage - -The following will deploy the KSM into an EKS Cluster. - -```hcl-terraform -enable_kube_state_metrics = true -``` - -Enable KSM with custom `values.yaml` - -```hcl-terraform - enable_kube_state_metrics = true - - # Optional Map value - kube_state_metrics_helm_config = { - name = "kube-state-metrics" # (Required) Release name. - repository = "https://prometheus-community.github.io/helm-charts" # (Optional) Repository URL where to locate the requested chart. - chart = "kube-state-metrics" # (Required) Chart name to be installed. - version = "4.5.0" - namespace = "kube-state-metrics" - values = [templatefile("${path.module}/values.yaml", {}})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -```hcl-terraform -argocd_gitops_config = { - enable = true - serviceAccountName = local.service_account -} -``` diff --git a/docs/add-ons/kubecost.md b/docs/add-ons/kubecost.md deleted file mode 100644 index 0f97210c5a..0000000000 --- a/docs/add-ons/kubecost.md +++ /dev/null @@ -1,41 +0,0 @@ -# Kubecost - -Kubecost provides real-time cost visibility and insights for teams using Kubernetes, helping you continuously reduce your cloud costs. -Amazon EKS supports Kubecost, which you can use to monitor your costs broken down by Kubernetes resources including pods, nodes, namespaces, and labels. -[Cost monitoring](https://docs.aws.amazon.com/eks/latest/userguide/cost-monitoring.html) docs provides steps to bootstrap Kubecost infrastructure on a EKS cluster using the Helm package manager. - -For complete project documentation, please visit the [Kubecost documentation site](https://www.kubecost.com/). - -Note: If your cluster is version 1.23 or later, you must have the [Amazon EBS CSI driver](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html) installed on your cluster. - -## Usage - -Kubecost can be deployed by enabling the add-on via the following. - -```hcl -enable_kubecost = true -``` - -Deploy Kubecost with custom `values.yaml` - -```hcl - # Optional Map value; pass kubecost-values.yaml from consumer module - kubecost_helm_config = { - name = "kubecost" # (Required) Release name. - repository = "oci://public.ecr.aws/kubecost" # (Optional) Repository URL where to locate the requested chart. - chart = "cost-analyzer" # (Required) Chart name to be installed. - version = "1.103.3" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/kubecost/locals.tf - namespace = "kubecost" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/kubecost-values.yaml", {})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```sh -kubecost = { - enable = true -} -``` diff --git a/docs/add-ons/kuberay-operator.md b/docs/add-ons/kuberay-operator.md deleted file mode 100644 index 47f5478789..0000000000 --- a/docs/add-ons/kuberay-operator.md +++ /dev/null @@ -1,24 +0,0 @@ -# KubeRay Operator - -[KubeRay](https://github.com/ray-project/kuberay) is an open source toolkit to run [Ray](https://www.ray.io/) applications on Kubernetes. For details on its design, please refer to the KubeRay [documentation](https://ray-project.github.io/kuberay/). - -> 🛑 This add-on should be considered as experimental and should only be used for proof of concept. - - -## Usage - -KubeRay operator can be deployed by enabling the add-on via the following. - -### Basic Example - -```hcl -enable_kuberay_operator = true -``` - -### Advanced Example - -Advanced example of KubeRay operator add-on is not currently supported as the upstream project does not publish a [Helm chart yet]. Please 👍 this [issue](https://github.com/ray-project/kuberay/issues/475). - -### GitOps Configuration - -GitOps is not currently supported due to lack of a published Helm chart upstream. Please 👍 this [issue](https://github.com/ray-project/kuberay/issues/475). diff --git a/docs/add-ons/kubernetes-dashboard.md b/docs/add-ons/kubernetes-dashboard.md deleted file mode 100644 index e78cc0f4da..0000000000 --- a/docs/add-ons/kubernetes-dashboard.md +++ /dev/null @@ -1,42 +0,0 @@ -# Kubernetes Dashboard - -[Kubernetes Dashboard](https://github.com/kubernetes/dashboard) is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage applications running in the cluster and troubleshoot them, as well as manage the cluster itself. - -## Usage - -The following will deploy the Kubernetes Dashboard into an EKS Cluster. - -```hcl-terraform -enable_kubernetes_dashboard = true -``` - -Enable Kubernetes Dashboard with custom `values.yaml` - -```hcl-terraform - enable_kubernetes_dashboard = true - - # Optional Map value - kubernetes_dashboard_helm_config = { - name = "kubernetes-dashboard" # (Required) Release name. - repository = "https://kubernetes.github.io/dashboard/" # (Optional) Repository URL where to locate the requested chart. - chart = "kubernetes-dashboard" # (Required) Chart name to be installed. - version = "5.2.0" - namespace = "kube-system" - values = [templatefile("${path.module}/values.yaml", {})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -```hcl-terraform -argocd_gitops_config = { - enable = true - serviceAccountName = local.service_account -} -``` - -### Connecting to the Dashboard - -Follow the steps outlined [here](https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html#view-dashboard) to connect to the dashboard diff --git a/docs/add-ons/kyverno.md b/docs/add-ons/kyverno.md deleted file mode 100644 index 0b4aa52f65..0000000000 --- a/docs/add-ons/kyverno.md +++ /dev/null @@ -1,36 +0,0 @@ -# Kyverno - -Kyverno is a policy engine that can help kubernetes clusters to enforce security and governance policies. - -This addon provides support for: -1. [Kyverno](https://github.com/kyverno/kyverno/tree/main/charts/kyverno) -2. [Kyverno policies](https://github.com/kyverno/kyverno/tree/main/charts/kyverno-policies) -3. [Kyverno policy reporter](https://github.com/kyverno/policy-reporter/tree/main/charts/policy-reporter) - -## Usage - -Kyverno can be deployed by enabling the respective add-on(s) via the following. - -```hcl -enable_kyverno = true -enable_kyverno_policies = true -enable_kyverno_policy_reporter = true -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```sh -kyverno = { - enable = true -} - -kyverno_policies = { - enable = true -} - -kyverno_policy_reporter = { - enable = true -} -``` diff --git a/docs/add-ons/local-volume-provisioner.md b/docs/add-ons/local-volume-provisioner.md deleted file mode 100644 index 3b3dce9fd8..0000000000 --- a/docs/add-ons/local-volume-provisioner.md +++ /dev/null @@ -1,23 +0,0 @@ -# Local volume provisioner - -[Local volume provisioner](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner) manages PersistentVolume lifecycle for pre-allocated disks by detecting and creating PVs for each local disk on the host, and cleaning up the disks when released - - -## Usage - -Local volume provisioner can be deployed by enabling the add-on via the following. - -```hcl -enable_local_volume_provisioner = true -``` - -Deploy Local volume provisioner with custom `values.yaml` - -```hcl - # Optional Map value; pass local-volume-provisioner-values.yaml from consumer module - local_volume_provisioner_helm_config = { - name = "local-static-provisioner" # (Required) Release name. - namespace = "local-static-provisioner" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/local-volume-provisioner-values.yaml", {})] - } -``` diff --git a/docs/add-ons/managed-add-ons.md b/docs/add-ons/managed-add-ons.md deleted file mode 100644 index 9450b3d842..0000000000 --- a/docs/add-ons/managed-add-ons.md +++ /dev/null @@ -1,96 +0,0 @@ -# Amazon EKS Add-ons - -[Amazon EKS add-ons](https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html) provide installation and management of a curated set of add-ons for Amazon EKS clusters. All Amazon EKS add-ons include the latest security patches, bug fixes, and are validated by AWS to work with Amazon EKS. Amazon EKS add-ons allow you to consistently ensure that your Amazon EKS clusters are secure and stable and reduce the amount of work that you need to do in order to install, configure, and update add-ons. - -EKS currently provides support for the following managed add-ons. - -| Name | Description | -|------|-------------| -| [Amazon VPC CNI](https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html) | Native VPC networking for Kubernetes pods. | -| [CoreDNS](https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html) | A flexible, extensible DNS server that can serve as the Kubernetes cluster DNS. | -| [kube-proxy](https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html) | Enables network communication to your pods. | -| [Amazon EBS CSI](https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html) | Manage the Amazon EBS CSI driver as an Amazon EKS add-on. | - -EKS managed add-ons can be enabled via the following. - -Note: EKS managed Add-ons can be converted to self-managed add-on with `preserve` field. -`preserve=true` option removes Amazon EKS management of any settings and the ability for Amazon EKS to notify you of updates and automatically update the Amazon EKS add-on after you initiate an update, but preserves the add-on's software on your cluster. -This option makes the add-on a self-managed add-on, rather than an Amazon EKS add-on. -There is no downtime while deleting EKS managed Add-ons when `preserve=true`. This is a default option for `enable_amazon_eks_vpc_cni` , `enable_amazon_eks_coredns` and `enable_amazon_eks_kube_proxy`. - -Checkout this [doc](https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#updating-vpc-cni-eks-add-on) for more details. Custom add-on configuration can be passed using [configuration_values](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_addon) as a single JSON string while creating or updating the add-on. - -``` -# EKS Addons - enable_amazon_eks_vpc_cni = true # default is false - #Optional - amazon_eks_vpc_cni_config = { - addon_name = "vpc-cni" - addon_version = "v1.11.2-eksbuild.1" - service_account = "aws-node" - resolve_conflicts = "OVERWRITE" - namespace = "kube-system" - service_account_role_arn = "" - preserve = true - additional_iam_policies = [] - configuration_values = jsonencode({ - env = { - ENABLE_PREFIX_DELEGATION = "true" - WARM_PREFIX_TARGET = "1" - } - }) - tags = {} - } - - enable_amazon_eks_coredns = true # default is false - #Optional - amazon_eks_coredns_config = { - addon_name = "coredns" - addon_version = "v1.8.4-eksbuild.1" - service_account = "coredns" - resolve_conflicts = "OVERWRITE" - namespace = "kube-system" - service_account_role_arn = "" - preserve = true - additional_iam_policies = [] - configuration_values = "" - tags = {} - } - - enable_amazon_eks_kube_proxy = true # default is false - #Optional - amazon_eks_kube_proxy_config = { - addon_name = "kube-proxy" - addon_version = "v1.21.2-eksbuild.2" - service_account = "kube-proxy" - resolve_conflicts = "OVERWRITE" - namespace = "kube-system" - service_account_role_arn = "" - preserve = true - additional_iam_policies = [] - configuration_values = "" - tags = {} - } - - enable_amazon_eks_aws_ebs_csi_driver = true # default is false - #Optional - amazon_eks_aws_ebs_csi_driver_config = { - addon_name = "aws-ebs-csi-driver" - addon_version = "v1.4.0-eksbuild.preview" - service_account = "ebs-csi-controller-sa" - resolve_conflicts = "OVERWRITE" - namespace = "kube-system" - additional_iam_policies = [] - service_account_role_arn = "" - configuration_values = "" - tags = {} - } -``` - -## Updating Managed Add-ons - -EKS won't modify any of your Kubernetes add-ons when you update a cluster to a newer Kubernetes version. As a result, it is important to upgrade EKS add-ons each time you upgrade an EKS cluster. - -Our [Cluster Upgrade](../advanced/cluster-upgrades.md) guide demonstrates how you can leverage this framework to upgrade your EKS cluster in addition to the EKS managed add-ons running in each cluster. - -Additional information on updating a EKS cluster can be found in the [EKS documentation](https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html). diff --git a/docs/add-ons/metrics-server.md b/docs/add-ons/metrics-server.md deleted file mode 100644 index dfd5f7c981..0000000000 --- a/docs/add-ons/metrics-server.md +++ /dev/null @@ -1,32 +0,0 @@ -# Metrics Server - -Metrics Server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. It is not deployed by default in Amazon EKS clusters. The Metrics Server is commonly used by other Kubernetes add-ons, such as the Horizontal Pod Autoscaler, Vertical Autoscaling or the Kubernetes Dashboard. - -> **Important**: Don't use Metrics Server when you need an accurate source of resource usage metrics or as a monitoring solution. - -## Usage - -[Metrics Server](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/metrics-server) can be deployed by enabling the add-on via the following. - -```hcl -enable_metrics_server = true -``` - -Once deployed, you can see metrics-server pod in the `kube-system` namespace. - -```sh -$ kubectl get deployments -n kube-system - -NAME READY UP-TO-DATE AVAILABLE AGE -metrics-server 1/1 1 1 20m -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -``` -metricsServer = { - enable = true -} -``` diff --git a/docs/add-ons/nginx.md b/docs/add-ons/nginx.md deleted file mode 100644 index d5ab3354b5..0000000000 --- a/docs/add-ons/nginx.md +++ /dev/null @@ -1,51 +0,0 @@ -# Nginx - -This add-on installs [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/deploy/) on Amazon EKS. The Nginx ingress controller uses [Nginx](https://www.nginx.org/) as a reverse proxy and load balancer. - -Other than handling Kubernetes ingress objects, this ingress controller can facilitate multi-tenancy and segregation of workload ingresses based on host name (host-based routing) and/or URL Path (path based routing). - -## Usage - -Nginx Ingress Controller can be deployed by enabling the add-on via the following. - -```hcl -enable_ingress_nginx = true -``` - -To validate that installation is successful run the following command: - -```sh -$ kubectl get po -n kube-system -NAME READY STATUS RESTARTS AGE -eks-blueprints-addon-ingress-nginx-78b8567p4q6 1/1 Running 0 4d10h -``` - -Note that the ingress controller is deployed in the `ingress-nginx` namespace. - -You can optionally customize the Helm chart that deploys `nginx` via the following configuration. - -```hcl - enable_ingress_nginx = true - - # Optional ingress_nginx_helm_config - ingress_nginx_helm_config = { - repository = "https://kubernetes.github.io/ingress-nginx" - version = "4.0.17" - values = [file("${path.module}/values.yaml")] - } - - nginx_irsa_policies = [] # Optional to add additional policies to IRSA -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -GitOps with ArgoCD Add-on repo is located [here](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/values.yaml) - -``` hcl -argocd_gitops_config = { - enable = true - serviceAccountName = local.service_account - } -``` diff --git a/docs/add-ons/nvidia-device-plugin.md b/docs/add-ons/nvidia-device-plugin.md deleted file mode 100644 index 6e30fad557..0000000000 --- a/docs/add-ons/nvidia-device-plugin.md +++ /dev/null @@ -1,48 +0,0 @@ -# NVIDIA Device Plugin - -The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically: - -* Expose the number of GPUs on each nodes of your cluster -* Keep track of the health of your GPUs -* Run GPU enabled containers in your Kubernetes cluster. - - -For complete project documentation, please visit the [NVIDIA Device Plugin](https://github.com/NVIDIA/k8s-device-plugin#readme). - -Additionally, refer to this AWS [blog](https://aws.amazon.com/blogs/compute/running-gpu-accelerated-kubernetes-workloads-on-p3-and-p2-ec2-instances-with-amazon-eks/) for more information on how the add-on can be tested. - -## Usage - -NVIDIA device plugin can be deployed by enabling the add-on via the following. - -```hcl -enable_nvidia_device_plugin = true -``` - -You can optionally customize the Helm chart via the following configuration. - -```hcl - enable_nvidia_device_plugin = true - # Optional nvidia_device_plugin_helm_config - nvidia_device_plugin_helm_config = { - name = "nvidia-device-plugin" - chart = "nvidia-device-plugin" - repository = "https://nvidia.github.io/k8s-device-plugin" - version = "0.12.3" - namespace = "nvidia-device-plugin" - values = [templatefile("${path.module}/values.yaml", { - ... - })] - } -``` - -### GitOps Configuration -The following properties are made available for use when managing the add-on via GitOps. - -Refer to [locals.tf](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/nvidia-device-plugin/locals.tf) for latest config. GitOps with ArgoCD Add-on repo is located [here](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/values.yaml) - -```hcl - argocd_gitops_config = { - enable = true - } -``` diff --git a/docs/add-ons/portworx.md b/docs/add-ons/portworx.md deleted file mode 100644 index e51df974a8..0000000000 --- a/docs/add-ons/portworx.md +++ /dev/null @@ -1,135 +0,0 @@ -# Portworx add-on for EKS Blueprints - -## Introduction - -[Portworx](https://portworx.com/) is a Kubernetes data services platform that provides persistent storage, data protection, disaster recovery, and other capabilities for containerized applications. This blueprint installs Portworx on Amazon Elastic Kubernetes Service (EKS) environment. - -- [Helm chart](https://github.com/portworx/helm) - -## Requirements - -For the add-on to work, Portworx needs additional permission to AWS resources which can be provided in the following way. - -Note: Portworx currently does not support obtaining these permissions with an IRSA. Its support will be added with future releases. - -### Creating the required IAM policy resource - -1. Add the below code block in your terraform script to create a policy with the required permissions. Make a note of the resource name for the policy you created: - -``` -resource "aws_iam_policy" "" { - name = "" - - policy = jsonencode({ - Version = "2012-10-17" - Statement = [ - { - Action = [ - "ec2:AttachVolume", - "ec2:ModifyVolume", - "ec2:DetachVolume", - "ec2:CreateTags", - "ec2:CreateVolume", - "ec2:DeleteTags", - "ec2:DeleteVolume", - "ec2:DescribeTags", - "ec2:DescribeVolumeAttribute", - "ec2:DescribeVolumesModifications", - "ec2:DescribeVolumeStatus", - "ec2:DescribeVolumes", - "ec2:DescribeInstances", - "autoscaling:DescribeAutoScalingGroups" - ] - Effect = "Allow" - Resource = "*" - }, - ] - }) -} -``` - -2. Run `terraform apply` command for the policy (replace it with your resource name): - -```bash -terraform apply -target="aws_iam_policy." -``` -3. Attach the newly created AWS policy ARN to the node groups in your cluster: - -``` - managed_node_groups = { - node_group_1 = { - node_group_name = "my_node_group_1" - instance_types = ["t2.small"] - min_size = 3 - max_size = 3 - subnet_ids = module.vpc.private_subnets - - #Add this line to the code block or add the new policy ARN to the list if it already exists - additional_iam_policies = [aws_iam_policy..arn] - - } - } -``` -4. Run the command below to apply the changes. (This step can be performed even if the cluster is up and running. The policy attachment happens without having to restart the nodes) -```bash -terraform apply -target="module.eks_blueprints" -``` - - -## Usage - -After completing the requirement step, installing Portworx is simple, set ```enable_portworx``` variable to true inside the Kubernetes add-on module. - -``` - enable_portworx = true -``` - -To customize Portworx installation, pass the configuration values as shown below: - -``` - enable_portworx = true - - portworx_helm_config = { - set = [ - { - name = "clusterName" - value = "testCluster" - }, - { - name = "imageVersion" - value = "2.11.1" - } - ] - } - -} -``` - -## Portworx Configuration - -The following tables lists the configurable parameters of the Portworx chart and their default values. - -| Parameter | Description | Default | -|-----------|-------------| --------| -| `imageVersion` | The image tag to pull | "2.11.0" | -| `useAWSMarketplace` | Set this variable to true if you wish to use AWS marketplace license for Portworx | "false" | -| `clusterName` | Portworx Cluster Name| portworx-\ | -| `drives` | Semi-colon separated list of drives to be used for storage. (example: "/dev/sda;/dev/sdb" or "type=gp2,size=200;type=gp3,size=500") | "type=gp2,size=200"| -| `useInternalKVDB` | boolean variable to set internal KVDB on/off | true | -| `kvdbDevice` | specify a separate device to store KVDB data, only used when internalKVDB is set to true | type=gp2,size=150 | -| `envVars` | semi-colon-separated list of environment variables that will be exported to portworx. (example: MYENV1=val1;MYENV2=val2) | "" | -| `maxStorageNodesPerZone` | The maximum number of storage nodes desired per zone| 3 | -| `useOpenshiftInstall` | boolean variable to install Portworx on Openshift .| false | -| `etcdEndPoint` | The ETCD endpoint. Should be in the format etcd:http://(your-etcd-endpoint):2379. If there are multiple etcd endpoints they need to be ";" separated. | "" | -| `dataInterface` | Name of the interface .| none | -| `managementInterface` | Name of the interface .| none | -| `useStork` | [Storage Orchestration for Hyperconvergence](https://github.com/libopenstorage/stork).| true | -| `storkVersion` | Optional: version of Stork. For eg: 2.11.0, when it's empty Portworx operator will pick up version according to Portworx version. | "2.11.0" | -| `customRegistryURL` | URL where to pull Portworx image from | "" | -| `registrySecret` | Image registry credentials to pull Portworx Images from a secure registry | "" | -| `licenseSecret` | Kubernetes secret name that has Portworx licensing information | "" | -| `monitoring` | Enable Monitoring on Portworx cluster | false | -| `enableCSI` | Enable CSI | false | -| `enableAutopilot` | Enable Autopilot | false | -| `KVDBauthSecretName` | Refer [Securing with certificates in Kubernetes](https://docs.portworx.com/operations/etcd/#securing-with-certificates-in-kubernetes) to create a kvdb secret and specify the name of the secret here| none | -| `deleteType` | Specify which strategy to use while Uninstalling Portworx. "Uninstall" values only removes Portworx but with "UninstallAndWipe" value all data from your disks including the Portworx metadata is also wiped permanently | UninstallAndWipe | diff --git a/docs/add-ons/prometheus.md b/docs/add-ons/prometheus.md deleted file mode 100644 index 1673781752..0000000000 --- a/docs/add-ons/prometheus.md +++ /dev/null @@ -1,54 +0,0 @@ -# Prometheus - -Prometheus is an open source monitoring and alerting service. Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes. - -This project provides support for installing a open source Prometheus server in your EKS cluster and for deploying a new Prometheus instance via [Amazon Managed Service for Prometheus](https://aws.amazon.com/prometheus/). - -## Usage - -The following will deploy the Prometheus server into an EKS Cluster and provision a new Amazon Managed Service for Prometheus instance. - -```hcl-terraform -# Creates the AMP workspace and all the relevant IAM Roles -enable_amazon_prometheus = true - -# Deploys Prometheus server with remote write to AWS AMP Workspace -enable_prometheus = true -``` - -Enable Prometheus with custom `values.yaml` - -```hcl-terraform - #--------------------------------------- - # Prometheus Server integration with Amazon Prometheus - #--------------------------------------- - # Amazon Prometheus Configuration to integrate with Prometheus Server Add-on - enable_amazon_prometheus = true - amazon_prometheus_workspace_endpoint = "" - - enable_prometheus = true - # Optional Map value - prometheus_helm_config = { - name = "prometheus" # (Required) Release name. - repository = "https://prometheus-community.github.io/helm-charts" # (Optional) Repository URL where to locate the requested chart. - chart = "prometheus" # (Required) Chart name to be installed. - version = "15.3.0" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/prometheus/locals.tf - namespace = "prometheus" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/prometheus-values.yaml", { - operating_system = "linux" - })] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -```hcl-terraform -prometheus = { - enable = true - ampWorkspaceUrl = "" - roleArn = "" - serviceAccountName = "" -} -``` diff --git a/docs/add-ons/promtail.md b/docs/add-ons/promtail.md deleted file mode 100644 index ff54972bca..0000000000 --- a/docs/add-ons/promtail.md +++ /dev/null @@ -1,39 +0,0 @@ -# Promtail - -Promtail is an agent which ships the contents of local logs to a Loki instance. - -[Promtail](https://github.com/grafana/helm-charts/tree/main/charts/promtail) chart bootstraps Promtail infrastructure on a Kubernetes cluster using the Helm package manager. - -For complete project documentation, please visit the [Promtail documentation site](https://grafana.com/docs/loki/latest/clients/promtail/). - -## Usage - -Promtail can be deployed by enabling the add-on via the following. - -```hcl -enable_promtail = true -``` - -Deploy Promtail with custom `values.yaml` - -```hcl - # Optional Map value; pass promtail-values.yaml from consumer module - promtail_helm_config = { - name = "promtail" # (Required) Release name. - repository = "https://grafana.github.io/helm-charts" # (Optional) Repository URL where to locate the requested chart. - chart = "promtail" # (Required) Chart name to be installed. - version = "6.3.0" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/promtail/locals.tf - namespace = "promtail" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/promtail-values.yaml", {})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```hcl -promtail = { - enable = true -} -``` diff --git a/docs/add-ons/secrets-store-csi-driver.md b/docs/add-ons/secrets-store-csi-driver.md deleted file mode 100644 index a8110f736f..0000000000 --- a/docs/add-ons/secrets-store-csi-driver.md +++ /dev/null @@ -1,37 +0,0 @@ -# secrets-store-csi-driver - -Secrets Store CSI Driver for Kubernetes secrets - Integrates secrets stores with Kubernetes via a [Container Storage Interface (CSI)](https://kubernetes-csi.github.io/docs/) volume. - -The Secrets Store CSI Driver `secrets-store.csi.k8s.io` allows Kubernetes to mount multiple secrets, keys, and certs stored in enterprise-grade external secrets stores into their pods as a volume. Once the Volume is attached, the data in it is mounted into the container’s file system. - -For more details, refer [Secrets Store CSI Driver](https://secrets-store-csi-driver.sigs.k8s.io/) - -## Usage - -secrets-store-csi-driver can be deployed by enabling the add-ons via the following. - -```hcl -enable_secrets_store_csi_driver = true -``` - -You can optionally customize the Helm chart that deploys `secrets_store_csi_driver` via the following configuration. - -```hcl -secrets_store_csi_driver_helm_config = { - name = "secrets-store-csi-driver" - chart = "secrets-store-csi-driver" - repository = "https://kubernetes-sigs.github.io/secrets-store-csi-driver/charts" - version = "1.2.4" - namespace = "secrets-store-csi-driver" - set_values = [ - { - name = "syncSecret.enabled" - value = "false" - }, - { - name = "enableSecretRotation" - value = "false" - } - ] -} -``` diff --git a/docs/add-ons/smb-csi-driver.md b/docs/add-ons/smb-csi-driver.md deleted file mode 100644 index 87da9b8eb5..0000000000 --- a/docs/add-ons/smb-csi-driver.md +++ /dev/null @@ -1,38 +0,0 @@ -# SMB CSI Driver Helm Chart -SMB CSI Driver allows Kubernetes to access SMB server on both Linux and Windows nodes. -The driver requires existing and already configured SMB server, it supports dynamic provisioning of Persistent Volumes via Persistent Volume Claims by creating a new subdirectory under SMB server. - -[SMB CSI Driver](https://github.com/kubernetes-csi/csi-driver-smb/tree/master/charts) docs chart bootstraps SMB CSI Driver infrastructure on a Kubernetes cluster using the Helm package manager. - -For complete project documentation, please visit the [SMB CSI Driver documentation site](https://github.com/kubernetes-csi/csi-driver-smb). - -## Usage - -SMB CSI Driver can be deployed by enabling the add-on via the following. - -```hcl -enable_smb_csi_driver = true -``` - -Deploy SMB CSI Driver with custom `values.yaml` - -```hcl - # Optional Map value; pass smb-csi-driver-values.yaml from consumer module - smb_csi_driver_helm_config = { - name = "csi-driver-smb" # (Required) Release name. - repository = "https://raw.githubusercontent.com/kubernetes-csi/csi-driver-smb/master/charts" # (Optional) Repository URL where to locate the requested chart. - chart = "csi-driver-smb" # (Required) Chart name to be installed. - version = "v1.9.0" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/smb-csi-driver/locals.tf - values = [templatefile("${path.module}/smb-csi-driver-values.yaml", {})] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps. - -```sh -smbCsiDriver = { - enable = true -} -``` diff --git a/docs/add-ons/spark-history-server.md b/docs/add-ons/spark-history-server.md deleted file mode 100644 index b1b0d1045e..0000000000 --- a/docs/add-ons/spark-history-server.md +++ /dev/null @@ -1,73 +0,0 @@ -# Spark History Server - -[Spark Web UI](https://spark.apache.org/docs/latest/web-ui.html#web-ui) can be enabled by this Add-on. -This Add-on deploys Spark History Server and fetches the Spark Event logs stored in S3. Spark Web UI can be exposed via Ingress and LoadBalancer with `values.yaml`. -Alternatively, you can port-forward on spark-history-server service. e.g., `kubectl port-forward services/spark-history-server 18085:80 -n spark-history-server` - -## Usage - -[Spark History Server](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/spark-k8s-operator) can be deployed by enabling the add-on via the following. - -### Basic Example - -``` -enable_spark_history_server = true -spark_history_server_s3a_path = "s3a:////" -``` - -### Advanced Example - -``` -enable_spark_history_server = true - -# IAM policy used by IRSA role. It's recommended to create a dedicated IAM policy to access your s3 bucket -spark_history_server_irsa_policies = [""] - -# NOTE: This block requires passing the helm values.yaml -# spark_history_server_s3a_path won't be used when you pass custom `values.yaml`. s3a path is passed via `sparkHistoryOpts` in `values.yaml` - -spark_history_server_helm_config = { - name = "spark-history-server" - chart = "spark-history-server" - repository = "https://hyper-mesh.github.io/spark-history-server" - version = "1.0.0" - namespace = "spark-history-server" - timeout = "300" - values = [ - <<-EOT - serviceAccount: - create: false - - # Enter S3 bucket with Spark Event logs location. - # Ensure IRSA roles has permissions to read the files for the given S3 bucket - sparkHistoryOpts: "-Dspark.history.fs.logDirectory=s3a:////" - - # Update spark conf according to your needs - sparkConf: |- - spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider - spark.history.fs.eventLog.rolling.maxFilesToRetain=5 - spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem - spark.eventLog.enabled=true - spark.history.ui.port=18080 - - resources: - limits: - cpu: 200m - memory: 2G - requests: - cpu: 100m - memory: 1G - EOT - ] -} -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -``` -sparkHistoryServer = { - enable = true -} -``` diff --git a/docs/add-ons/spark-on-k8s-operator.md b/docs/add-ons/spark-on-k8s-operator.md deleted file mode 100644 index b2288e6e81..0000000000 --- a/docs/add-ons/spark-on-k8s-operator.md +++ /dev/null @@ -1,42 +0,0 @@ -# Spark K8S Operator -The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design doc. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend. - -For complete project documentation, please visit the [Spark K8S Operator documentation site](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). - -## Usage - -[Spark K8S Operator](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/spark-k8s-operator) can be deployed by enabling the add-on via the following. - -### Basic Example - -```hcl - enable_spark_k8s_operator = true -``` - -### Advanced Example -```hcl - enable_spark_k8s_operator = true - # Optional Map value - # NOTE: This block requires passing the helm values.yaml - spark_k8s_operator_helm_config = { - name = "spark-operator" - chart = "spark-operator" - repository = "https://googlecloudplatform.github.io/spark-on-k8s-operator" - version = "1.1.19" - namespace = "spark-k8s-operator" - timeout = "1200" - create_namespace = true - values = [templatefile("${path.module}/values.yaml", {})] - - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -``` -sparkK8sOperator = { - enable = true -} -``` diff --git a/docs/add-ons/tetrate-istio.md b/docs/add-ons/tetrate-istio.md deleted file mode 100644 index b5b2cff138..0000000000 --- a/docs/add-ons/tetrate-istio.md +++ /dev/null @@ -1,90 +0,0 @@ -# Tetrate Istio Distro - -[Tetrate Istio Distro](https://istio.tetratelabs.io/) is simple, safe enterprise-grade Istio distro. - -This add-on is implemented as an external add-on. For detailed documentation and usage of the add-on please refer to the add-on [repository](https://github.com/tetratelabs/terraform-eksblueprints-tetrate-istio-addon). - -## Example - -Checkout the full [example](https://github.com/tetratelabs/terraform-eksblueprints-tetrate-istio-addon/tree/main/blueprints/getting-started). - -## Usage - -This step deploys the [Tetrate Istio Distro](https://istio.tetratelabs.io/) with default Helm Chart config - -```hcl - enable_tetrate_istio = true -``` - -Alternatively, you can override the helm values by using the code snippet below - -```hcl - enable_tetrate_istio = true - - # Optional fine-grained configuration - - tetrate_istio_distribution = "TID" # (default, Tetrate Istio Distro) - tetrate_istio_version = "1.12.2" - tetrate_istio_install_base = "true" # (default, Istio `base` Helm Chart) - tetrate_istio_install_cni = "true" # (default, Istio `cni` Helm Chart) - tetrate_istio_install_istiod = "true" # (default, Istio `istiod` Helm Chart) - tetrate_istio_install_gateway = "true" # (default, Istio `gateway` Helm Chart) - - # Istio `base` Helm Chart config - tetrate_istio_base_helm_config = { - name = "istio-base" # (default) Release name. - repository = "https://istio-release.storage.googleapis.com/charts" # (default) Repository URL where to locate the requested chart. - chart = "base" # (default) Chart name to be installed. - version = "1.12.2" # (default) The exact chart version to install. - values = [] - } - - # Istio `cni` Helm Chart config - tetrate_istio_cni_helm_config = { - name = "istio-cni" # (default) Release name. - repository = "https://istio-release.storage.googleapis.com/charts" # (default) Repository URL where to locate the requested chart. - chart = "cni" # (default) Chart name to be installed. - version = "1.12.2" # (default) The exact chart version to install. - values = [yamlencode({ - "global" : { - "hub" : "containers.istio.tetratelabs.com", - "tag" : "1.12.2-tetratefips-v0", - } - })] - } - - # Istio `istiod` Helm Chart config - tetrate_istio_istiod_helm_config = { - name = "istio-istiod" # (default) Release name. - repository = "https://istio-release.storage.googleapis.com/charts" # (default) Repository URL where to locate the requested chart. - chart = "istiod" # (default) Chart name to be installed. - version = "1.12.2" # (default) The exact chart version to install. - values = [yamlencode({ - "global" : { - "hub" : "containers.istio.tetratelabs.com", - "tag" : "1.12.2-tetratefips-v0", - } - })] - } - - # Istio `gateway` Helm Chart config - tetrate_istio_gateway_helm_config = { - name = "istio-ingress" # (default) Release name. - repository = "https://istio-release.storage.googleapis.com/charts" # (default) Repository URL where to locate the requested chart. - chart = "gateway" # (default) Chart name to be installed. - version = "1.12.2" # (default) The exact chart version to install. - values = [] - } -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -```hcl -tetrateIstio = { - enable = true -} -``` - -GitOps with ArgoCD Add-on repo is located [here](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/values.yaml) diff --git a/docs/add-ons/thanos.md b/docs/add-ons/thanos.md deleted file mode 100644 index 86b9443892..0000000000 --- a/docs/add-ons/thanos.md +++ /dev/null @@ -1,23 +0,0 @@ -# Thanos - -Thanos is a highly available metrics system that can be added on top of existing Prometheus deployments, providing a global query view across all Prometheus installations. - -For complete project documentation, please visit the [Thanos documentation site](https://thanos.io/tip/thanos/getting-started.md/). - -## Usage - -[Thanos](https://github.com/bitnami/charts/tree/main/bitnami/thanos) can be deployed by enabling the add-on via the following. - -```hcl -enable_thanos = true -``` - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -``` -thanos = { - enable = true -} -``` diff --git a/docs/add-ons/traefik.md b/docs/add-ons/traefik.md deleted file mode 100644 index a9855ca1b5..0000000000 --- a/docs/add-ons/traefik.md +++ /dev/null @@ -1,43 +0,0 @@ -# Traefik - -Traefik is an open-source Edge Router that makes publishing your services a fun and easy experience. It receives requests on behalf of your system and finds out which components are responsible for handling them. - -For complete project documentation, please visit the [Traefik documentation site](https://doc.traefik.io/traefik/). - -## Usage - -[Traefik](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/traefik) can be deployed by enabling the add-on via the following. - -```hcl -enable_traefik = true -``` - -## How to test Traefik Web UI - -Once the Traefik deployment is successful, run the following command from your a local machine which have access to an EKS cluster using kubectl. - -``` -$ kubectl port-forward svc/traefik -n kube-system 9000:9000 -``` - -Now open the browser from your machine and enter the below URL to access Traefik Web UI. - -``` -http://127.0.0.1:9000/dashboard/ -``` - -![alt text](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/a8ceac6c977a3ccbcb95ef7fb21fff0daf0b7081/images/traefik_web_ui.png "Traefik Dashboard") - -#### AWS Service annotations for Traefik Ingress Controller - -Here is the link to get the AWS ELB [service annotations](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/) for Traefik Ingress controller - -### GitOps Configuration - -The following properties are made available for use when managing the add-on via GitOps - -``` -traefik = { - enable = true -} -``` diff --git a/docs/add-ons/vault.md b/docs/add-ons/vault.md deleted file mode 100644 index dbb8504789..0000000000 --- a/docs/add-ons/vault.md +++ /dev/null @@ -1,34 +0,0 @@ -# HashiCorp Vault - -[HashiCorp Vault](https://www.vaultproject.io) brokers and deeply integrates with trusted identities to automate access to secrets, data, and systems. - -This add-on is implemented as an external add-on. For detailed documentation and usage of the add-on please refer to the add-on [repository](https://github.com/hashicorp/terraform-aws-hashicorp-vault-eks-addon). - -## Example - -Checkout the full [example](https://github.com/hashicorp/terraform-aws-hashicorp-vault-eks-addon/tree/main/blueprints/getting-started). - -## Usage - -This step deploys the [HashiCorp Vault](https://www.vaultproject.io) with default Helm Chart config - -```hcl - enable_vault = true -``` - -Alternatively, you can override the Helm Values by setting the `vault_helm_config` object, like shown in the code snippet below: - -```hcl - enable_vault = true - - vault_helm_config = { - name = "vault" # (Required) Release name. - chart = "vault" # (Required) Chart name to be installed. - repository = "https://helm.releases.hashicorp.com" # (Optional) Repository URL where to locate the requested chart. - version = "v0.19.0" # (Optional) Specify the exact chart version to install. - - # ... - } -``` - -This snippet does not contain _all_ available options that can be set as part of `vault_helm_config`. For the complete listing, see the [`hashicorp-vault-eks-blueprints-addon` repository](https://github.com/hashicorp/terraform-aws-hashicorp-vault-eks-addon/). diff --git a/docs/add-ons/velero.md b/docs/add-ons/velero.md deleted file mode 100644 index 54ed43b7a5..0000000000 --- a/docs/add-ons/velero.md +++ /dev/null @@ -1,162 +0,0 @@ -# Velero - -[Velero](https://velero.io/) is an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes. - -- [Helm chart](https://github.com/vmware-tanzu/helm-charts/tree/main/charts/velero) -- [Plugin for AWS](https://github.com/vmware-tanzu/velero-plugin-for-aws) - -## Usage - -[Velero](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/velero) can be deployed by enabling the add-on via the following. - -```hcl -enable_velero = true -velero_backup_s3_bucket = "" -``` - -You can also customize the Helm chart that deploys `velero` via the following configuration: - -```hcl -enable_velero = true -velero_helm_config = { - name = "velero" - description = "A Helm chart for velero" - chart = "velero" - version = "2.30.0" - repository = "https://vmware-tanzu.github.io/helm-charts/" - namespace = "velero" - values = [templatefile("${path.module}/values.yaml", { - bucket = "", - region = "" - })] -} -``` - -To see a working example, see the [`stateful`](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/examples/stateful) example blueprint. - -## Validate - - -1. Run `update-kubeconfig` command: - -```bash -aws eks --region update-kubeconfig --name -``` - -2. Test by listing velero resources provisioned: - -```bash -kubectl get all -n velero - -# Output should look similar to below -NAME READY STATUS RESTARTS AGE -pod/velero-b4d8fd5c7-5smp6 1/1 Running 0 112s - -NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -service/velero ClusterIP 172.20.217.203 8085/TCP 114s - -NAME READY UP-TO-DATE AVAILABLE AGE -deployment.apps/velero 1/1 1 1 114s - -NAME DESIRED CURRENT READY AGE -replicaset.apps/velero-b4d8fd5c7 1 1 1 114s -``` - -3. Get backup location using velero [CLI](https://velero.io/docs/v1.8/basic-install/#install-the-cli) - -```bash -velero backup-location get - -# Output should look similar to below -NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT -default aws velero-ssqwm44hvofzb32d Available 2022-05-22 10:53:26 -0400 EDT ReadWrite true -``` - -4. To demonstrate creating a backup and restoring, create a new namespace and run nginx using below commands: - -```bash -kubectl create namespace backupdemo -kubectl run nginx --image=nginx -n backupdemo -``` - -5. Create backup of this namespace using velero - -```bash -velero backup create backup1 --include-namespaces backupdemo - -# Output should look similar to below -Backup request "backup1" submitted successfully. -Run `velero backup describe backup1` or `velero backup logs backup1` for more details. -``` - -6. Describe the backup to check the backup status - -```bash -velero backup describe backup1 - -# Output should look similar to below -Name: backup1 -Namespace: velero -Labels: velero.io/storage-location=default -Annotations: velero.io/source-cluster-k8s-gitversion=v1.21.9-eks-14c7a48 - velero.io/source-cluster-k8s-major-version=1 - velero.io/source-cluster-k8s-minor-version=21+ - -Phase: Completed - -Errors: 0 -Warnings: 0 - -Namespaces: - Included: backupdemo - Excluded: - -Resources: - Included: * - Excluded: - Cluster-scoped: auto - -Label selector: - -Storage Location: default - -Velero-Native Snapshot PVs: auto - -TTL: 720h0m0s - -Hooks: - -Backup Format Version: 1.1.0 - -Started: 2022-05-22 10:54:32 -0400 EDT -Completed: 2022-05-22 10:54:35 -0400 EDT - -Expiration: 2022-06-21 10:54:32 -0400 EDT - -Total items to be backed up: 10 -Items backed up: 10 - -Velero-Native Snapshots: -``` - -7. Delete the namespace - this will be restored using the backup created - -```bash -kubectl delete namespace backupdemo -``` - -8. Restore the namespace from your backup - -```bash -velero restore create --from-backup backup1 -``` - -9. Verify that the namespace is restored - -```bash -kubectl get all -n backupdemo - -# Output should look similar to below -NAME READY STATUS RESTARTS AGE -pod/nginx 1/1 Running 0 21s -``` diff --git a/docs/add-ons/vpa.md b/docs/add-ons/vpa.md deleted file mode 100644 index 804ac07f9d..0000000000 --- a/docs/add-ons/vpa.md +++ /dev/null @@ -1,27 +0,0 @@ -# Vertical Pod Autoscaler -[VPA](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler) Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory reservations for your pods to help "right size" your applications. When configured, it will automatically request the necessary reservations based on usage and thus allow proper scheduling onto nodes so that the appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial container configuration. - -NOTE: Metrics Server add-on is a dependency for this addon - -## Usage - -This step deploys the Vertical Pod Autoscaler with default Helm Chart config - -```hcl - enable_vpa = true -``` - -Alternatively, you can override the helm values by using the code snippet below - -```hcl - vpa_enable = true - - vpa_helm_config = { - name = "vpa" # (Required) Release name. - repository = "https://charts.fairwinds.com/stable" # (Optional) Repository URL where to locate the requested chart. - chart = "vpa" # (Required) Chart name to be installed. - version = "1.0.0" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/vpa/locals.tf - namespace = "vpa" # (Optional) The namespace to install the release into. - values = [templatefile("${path.module}/values.yaml", {})] - } -``` diff --git a/docs/add-ons/yunikorn.md b/docs/add-ons/yunikorn.md deleted file mode 100644 index eae5b97e2f..0000000000 --- a/docs/add-ons/yunikorn.md +++ /dev/null @@ -1,27 +0,0 @@ -# Apache YuniKorn -[YuniKorn](https://yunikorn.apache.org/) YuniKorn is a light-weight, universal resource scheduler for container orchestrator systems. - -Apache YuniKorn (Incubating) is a new Apache incubator project that offers rich scheduling capabilities on Kubernetes. It fills the scheduling gap while running Big Data workloads on Kubernetes, with a ton of useful features such as hierarchical queues, elastic queue quotas, resource fairness, and job ordering - -You can define `batchScheduler: "yunikorn"` when you are running Spark Applications using SparkK8sOperator - -## Usage -This step deploys the Apache YuniKorn K8s schedular with default Helm Chart config - -```hcl - enable_yunikorn = true -``` - -Alternatively, you can override the helm values by using the code snippet below - -```hcl - enable_yunikorn = true - - yunikorn_helm_config = { - name = "yunikorn" # (Required) Release name. - repository = "https://apache.github.io/yunikorn-release" # (Optional) Repository URL where to locate the requested chart. - chart = "yunikorn" # (Required) Chart name to be installed. - version = "0.12.2" # (Optional) Specify the exact chart version to install. If this is not specified, it defaults to the version set within default_helm_config: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/yunikorn/locals.tf - values = [templatefile("${path.module}/values.yaml", {})] - } -``` diff --git a/docs/advanced/.pages b/docs/advanced/.pages deleted file mode 100644 index 9fbb9ca8b9..0000000000 --- a/docs/advanced/.pages +++ /dev/null @@ -1,8 +0,0 @@ -nav: - - Bottlerocket: bottlerocket.md - - Cluster Upgrades: cluster-upgrades.md - - ECR Instructions: ecr-instructions.md - - GitOps with Flux: gitops-with-flux.md - - Multi-cluster: multi-cluster.md - - Private Clusters: private-clusters.md - - ... diff --git a/docs/advanced/bottlerocket.md b/docs/advanced/bottlerocket.md deleted file mode 100644 index 9098834b91..0000000000 --- a/docs/advanced/bottlerocket.md +++ /dev/null @@ -1,22 +0,0 @@ -# Bottlerocket OS - -[Bottlerocket](https://aws.amazon.com/bottlerocket/) is an open source operating system specifically designed for running containers. Bottlerocket build system is based on Rust. It's a container host OS and doesn't have additional software's or package managers other than what is needed for running containers hence its very light weight and secure. Container optimized operating systems are ideal when you need to run applications in Kubernetes with minimal setup and do not want to worry about security or updates, or want OS support from cloud provider. Container operating systems does updates transactionally. - -Bottlerocket has two containers runtimes running. Control container **on** by default used for AWS Systems manager and remote API access. Admin container **off** by default for deep debugging and exploration. - -Bottlerocket [Launch templates userdata](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/aws-eks-managed-node-groups/templates/userdata-bottlerocket.tpl) uses the TOML format with Key-value pairs. -Remote API access API via SSM agent. You can launch trouble shooting container via user data `[settings.host-containers.admin] enabled = true`. - -### Features -* [Secure](https://github.com/bottlerocket-os/bottlerocket/blob/develop/SECURITY_FEATURES.md) - Opinionated, specialized and highly secured -* **Flexible** - Multi cloud and multi orchestrator -* **Transactional** - Image based upgraded and rollbacks -* **Isolated** - Separate container Runtimes - -### Updates -Bottlerocket can be updated automatically via Kubernetes Operator - -```sh - kubectl apply -f Bottlerocket_k8s.csv.yaml - kubectl get ClusterServiceVersion Bottlerocket_k8s | jq.'status' -``` diff --git a/docs/advanced/cluster-upgrades.md b/docs/advanced/cluster-upgrades.md deleted file mode 100644 index 755458787b..0000000000 --- a/docs/advanced/cluster-upgrades.md +++ /dev/null @@ -1,59 +0,0 @@ -### EKS Upgrade Documentation - -#### Objective: - -The purpose of this document is to provide an overview of the steps for upgrading the EKS Cluster from one version to another. Please note that EKS upgrade documentation gets published by AWS every year. - -The current version of the upgrade documentation while writing this [README](https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html) - -#### Prerequisites: - - 1. Download the latest upgrade docs from AWS sites (https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html) - 2. Always upgrade one increment at a time (E.g., 1.20 to 1.21). AWS doesn't support upgrades from 1.20 to 1.22 directly - -#### Steps to Upgrade EKS cluster: - -1. Change the version in Terraform to the desired Kubernetes cluster version. See the example below - - ```hcl-terraform - cluster_version = "1.21" - ``` - -2. If you are specifying a version for EKS managed addons, you will need to ensure the version used is compatible with the new cluster version, or use a data source to pull the appropriate version. If you are not specifying a version for EKS managed addons, no changes are required since the EKS service will update the default addon version based on the cluster version specified. - -To ensure the correct addon version is used, it is recommended to use the addon version data source which will pull the appropriate version for a given cluster version: - -```hcl-terraform -module "eks_blueprints_kubernetes_addons" { - # Essential inputs are not shown for brevity - - enable_amazon_eks_coredns = true - amazon_eks_coredns_config = { - most_recent = true - } - - enable_amazon_eks_aws_ebs_csi_driver = true - amazon_eks_aws_ebs_csi_driver_config = { - most_recent = true - } - - enable_amazon_eks_kube_proxy = true - amazon_eks_kube_proxy_config = { - most_recent = true - } - - enable_amazon_eks_vpc_cni = true - amazon_eks_vpc_cni_config = { - most_recent = true - } -} -``` - -3. Apply the changes to the cluster with Terraform. This will: - - Upgrade the Control Plane to the version specified - - Update the Data Plane to ensure the compute resources are utilizing the corresponding AMI for the given cluster version - - Update addons to reflect the respective versions - -## Important Note - -Please note that you may need to update other Kubernetes Addons deployed through Helm Charts to match with new Kubernetes upgrade version diff --git a/docs/advanced/ecr-instructions.md b/docs/advanced/ecr-instructions.md deleted file mode 100644 index cc23551caf..0000000000 --- a/docs/advanced/ecr-instructions.md +++ /dev/null @@ -1,31 +0,0 @@ -# Docker upload to Elastic Container Registry - -Download the docker image to your local Mac/Laptop - -``` -$ docker pull : -``` - -Retrieve an authentication token and authenticate your Docker client to your registry. Use the AWS CLI: - -``` -$ aws ecr get-login-password --region | docker login --username AWS --password-stdin .dkr.ecr..amazonaws.com -``` - -Create an ECR repo for your image. - -``` -$ aws ecr create-repository --repository-name --image-scanning-configuration scanOnPush=true -``` - -After the repo is created in ECR, tag your image so, you can push the image to this repository: - -``` -$ docker tag : .dkr.ecr.: -``` - -Step 6: Run the following command to push this image to your newly created AWS repository: - -``` -$ docker push .dkr.ecr.: -``` diff --git a/docs/advanced/gitops-with-flux.md b/docs/advanced/gitops-with-flux.md deleted file mode 100644 index 79d0210a19..0000000000 --- a/docs/advanced/gitops-with-flux.md +++ /dev/null @@ -1,114 +0,0 @@ -# Manage your cluster(s) configuration with Flux - -Once you have deployed your EKS cluster(s) with Terraform, you can leverage [Flux](https://fluxcd.io) to manage your cluster's configuration with [GitOps](https://www.gitops.tech/), including the deployment of add-ons, cluster configuration (e.g. cluster policies) and applications. Using GitOps practices to manage your clusters configuration will simplify management, scaling the number of clusters you run and be able to easily recreate your clusters, treating them as ephemeral resources. Recreating your cluster is as simple as deploying a new cluster with Terraform and bootstrapping it with Flux pointing to the repository containing the configuration. - -The [aws-samples/flux-eks-gitops-config](https://github.com/aws-samples/flux-eks-gitops-config) repository provides a sample configuration blueprint for configuring multiple Amazon EKS clusters belonging to different stages (`test` and `production`) using [GitOps](https://www.gitops.tech/) with [Flux v2](https://fluxcd.io/docs/). This repository installs a set of commonly used Kubernetes add-ons to perform policy enforcement, restrict network traffic with network policies, cluster monitoring, extend Kubernetes deployment capabilities enabling progressive Canary deployments for your applications... - -You can use the above sample repository to experiment with the predefined cluster configurations and use it as a baseline to adjust it to your own needs. - -This sample installs the following Kubernetes add-ons: - -* **[metrics-server](https://github.com/kubernetes-sigs/metrics-server):** Aggregator of resource usage data in your cluster, commonly used by other Kubernetes add ons, such us [Horizontal Pod Autoscaler](https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html) or [Kubernetes Dashboard](https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html). -* **[Calico](https://docs.tigera.io/calico/next/about/):** Project Calico is a network policy engine for Kubernetes. Calico network policy enforcement allows you to implement network segmentation and tenant isolation. For more information check the [Amazon EKS documentation](https://docs.aws.amazon.com/eks/latest/userguide/calico.html). -* **[Kyverno](https://kyverno.io/):** Kubernetes Policy Management Engine. Kyverno allows cluster administrators to manage environment specific configurations independently of workload configurations and enforce configuration best practices for their clusters. Kyverno can be used to scan existing workloads for best practices, or can be used to enforce best practices by blocking or mutating API requests. -* **[Prometheus](https://prometheus.io/):** Defacto standard open-source systems monitoring and alerting toolkit for Kubernetes. This repository installs [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). -* **[Flagger](https://flagger.app/):** Progressive delivery operator for Flux. Flagger can run automated application analysis, testing, promotion and rollback for the following deployment strategies: Canary, A/B Testing and Blue/Green. For more details, check the [Flagger documentation](https://docs.flagger.app/). -* **[nginx-ingress-controller](https://kubernetes.github.io/ingress-nginx/):** Ingress controller to expose apps and enable [canary deployments and A/B testing with Flagger](https://docs.flagger.app/tutorials/nginx-progressive-delivery). - -**NOTE:** The add-ons on the sample are not configured for a production-ready cluster (e.g. Prometheus would need to be configured for long term metric storage, nginx would need HPA and any custom settings you need...). - -There're also a set of Kyverno cluster policies deployed to audit (test) or enforce (production) security settings on your workloads, as well as [podinfo](https://github.com/stefanprodan/podinfo) as a sample application, configured with [Flagger](https://flagger.app/) to perform progressive deployments. For further information, visit the [aws-samples/flux-eks-gitops-config](https://github.com/aws-samples/flux-eks-gitops-config) repository documentation. - -## Bootstrap your cluster with Flux - -The below instructions assume you have created a cluster with `eks-blueprints` with no add-ons other than aws-load-balancer-controller. If you're installing additional add-ons via terraform, the configuration may clash with the one on the sample repository. If you plan to leverage Flux, we recommend that you use Terraform to install and manage only add-ons that require additional AWS resources to be created (like IAM roles for Service accounts), and then use Flux to manage the rest. - -### Prerequisites - -The add-ons and configurations of this repository require Kubernetes 1.21 or higher (this is required by the version of kube-prometheus-stack that is installed, you can use 1.19+ installing previous versions of kube-prometheus-stack). - -You'll also need the following: - -* Install flux CLI on your computer following the instructions [here](https://fluxcd.io/docs/installation/). This repository has been tested with flux 0.22. -* A GitHub account and a [personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line) that can create repositories. - -### Bootstrap your cluster - -Fork the [aws-samples/flux-eks-gitops-config](https://github.com/aws-samples/flux-eks-gitops-config) repository on your personal GitHub account and export your GitHub access token, username and repo name: - -```sh - export GITHUB_TOKEN= - export GITHUB_USER= - export GITHUB_REPO= -``` - -Define whether you want to bootstrap your cluster with the `TEST` or the `PRODUCTION` configuration: - -```sh - # TEST configuration - export CLUSTER_ENVIRONMENT=test - - # PRODUCTION configuration - export CLUSTER_ENVIRONMENT=production -``` - -Verify that your staging cluster satisfies the prerequisites with: - -```sh - flux check --pre -``` - -You can now bootstrap your cluster with Flux CLI. - -```sh - flux bootstrap github --owner=${GITHUB_USER} --repository=${GITHUB_REPO} --branch=main --path=clusters/${CLUSTER_ENVIRONMENT} --personal -``` - -The bootstrap command commits the manifests for the Flux components in `clusters/${CLUSTER_ENVIRONMENT}/flux-system` directory and creates a deploy key with read-only access on GitHub, so it can pull changes inside the cluster. - -Confirm that Flux has finished applying the configuration to your cluster (it will take 3 or 4 minutes to sync everything): - -```sh - $ flux get kustomization - NAME READY MESSAGE REVISION SUSPENDED - apps True Applied revision: main/b7d10ca21be7cac0dcdd14c80353012ccfedd4fe main/b7d10ca21be7cac0dcdd14c80353012ccfedd4fe False - calico-installation True Applied revision: master/00a2f33ea55f2018819434175c09c8bd8f20741a master/00a2f33ea55f2018819434175c09c8bd8f20741a False - calico-operator True Applied revision: master/00a2f33ea55f2018819434175c09c8bd8f20741a master/00a2f33ea55f2018819434175c09c8bd8f20741a False - config True Applied revision: main/8fd33f531df71002f2da7bc9619ee75281a9ead0 main/8fd33f531df71002f2da7bc9619ee75281a9ead0 False - flux-system True Applied revision: main/b7d10ca21be7cac0dcdd14c80353012ccfedd4fe main/b7d10ca21be7cac0dcdd14c80353012ccfedd4fe False - infrastructure True Applied revision: main/b7d10ca21be7cac0dcdd14c80353012ccfedd4fe main/b7d10ca21be7cac0dcdd14c80353012ccfedd4fe False -``` - -Get the URL for the nginx ingress controller that has been deployed in your cluster (you will see two ingresses, since Flagger will create a canary ingress): - -```sh - $ kubectl get ingress -n podinfo - NAME CLASS HOSTS ADDRESS PORTS AGE - podinfo nginx podinfo.test k8s-xxxxxx.elb.us-west-2.amazonaws.com 80 23h - podinfo-canary nginx podinfo.test k8s-xxxxxx.elb.us-west-2.amazonaws.com 80 23h -``` - -Confirm that podinfo can be correctly accessed via ingress: - -```sh - $ curl -H "Host: podinfo.test" k8s-xxxxxx.elb.us-west-2.amazonaws.com - { - "hostname": "podinfo-primary-65584c8f4f-d7v4t", - "version": "6.0.0", - "revision": "", - "color": "#34577c", - "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif", - "message": "greetings from podinfo v6.0.0", - "goos": "linux", - "goarch": "amd64", - "runtime": "go1.16.5", - "num_goroutine": "10", - "num_cpu": "2" - } -``` - -Congratulations! Your cluster has sync'ed all the configuration defined on the repository. Continue exploring the deployed configuration following these docs: - -* [Review the repository structure to understand the applied configuration](https://github.com/aws-samples/flux-eks-gitops-config/blob/main/docs/repository-structure.md) -* [Test the cluster policies configured with Kyverno](https://github.com/aws-samples/flux-eks-gitops-config/blob/main/docs/test-kyverno-policies.md) -* [Test progressive deployments with Flux, Flagger and nginx controller](https://github.com/aws-samples/flux-eks-gitops-config/blob/main/docs/flagger-canary-deployments.md) diff --git a/docs/advanced/multi-cluster.md b/docs/advanced/multi-cluster.md deleted file mode 100644 index 75d58e4137..0000000000 --- a/docs/advanced/multi-cluster.md +++ /dev/null @@ -1,62 +0,0 @@ -## Advanced Deployment Folder Structure - -This example shows how to structure folders in your repo when you want to deploy multiple EKS Clusters across multiple regions and accounts. - -The top-level `examples\advanced` folder provides an example of how you can structure your folders and files to define multiple EKS Cluster environments and consume this Blueprints module. This approach is suitable for large projects, with clearly defined sub directory and file structure. - -Each folder under `live//application` represents an EKS cluster environment(e.g., dev, test, load etc.). Each folder contains a `backend.conf` and `.tfvars`, used to create a unique Terraform state for each cluster environment. - -Terraform backend configuration can be updated in `backend.conf` and cluster common configuration variables in `.tfvars` - -e.g. folder/file structure for defining multiple clusters - - ├── examples\advanced - │ └── live - │ └── preprod - │ └── eu-west-1 - │ └── application - │ └── dev - │ └── backend.conf - │ └── dev.tfvars - │ └── main.tf - │ └── variables.tf - │ └── outputs.tf - │ └── test - │ └── backend.conf - │ └── test.tfvars - │ └── prod - │ └── eu-west-1 - │ └── application - │ └── prod - │ └── backend.conf - │ └── prod.tfvars - │ └── main.tf - │ └── variables.tf - │ └── outputs.tf - - -## Important Note - -If you are using an existing VPC, you need to ensure that the following tags are added to the VPC and subnet resources - -Add Tags to **VPC** -```hcl - Key = "Kubernetes.io/cluster/${local.cluster_id}" - Value = "Shared" -``` - -Add Tags to **Public Subnets tagging** requirement -```hcl - public_subnet_tags = { - "Kubernetes.io/cluster/${local.cluster_id}" = "shared" - "Kubernetes.io/role/elb" = "1" - } -``` - -Add Tags to **Private Subnets tagging** requirement -```hcl - private_subnet_tags = { - "Kubernetes.io/cluster/${local.cluster_id}" = "shared" - "Kubernetes.io/role/internal-elb" = "1" - } -``` diff --git a/docs/advanced/private-clusters.md b/docs/advanced/private-clusters.md deleted file mode 100644 index 3bfec2b12b..0000000000 --- a/docs/advanced/private-clusters.md +++ /dev/null @@ -1,14 +0,0 @@ -# Private Clusters - -For fully Private EKS clusters requires the following VPC endpoints to be created to communicate with AWS services. This module will create these endpoints if you choose to create VPC. If you are using an existing VPC then you may need to ensure these endpoints are created. - - com.amazonaws.region.aps-workspaces - For AWS Managed Prometheus Workspace - com.amazonaws.region.ssm - Secrets Management - com.amazonaws.region.ec2 - com.amazonaws.region.ecr.api - com.amazonaws.region.ecr.dkr - com.amazonaws.region.logs – For CloudWatch Logs - com.amazonaws.region.sts – If using AWS Fargate or IAM roles for service accounts - com.amazonaws.region.elasticloadbalancing – If using Application Load Balancers - com.amazonaws.region.autoscaling – If using Cluster Autoscaler - com.amazonaws.region.s3 – Creates S3 gateway diff --git a/docs/blueprints/agones-game-controller.md b/docs/blueprints/agones-game-controller.md new file mode 100644 index 0000000000..f1df1041e1 --- /dev/null +++ b/docs/blueprints/agones-game-controller.md @@ -0,0 +1,7 @@ +--- +title: Agones Game Controller +--- + +{% + include-markdown "../../examples/agones-game-controller/README.md" +%} diff --git a/docs/blueprints/appmesh-mtls.md b/docs/blueprints/appmesh-mtls.md new file mode 100644 index 0000000000..ee9400a3ad --- /dev/null +++ b/docs/blueprints/appmesh-mtls.md @@ -0,0 +1,7 @@ +--- +title: AWS AppMesh mTLS +--- + +{% + include-markdown "../../examples/appmesh-mtls/README.md" +%} diff --git a/docs/blueprints/argocd.md b/docs/blueprints/argocd.md new file mode 100644 index 0000000000..b91434e55c --- /dev/null +++ b/docs/blueprints/argocd.md @@ -0,0 +1,7 @@ +--- +title: ArgoCD +--- + +{% + include-markdown "../../examples/argocd/README.md" +%} diff --git a/docs/blueprints/blue-green-upgrade.md b/docs/blueprints/blue-green-upgrade.md new file mode 100644 index 0000000000..9c79037e66 --- /dev/null +++ b/docs/blueprints/blue-green-upgrade.md @@ -0,0 +1,7 @@ +--- +title: Blue/Green Migration +--- + +{% + include-markdown "../../examples/blue-green-upgrade/README.md" +%} diff --git a/docs/blueprints/elastic-fabric-adapter.md b/docs/blueprints/elastic-fabric-adapter.md new file mode 100644 index 0000000000..2f2aba96aa --- /dev/null +++ b/docs/blueprints/elastic-fabric-adapter.md @@ -0,0 +1,7 @@ +--- +title: Elastic Fabric Adapter +--- + +{% + include-markdown "../../examples/elastic-fabric-adapter/README.md" +%} diff --git a/docs/blueprints/external-secrets.md b/docs/blueprints/external-secrets.md new file mode 100644 index 0000000000..ff7ee31c5b --- /dev/null +++ b/docs/blueprints/external-secrets.md @@ -0,0 +1,7 @@ +--- +title: External Secrets +--- + +{% + include-markdown "../../examples/external-secrets/README.md" +%} diff --git a/docs/blueprints/fargate-serverless.md b/docs/blueprints/fargate-serverless.md new file mode 100644 index 0000000000..fe97d2784f --- /dev/null +++ b/docs/blueprints/fargate-serverless.md @@ -0,0 +1,7 @@ +--- +title: Fargate Serverless +--- + +{% + include-markdown "../../examples/fargate-serverless/README.md" +%} diff --git a/docs/blueprints/fully-private-cluster.md b/docs/blueprints/fully-private-cluster.md new file mode 100644 index 0000000000..8133b74cad --- /dev/null +++ b/docs/blueprints/fully-private-cluster.md @@ -0,0 +1,7 @@ +--- +title: Fully Private Cluster +--- + +{% + include-markdown "../../examples/fully-private-cluster/README.md" +%} diff --git a/docs/blueprints/ipv4-prefix-delegation.md b/docs/blueprints/ipv4-prefix-delegation.md new file mode 100644 index 0000000000..463d041fca --- /dev/null +++ b/docs/blueprints/ipv4-prefix-delegation.md @@ -0,0 +1,7 @@ +--- +title: IPv4 Prefix Delegation +--- + +{% + include-markdown "../../examples/ipv4-prefix-delegation/README.md" +%} diff --git a/docs/blueprints/ipv6-eks-cluster.md b/docs/blueprints/ipv6-eks-cluster.md new file mode 100644 index 0000000000..2befa8b970 --- /dev/null +++ b/docs/blueprints/ipv6-eks-cluster.md @@ -0,0 +1,7 @@ +--- +title: IPv6 Networking +--- + +{% + include-markdown "../../examples/ipv6-eks-cluster/README.md" +%} diff --git a/docs/blueprints/karpenter.md b/docs/blueprints/karpenter.md new file mode 100644 index 0000000000..85288dd56d --- /dev/null +++ b/docs/blueprints/karpenter.md @@ -0,0 +1,7 @@ +--- +title: Karpenter +--- + +{% + include-markdown "../../examples/karpenter/README.md" +%} diff --git a/docs/blueprints/multi-tenancy-with-teams.md b/docs/blueprints/multi-tenancy-with-teams.md new file mode 100644 index 0000000000..329558c272 --- /dev/null +++ b/docs/blueprints/multi-tenancy-with-teams.md @@ -0,0 +1,7 @@ +--- +title: Multi-Tenancy w/ Teams +--- + +{% + include-markdown "../../examples/multi-tenancy-with-teams/README.md" +%} diff --git a/docs/blueprints/stateful.md b/docs/blueprints/stateful.md new file mode 100644 index 0000000000..ec7ab17e48 --- /dev/null +++ b/docs/blueprints/stateful.md @@ -0,0 +1,7 @@ +--- +title: Stateful +--- + +{% + include-markdown "../../examples/stateful/README.md" +%} diff --git a/docs/blueprints/tls-with-aws-pca-issuer.md b/docs/blueprints/tls-with-aws-pca-issuer.md new file mode 100644 index 0000000000..8335529c4c --- /dev/null +++ b/docs/blueprints/tls-with-aws-pca-issuer.md @@ -0,0 +1,7 @@ +--- +title: TLS w/ AWS PCA Issuer +--- + +{% + include-markdown "../../examples/tls-with-aws-pca-issuer/README.md" +%} diff --git a/docs/blueprints/vpc-cni-custom-networking.md b/docs/blueprints/vpc-cni-custom-networking.md new file mode 100644 index 0000000000..80b9725a31 --- /dev/null +++ b/docs/blueprints/vpc-cni-custom-networking.md @@ -0,0 +1,7 @@ +--- +title: VPC CNI Custom Networking +--- + +{% + include-markdown "../../examples/vpc-cni-custom-networking/README.md" +%} diff --git a/docs/blueprints/wireguard-with-cilium.md b/docs/blueprints/wireguard-with-cilium.md new file mode 100644 index 0000000000..733fd4f9e2 --- /dev/null +++ b/docs/blueprints/wireguard-with-cilium.md @@ -0,0 +1,7 @@ +--- +title: Wireguard /w Cilium +--- + +{% + include-markdown "../../examples/wireguard-with-cilium/README.md" +%} diff --git a/docs/core-concepts.md b/docs/core-concepts.md deleted file mode 100644 index 7798d11bf9..0000000000 --- a/docs/core-concepts.md +++ /dev/null @@ -1,37 +0,0 @@ -# Core Concepts - -This document provides a high level overview of the Core Concepts that are embedded in EKS Blueprints. For the purposes of this document, we will assume the reader is familiar with Git, Docker, Kubernetes and AWS. - -| Concept | Description | -| --------------------------- | --------------------------------------------------------------------------------------------- | -| [Cluster](#cluster) | An Amazon EKS Cluster and associated worker groups. | -| [Add-on](#add-on) | Operational software that provides key functionality to support your Kubernetes applications. | -| [Team](#team) | A logical grouping of IAM identities that have access to Kubernetes resources. | -| Pipeline | Continuous Delivery pipelines for deploying `clusters` and `add-ons`. | -| [Application](#application) | An application that runs within an EKS Cluster. | - -## Cluster - -A `cluster` is simply an EKS cluster. EKS Blueprints provides for customizing the compute options you leverage with your `clusters`. The framework currently supports `EC2`, `Fargate` and `BottleRocket` instances. It also supports managed and self-managed node groups. - -We rely on [`terraform-aws-modules/eks/aws`](https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest) to configure `clusters`. See our [examples](getting-started.md) to see how `terraform-aws-modules/eks/aws` is configured for EKS Blueprints. - -## Add-on - -`Add-ons` allow you to configure the operational tools that you would like to deploy into your EKS cluster. When you configure `add-ons` for a `cluster`, the `add-ons` will be provisioned at deploy time by leveraging the Terraform Helm provider. Add-ons can deploy both Kubernetes specific resources and AWS resources needed to support add-on functionality. - -For example, the `metrics-server` add-on only deploys the Kubernetes manifests that are needed to run the Kubernetes Metrics Server. By contrast, the `aws-load-balancer-controller` add-on deploys both Kubernetes YAML, in addition to creating resources via AWS APIs that are needed to support the AWS Load Balancer Controller functionality. - -EKS Blueprints allows you to manage your add-ons directly via Terraform (by leveraging the Terraform Helm provider) or via GitOps with ArgoCD. See our [`Add-ons`](add-ons/index.md) documentation page for detailed information. - -## Team - -`Teams` allow you to configure the logical grouping of users that have access to your EKS clusters, in addition to the access permissions they are granted. EKS Blueprints currently supports two types of `teams`: `application-team` and `platform-team`. `application-team` members are granted access to specific namespaces. `platform-team` members are granted administrative access to your clusters. - -See our [`Teams`](teams.md) documentation page for detailed information. - -## Application - -`Applications` represent the actual workloads that run within a Kubernetes cluster. The framework leverages a GitOps approach for deploying applications onto clusters. - -See our [`Applications`](https://aws-ia.github.io/terraform-aws-eks-blueprints/main/add-ons/argocd/#bootstrapping) documentation for detailed information. diff --git a/docs/extensibility.md b/docs/extensibility.md deleted file mode 100644 index 1e0b91a797..0000000000 --- a/docs/extensibility.md +++ /dev/null @@ -1,277 +0,0 @@ -# Extensibility - -This guide provides an overview of extensibility options focusing on add-on extensions as the primary mechanism for the partners and customers. - -## Overview - -EKS Blueprints framework is designed to be extensible. In the context of this guide, extensibility refers to the ability of customers and partners to both add new capabilities to the framework or platforms as well as customize existing behavior, including the ability to modify or override existing behavior. - -As of this writing, the primary means by which customers and partners can extend the EKS Blueprints for Terraform framework is by implementing new add-ons which could be leveraged exactly the same way as the core add-ons (supplied by the framework). - -### Add-on Extensions - -#### Helm Add-ons - -Helm add-ons are the most common case that generally combines provisioning of a helm chart as well as supporting infrastructure such as wiring of proper IAM policies for the Kubernetes service account, provisioning or configuring other AWS resources (VPC, subnets, node groups). - -In order to simplify the add-on creation, we have provided a helper module called [`helm-addon`](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/helm-addon/README.md) for convenience. - -#### Non-helm Add-ons - -Add-ons that don't leverage helm but require to install arbitrary Kubernetes manifests will not be able to leverage the benefits provided by the [`helm-addon`](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/helm-addon/README.md) however, they are still relatively easy to implement and would follow a similar pattern. Such addons should leverage the [kubectl provider](https://registry.terraform.io/providers/gavinbunney/kubectl). - -### Public Add-ons - -The life-cycle of a public add-on should be decoupled from the life-cycle of the core framework repository. When decoupled, extensions can be released at any arbitrary cadence specific to the extension, enabling better agility when it comes to new features or bug fixes. The owner of such public add-on is ultimately responsible for the quality and maintenance of the add-on. - -In order to enable this model the following workflow outline steps required to create and release a public add-on: - -1. Public add-on are created in a separate repository. Public GitHub repository is preferred as it aligns with the open-source spirit of the framework and enables external reviews/feedback. -2. Add-ons are released and consumed as distinct public Terraform modules. -3. Public add-ons are expected to have sufficient documentation to allow customers to consume them independently. Documentation can reside in GitHub or external resources referenced in the documentation bundled with the extension. -4. Public add-ons are expected to be tested and validated against released EKS Blueprints versions, e.g. with a CI/CD pipeline or GitHub Actions. - -### Partner Add-ons - -Partner extensions (APN Partner) are expected to comply with the public extension workflow and additional items required to ensure proper validation and documentation support for a partner extension. - -We expect 2 PRs to be created for every Partner Add-On. - -1. A PR against the main [EKS Blueprints](https://github.com/aws-ia/terraform-aws-eks-blueprints) repository that contains the following: - 1. Update [kubernetes-addons/main.tf](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/main.tf) to add a module invocation of the remote terraform module for the add-on. - 2. Documentation to update the [Add-Ons](./add-ons/index.md) section. Example of add-on documentation can be found here along with the list of other add-ons. -2. A second PR against the [EKS Blueprints Add-Ons](https://github.com/aws-samples/eks-blueprints-add-ons) repository to create an ArgoCD application for your add-on. See example of other add-ons that shows what should be added. Add-ons that do not provide GitOps support are not expected to create this PR. - -### Private Add-ons - -There are two ways in which a customer can implement fully private add-ons: - -1. Add-ons specific to a customer instance of EKS Blueprints can be implemented inline with the blueprint in the same codebase. Such extensions are scoped to the customer base. Forking the repo however has disadvantages when it comes to ongoing feature releases and bug fixes which will have to be manually ported to your fork. -2. We recommend, you implement a separate repository for your private add-on while still using the upstream framework. This gives you the advantage of keeping up with ongoing feature releases and bug fixes while keeping your add-on private. - -The following example shows you can leverage EKS Blueprints to provide your own helm add-on. - -```hcl -#--------------------------------------------------------------- -# AWS VPC CNI Metrics Helper -# This is using local helm chart -#--------------------------------------------------------------- - -data "aws_partition" "current" {} - -data "aws_caller_identity" "current" {} - - -locals { - cni_metrics_name = "cni-metrics-helper" - - default_helm_values = [templatefile("${path.module}/helm-values/cni-metrics-helper-values.yaml", { - eks_cluster_id = var.eks_cluster_id, - image = "602401143452.dkr.ecr.${var.region}.amazonaws.com/cni-metrics-helper:v1.10.3", - sa-name = local.cni_metrics_name - oidc_url = "oidc.eks.eu-west-1.amazonaws.com/id/E6CASOMETHING55B9D01F7" - })] - - addon_context = { - aws_caller_identity_account_id = data.aws_caller_identity.current.account_id - aws_caller_identity_arn = data.aws_caller_identity.current.arn - aws_eks_cluster_endpoint = data.aws_eks_cluster.cluster.endpoint - aws_partition_id = data.aws_partition.current.partition - aws_region_name = var.region - eks_cluster_id = var.eks_cluster_id - eks_oidc_issuer_url = local.oidc_url - eks_oidc_provider_arn = "arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.oidc_url}" - tags = {} - } - - helm_config = { - name = local.cni_metrics_name - description = "CNI Metrics Helper Helm Chart" - timeout = "300" - chart = "${path.module}/local-helm-charts/cni-metrics-helper" - version = "0.1.7" - repository = null - namespace = "kube-system" - lint = false - values = local.default_helm_values - } - - irsa_config = { - kubernetes_namespace = "kube-system" - kubernetes_service_account = local.cni_metrics_name - create_kubernetes_namespace = false - create_kubernetes_service_account = true - irsa_iam_policies = [aws_iam_policy.cni_metrics.arn] - } -} - -module "helm_addon" { - source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons/helm-addon" - helm_config = local.helm_config - irsa_config = local.irsa_config - addon_context = local.addon_context -} - -resource "aws_iam_policy" "cni_metrics" { - name = "${var.eks_cluster_id}-cni-metrics" - description = "IAM policy for EKS CNI Metrics helper" - path = "/" - policy = data.aws_iam_policy_document.cni_metrics.json - - tags = var.tags -} - -data "aws_iam_policy_document" "cni_metrics" { - statement { - sid = "CNIMetrics" - actions = [ - "cloudwatch:PutMetricData" - ] - resources = ["*"] - } -} -``` - -### Secrets Handling - -We expect that certain add-ons will need to provide access to sensitive values to their helm chart configuration such as password, license keys, API keys, etc. We recommend that you ask customers to store such secrets in an external secret store such as AWS Secrets Manager or AWS Systems Manager Parameter Store and use the [AWS Secrets and Configuration Provider (ASCP)](https://docs.aws.amazon.com/secretsmanager/latest/userguide/integrating_csi_driver.html) to mount the secrets as files or environment variables in the pods of your add-on. We are actively working on providing a native add-on for ASCP as of this writing which you will be able to leverage for your add-on. - -## Example Public Add-On - -[Kube-state-metrics-addon](https://registry.terraform.io/modules/askulkarni2/kube-state-metrics-addon/eksblueprints/latest) extension contains a sample implementation of the [`kube-state-metrics`](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-state-metrics) that demonstrates how to write a public add-on that lives outside of the core repo. - -### Add-On Repo - -We recommend the use of pattern `terraform-eksblueprints-` as the name of the repo so that you are able to easily publish the module to Terraform [registry](https://registry.terraform.io/). See [kube-state-metrics](https://github.com/askulkarni2/terraform-eksblueprints-kube-state-metrics-addon) for an example. - -### Add-On Code - -We recommend your add-on code follow Terraform standards for best practices for organizing your code, such as.. - -```sh -. -├── CODE_OF_CONDUCT.md -├── CONTRIBUTING.md -├── LICENSE -├── README.md -├── blueprints -│ ├── README.md -│ ├── addons -│ │ ├── README.md -│ │ ├── addons.tfbackend -│ │ ├── backend.tf -│ │ ├── data.tf -│ │ ├── main.tf -│ │ ├── providers.tf -│ │ └── variables.tf -│ ├── eks -│ │ ├── README.md -│ │ ├── backend.tf -│ │ ├── data.tf -│ │ ├── eks.tfbackend -│ │ ├── main.tf -│ │ ├── outputs.tf -│ │ ├── providers.tf -│ │ └── variables.tf -│ ├── vars -│ │ └── config.tfvars -│ └── vpc -│ ├── README.md -│ ├── backend.tf -│ ├── data.tf -│ ├── locals.tf -│ ├── main.tf -│ ├── outputs.tf -│ ├── providers.tf -│ ├── variables.tf -│ └── vpc.tfbackend -├── locals.tf -├── main.tf -├── outputs.tf -├── values.yaml -└── variables.tf -``` - -In the above code tree, - -- The root directory contains your add-on code. -- The blueprints code contains the code that demonstrates how customers can use your add-on with the EKS Blueprints framework. Here, we highly recommend that you show the true value add of your add-on through the pattern. Customers will benefit the most where the example shows how they can integrate their workload with your add-on. - -If your add-on can be deployed via helm chart, we recommend the use of the [helm-addon](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/helm-addon) as shown below. - -**Note**: Use the latest published module in the source version. - -> main.tf - -```hcl -module "helm_addon" { - source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons/helm-addon?ref=v3.5.0" - manage_via_gitops = var.manage_via_gitops - - ### The following values are defined in locals.tf - set_values = local.set_values - set_sensitive_values = local.set_sensitive_values - helm_config = local.helm_config - addon_context = var.addon_context -} -``` - -### Core Repo Changes - -Once you have tested your add-on locally against your fork of the core repo, please open a PR that contains the following: - -> Update to [`kubernetes-addons/main.tf`](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/main.tf) with a code block that invokes your add-on. E.g. - -```hcl -module "kube_state_metrics" { - count = var.enable_kube_state_metrics ? 1 : 0 - source = "askulkarni2/kube-state-metrics-addon/eksblueprints" - version = "0.0.2" - helm_config = var.kube_state_metrics_helm_config - addon_context = local.addon_context - manage_via_gitops = var.argocd_manage_add_ons -} -``` - -> Update to [`kubernetes-addons/variables.tf`](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/variables.tf) to accept parameters for your add-on. E.g. - -```hcl -#-----------Kube State Metrics ADDON------------- -variable "enable_kube_state_metrics" { - type = bool - default = false - description = "Enable Kube State Metrics add-on" -} - -variable "kube_state_metrics_helm_config" { - type = any - default = {} - description = "Kube State Metrics Helm Chart config" -} -``` - -- Add documentation under add-on [`docs`](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/docs/add-ons/) that gives an overview of your add-on and points the customer to the actual documentation which would live in your add-on repo. - -### GitOps - -If your add-on can be managed via ArgoCD GitOps, then - -- Provide the `argo_gitops_config` as an output of your add-on module as shown [here](https://github.com/askulkarni2/terraform-eksblueprints-kube-state-metrics-addon/blob/main/outputs.tf). - -> outputs.tf - -```hcl -output "argocd_gitops_config" { - description = "Configuration used for managing the add-on with ArgoCD" - value = var.manage_via_gitops ? local.argocd_gitops_config : null -} -``` - -- In the PR against the core repo, update [`kubernetes-addons/locals.tf`](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/locals.tf) to provide the add-on module output `argocd_gitops_config` to the `argocd_add_on_config` as shown for others. - -- Open a PR against the [eks-blueprints-addons](https://github.com/aws-samples/eks-blueprints-add-ons) repo with the following changes: - - - Create a wrapper Helm chart for your add-on similar to [kube-state-metrics](https://github.com/aws-samples/eks-blueprints-add-ons/tree/main/add-ons/kube-state-metrics) - - Create a [`Chart.yaml`](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/add-ons/kube-state-metrics/Chart.yaml) which points to the location of your actual helm chart. - - Create a [`values.yaml`](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/add-ons/kube-state-metrics/values.yaml) which contains a default best-practice configuration for your add-on. - - Create an ArgoCD application [template](https://github.com/aws-samples/eks-blueprints-add-ons/blob/main/chart/templates/kube-state-metrics.yaml) which is applied if `enable_ = true` is used by the customer in the consumer module. This also used to parameterize your add-ons helm chart wrapper with values that will be passed over from Terraform to Helm using the [GitOps bridge](./add-ons/index.md#gitops-bridge). diff --git a/docs/getting-started.md b/docs/getting-started.md index a40d0ed0c5..abe0e68db3 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -70,7 +70,7 @@ fargate-ip-10-0-10-71.us-west-2.compute.internal Ready 2m48s v1 To teardown and remove the resources created in this example: ```sh -terraform destroy -target="module.eks_blueprints_kubernetes_addons" -auto-approve +terraform destroy -target="module.eks_blueprints_addons" -auto-approve terraform destroy -target="module.eks" -auto-approve terraform destroy -auto-approve ``` diff --git a/docs/images/colored-logo.png b/docs/images/colored-logo.png new file mode 100644 index 0000000000..d49129e3f8 Binary files /dev/null and b/docs/images/colored-logo.png differ diff --git a/docs/images/white-logo.png b/docs/images/white-logo.png new file mode 100644 index 0000000000..004fcf1aef Binary files /dev/null and b/docs/images/white-logo.png differ diff --git a/docs/index.md b/docs/index.md index fbb33c51af..4129e67ee0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,39 +1,3 @@ -# Amazon EKS Blueprints for Terraform - -![GitHub](https://img.shields.io/github/license/aws-ia/terraform-aws-eks-blueprints) - -Welcome to Amazon EKS Blueprints for Terraform! - -This repository contains a collection of Terraform modules that aim to make it easier and faster for customers to adopt [Amazon EKS](https://aws.amazon.com/eks/). - -## What is EKS Blueprints - -EKS Blueprints helps you compose complete EKS clusters that are fully bootstrapped with the operational software that is needed to deploy and operate workloads. With EKS Blueprints, you describe the configuration for the desired state of your EKS environment, such as the control plane, worker nodes, and Kubernetes add-ons, as an IaC blueprint. Once a blueprint is configured, you can use it to stamp out consistent environments across multiple AWS accounts and Regions using continuous deployment automation. - -You can use EKS Blueprints to easily bootstrap an EKS cluster with Amazon EKS add-ons as well as a wide range of popular open-source add-ons, including Prometheus, Karpenter, Nginx, Traefik, AWS Load Balancer Controller, Fluent Bit, Keda, ArgoCD, and more. EKS Blueprints also helps you implement relevant security controls needed to operate workloads from multiple teams in the same cluster. - -## Examples - -To view a library of examples for how you can leverage `terraform-aws-eks-blueprints`, please see our [examples](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/examples). - -## Workshop -We maintain a hands-on self-paced workshop, the [EKS Blueprints for Terraform workshop](https://catalog.workshops.aws/eks-blueprints-terraform/en-US) helps you with foundational setup of your EKS cluster, and it gradually adds complexity via existing and new modules. - -![EKS Blueprints for Terraform](https://static.us-east-1.prod.workshops.aws/public/6ad9b13b-df6a-4609-a586-fd2b7f25863c/static/eks_cluster_1.svg) - - -## Motivation - -Kubernetes is a powerful and extensible container orchestration technology that allows you to deploy and manage containerized applications at scale. The extensible nature of Kubernetes also allows you to use a wide range of popular open-source tools, commonly referred to as add-ons, in Kubernetes clusters. With such a large number of tooling and design choices available however, building a tailored EKS cluster that meets your application’s specific needs can take a significant amount of time. It involves integrating a wide range of open-source tools and AWS services and requires deep expertise in AWS and Kubernetes. - -AWS customers have asked for examples that demonstrate how to integrate the landscape of Kubernetes tools and make it easy for them to provision complete, batteries-included EKS clusters that meet specific application requirements. EKS Blueprints was built to address this customer need. You can use EKS Blueprints to configure and deploy purpose built EKS clusters, and start onboarding workloads in days, rather than months. - -## What can I do with this Solution? - -Customers can use this solution to easily architect and deploy complete, opinionated EKS clusters. Specifically, customers can leverage the eks-blueprints module to: - -- Deploy Well-Architected EKS clusters across any number of accounts and regions. -- Manage cluster configuration, including add-ons that run in each cluster, from a single Git repository. -- Define teams, namespaces, and their associated access permissions for your clusters. -- Leverage GitOps-based workflows for onboarding and managing workloads for your teams. -- Create Continuous Delivery (CD) pipelines that are responsible for deploying your infrastructure. +{% + include-markdown "../README.md" +%} diff --git a/docs/internal/.pages b/docs/internal/.pages deleted file mode 100644 index e2d5ae9127..0000000000 --- a/docs/internal/.pages +++ /dev/null @@ -1 +0,0 @@ -hide: true diff --git a/docs/modules/emr-on-eks.md b/docs/modules/emr-on-eks.md deleted file mode 100644 index e323b3a77e..0000000000 --- a/docs/modules/emr-on-eks.md +++ /dev/null @@ -1,37 +0,0 @@ -# EMR on EKS - -EMR on EKS is a deployment option in EMR that allows you to automate the provisioning and management of open-source big data frameworks on EKS. -This module deploys the necessary resources to run EMR Spark Jobs on EKS Cluster. - -- Create a new Namespace to run Spark workloads -- Create K8s Role and Role Binding to allow the username `emr-containers` on a given namespace(`spark`) -- Create RBAC permissions and adding EMR on EKS service-linked role into aws-auth configmap -- Enables IAM Roles for Service Account (IRSA) -- Update trust relationship for job execution role - -## Usage - -[EMR on EKS](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/emr-on-eks) can be deployed by enabling the module via the following. - -Checkout this [Blog](https://aws.amazon.com/blogs/mt/monitoring-amazon-emr-on-eks-with-amazon-managed-prometheus-and-amazon-managed-grafana/) to setup Observability for EMR on EKS Spark Jobs - -```hcl - #--------------------------------------- - # ENABLE EMR ON EKS - #--------------------------------------- - enable_emr_on_eks = true - emr_on_eks_teams = { - emr-team-a = { - namespace = "emr-data-team-a" - job_execution_role = "emr-eks-data-team-a" - additional_iam_policies = [""] - } - emr-team-b = { - namespace = "emr-data-team-b" - job_execution_role = "emr-eks-data-team-b" - additional_iam_policies = [""] - } - } -``` - -Once deployed, you can create Virtual EMR Cluster and execute Spark jobs. See the [document](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-registration.html) below for more details. diff --git a/docs/teams.md b/docs/teams.md deleted file mode 100644 index 739d3ca8d8..0000000000 --- a/docs/teams.md +++ /dev/null @@ -1,118 +0,0 @@ -# Teams - -## Introduction - -EKS Blueprints provides support for onboarding and managing teams and easily configuring cluster access. We currently support two `Team` types: `application_teams` and `platform_teams`. - -`Application Teams` represent teams managing workloads running in cluster namespaces and `Platform Teams` represents platform administrators who have admin access (masters group) to clusters. - -You can reference the [aws-eks-teams](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/aws-eks-teams) module to create your own team implementations. - -### ApplicationTeam - -To create an `application_team` for your cluster, you will need to supply a team name, with the options to pass map of labels, map of resource quotas, existing IAM entities (user/roles), and a directory where you may optionally place any policy definitions and generic manifests for the team. These manifests will be applied by EKS Blueprints and will be outside of the team control. - -**NOTE:** When the manifests are applied, namespaces are not checked. Therefore, you are responsible for namespace settings in the yaml files. - -> As of today (2020-05-01), resource `kubernetes_manifest` can only be used (`terraform plan/apply...`) only after the cluster has been created and the cluster API can be accessed. Read ["Before you use this resource"](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/manifest#before-you-use-this-resource) section for more information. - -To overcome this limitation, you can add/enable `manifests_dir` after you applied and created the cluster first. We are working on a better solution for this. - -#### Application Team Example - -```hcl - # EKS Application Teams - - application_teams = { - # First Team - team-blue = { - "labels" = { - "appName" = "example", - "projectName" = "example", - "environment" = "example", - "domain" = "example", - "uuid" = "example", - } - "quota" = { - "requests.cpu" = "1000m", - "requests.memory" = "4Gi", - "limits.cpu" = "2000m", - "limits.memory" = "8Gi", - "pods" = "10", - "secrets" = "10", - "services" = "10" - } - manifests_dir = "./manifests" - # Belows are examples of IAM users and roles - users = [ - "arn:aws:iam::123456789012:user/blue-team-user", - "arn:aws:iam::123456789012:role/blue-team-sso-iam-role" - ] - } - - # Second Team - team-red = { - "labels" = { - "appName" = "example2", - "projectName" = "example2", - } - "quota" = { - "requests.cpu" = "2000m", - "requests.memory" = "8Gi", - "limits.cpu" = "4000m", - "limits.memory" = "16Gi", - "pods" = "20", - "secrets" = "20", - "services" = "20" - } - manifests_dir = "./manifests2" - users = [ - - "arn:aws:iam::123456789012:role/other-sso-iam-role" - ] - } - } -``` - -EKS Blueprints will do the following for every provided team: - -- Create a namespace -- Register quotas -- Register IAM users for cross-account access -- Create a shared role for cluster access. Alternatively, an existing role can be supplied. -- Register provided users/roles in the `aws-auth` configmap for `kubectl` and console access to the cluster and namespace. -- (Optionally) read all additional manifests (e.g., network policies, OPA policies, others) stored in a provided directory, and apply them. - -### PlatformTeam - -To create an `Platform Team` for your cluster, simply use `platform_teams`. You will need to supply a team name and and all users/roles. - -#### Platform Team Example - -```hcl - platform_teams = { - admin-team-name-example = { - users = [ - "arn:aws:iam::123456789012:user/admin-user", - "arn:aws:iam::123456789012:role/org-admin-role" - ] - } - } -``` - -`Platform Team` does the following: - -- Registers IAM users for admin access to the cluster (`kubectl` and console). -- Registers an existing role (or create a new role) for cluster access with trust relationship with the provided/created role. - -## Cluster Access (`kubectl`) - -The output will contain the IAM roles for every application(`application_teams_iam_role_arn`) or platform team(`platform_teams_iam_role_arn`). - -To update your kubeconfig, you can run the following command: - -``` -aws eks update-kubeconfig --name ${eks_cluster_id} --region ${AWS_REGION} --role-arn ${TEAM_ROLE_ARN} -``` - -Make sure to replace the `${eks_cluster_id}`, `${AWS_REGION}` and `${TEAM_ROLE_ARN}` with the actual values. diff --git a/eks-worker.tf b/eks-worker.tf deleted file mode 100644 index ec9a5146ef..0000000000 --- a/eks-worker.tf +++ /dev/null @@ -1,44 +0,0 @@ -# --------------------------------------------------------------------------------------------------------------------- -# MANAGED NODE GROUPS -# --------------------------------------------------------------------------------------------------------------------- - -module "aws_eks_managed_node_groups" { - source = "./modules/aws-eks-managed-node-groups" - - for_each = var.managed_node_groups - - managed_ng = each.value - context = local.node_group_context - - depends_on = [kubernetes_config_map.aws_auth] -} - -# --------------------------------------------------------------------------------------------------------------------- -# SELF MANAGED NODE GROUPS -# --------------------------------------------------------------------------------------------------------------------- - -module "aws_eks_self_managed_node_groups" { - source = "./modules/aws-eks-self-managed-node-groups" - - for_each = var.self_managed_node_groups - - self_managed_ng = each.value - context = local.node_group_context - - depends_on = [kubernetes_config_map.aws_auth] -} - -# --------------------------------------------------------------------------------------------------------------------- -# FARGATE PROFILES -# --------------------------------------------------------------------------------------------------------------------- - -module "aws_eks_fargate_profiles" { - source = "./modules/aws-eks-fargate-profiles" - - for_each = var.fargate_profiles - - fargate_profile = each.value - context = local.fargate_context - - depends_on = [kubernetes_config_map.aws_auth] -} diff --git a/examples/agones-game-controller/README.md b/examples/agones-game-controller/README.md index 1a2d48f861..79414773a0 100644 --- a/examples/agones-game-controller/README.md +++ b/examples/agones-game-controller/README.md @@ -1,4 +1,4 @@ -# Amazon EKS Deployment with Agones Gaming Kubernetes Controller +# Agones Game Controller on Amazon EKS This example shows how to deploy and run Gaming applications on Amazon EKS with Agones Kubernetes Controller diff --git a/examples/agones-game-controller/main.tf b/examples/agones-game-controller/main.tf index 8af1ca8b14..14c432ad85 100644 --- a/examples/agones-game-controller/main.tf +++ b/examples/agones-game-controller/main.tf @@ -5,19 +5,27 @@ provider "aws" { provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } } provider "helm" { kubernetes { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token - } -} -data "aws_eks_cluster_auth" "this" { - name = module.eks.cluster_name + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } + } } data "aws_availability_zones" "available" {} @@ -26,11 +34,12 @@ locals { name = basename(path.cwd) region = "us-west-2" - cluster_version = "1.24" - vpc_cidr = "10.0.0.0/16" azs = slice(data.aws_availability_zones.available.names, 0, 3) + gameserver_minport = 7000 + gameserver_maxport = 8000 + tags = { Blueprint = local.name GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints" @@ -44,25 +53,19 @@ locals { #tfsec:ignore:aws-eks-enable-control-plane-logging module "eks" { source = "terraform-aws-modules/eks/aws" - version = "~> 19.12" + version = "~> 19.13" cluster_name = local.name - cluster_version = local.cluster_version + cluster_version = "1.27" cluster_endpoint_public_access = true - # EKS Addons - cluster_addons = { - coredns = {} - kube-proxy = {} - vpc-cni = {} - } - vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnets eks_managed_node_groups = { initial = { instance_types = ["m5.large"] + subnet_ids = module.vpc.public_subnets min_size = 1 max_size = 5 @@ -70,54 +73,81 @@ module "eks" { } } + cluster_security_group_additional_rules = { + ingress_gameserver_tcp = { + description = "Nodes on ephemeral ports" + protocol = "tcp" + from_port = local.gameserver_minport + to_port = local.gameserver_maxport + type = "ingress" + cidr_blocks = ["0.0.0.0/0"] + ipv6_cidr_blocks = ["::/0"] + } + } + tags = local.tags } ################################################################################ -# Kubernetes Addons +# EKS Blueprints Addons ################################################################################ -module "eks_blueprints_kubernetes_addons" { - source = "../../modules/kubernetes-addons" +module "eks_blueprints_addons" { + source = "aws-ia/eks-blueprints-addons/aws" + version = "0.2.0" - eks_cluster_id = module.eks.cluster_name - eks_cluster_endpoint = module.eks.cluster_endpoint - eks_oidc_provider = module.eks.oidc_provider - eks_cluster_version = module.eks.cluster_version + cluster_name = module.eks.cluster_name + cluster_endpoint = module.eks.cluster_endpoint + cluster_version = module.eks.cluster_version + oidc_provider_arn = module.eks.oidc_provider_arn + + # EKS Add-Ons + eks_addons = { + coredns = {} + vpc-cni = {} + kube-proxy = {} + } # Add-ons enable_metrics_server = true enable_cluster_autoscaler = true - # NOTE: Agones requires a Node group in Public Subnets and enable Public IP - enable_agones = true - # Do not be fooled, this is required by the Agones addon - eks_worker_security_group_id = module.eks.cluster_security_group_id - agones_helm_config = { - name = "agones" - chart = "agones" - repository = "https://agones.dev/chart/stable" - version = "1.21.0" - namespace = "agones-system" - - values = [templatefile("${path.module}/helm_values/agones-values.yaml", { - expose_udp = true - gameserver_namespaces = "{${join(",", ["default", "xbox-gameservers", "xbox-gameservers"])}}" - gameserver_minport = 7000 - gameserver_maxport = 8000 - })] - } - tags = local.tags } +################################################################################ +# Agones Helm Chart +################################################################################ + +# NOTE: Agones requires a Node group in Public Subnets and enable Public IP +resource "helm_release" "agones" { + name = "agones" + chart = "agones" + version = "1.21.0" + repository = "https://agones.dev/chart/stable" + description = "Agones helm chart" + namespace = "agones-system" + create_namespace = true + + values = [templatefile("${path.module}/helm_values/agones-values.yaml", { + expose_udp = true + gameserver_namespaces = "{${join(",", ["default", "xbox-gameservers", "xbox-gameservers"])}}" + gameserver_minport = 7000 + gameserver_maxport = 8000 + })] + + depends_on = [ + module.eks_blueprints_addons + ] +} + ################################################################################ # Supporting Resources ################################################################################ module "vpc" { source = "terraform-aws-modules/vpc/aws" - version = "~> 4.0" + version = "~> 5.0" name = local.name cidr = local.vpc_cidr diff --git a/examples/agones-game-controller/versions.tf b/examples/agones-game-controller/versions.tf index 7d1c18b4b2..a4f611af01 100644 --- a/examples/agones-game-controller/versions.tf +++ b/examples/agones-game-controller/versions.tf @@ -6,13 +6,13 @@ terraform { source = "hashicorp/aws" version = ">= 4.47" } - kubernetes = { - source = "hashicorp/kubernetes" - version = ">= 2.17" - } helm = { source = "hashicorp/helm" - version = ">= 2.8" + version = ">= 2.9" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = ">= 2.20" } } diff --git a/examples/amp-amg-opensearch/README.md b/examples/amp-amg-opensearch/README.md deleted file mode 100644 index a796475318..0000000000 --- a/examples/amp-amg-opensearch/README.md +++ /dev/null @@ -1,179 +0,0 @@ -# Observability pattern with EKS Cluster, Amazon Managed Prometheus, Amazon Managed Grafana and Amazon Open Search Service - -This example demonstrates how to use the Amazon EKS Blueprints for Terraform to deploy a new Amazon EKS Cluster with Prometheus server for metrics and AWS Fluent Bit for logs. Outside of the EKS cluster, it also provisions Amazon Managed Prometheus, Amazon OpenSearch Service within a VPC, and integrates Amazon Managed Prometheus with Amazon Managed Grafana. It also deploys a bastion host to let us test OpenSearch. Lastly, it includes a sample workload, provisioned with ArgoCD, to generate logs and metrics. - -Prometheus server collects these metrics and writes to remote Amazon Managed Prometheus endpoint via `remote write` config property. Amazon Managed Grafana is used to visualize the metrics in dashboards by leveraging Amazon Managed Prometheus workspace as a data source. - -AWS FluentBit Addon is configured to collect the container logs from EKS Cluster nodes and write to Amazon Open Search service. - -**NOTE** - -For the sake of simplicity in this example, we store sensitive information and credentials in `dev.tfvars`. This should not be done in a production environment. Instead, use an external secret store such as AWS Secrets Manager and use the [aws_secretsmanager_secret](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret) data source to retrieve them. - -Checkout additional observability patterns at the [AWS Observability Accelerator for Terraform](https://github.com/aws-observability/terraform-aws-observability-accelerator) - -## How to Deploy - -### Prerequisites - -- Terraform -- An AWS Account -- kubectl -- awscli -- jq -- An existing Amazon Managed Grafana workspace. - - As of this writing (February 3, 2022), the AWS Terraform Provider does not support Amazon Managed Grafana, so it must be manually created beforehand. Follow the instructions [here](https://docs.aws.amazon.com/grafana/latest/userguide/getting-started-with-AMG.html) to deploy an Amazon Managed Grafana workspace. - -#### Generate a Grafana API Key - -- Give admin access to the SSO user you set up when creating the Amazon Managed Grafana Workspace: - - In the AWS Console, navigate to Amazon Grafana. In the left navigation bar, click **All workspaces**, then click on the workspace name you are using for this example. - - Under **Authentication** within **AWS Single Sign-On (SSO)**, click **Configure users and user groups** - - Check the box next to the SSO user you created and click **Make admin** -- Navigate back to the Grafana Dashboard. If you don't see the gear icon in the left navigation bar, log out and log back in. -- Click on the gear icon, then click on the **API keys** tab. -- Click **Add API key**, fill in the _Key name_ field and select _Admin_ as the Role. -- Copy your API key into `dev.tfvars` under `grafana_api_key` - -### Deployment Steps - -- Clone this repository: - -``` -git clone https://github.com/aws-ia/terraform-aws-eks-blueprints.git -``` - -- Initialize a working directory - -``` -terraform init -``` - -- Fill-in the values for the variables in `dev.tfvars` - - The password for OpenSearch must be a minimum of eight characters with at least one uppercase, one lowercase, one digit, and one special character. - - If the `AWSServiceRoleForAmazonElasticsearchService` role already exists in your account, set `create_iam_service_linked_role = false`. -- Verify the resources created by this execution: - -``` -export AWS_REGION= # Select your own region -terraform validate -terraform plan -var-file=dev.tfvars -``` - -- Deploy resources: - -``` -terraform apply -var-file=dev.tfvars --auto-approve -``` - -- Add the cluster to your kubeconfig: - -``` -aws eks --region $AWS_REGION update-kubeconfig --name aws001-preprod-observability-eks -``` - -`terraform apply` will provision a new EKS cluster with Fluent Bit, Prometheus, and a sample workload. It will also provision Amazon Managed Prometheus to ingest metrics from Prometheus, an Amazon OpenSearch service domain for ingesting logs from Fluent Bit, and a bastion host so we can test OpenSearch. - ---- - -**NOTE** - -This example automatically generates a key-pair for you and saves the private key to your current directory to make the next steps simpler. In production workloads, it is best practice to use your own key-pair instead of using Terraform to generate one for you. - ---- - -#### Verify that the Resources Deployed Successfully - -- Check that the bastion host we use to test OpenSearch is running in the EC2 Console. - -- Check that the status of OpenSearch is green: - Navigate to Amazon OpenSearch in the AWS Console and select the **opensearch** domain. Verify that _Cluster Health_ under _General Information_ lists Green. - -- Verify that Amazon Managed Prometheus workspace was created successfully: - - - Check the status of Amazon Managed Prometheus workspace through the AWS console. - -- Check that Prometheus Server is healthy: - - - The following command gets the pod that is running the Prometheus server and sets up port fowarding to http://localhost:8080 - - ``` - kubectl port-forward $(kubectl get pods --namespace=prometheus --selector='component=server' --output=name) 8080:9090 -n prometheus - ``` - - - Navigate to http://localhost:8080 and confirm that the dashboard webpage loads. - - Press `CTRL+C` to stop port forwarding. - -- To check that Fluent Bit is working: - - - Fluent Bit is provisioned properly if you see the option to add an index pattern while following the steps for the section below named **Set up an Index Pattern in OpenSearch to Explore Log Data** - -- Check that the sample workload is running: - - Run the command below, then navigate to http://localhost:4040 and confirm the webpage loads. - -``` -kubectl port-forward svc/guestbook-ui -n team-riker 4040:80 -``` - -#### Map the Fluent Bit Role as a Backend Role in OpenSearch - -OpenSearch roles are the core method for controlling access within your OpenSearch cluster. Backend roles are a method for mapping an external identity (such as an IAM role) to an OpenSearch role. Mapping the external identity to an OpenSearch role allows that identity to gain the permissions of that role. Here we map the Fluent Bit IAM role as a backend role to OpenSearch's _all_access_ role. This gives the Fluent Bit IAM role permission to send logs to OpenSearch. Read more about OpenSearch roles [here](https://opensearch.org/docs/latest/security-plugin/access-control/users-roles/). - -- In a different terminal window, navigate back to the example directory and establish and SSH tunnel from https://localhost:9200 to your OpenSearch Service domain through the bastion host: - - Because we provisioned OpenSearch within our VPC, we connect to a bastion host with an SSH tunnel to test and access our OpenSearch endpoints. Refer to the [Amazon OpenSearch Developer Guide](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/vpc.html#vpc-test) for more information. - - For connecting with additional access control using Amazon Cognito, see this [page](https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-outside-vpc-ssh/). - -``` -export PRIVATE_KEY_FILE=bastion_host_private_key.pem -export BASTION_HOST_IP=$(terraform output -raw bastion_host_public_ip) -export OS_VPC_ENDPOINT=$(terraform output -raw opensearch_vpc_endpoint) -ssh -i $PRIVATE_KEY_FILE ec2-user@$BASTION_HOST_IP -N -L "9200:${OS_VPC_ENDPOINT}:443" -``` - -- Back in your first terminal window: - -``` -export BASTION_HOST_IP=$(terraform output -raw bastion_host_public_ip) -export OS_DOMAIN_USER=$(terraform output -raw opensearch_user) -export OS_DOMAIN_PASSWORD=$(terraform output -raw opensearch_pw) -export FLUENTBIT_ROLE="arn:aws:iam::$(aws sts get-caller-identity | jq -r '.Account'):role/aws001-preprod-observability-eks-aws-for-fluent-bit-sa-irsa" - -curl --insecure -sS -u "${OS_DOMAIN_USER}:${OS_DOMAIN_PASSWORD}" \ - -X PATCH \ - https://localhost:9200/_opendistro/_security/api/rolesmapping/all_access?pretty \ - -H 'Content-Type: application/json' \ - -d' -[ - { - "op": "add", "path": "/backend_roles", "value": ["'${FLUENTBIT_ROLE}'"] - } -] -' -``` - -#### Set up an Index Pattern in OpenSearch Dashboards to Explore Log Data - -You must set up an index pattern before you can explore data in the OpenSearch Dashboards. An index pattern selects which data to use. Read more about index patterns [here](https://www.elastic.co/guide/en/kibana/current/index-patterns.html). - -- Follow the steps outlined in **Configure the SOCKS proxy** and **Create the SSH tunnel** sections of this [Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-outside-vpc-ssh/) article to establish a SOCKS5 tunnel from localhost to OpenSearch via the bastion host. -- Log into the AWS console, navigate to Amazon OpenSearch Service, click on the "opensearch" domain and click on the link under **OpenSearch Dashboards URL** to access the OpenSearch Dashboards. -- Log into the OpenSearch Dashboards with the credentials you set in `dev.tfvars` -- From the OpenSearch Dashboards Welcome screen select **Explore on my own** -- On _Select your tenant_ screen, select Private and click **Confirm** -- On the next screen click on the _OpenSearch Dashboards_ tile -- Click **Add your data** -- Click **Create index Pattern** -- Add **\*fluent-bit\*** as the Index pattern and click **Next step** -- Select **@timestamp** as the Time filter field name and close the Configuration window by clicking on **Create index pattern** -- Select **Discover** from the left panel and start exploring the logs - -## Cleanup - -- Run `terraform destroy -var-file=dev.tfvars` to remove all resources except for your Amazon Managed Grafana workspace. -- Delete your Amazon Managed Grafana workspace through the AWS console. -- Delete the private key file: `bastion_host_private_key.pem`. - -## Troubleshooting - -- When running `terraform apply` or `terraform destroy`, the process will sometimes time-out. If that happens, run the command again and the operation will continue where it left off. -- If your connection times out when trying to establish an SSH tunnel with the bastion host, check that you are disconnected from any VPNs. diff --git a/examples/amp-amg-opensearch/data.tf b/examples/amp-amg-opensearch/data.tf deleted file mode 100644 index 0e94faaac5..0000000000 --- a/examples/amp-amg-opensearch/data.tf +++ /dev/null @@ -1,46 +0,0 @@ -data "aws_caller_identity" "current" {} - -data "aws_iam_policy_document" "fluentbit_opensearch_access" { - # Identity Based Policy specifies a list of IAM permissions - # that principal has against OpenSearch service API - # ref: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-identity - statement { - sid = "OpenSearchAccess" - effect = "Allow" - resources = ["${aws_elasticsearch_domain.opensearch.arn}/*"] - actions = ["es:ESHttp*"] - } -} - -data "aws_iam_policy_document" "opensearch_access_policy" { - # This is the resource-based policy that allows to set access permissions on OpenSearch level - # To be working properly the client must support IAM (SDK, fluent-bit with sigv4, etc.) Browsers don't do IAM. - # ref: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-types-resource - statement { - sid = "WriteDomainLevelAccessToOpenSearch" - effect = "Allow" - resources = ["${aws_elasticsearch_domain.opensearch.arn}/*"] # this can be an index prefix like '/foo-*' - actions = [ #ref: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ac.html#ac-reference - "es:ESHttpPost", - "es:ESHttpPut" - ] - principals { - type = "AWS" - identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/amp-amg-opensearch-aws-for-fluent-bit-sa-irsa"] - } - } - - statement { - sid = "AdminDomainLevelAccessToOpenSearch" - effect = "Allow" - resources = [ - aws_elasticsearch_domain.opensearch.arn, - "${aws_elasticsearch_domain.opensearch.arn}/*", - ] - actions = ["es:*"] - principals { - type = "*" - identifiers = ["*"] # must be set to wildcard when clients can't sign sigv4 or pass IAM to OpenSearch (aka browsers) - } - } -} diff --git a/examples/amp-amg-opensearch/helm_values/aws-for-fluentbit-values.yaml b/examples/amp-amg-opensearch/helm_values/aws-for-fluentbit-values.yaml deleted file mode 100644 index 581819e18c..0000000000 --- a/examples/amp-amg-opensearch/helm_values/aws-for-fluentbit-values.yaml +++ /dev/null @@ -1,19 +0,0 @@ -serviceAccount: - create: false - name: "aws-for-fluent-bit-sa" - -elasticsearch: - enabled: true - match: "*" - awsRegion: ${aws_region} - host: ${host} - -# These plugins are not used in this example. They are enabled by default if not explicitly disabled -firehose: - enabled: false - -kinesis: - enabled: false - -cloudWatch: - enabled: false diff --git a/examples/amp-amg-opensearch/main.tf b/examples/amp-amg-opensearch/main.tf deleted file mode 100644 index 9f1d8e1ba7..0000000000 --- a/examples/amp-amg-opensearch/main.tf +++ /dev/null @@ -1,281 +0,0 @@ -provider "aws" { - region = local.region -} - -provider "kubernetes" { - host = module.eks.cluster_endpoint - cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token -} - -provider "helm" { - kubernetes { - host = module.eks.cluster_endpoint - cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token - } -} - -data "aws_eks_cluster_auth" "this" { - name = module.eks.cluster_name -} - -data "aws_availability_zones" "available" {} - -provider "grafana" { - url = var.grafana_endpoint - auth = var.grafana_api_key -} - -locals { - name = basename(path.cwd) - region = "us-west-2" - - cluster_version = "1.24" - - vpc_cidr = "10.0.0.0/16" - azs = slice(data.aws_availability_zones.available.names, 0, 3) - - tags = { - Blueprint = local.name - GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints" - } -} - -################################################################################ -# Cluster -################################################################################ - -#tfsec:ignore:aws-eks-enable-control-plane-logging -module "eks" { - source = "terraform-aws-modules/eks/aws" - version = "~> 19.12" - - cluster_name = local.name - cluster_version = local.cluster_version - cluster_endpoint_public_access = true - - # EKS Addons - cluster_addons = { - coredns = {} - kube-proxy = {} - vpc-cni = {} - } - - vpc_id = module.vpc.vpc_id - subnet_ids = module.vpc.private_subnets - - eks_managed_node_groups = { - initial = { - instance_types = ["m5.large"] - - min_size = 1 - max_size = 5 - desired_size = 2 - } - } - - tags = local.tags -} - -################################################################################ -# Kubernetes Addons -################################################################################ - -module "eks_blueprints_kubernetes_addons" { - source = "../../modules/kubernetes-addons" - - eks_cluster_id = module.eks.cluster_name - eks_cluster_endpoint = module.eks.cluster_endpoint - eks_oidc_provider = module.eks.oidc_provider - eks_cluster_version = module.eks.cluster_version - - # Add-ons - enable_metrics_server = true - enable_cluster_autoscaler = true - enable_argocd = true - argocd_applications = { - workloads = { - path = "envs/dev" - repo_url = "https://github.com/aws-samples/eks-blueprints-workloads.git" - add_on_application = false - } - } - - enable_aws_for_fluentbit = true - aws_for_fluentbit_irsa_policies = [aws_iam_policy.fluentbit_opensearch_access.arn] - aws_for_fluentbit_helm_config = { - values = [templatefile("${path.module}/helm_values/aws-for-fluentbit-values.yaml", { - aws_region = local.region - host = aws_elasticsearch_domain.opensearch.endpoint - })] - } - - enable_amazon_eks_aws_ebs_csi_driver = true - enable_prometheus = true - enable_amazon_prometheus = true - amazon_prometheus_workspace_endpoint = module.managed_prometheus.workspace_prometheus_endpoint - - tags = local.tags -} - -#--------------------------------------------------------------- -# Configure AMP as a Grafana Data Source -#--------------------------------------------------------------- -resource "grafana_data_source" "prometheus" { - type = "prometheus" - name = "amp" - is_default = true - url = module.managed_prometheus.workspace_prometheus_endpoint - - json_data { - http_method = "POST" - sigv4_auth = true - sigv4_auth_type = "workspace-iam-role" - sigv4_region = local.region - } -} - -#--------------------------------------------------------------- -# Provision OpenSearch and Allow Access -#--------------------------------------------------------------- -#tfsec:ignore:aws-elastic-search-enable-domain-logging -resource "aws_elasticsearch_domain" "opensearch" { - domain_name = "opensearch" - elasticsearch_version = "OpenSearch_1.3" - - cluster_config { - instance_type = "m6g.large.elasticsearch" - instance_count = 3 - zone_awareness_enabled = true - - zone_awareness_config { - availability_zone_count = 3 - } - } - - node_to_node_encryption { - enabled = true - } - - domain_endpoint_options { - enforce_https = true - tls_security_policy = "Policy-Min-TLS-1-2-2019-07" - } - - encrypt_at_rest { - enabled = true - } - - ebs_options { - ebs_enabled = true - volume_size = 10 - } - - advanced_security_options { - enabled = false - internal_user_database_enabled = true - - master_user_options { - master_user_name = var.opensearch_dashboard_user - master_user_password = var.opensearch_dashboard_pw - } - } - - vpc_options { - subnet_ids = module.vpc.public_subnets - security_group_ids = [aws_security_group.opensearch_access.id] - } - - depends_on = [ - aws_iam_service_linked_role.opensearch - ] - - tags = local.tags -} - -resource "aws_iam_service_linked_role" "opensearch" { - count = var.create_iam_service_linked_role == true ? 1 : 0 - aws_service_name = "es.amazonaws.com" -} - -resource "aws_iam_policy" "fluentbit_opensearch_access" { - name = "fluentbit_opensearch_access" - description = "IAM policy to allow Fluentbit access to OpenSearch" - policy = data.aws_iam_policy_document.fluentbit_opensearch_access.json -} - -resource "aws_elasticsearch_domain_policy" "opensearch_access_policy" { - domain_name = aws_elasticsearch_domain.opensearch.domain_name - access_policies = data.aws_iam_policy_document.opensearch_access_policy.json -} - -resource "aws_security_group" "opensearch_access" { - vpc_id = module.vpc.vpc_id - description = "OpenSearch access" - - ingress { - description = "host access to OpenSearch" - from_port = 443 - to_port = 443 - protocol = "tcp" - self = true - } - - ingress { - description = "allow instances in the VPC (like EKS) to communicate with OpenSearch" - from_port = 443 - to_port = 443 - protocol = "tcp" - - cidr_blocks = [module.vpc.vpc_cidr_block] - } - - egress { - description = "Allow all outbound access" - from_port = 0 - to_port = 0 - protocol = "-1" - cidr_blocks = ["0.0.0.0/0"] #tfsec:ignore:aws-vpc-no-public-egress-sgr - } - - tags = local.tags -} - -################################################################################ -# Supporting Resources -################################################################################ - -module "managed_prometheus" { - source = "terraform-aws-modules/managed-service-prometheus/aws" - version = "~> 2.1" - - workspace_alias = local.name - - tags = local.tags -} - -module "vpc" { - source = "terraform-aws-modules/vpc/aws" - version = "~> 4.0" - - name = local.name - cidr = local.vpc_cidr - - azs = local.azs - private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)] - public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)] - - enable_nat_gateway = true - single_nat_gateway = true - - public_subnet_tags = { - "kubernetes.io/role/elb" = 1 - } - - private_subnet_tags = { - "kubernetes.io/role/internal-elb" = 1 - } - - tags = local.tags -} diff --git a/examples/amp-amg-opensearch/outputs.tf b/examples/amp-amg-opensearch/outputs.tf deleted file mode 100644 index d82e1c7f1e..0000000000 --- a/examples/amp-amg-opensearch/outputs.tf +++ /dev/null @@ -1,20 +0,0 @@ -output "opensearch_pw" { - description = "Amazon OpenSearch Service Domain password" - value = var.opensearch_dashboard_pw - sensitive = true -} - -output "opensearch_user" { - description = "Amazon OpenSearch Service Domain username" - value = var.opensearch_dashboard_user -} - -output "opensearch_vpc_endpoint" { - description = "Amazon OpenSearch Service Domain-specific endpoint" - value = aws_elasticsearch_domain.opensearch.endpoint -} - -output "configure_kubectl" { - description = "Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig" - value = "aws eks --region ${local.region} update-kubeconfig --name ${module.eks.cluster_name}" -} diff --git a/examples/amp-amg-opensearch/variables.tf b/examples/amp-amg-opensearch/variables.tf deleted file mode 100644 index 52d29445df..0000000000 --- a/examples/amp-amg-opensearch/variables.tf +++ /dev/null @@ -1,31 +0,0 @@ -variable "grafana_endpoint" { - description = "Grafana endpoint" - type = string - default = "https://example.com" -} - -variable "grafana_api_key" { - description = "Api key for authorizing the Grafana provider to make changes to Amazon Managed Grafana" - type = string - default = "" - sensitive = true -} - -variable "opensearch_dashboard_user" { - description = "OpenSearch dashboard user" - type = string - default = "" -} - -variable "opensearch_dashboard_pw" { - description = "OpenSearch dashboard user password" - type = string - default = "" - sensitive = true -} - -variable "create_iam_service_linked_role" { - description = "Whether to create the AWSServiceRoleForAmazonElasticsearchService role used by the OpenSearch service" - type = bool - default = true -} diff --git a/examples/appmesh-mtls/README.md b/examples/appmesh-mtls/README.md index 0af6fdcf33..21788920ad 100644 --- a/examples/appmesh-mtls/README.md +++ b/examples/appmesh-mtls/README.md @@ -1,6 +1,6 @@ # EKS Cluster w/ AppMesh mTLS -This example shows how to provision an EKS cluster with AppMesh mTLS enabled. +This examples demonstrates how to deploy an Amazon EKS cluster with AppMesh mTLS enabled. ## Prerequisites: @@ -16,6 +16,8 @@ To provision this example: ```sh terraform init +terraform apply -target module.vpc +terraform apply -target module.eks -target module.eks_blueprints_addons terraform apply ``` @@ -26,36 +28,252 @@ Enter `yes` at command prompt to apply The following command will update the `kubeconfig` on your local machine and allow you to interact with your EKS Cluster using `kubectl` to validate the deployment. -1. Run `update-kubeconfig` command: +1. Check the Terraform provided Output, to update your `kubeconfig` + +```hcl +Apply complete! Resources: 63 added, 0 changed, 0 destroyed. + +Outputs: + +configure_kubectl = "aws eks --region us-west-2 update-kubeconfig " +``` + +This example deploys the folowing Kubernetes resources: +* The `appmesh-controller` in the `appmesh-system` Namespace. +* The `cert-manager` resources on `cert-manager` Namespace. +* The `aws-privateca-issuer` on `kube-system` Namespace. +* A Cluster Issuer `appmesh-mtls`. +* A Certificate `example`. +* A Secret named `example-clusterissuer` in the `default` Namespace, generated by `aws-privateca-issuer` tied to the `example` Certificate. + +2. List the created Resources. + +```sh +kubectl get pods -A +NAMESPACE NAME READY STATUS RESTARTS AGE +amazon-guardduty aws-guardduty-agent-54tlt 1/1 Running 0 4h42m +amazon-guardduty aws-guardduty-agent-tl574 1/1 Running 0 4h42m +appmesh-system appmesh-controller-7c98b87bdc-q6226 1/1 Running 0 4h44m +cert-manager cert-manager-87f5555f-tcxj7 1/1 Running 0 4h43m +cert-manager cert-manager-cainjector-8448ff8ddb-wwjsc 1/1 Running 0 4h43m +cert-manager cert-manager-webhook-5468b675b-fvdwk 1/1 Running 0 4h43m +kube-system aws-node-rf4wg 1/1 Running 0 4h43m +kube-system aws-node-skkwh 1/1 Running 0 4h43m +kube-system aws-privateca-issuer-b6fb8c5bd-hh8q4 1/1 Running 0 4h44m +kube-system coredns-5f9f955df6-qhr6p 1/1 Running 0 4h44m +kube-system coredns-5f9f955df6-tw8r7 1/1 Running 0 4h44m +kube-system kube-proxy-q72l9 1/1 Running 0 4h43m +kube-system kube-proxy-w54pc 1/1 Running 0 4h43m +``` ```sh -aws eks --region update-kubeconfig --name +kubectl get awspcaclusterissuers.awspca.cert-manager.io +NAME AGE +appmesh-mtls 4h42m ``` -2. List the nodes running currently +```sh +kubectl get certificate +NAME READY SECRET AGE +example True example-clusterissuer 4h12m +``` ```sh -kubectl get nodes +kubectl describe secret example-clusterissuer +Name: example-clusterissuer +Namespace: default +Labels: controller.cert-manager.io/fao=true +Annotations: cert-manager.io/alt-names: + cert-manager.io/certificate-name: example + cert-manager.io/common-name: example.com + cert-manager.io/ip-sans: + cert-manager.io/issuer-group: awspca.cert-manager.io + cert-manager.io/issuer-kind: AWSPCAClusterIssuer + cert-manager.io/issuer-name: appmesh-mtls + cert-manager.io/uri-sans: + +Type: kubernetes.io/tls + +Data +==== +ca.crt: 1785 bytes +tls.crt: 1517 bytes +tls.key: 1675 bytes +``` -# Output should look like below -NAME STATUS ROLES AGE VERSION -ip-10-0-30-125.us-west-2.compute.internal Ready 2m19s v1.22.9-eks-810597c + +3. Create the AWS App Mesh Resources on your Amazon EKS Cluster. Full documentation can be found [here](https://docs.aws.amazon.com/app-mesh/latest/userguide/getting-started-kubernetes.html#configure-app-mesh). + + 1. Annotate the `default` Namespace to allow Side Car Injection. + +```sh +kubectl label namespaces default appmesh.k8s.aws/sidecarInjectorWebhook=enabled +namespace/default labeled ``` -3. List out the pods running currently: + 2. Create the Mesh. ```sh -kubectl get pods -A +cat < update-kubeconfig --name + aws eks --region update-kubeconfig --name --alias ``` 2. List out the pods running currently: @@ -53,65 +53,69 @@ The following command will update the `kubeconfig` on your local machine and all ```sh kubectl get pods -A - NAMESPACE NAME READY STATUS RESTARTS AGE - argo-rollouts argo-rollouts-5656b86459-jgssp 1/1 Running 0 6m59s - argo-rollouts argo-rollouts-5656b86459-kncxg 1/1 Running 0 6m59s - argocd argo-cd-argocd-application-controller-0 1/1 Running 0 15m - argocd argo-cd-argocd-applicationset-controller-9f66b8d6b-bnvqk 1/1 Running 0 15m - argocd argo-cd-argocd-dex-server-66c5769c46-kxns4 1/1 Running 0 15m - argocd argo-cd-argocd-notifications-controller-74c78485d-fgh4w 1/1 Running 0 15m - argocd argo-cd-argocd-repo-server-77b8c98d6f-kcq6j 1/1 Running 0 15m - argocd argo-cd-argocd-repo-server-77b8c98d6f-mt7nf 1/1 Running 0 15m - argocd argo-cd-argocd-server-849d775f7b-t2crt 1/1 Running 0 15m - argocd argo-cd-argocd-server-849d775f7b-vnwtq 1/1 Running 0 15m - argocd argo-cd-redis-ha-haproxy-578979d984-5chwx 1/1 Running 0 15m - argocd argo-cd-redis-ha-haproxy-578979d984-74qdg 1/1 Running 0 15m - argocd argo-cd-redis-ha-haproxy-578979d984-9dwf2 1/1 Running 0 15m - argocd argo-cd-redis-ha-server-0 4/4 Running 0 15m - argocd argo-cd-redis-ha-server-1 4/4 Running 0 12m - argocd argo-cd-redis-ha-server-2 4/4 Running 0 11m - aws-for-fluent-bit aws-for-fluent-bit-7gwzd 1/1 Running 0 7m10s - aws-for-fluent-bit aws-for-fluent-bit-9gzqw 1/1 Running 0 7m10s - aws-for-fluent-bit aws-for-fluent-bit-csrgh 1/1 Running 0 7m10s - aws-for-fluent-bit aws-for-fluent-bit-h9vtm 1/1 Running 0 7m10s - aws-for-fluent-bit aws-for-fluent-bit-p4bmj 1/1 Running 0 7m10s - cert-manager cert-manager-765c5d7777-k7jkk 1/1 Running 0 7m6s - cert-manager cert-manager-cainjector-6bc9d758b-kt8dm 1/1 Running 0 7m6s - cert-manager cert-manager-webhook-586d45d5ff-szkc7 1/1 Running 0 7m6s - geolocationapi geolocationapi-fbb6987f8-d22qv 2/2 Running 0 6m15s - geolocationapi geolocationapi-fbb6987f8-fqshh 2/2 Running 0 6m15s - karpenter karpenter-5d65d77779-nnsjp 2/2 Running 0 7m42s - keda keda-operator-676b4b8d8c-5bjmt 1/1 Running 0 7m16s - keda keda-operator-metrics-apiserver-5d679f968c-jkhz8 1/1 Running 0 7m16s - kube-system aws-node-66dl8 1/1 Running 0 14m - kube-system aws-node-7fgks 1/1 Running 0 14m - kube-system aws-node-828t9 1/1 Running 0 14m - kube-system aws-node-k7phx 1/1 Running 0 14m - kube-system aws-node-rptsc 1/1 Running 0 14m - kube-system cluster-autoscaler-aws-cluster-autoscaler-74456d5cc9-hfqlz 1/1 Running 0 7m24s - kube-system coredns-657694c6f4-kp6sm 1/1 Running 0 19m - kube-system coredns-657694c6f4-wcqh2 1/1 Running 0 19m - kube-system kube-proxy-6zwcj 1/1 Running 0 14m - kube-system kube-proxy-9kkg7 1/1 Running 0 14m - kube-system kube-proxy-q9bgv 1/1 Running 0 14m - kube-system kube-proxy-rzndg 1/1 Running 0 14m - kube-system kube-proxy-w86mz 1/1 Running 0 14m - kube-system metrics-server-694d47d564-psr4s 1/1 Running 0 6m37s - prometheus prometheus-alertmanager-758597fd7-pntlj 2/2 Running 0 7m18s - prometheus prometheus-kube-state-metrics-5fd8648d78-w48p2 1/1 Running 0 7m18s - prometheus prometheus-node-exporter-7wr8x 1/1 Running 0 7m18s - prometheus prometheus-node-exporter-9hjzw 1/1 Running 0 7m19s - prometheus prometheus-node-exporter-kjsxt 1/1 Running 0 7m18s - prometheus prometheus-node-exporter-mr9cx 1/1 Running 0 7m19s - prometheus prometheus-node-exporter-qmm58 1/1 Running 0 7m19s - prometheus prometheus-pushgateway-8696df5474-cv59q 1/1 Running 0 7m18s - prometheus prometheus-server-58c58c58cc-n4242 2/2 Running 0 7m18s - team-burnham nginx-66b6c48dd5-nnp9l 1/1 Running 0 7m39s - team-riker guestbook-ui-6847557d79-lrms2 1/1 Running 0 7m39s - traefik traefik-b9955f58-pc2zp 1/1 Running 0 7m4s - vpa vpa-recommender-554f56647b-lcz9w 1/1 Running 0 7m35s - vpa vpa-updater-67d6c5c7cf-b9hw4 1/1 Running 0 7m35s - yunikorn yunikorn-scheduler-5c446fcc89-lcmmm 2/2 Running 0 7m28s + NAMESPACE NAME READY STATUS RESTARTS AGE + argo-rollouts argo-rollouts-5d47ccb8d4-854s6 1/1 Running 0 23h + argo-rollouts argo-rollouts-5d47ccb8d4-srjk9 1/1 Running 0 23h + argocd argo-cd-argocd-application-controller-0 1/1 Running 0 24h + argocd argo-cd-argocd-applicationset-controller-547f9cfd68-kp89p 1/1 Running 0 24h + argocd argo-cd-argocd-dex-server-55765f7cd7-t8r2f 1/1 Running 0 24h + argocd argo-cd-argocd-notifications-controller-657df4dbcb-p596r 1/1 Running 0 24h + argocd argo-cd-argocd-repo-server-7d4dddf886-2vmgt 1/1 Running 0 24h + argocd argo-cd-argocd-repo-server-7d4dddf886-bm7tz 1/1 Running 0 24h + argocd argo-cd-argocd-server-775ddf74b8-8jzvc 1/1 Running 0 24h + argocd argo-cd-argocd-server-775ddf74b8-z6lz6 1/1 Running 0 24h + argocd argo-cd-redis-ha-haproxy-6d7b7d4656-b8bt8 1/1 Running 0 24h + argocd argo-cd-redis-ha-haproxy-6d7b7d4656-mgjx5 1/1 Running 0 24h + argocd argo-cd-redis-ha-haproxy-6d7b7d4656-qsbgw 1/1 Running 0 24h + argocd argo-cd-redis-ha-server-0 4/4 Running 0 24h + argocd argo-cd-redis-ha-server-1 4/4 Running 0 24h + argocd argo-cd-redis-ha-server-2 4/4 Running 0 24h + cert-manager cert-manager-586ccb6656-2v8mf 1/1 Running 0 23h + cert-manager cert-manager-cainjector-99d64d795-2gwnj 1/1 Running 0 23h + cert-manager cert-manager-webhook-8d87786cb-24kww 1/1 Running 0 23h + geolocationapi geolocationapi-85599c5c74-rqqqs 2/2 Running 0 25m + geolocationapi geolocationapi-85599c5c74-whsp6 2/2 Running 0 25m + geordie downstream0-7f6ff946b6-r8sxc 1/1 Running 0 25m + geordie downstream1-64c7db6f9-rsbk5 1/1 Running 0 25m + geordie frontend-646bfb947c-wshpb 1/1 Running 0 25m + geordie redis-server-6bd7885d5d-s7rqw 1/1 Running 0 25m + geordie yelb-appserver-5d89946ffd-vkxt9 1/1 Running 0 25m + geordie yelb-db-697bd9f9d9-2t4b6 1/1 Running 0 25m + geordie yelb-ui-75ff8b96ff-fh6bw 1/1 Running 0 25m + karpenter karpenter-7b99fb785d-87k6h 1/1 Running 0 106m + karpenter karpenter-7b99fb785d-lkq9l 1/1 Running 0 106m + kube-system aws-load-balancer-controller-6cf9bdbfdf-h7bzb 1/1 Running 0 20m + kube-system aws-load-balancer-controller-6cf9bdbfdf-vfbrj 1/1 Running 0 20m + kube-system aws-node-cvjmq 1/1 Running 0 24h + kube-system aws-node-fw7zc 1/1 Running 0 24h + kube-system aws-node-l7589 1/1 Running 0 24h + kube-system aws-node-nll82 1/1 Running 0 24h + kube-system aws-node-zhz8l 1/1 Running 0 24h + kube-system coredns-7975d6fb9b-5sf7r 1/1 Running 0 24h + kube-system coredns-7975d6fb9b-k78dz 1/1 Running 0 24h + kube-system ebs-csi-controller-5cd4944c94-7jwlb 6/6 Running 0 24h + kube-system ebs-csi-controller-5cd4944c94-8tcsg 6/6 Running 0 24h + kube-system ebs-csi-node-66jmx 3/3 Running 0 24h + kube-system ebs-csi-node-b2pw4 3/3 Running 0 24h + kube-system ebs-csi-node-g4v9z 3/3 Running 0 24h + kube-system ebs-csi-node-k7nvp 3/3 Running 0 24h + kube-system ebs-csi-node-tfq9q 3/3 Running 0 24h + kube-system kube-proxy-4x8vm 1/1 Running 0 24h + kube-system kube-proxy-gtlpm 1/1 Running 0 24h + kube-system kube-proxy-vfnbf 1/1 Running 0 24h + kube-system kube-proxy-z9wdh 1/1 Running 0 24h + kube-system kube-proxy-zzx9m 1/1 Running 0 24h + kube-system metrics-server-7f4db5fd87-9n6dv 1/1 Running 0 23h + kube-system metrics-server-7f4db5fd87-t8wxg 1/1 Running 0 23h + kube-system metrics-server-7f4db5fd87-xcxlv 1/1 Running 0 23h + team-burnham burnham-66fccc4fb5-k4qtm 1/1 Running 0 25m + team-burnham burnham-66fccc4fb5-rrf4j 1/1 Running 0 25m + team-burnham burnham-66fccc4fb5-s9kbr 1/1 Running 0 25m + team-burnham nginx-7d47cfdff7-lzdjb 1/1 Running 0 25m + team-riker deployment-2048-6f7c78f959-h76rx 1/1 Running 0 25m + team-riker deployment-2048-6f7c78f959-skmrr 1/1 Running 0 25m + team-riker deployment-2048-6f7c78f959-tn9dw 1/1 Running 0 25m + team-riker guestbook-ui-c86c478bd-zg2z4 1/1 Running 0 25m ``` 3. You can access the ArgoCD UI by running the following command: diff --git a/examples/argocd/main.tf b/examples/argocd/main.tf index 99f3e5994f..423732c457 100644 --- a/examples/argocd/main.tf +++ b/examples/argocd/main.tf @@ -5,31 +5,37 @@ provider "aws" { provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } } provider "helm" { kubernetes { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } } } provider "bcrypt" {} -data "aws_eks_cluster_auth" "this" { - name = module.eks.cluster_name -} - data "aws_availability_zones" "available" {} locals { name = basename(path.cwd) region = "us-west-2" - cluster_version = "1.24" - vpc_cidr = "10.0.0.0/16" azs = slice(data.aws_availability_zones.available.names, 0, 3) @@ -46,10 +52,10 @@ locals { #tfsec:ignore:aws-eks-enable-control-plane-logging module "eks" { source = "terraform-aws-modules/eks/aws" - version = "~> 19.12" + version = "~> 19.13" cluster_name = local.name - cluster_version = local.cluster_version + cluster_version = "1.27" cluster_endpoint_public_access = true # EKS Addons @@ -76,16 +82,19 @@ module "eks" { } ################################################################################ -# Kubernetes Addons +# EKS Blueprints Addons ################################################################################ -module "eks_blueprints_kubernetes_addons" { - source = "../../modules/kubernetes-addons" +module "eks_blueprints_addons" { + # Users should pin the version to the latest available release + # tflint-ignore: terraform_module_pinned_source + source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons?ref=v4.31.0" - eks_cluster_id = module.eks.cluster_name - eks_cluster_endpoint = module.eks.cluster_endpoint - eks_oidc_provider = module.eks.oidc_provider - eks_cluster_version = module.eks.cluster_version + eks_cluster_id = module.eks.cluster_name + eks_cluster_endpoint = module.eks.cluster_endpoint + eks_cluster_version = module.eks.cluster_version + eks_oidc_provider = module.eks.oidc_provider + eks_oidc_provider_arn = module.eks.oidc_provider_arn enable_argocd = true # This example shows how to set default ArgoCD Admin Password using SecretsManager with Helm Chart set_sensitive values. @@ -98,15 +107,6 @@ module "eks_blueprints_kubernetes_addons" { ] } - keda_helm_config = { - values = [ - { - name = "serviceAccount.create" - value = "false" - } - ] - } - argocd_manage_add_ons = true # Indicates that ArgoCD is responsible for managing/deploying add-ons argocd_applications = { addons = { @@ -123,19 +123,11 @@ module "eks_blueprints_kubernetes_addons" { # Add-ons enable_amazon_eks_aws_ebs_csi_driver = true - enable_aws_for_fluentbit = true - # Let fluentbit create the cw log group - aws_for_fluentbit_create_cw_log_group = false - enable_cert_manager = true - enable_cluster_autoscaler = true - enable_karpenter = true - enable_keda = true - enable_metrics_server = true - enable_prometheus = true - enable_traefik = true - enable_vpa = true - enable_yunikorn = true - enable_argo_rollouts = true + enable_aws_load_balancer_controller = true + enable_cert_manager = true + enable_karpenter = true + enable_metrics_server = true + enable_argo_rollouts = true tags = local.tags } @@ -173,7 +165,7 @@ resource "aws_secretsmanager_secret_version" "argocd" { module "vpc" { source = "terraform-aws-modules/vpc/aws" - version = "~> 4.0" + version = "~> 5.0" name = local.name cidr = local.vpc_cidr diff --git a/examples/argocd/outputs.tf b/examples/argocd/outputs.tf index c624023e90..d79912bf44 100644 --- a/examples/argocd/outputs.tf +++ b/examples/argocd/outputs.tf @@ -1,4 +1,4 @@ output "configure_kubectl" { description = "Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig" - value = "aws eks --region ${local.region} update-kubeconfig --name ${module.eks.cluster_name}" + value = "aws eks update-kubeconfig --name ${module.eks.cluster_name} --alias ${module.eks.cluster_name}" } diff --git a/examples/argocd/versions.tf b/examples/argocd/versions.tf index bd247274fc..aa00573a68 100644 --- a/examples/argocd/versions.tf +++ b/examples/argocd/versions.tf @@ -6,17 +6,17 @@ terraform { source = "hashicorp/aws" version = ">= 4.47" } - kubernetes = { - source = "hashicorp/kubernetes" - version = ">= 2.17" - } helm = { source = "hashicorp/helm" - version = ">= 2.8" + version = ">= 2.9" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = ">= 2.20" } random = { source = "hashicorp/random" - version = "3.3.2" + version = ">= 3.5" } bcrypt = { source = "viktorradnai/bcrypt" diff --git a/examples/blue-green-upgrade/README.md b/examples/blue-green-upgrade/README.md index 5f59f62ff7..73d2f775da 100644 --- a/examples/blue-green-upgrade/README.md +++ b/examples/blue-green-upgrade/README.md @@ -1,9 +1,11 @@ -# Blue/Green or Canary Amazon EKS clusters migration for stateless ArgoCD workloads +# Blue/Green Migration This directory provides a solution based on [EKS Blueprint for Terraform](https://aws-ia.github.io/terraform-aws-eks-blueprints) that shows how to leverage blue/green or canary application workload migration between EKS clusters, using [Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-weighted.html) weighted routing feature. The workloads will be dynamically exposed using [AWS LoadBalancer Controller](https://aws-ia.github.io/terraform-aws-eks-blueprints/add-ons/aws-load-balancer-controller/) and [External DNS add-on](https://aws-ia.github.io/terraform-aws-eks-blueprints/add-ons/external-dns/). We are leveraging [the existing EKS Blueprints Workloads GitHub repository sample](https://github.com/aws-samples/eks-blueprints-workloads) to deploy our GitOps [ArgoCD](https://aws-ia.github.io/terraform-aws-eks-blueprints/add-ons/argocd/) applications, which are defined as helm charts. We are leveraging [ArgoCD Apps of apps](https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/) pattern where an ArgoCD Application can also reference other Helm charts to deploy. +> You can also find more informations in the [associated blog post](https://aws.amazon.com/blogs/containers/blue-green-or-canary-amazon-eks-clusters-migration-for-stateless-argocd-workloads/) + ## Table of content - [Blue/Green or Canary Amazon EKS clusters migration for stateless ArgoCD workloads](#bluegreen-or-canary-amazon-eks-clusters-migration-for-stateless-argocd-workloads) @@ -168,7 +170,6 @@ Note it will allow the role associated to the parameter **eks_admin_role_name** You can also connect with the user who created the EKS cluster without specifying the `--role-arn` parameter - Next, you can interact with the cluster and see the deployment ```bash @@ -338,7 +339,7 @@ In this sample, we uses a simple terraform variable to control the weight for al > This section, can be executed in either eks-blue or eks-green folders, or in both if you want to delete both clusters. -In order to properly destroy the Cluster, we need first to remove the ArgoCD workloads, while keeping the ArgoCD addons. +In order to properly destroy the Cluster, we need first to remove the ArgoCD workloads, while keeping the ArgoCD addons. We will also need to remove our Karpenter provisioners, and any other objects you created outside of Terraform that needs to be cleaned before destroying the terraform stack. Why doing this? When we remove an ingress object, we want the associated Kubernetes add-ons like aws load balancer controller and External DNS to correctly free the associated AWS resources. If we directly ask terraform to destroy everything, it can remove first theses controllers without allowing them the time to remove associated aws resources that will still existing in AWS, preventing us to completely delete our cluster. @@ -350,13 +351,22 @@ Why doing this? When we remove an ingress object, we want the associated Kuberne #### Manual -1. Delete Workloads App of App +1. If also deployed, delete your Karpenter provisioners + +this is safe to delete if no addons are deployed on Karpenter, which is the case here. +If not we should separate the team-platform deployments which installed Karpenter provisioners in a separate ArgoCD Application to avoid any conflicts. + +```bash +kubectl delete provisioners.karpenter.sh --all +``` + +2. Delete Workloads App of App ```bash kubectl delete application workloads -n argocd ``` -2. If also deployed, delete ecsdemo App of App +3. If also deployed, delete ecsdemo App of App ```bash kubectl delete application ecsdemo -n argocd @@ -366,11 +376,11 @@ Once every workload applications as been freed on AWS side, (this can take some > Note: it can take time to deregister all load balancers, verify that you don't have any more AWS resources created by EKS prior to start destroying EKS with terraform. -3. Destroy terraform resources +4. Destroy terraform resources ```bash -terraform apply -destroy -target="module.kubernetes_addons" -auto-approve -terraform apply -destroy -target="module.eks_blueprints" -auto-approve +terraform apply -destroy -target="module.eks_cluster.module.kubernetes_addons" -auto-approve +terraform apply -destroy -target="module.eks_cluster.module.eks" -auto-approve terraform apply -destroy -auto-approve ``` diff --git a/examples/eks-efa/.gitignore b/examples/eks-efa/.gitignore deleted file mode 100644 index 55e2910e38..0000000000 --- a/examples/eks-efa/.gitignore +++ /dev/null @@ -1,6 +0,0 @@ -tfplan -*.tfstate -*.backup -TODO*.* -.terraform -*.hcl diff --git a/examples/eks-efa/README.md b/examples/eks-efa/README.md deleted file mode 100644 index de6a0ddc9d..0000000000 --- a/examples/eks-efa/README.md +++ /dev/null @@ -1,659 +0,0 @@ -# EKS Blueprint Example with Elastic Fabric Adapter - -## Table of Contents - -- [EKS Blueprint Example with Elastic Fabric Adapter](#eks-blueprint-example-with-elastic-fabric-adapter) - - [Table of Contents](#table-of-contents) - - [Elastic Fabric Adapter Overview](#elastic-fabric-adapter-overview) - - [Setup Details](#setup-details) -- [Terraform Doc](#terraform-doc) - - [Requirements](#requirements) - - [Providers](#providers) - - [Modules](#modules) - - [Resources](#resources) - - [Inputs](#inputs) - - [Outputs](#outputs) -- [Example Walkthrough](#example-walkthrough) - - [1. Clone Repository](#1-clone-repository) - - [2. Configure Terraform Plan](#2-configure-terraform-plan) - - [3. Initialize Terraform Plan](#3-initialize-terraform-plan) - - [4. Create Terraform Plan](#4-create-terraform-plan) - - [5. Apply Terraform Plan](#5-apply-terraform-plan) - - [6. Connect to EKS](#6-connect-to-eks) - - [7. Deploy Kubeflow MPI Operator](#7-deploy-kubeflow-mpi-operator) - - [8. Test EFA](#8-test-efa) - - [8.1. EFA Info Test](#81-efa-info-test) - - [8.2. EFA NCCL Test](#82-efa-nccl-test) - - [9. Cleanup](#9-cleanup) -- [Conclusion](#conclusion) - -## Elastic Fabric Adapter Overview - -[Elastic Fabric Adapter (EFA)](https://aws.amazon.com/hpc/efa/) is a network interface supported by [some Amazon EC2 instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types) that provides high-performance network communications at scale on AWS. Commonly, high-performance computing, simulation, and large AI model training jobs require EFA, in order to minimize the time to job completion. This example provides a blueprint for deploying an [Amazon EKS](https://aws.amazon.com/eks/) cluster with EFA-enabled nodes, which can be used to run such jobs. - -## Setup Details - -There are three requirements that need to be satisfied, in order for EFA to work: - -1. The EC2 instance type must support EFA and the EFA adapter must be enabled. -2. The EFA software must be installed -3. The security group attached to the EC2 instance must allow all incoming and outgoing traffic to itself - -In the provided Terraform EKS Blueprint example here, these requirements are satisfied automatically. - -# Terraform Doc - -The main Terraform doc [main.tf](main.tf) contains local variables, local data, vpc and eks definitions, device plugins, and addons. - -## Requirements - -Requirements are specified in the [providers.tf](providers.tf) file. This file is used to install all needed providers when `terraform init` is executed. - -## Providers - -Providers are defined in [main.tf](main.tf#L3). They include `aws`, `kubernetes`, `helm`, and `kubectl`. - -## Modules - -The following modules are included in the template: - -1. [vpc](main.tf#L240) - defines the VPC which will be used to host the EKS cluster - -2. [eks](main.tf#L92) - defines the EKS cluster - The EKS cluster contains a managed nodedgroup called `sys` for running system pods, - and an unmanaged nodegroup called `efa` which has the necessary configuration to enable EFA on the nodes in that group. - -3. [eks_blueprints_kubernetes_addons](main.tf#L220) - defines EKS cluster addons to be deployed - - -## Resources - -The [resources section of main.tf](main.tf#69) creates a placement group, deploys the [EFA](https://github.com/aws-samples/aws-efa-eks) and [NVIDIA](https://github.com/NVIDIA/k8s-device-plugin) device plugins. - -## Inputs - -There are no required user-inputs. -The template comes with default inputs which create an EKS cluster called `eks-efa` in region `us-east-1`. -These settings can be adjusted in the [variables.tf](variables.tf) file. - -## Outputs - -When the `terraform apply` completes successfully, the EKS cluster id, and the command to connect to the cluster are provided as outputs as described in [outputs.tf](outputs.tf). - -# Example Walkthrough - -## 1. Clone Repository - -```bash -git clone https://github.com/aws-ia/terraform-aws-eks-blueprints.git -cd terraform-aws-eks-bluerpints/examples/eks-efa -``` - -## 2. Configure Terraform Plan - -Edit [variables.tf](variables.tf) and the [locals section of main.tf](main.tf#L54) as needed. - -## 3. Initialize Terraform Plan - -```bash -terraform init -``` - -
-Output: -Initializing the backend... -Initializing modules... -Downloading registry.terraform.io/terraform-aws-modules/eks/aws 19.13.1 for eks... -- eks in .terraform/modules/eks -- eks.eks_managed_node_group in .terraform/modules/eks/modules/eks-managed-node-group -- eks.eks_managed_node_group.user_data in .terraform/modules/eks/modules/_user_data -- eks.fargate_profile in .terraform/modules/eks/modules/fargate-profile -Downloading registry.terraform.io/terraform-aws-modules/kms/aws 1.1.0 for eks.kms... -- eks.kms in .terraform/modules/eks.kms -- eks.self_managed_node_group in .terraform/modules/eks/modules/self-managed-node-group -- eks.self_managed_node_group.user_data in .terraform/modules/eks/modules/_user_data -- eks_blueprints_kubernetes_addons in ../../modules/kubernetes-addons -- eks_blueprints_kubernetes_addons.adot_collector_haproxy in ../../modules/kubernetes-addons/adot-collector-haproxy -- eks_blueprints_kubernetes_addons.adot_collector_haproxy.helm_addon in ../../modules/kubernetes-addons/helm-addon -- eks_blueprints_kubernetes_addons.adot_collector_haproxy.helm_addon.irsa in ../../modules/irsa -- eks_blueprints_kubernetes_addons.adot_collector_java in ../../modules/kubernetes-addons/adot-collector-java -- eks_blueprints_kubernetes_addons.adot_collector_java.helm_addon in ../../modules/kubernetes-addons/helm-addon -- ... -- eks_blueprints_kubernetes_addons.opentelemetry_operator in ../../modules/kubernetes-addons/opentelemetry-operator -- eks_blueprints_kubernetes_addons.opentelemetry_operator.cert_manager in ../../modules/kubernetes-addons/cert-manager -- eks_blueprints_kubernetes_addons.opentelemetry_operator.cert_manager.helm_addon in ../../modules/kubernetes-addons/helm-addon -- eks_blueprints_kubernetes_addons.opentelemetry_operator.cert_manager.helm_addon.irsa in ../../modules/irsa -- eks_blueprints_kubernetes_addons.opentelemetry_operator.helm_addon in ../../modules/kubernetes-addons/helm-addon -- eks_blueprints_kubernetes_addons.opentelemetry_operator.helm_addon.irsa in ../../modules/irsa -Downloading registry.terraform.io/portworx/portworx-addon/eksblueprints 0.0.6 for eks_blueprints_kubernetes_addons.portworx... -- eks_blueprints_kubernetes_addons.portworx in .terraform/modules/eks_blueprints_kubernetes_addons.portworx -Downloading git::https://github.com/aws-ia/terraform-aws-eks-blueprints.git for eks_blueprints_kubernetes_addons.portworx.helm_addon... -- eks_blueprints_kubernetes_addons.portworx.helm_addon in .terraform/modules/eks_blueprints_kubernetes_addons.portworx.helm_addon/modules/kubernetes-addons/helm-addon -- eks_blueprints_kubernetes_addons.portworx.helm_addon.irsa in .terraform/modules/eks_blueprints_kubernetes_addons.portworx.helm_addon/modules/irsa -- eks_blueprints_kubernetes_addons.prometheus in ../../modules/kubernetes-addons/prometheus --... -- eks_blueprints_kubernetes_addons.yunikorn.helm_addon in ../../modules/kubernetes-addons/helm-addon -- eks_blueprints_kubernetes_addons.yunikorn.helm_addon.irsa in ../../modules/irsa -Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 4.0.1 for vpc... -- vpc in .terraform/modules/vpc - -Initializing provider plugins... -- Finding latest version of hashicorp/random... -- Finding hashicorp/kubernetes versions matching ">= 2.6.1, >= 2.10.0, >= 2.16.1"... -- Finding latest version of hashicorp/http... -- Finding hashicorp/helm versions matching ">= 2.4.1, >= 2.5.1, >= 2.8.0"... -- Finding gavinbunney/kubectl versions matching ">= 1.14.0"... -- Finding hashicorp/aws versions matching ">= 3.72.0, >= 4.10.0, >= 4.13.0, >= 4.35.0, >= 4.47.0, >= 4.57.0"... -- Finding hashicorp/time versions matching ">= 0.7.0, >= 0.8.0, >= 0.9.0"... -- Finding hashicorp/null versions matching ">= 3.0.0"... -- Finding hashicorp/tls versions matching ">= 3.0.0"... -- Finding hashicorp/cloudinit versions matching ">= 2.0.0"... -- Installing hashicorp/helm v2.9.0... -- Installed hashicorp/helm v2.9.0 (signed by HashiCorp) -- Installing gavinbunney/kubectl v1.14.0... -- Installed gavinbunney/kubectl v1.14.0 (self-signed, key ID AD64217B5ADD572F) -- Installing hashicorp/tls v4.0.4... -- Installed hashicorp/tls v4.0.4 (signed by HashiCorp) -- Installing hashicorp/cloudinit v2.3.2... -- Installed hashicorp/cloudinit v2.3.2 (signed by HashiCorp) -- Installing hashicorp/random v3.5.1... -- Installed hashicorp/random v3.5.1 (signed by HashiCorp) -- Installing hashicorp/http v3.3.0... -- Installed hashicorp/http v3.3.0 (signed by HashiCorp) -- Installing hashicorp/time v0.9.1... -- Installed hashicorp/time v0.9.1 (signed by HashiCorp) -- Installing hashicorp/null v3.2.1... -- Installed hashicorp/null v3.2.1 (signed by HashiCorp) -- Installing hashicorp/kubernetes v2.20.0... -- Installed hashicorp/kubernetes v2.20.0 (signed by HashiCorp) -- Installing hashicorp/aws v4.66.1... -- Installed hashicorp/aws v4.66.1 (signed by HashiCorp) - -Partner and community providers are signed by their developers. -If you'd like to know more about provider signing, you can read about it here: -https://www.terraform.io/docs/cli/plugins/signing.html - -Terraform has created a lock file .terraform.lock.hcl to record the provider -selections it made above. Include this file in your version control repository -so that Terraform can guarantee to make the same selections by default when -you run "terraform init" in the future. - -Terraform has been successfully initialized! - -You may now begin working with Terraform. Try running "terraform plan" to see -any changes that are required for your infrastructure. All Terraform commands -should now work. - -If you ever set or change modules or backend configuration for Terraform, -rerun this command to reinitialize your working directory. If you forget, other -commands will detect it and remind you to do so if necessary. -
- -## 4. Create Terraform Plan - -```bash -terraform plan -out tfplan -``` - -
-Output: - -```text -... -# module.vpc.aws_vpc.this[0] will be created - + resource "aws_vpc" "this" { - + arn = (known after apply) - + cidr_block = "10.11.0.0/16" - + default_network_acl_id = (known after apply) - + default_route_table_id = (known after apply) - + default_security_group_id = (known after apply) -... - -Plan: 80 to add, 0 to change, 0 to destroy. - -Changes to Outputs: - + configure_kubectl = "aws eks update-kubeconfig --region us-east-1 --name eks-efa" - + eks_cluster_id = (known after apply) - -─────────────────────────────────────────────────────────────────────────────── - -Saved the plan to: tfplan - -To perform exactly these actions, run the following command to apply: - terraform apply "tfplan" -``` -
- -## 5. Apply Terraform Plan - -```bash -terraform apply tfplan -``` - -
- -Output: - -```text -aws_placement_group.efa_pg: Creating... -module.eks.aws_cloudwatch_log_group.this[0]: Creating... -module.vpc.aws_vpc.this[0]: Creating... -module.eks.module.eks_managed_node_group["sys"].aws_iam_role.this[0]: Creating... -module.vpc.aws_eip.nat[0]: Creating... -module.eks.aws_iam_role.this[0]: Creating... -... -module.eks.aws_eks_cluster.this[0]: Still creating... [1m40s elapsed] -module.eks.aws_eks_cluster.this[0]: Still creating... [1m50s elapsed] -module.eks.aws_eks_cluster.this[0]: Still creating... [2m0s elapsed] -... -module.eks.aws_eks_addon.this["kube-proxy"]: Still creating... [30s elapsed] -module.eks_blueprints_kubernetes_addons.module.aws_fsx_csi_driver[0].module.helm_addon.helm_release.addon[0]: Still creating... [20s elapsed] -module.eks_blueprints_kubernetes_addons.module.aws_efs_csi_driver[0].module.helm_addon.helm_release.addon[0]: Still creating... [20s elapsed] -module.eks.aws_eks_addon.this["vpc-cni"]: Creation complete after 35s [id=eks-efa:vpc-cni] -module.eks.aws_eks_addon.this["kube-proxy"]: Creation complete after 35s [id=eks-efa:kube-proxy] -module.eks_blueprints_kubernetes_addons.module.aws_fsx_csi_driver[0].module.helm_addon.helm_release.addon[0]: Still creating... [30s elapsed] -module.eks_blueprints_kubernetes_addons.module.aws_efs_csi_driver[0].module.helm_addon.helm_release.addon[0]: Still creating... [30s elapsed] -module.eks_blueprints_kubernetes_addons.module.aws_efs_csi_driver[0].module.helm_addon.helm_release.addon[0]: Creation complete after 36s [id=aws-efs-csi-driver] -module.eks_blueprints_kubernetes_addons.module.aws_fsx_csi_driver[0].module.helm_addon.helm_release.addon[0]: Creation complete after 36s [id=aws-fsx-csi-driver] -╷ -│ Warning: "default_secret_name" is no longer applicable for Kubernetes v1.24.0 and above -│ -│ with module.eks_blueprints_kubernetes_addons.module.aws_efs_csi_driver[0].module.helm_addon.module.irsa[0].kubernetes_service_account_v1.irsa[0], -│ on ../../modules/irsa/main.tf line 37, in resource "kubernetes_service_account_v1" "irsa": -│ 37: resource "kubernetes_service_account_v1" "irsa" { -│ -│ Starting from version 1.24.0 Kubernetes does not automatically generate a token for service accounts, in this case, "default_secret_name" will be empty -│ -│ (and one more similar warning elsewhere) -╵ - -Apply complete! Resources: 80 added, 0 changed, 0 destroyed. - -Outputs: - -configure_kubectl = "aws eks update-kubeconfig --region us-east-1 --name eks-efa" - -``` -
- -> **_Note:_** If the plan apply operation fails, you can repeat `terraform plan -out tfplan` and `terraform apply tfplan` - -It takes about 15 minutes to create the cluster. - -## 6. Connect to EKS - -Copy the value of the `configure_kubectl` output and execute it in your shell to connect to your EKS cluster. - -```bash -aws eks update-kubeconfig --region us-east-1 --name eks-efa -``` - -Output: -```text -Updated context arn:aws:eks:us-east-1:xxxxxxxxxxxx:cluster/eks-efa in /root/.kube/config -``` - -Allow 5 minutes after the plan is applied for the EFA nodes to finish initializing and join the EKS cluster, then execute: - -```bash -kubectl get nodes -kubectl get nodes -o yaml | grep instance-type | grep node | grep -v f: -``` - -Your nodes and node types will be listed: - -```text -# kubectl get nodes -NAME STATUS ROLES AGE VERSION -ip-10-11-10-103.ec2.internal Ready 4m1s v1.25.7-eks-a59e1f0 -ip-10-11-19-28.ec2.internal Ready 11m v1.25.7-eks-a59e1f0 -ip-10-11-2-151.ec2.internal Ready 11m v1.25.7-eks-a59e1f0 -ip-10-11-2-18.ec2.internal Ready 5m1s v1.25.7-eks-a59e1f0 -# kubectl get nodes -o yaml | grep instance-type | grep node | grep -v f: - node.kubernetes.io/instance-type: g4dn.metal - node.kubernetes.io/instance-type: m5.large - node.kubernetes.io/instance-type: m5.large - node.kubernetes.io/instance-type: g4dn.metal -``` - -You should see two EFA-enabled (in this example `g4dn.metal`) nodes in the list. -This verifies that you are connected to your EKS cluster and it is configured with EFA nodes. - -## 7. Deploy Kubeflow MPI Operator - -Kubeflow MPI Operator is required for running MPIJobs on EKS. We will use an MPIJob to test EFA. -To deploy the MPI operator execute the following: - -```bash -kubectl apply -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.3.0/deploy/v2beta1/mpi-operator.yaml -``` - -Output: - -```text -namespace/mpi-operator created -customresourcedefinition.apiextensions.k8s.io/mpijobs.kubeflow.org created -serviceaccount/mpi-operator created -clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-admin created -clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-edit created -clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-view created -clusterrole.rbac.authorization.k8s.io/mpi-operator created -clusterrolebinding.rbac.authorization.k8s.io/mpi-operator created -deployment.apps/mpi-operator created -``` - -In addition to deploying the operator, please apply a patch to the mpi-operator clusterrole -to allow the mpi-operator service account access to `leases` resources in the `coordination.k8s.io` apiGroup. - -```bash -kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/kubeflow/mpi-operator/clusterrole-mpi-operator.yaml -``` - -Output: - -```text -clusterrole.rbac.authorization.k8s.io/mpi-operator configured -``` - -## 8. Test EFA - -We will run two tests. The first one will show the presence of EFA adapters on our EFA-enabled nodes. The second will test EFA performance. - -### 8.1. EFA Info Test - -To run the EFA info test, execute the following commands: - -```bash -kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/efa-device-plugin/test-efa.yaml -``` - -Output: - -```text -mpijob.kubeflow.org/efa-info-test created -``` - -```bash -kubectl get pods -``` - -Output: - -```text -NAME READY STATUS RESTARTS AGE -efa-info-test-launcher-hckkj 0/1 Completed 2 37s -efa-info-test-worker-0 1/1 Running 0 38s -efa-info-test-worker-1 1/1 Running 0 38s -``` - -Once the test launcher pod enters status `Running` or `Completed`, see the test logs using the command below: - -```bash -kubectl logs -f $(kubectl get pods | grep launcher | cut -d ' ' -f 1) -``` - -Output: - -```text -Warning: Permanently added 'efa-info-test-worker-1.efa-info-test-worker.default.svc,10.11.13.224' (ECDSA) to the list of known hosts. -Warning: Permanently added 'efa-info-test-worker-0.efa-info-test-worker.default.svc,10.11.4.63' (ECDSA) to the list of known hosts. -[1,1]:provider: efa -[1,1]: fabric: efa -[1,1]: domain: rdmap197s0-rdm -[1,1]: version: 116.10 -[1,1]: type: FI_EP_RDM -[1,1]: protocol: FI_PROTO_EFA -[1,0]:provider: efa -[1,0]: fabric: efa -[1,0]: domain: rdmap197s0-rdm -[1,0]: version: 116.10 -[1,0]: type: FI_EP_RDM -[1,0]: protocol: FI_PROTO_EFA -``` - -This result shows that two EFA adapters are available (one for each worker pod). - -Lastly, delete the test job: - -```bash -kubectl delete mpijob efa-info-test -``` - -Output: - -```text -mpijob.kubeflow.org "efa-info-test" deleted -``` - -### 8.2. EFA NCCL Test - -To run the EFA NCCL test please execute the following kubectl command: - -```bash -kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/efa-device-plugin/test-nccl-efa.yaml -``` - -Output: - -```text -mpijob.kubeflow.org/test-nccl-efa created -``` - -Then display the pods in the current namespace: - -```bash -kubectl get pods -``` - -Output: - -```text -NAME READY STATUS RESTARTS AGE -test-nccl-efa-launcher-tx47t 1/1 Running 2 (31s ago) 33s -test-nccl-efa-worker-0 1/1 Running 0 33s -test-nccl-efa-worker-1 1/1 Running 0 33s -``` - -Once the launcher pod enters `Running` or `Completed` state, execute the following to see the test logs: - -```bash -kubectl logs -f $(kubectl get pods | grep launcher | cut -d ' ' -f 1) -``` - -
- -Output: - -```text -Warning: Permanently added 'test-nccl-efa-worker-1.test-nccl-efa-worker.default.svc,10.11.5.31' (ECDSA) to the list of known hosts. -Warning: Permanently added 'test-nccl-efa-worker-0.test-nccl-efa-worker.default.svc,10.11.13.106' (ECDSA) to the list of known hosts. -[1,0]:# nThread 1 nGpus 1 minBytes 1 maxBytes 1073741824 step: 2(factor) warmup iters: 5 iters: 100 agg iters: 1 validation: 1 graph: 0 -[1,0]:# -[1,0]:# Using devices -[1,0]:# Rank 0 Group 0 Pid 21 on test-nccl-efa-worker-0 device 0 [0x35] Tesla T4 -[1,0]:# Rank 1 Group 0 Pid 21 on test-nccl-efa-worker-1 device 0 [0xf5] Tesla T4 -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO Bootstrap : Using eth0:10.11.13.106<0> -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol. -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.5.0aws -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/OFI Configuring AWS-specific options -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/OFI Setting NCCL_PROTO to "simple" -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1 -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/OFI Selected Provider is efa (found 1 nics) -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO Using network AWS Libfabric -[1,0]:NCCL version 2.12.7+cuda11.4 -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO Bootstrap : Using eth0:10.11.5.31<0> -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol. -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.5.0aws -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO NET/OFI Configuring AWS-specific options -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO NET/OFI Setting NCCL_PROTO to "simple" -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1 -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO NET/OFI Selected Provider is efa (found 1 nics) -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO Using network AWS Libfabric -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffff0000,00ffffff -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Setting affinity for GPU 0 to ffffff00,0000ffff,ff000000 -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Channel 00/02 : 0 1 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Channel 01/02 : 0 1 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] -1/-1/-1->0->1 -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO NCCL_SHM_DISABLE set by environment to 0. -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO NCCL_SHM_DISABLE set by environment to 0. -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Channel 00/0 : 0[35000] -> 1[f5000] [receive] via NET/AWS Libfabric/0 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Channel 00/0 : 1[f5000] -> 0[35000] [receive] via NET/AWS Libfabric/0 -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Channel 01/0 : 0[35000] -> 1[f5000] [receive] via NET/AWS Libfabric/0 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Channel 01/0 : 1[f5000] -> 0[35000] [receive] via NET/AWS Libfabric/0 -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Channel 00/0 : 1[f5000] -> 0[35000] [send] via NET/AWS Libfabric/0 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Channel 00/0 : 0[35000] -> 1[f5000] [send] via NET/AWS Libfabric/0 -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Channel 01/0 : 1[f5000] -> 0[35000] [send] via NET/AWS Libfabric/0 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Channel 01/0 : 0[35000] -> 1[f5000] [send] via NET/AWS Libfabric/0 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Connected all rings -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO Connected all trees -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Connected all rings -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO Connected all trees -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer -[1,1]:test-nccl-efa-worker-1:21:26 [0] NCCL INFO comm 0x7f9c0c000f60 rank 1 nranks 2 cudaDev 0 busId f5000 - Init COMPLETE -[1,0]:test-nccl-efa-worker-0:21:27 [0] NCCL INFO comm 0x7fde98000f60 rank 0 nranks 2 cudaDev 0 busId 35000 - Init COMPLETE -[1,0]:# -[1,0]:# out-of-place in-place -[1,0]:# size count type redop root time algbw busbw #wrong time algbw busbw #wrong -[1,0]:# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO Launch mode Parallel -[1,0]: 0 0 float sum -1 6.36 0.00 0.00 0 6.40 0.00 0.00 0 -[1,0]: 0 0 float sum -1 6.43 0.00 0.00 0 6.35 0.00 0.00 0 -[1,0]: 4 1 float sum -1 65.70 0.00 0.00 0 64.84 0.00 0.00 0 -[1,0]: 8 2 float sum -1 64.88 0.00 0.00 0 64.18 0.00 0.00 0 -[1,0]: 16 4 float sum -1 64.33 0.00 0.00 0 65.02 0.00 0.00 0 -[1,0]: 32 8 float sum -1 65.95 0.00 0.00 0 64.78 0.00 0.00 0 -[1,0]: 64 16 float sum -1 65.19 0.00 0.00 0 64.66 0.00 0.00 0 -[1,0]: 128 32 float sum -1 65.30 0.00 0.00 0 64.76 0.00 0.00 0 -[1,0]: 256 64 float sum -1 65.30 0.00 0.00 0 64.90 0.00 0.00 0 -[1,0]: 512 128 float sum -1 65.71 0.01 0.01 0 64.75 0.01 0.01 0 -[1,0]: 1024 256 float sum -1 67.15 0.02 0.02 0 66.82 0.02 0.02 0 -[1,0]: 2048 512 float sum -1 68.22 0.03 0.03 0 67.55 0.03 0.03 0 -[1,0]: 4096 1024 float sum -1 70.65 0.06 0.06 0 71.20 0.06 0.06 0 -[1,0]: 8192 2048 float sum -1 76.15 0.11 0.11 0 75.36 0.11 0.11 0 -[1,0]: 16384 4096 float sum -1 87.65 0.19 0.19 0 87.87 0.19 0.19 0 -[1,0]: 32768 8192 float sum -1 98.94 0.33 0.33 0 98.14 0.33 0.33 0 -[1,0]: 65536 16384 float sum -1 115.8 0.57 0.57 0 115.7 0.57 0.57 0 -[1,0]: 131072 32768 float sum -1 149.3 0.88 0.88 0 148.7 0.88 0.88 0 -[1,0]: 262144 65536 float sum -1 195.0 1.34 1.34 0 194.0 1.35 1.35 0 -[1,0]: 524288 131072 float sum -1 296.9 1.77 1.77 0 291.1 1.80 1.80 0 -[1,0]: 1048576 262144 float sum -1 583.4 1.80 1.80 0 579.6 1.81 1.81 0 -[1,0]: 2097152 524288 float sum -1 983.3 2.13 2.13 0 973.9 2.15 2.15 0 -[1,0]: 4194304 1048576 float sum -1 1745.4 2.40 2.40 0 1673.2 2.51 2.51 0 -[1,0]: 8388608 2097152 float sum -1 3116.1 2.69 2.69 0 3092.6 2.71 2.71 0 -[1,0]: 16777216 4194304 float sum -1 5966.3 2.81 2.81 0 6008.9 2.79 2.79 0 -[1,0]: 33554432 8388608 float sum -1 11390 2.95 2.95 0 11419 2.94 2.94 0 -[1,0]: 67108864 16777216 float sum -1 21934 3.06 3.06 0 21930 3.06 3.06 0 -[1,0]: 134217728 33554432 float sum -1 43014 3.12 3.12 0 42619 3.15 3.15 0 -[1,0]: 268435456 67108864 float sum -1 85119 3.15 3.15 0 85743 3.13 3.13 0 -[1,0]: 536870912 134217728 float sum -1 171351 3.13 3.13 0 171823 3.12 3.12 0 -[1,0]: 1073741824 268435456 float sum -1 344981 3.11 3.11 0 344454 3.12 3.12 0 -[1,1]:test-nccl-efa-worker-1:21:21 [0] NCCL INFO comm 0x7f9c0c000f60 rank 1 nranks 2 cudaDev 0 busId f5000 - Destroy COMPLETE -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO comm 0x7fde98000f60 rank 0 nranks 2 cudaDev 0 busId 35000 - Destroy COMPLETE -[1,0]:# Out of bounds values : 0 OK -[1,0]:# Avg bus bandwidth : 1.15327 -[1,0]:# -[1,0]: -``` -
- - -The following section from the beginning of the log, indicates that the test is being performed using EFA: - -```text -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/OFI Selected Provider is efa (found 1 nics) -[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO Using network AWS Libfabric -[1,0]:NCCL version 2.12.7+cuda11.4 -``` - -Columns 8 and 12 in the output table show the in-place and out-of-place bus bandwidth calculated for the data size listed in column 1. In this case it is 3.13 and 3.12 GB/s respectively. -Your actual results may be slightly different. The calculated average bus bandwidth is displayed at the bottom of the log when the test finishes after it reaches the max data size, -specified in the mpijob manifest. In this result the average bus bandwidth is 1.15 GB/s. - -``` -[1,0]:# size count type redop root time algbw busbw #wrong time algbw busbw #wrong -[1,0]:# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) -... -[1,0]: 262144 65536 float sum -1 195.0 1.34 1.34 0 194.0 1.35 1.35 0 -[1,0]: 524288 131072 float sum -1 296.9 1.77 1.77 0 291.1 1.80 1.80 0 -[1,0]: 1048576 262144 float sum -1 583.4 1.80 1.80 0 579.6 1.81 1.81 0 -[1,0]: 2097152 524288 float sum -1 983.3 2.13 2.13 0 973.9 2.15 2.15 0 -[1,0]: 4194304 1048576 float sum -1 1745.4 2.40 2.40 0 1673.2 2.51 2.51 0 -... -[1,0]:# Avg bus bandwidth : 1.15327 -``` - -Finally, delete the test mpi job: - -```bash -kubectl delete mpijob test-nccl-efa -``` - -Output: - -```text -mpijob.kubeflow.org "test-nccl-efa" deleted -``` - -## 9. Cleanup - -```bash -terraform destroy -``` - -
-Output: - -```text -... - # module.eks.module.self_managed_node_group["efa"].aws_iam_role.this[0] will be destroyed -... - -Plan: 0 to add, 0 to change, 80 to destroy. - -Changes to Outputs: - - configure_kubectl = "aws eks update-kubeconfig --region us-east-1 --name eks-efa" -> null - -Do you really want to destroy all resources? - Terraform will destroy all your managed infrastructure, as shown above. - There is no undo. Only 'yes' will be accepted to confirm. - - Enter a value: yes - ... - module.eks.aws_iam_role.this[0]: Destruction complete after 1s -module.eks.aws_security_group_rule.node["ingress_self_coredns_udp"]: Destruction complete after 2s -module.eks.aws_security_group_rule.node["ingress_cluster_9443_webhook"]: Destruction complete after 3s -module.eks.aws_security_group_rule.node["ingress_cluster_443"]: Destruction complete after 3s -module.eks.aws_security_group_rule.node["egress_all"]: Destruction complete after 2s -module.eks.aws_security_group_rule.node["egress_self_all"]: Destruction complete after 3s -module.eks.aws_security_group_rule.node["ingress_nodes_ephemeral"]: Destruction complete after 3s -module.eks.aws_security_group_rule.node["ingress_cluster_8443_webhook"]: Destruction complete after 3s -module.eks.aws_security_group_rule.node["ingress_self_coredns_tcp"]: Destruction complete after 4s -module.eks.aws_security_group.cluster[0]: Destroying... [id=sg-05516650e2f2ed6c1] -module.eks.aws_security_group.node[0]: Destroying... [id=sg-0e421877145f36d48] -module.eks.aws_security_group.cluster[0]: Destruction complete after 1s -module.eks.aws_security_group.node[0]: Destruction complete after 1s -module.vpc.aws_vpc.this[0]: Destroying... [id=vpc-04677b1ab4eac3ca7] -module.vpc.aws_vpc.this[0]: Destruction complete after 0s -╷ -│ Warning: EC2 Default Network ACL (acl-0932148c7d86482e0) not deleted, removing from state -╵ - -Destroy complete! Resources: 80 destroyed. -``` - -
- -The cleanup process takes about 15 minutes. - -# Conclusion - -With this example, we have demonstrated how AWS EKS Blueprints can be used to create an EKS cluster with an -EFA-enabled nodegroup. Futhermore, we have shown how to run MPI Jobs to validate that EFA works and check its performance. -Use this example as a starting point to bootstrap your own infrastructure-as-code terraform projects that require use -of high-performance networking on AWS with Elastic Fabric Adapter. diff --git a/examples/eks-efa/main.tf b/examples/eks-efa/main.tf deleted file mode 100644 index 3fe9178e9c..0000000000 --- a/examples/eks-efa/main.tf +++ /dev/null @@ -1,261 +0,0 @@ -# Providers - -provider "aws" { - region = var.aws_region -} - -provider "kubernetes" { - host = module.eks.cluster_endpoint - cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token -} - -provider "helm" { - kubernetes { - host = module.eks.cluster_endpoint - cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token - } -} - -provider "kubectl" { - apply_retry_count = 10 - host = module.eks.cluster_endpoint - cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token -} - -# Data - -data "aws_eks_cluster_auth" "this" { - name = module.eks.cluster_name -} - -data "aws_availability_zones" "available" {} - -data "http" "efa_device_plugin_yaml" { - url = "https://raw.githubusercontent.com/aws-samples/aws-efa-eks/main/manifest/efa-k8s-device-plugin.yml" -} - -data "aws_ami" "eks_gpu_node" { - most_recent = true - owners = ["amazon"] - - filter { - name = "name" - values = ["amazon-eks-gpu-node-${local.cluster_version}-*"] - } -} - -# Local config - -locals { - name = var.cluster_name - cluster_version = "1.25" - - vpc_cidr = "10.11.0.0/16" - azs = slice(data.aws_availability_zones.available.names, 0, 2) - - tags = { - Blueprint = local.name - GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints" - } - -} - -# Resources - -resource "aws_placement_group" "efa_pg" { - name = "efa_pg" - strategy = "cluster" -} - -resource "kubectl_manifest" "efa_device_plugin" { - yaml_body = < update-kubeconfig --name <$CLUSTER_NAME> +``` + +2. Test by listing Nodes in in the Cluster, you should see Fargate instances as your Cluster Nodes. + +```sh +kubectl get nodes +kubectl get nodes -o yaml | grep instance-type | grep node | grep -v f: +``` + +Your nodes and node types will be listed: + +```text +# kubectl get nodes +NAME STATUS ROLES AGE VERSION +ip-10-11-10-103.ec2.internal Ready 4m1s v1.25.7-eks-a59e1f0 +ip-10-11-19-28.ec2.internal Ready 11m v1.25.7-eks-a59e1f0 +ip-10-11-2-151.ec2.internal Ready 11m v1.25.7-eks-a59e1f0 +ip-10-11-2-18.ec2.internal Ready 5m1s v1.25.7-eks-a59e1f0 +# kubectl get nodes -o yaml | grep instance-type | grep node | grep -v f: + node.kubernetes.io/instance-type: g5.8xlarge + node.kubernetes.io/instance-type: m5.large + node.kubernetes.io/instance-type: m5.large + node.kubernetes.io/instance-type: g5.8xlarge +``` + +You should see two EFA-enabled (in this example `g5.8xlarge`) nodes in the list. +This verifies that you are connected to your EKS cluster and it is configured with EFA nodes. + +3. Deploy Kubeflow MPI Operator + +Kubeflow MPI Operator is required for running MPIJobs on EKS. We will use an MPIJob to test EFA. +To deploy the MPI operator execute the following: + +```sh +kubectl apply -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.3.0/deploy/v2beta1/mpi-operator.yaml +``` + +Output: + +```text +namespace/mpi-operator created +customresourcedefinition.apiextensions.k8s.io/mpijobs.kubeflow.org created +serviceaccount/mpi-operator created +clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-admin created +clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-edit created +clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-view created +clusterrole.rbac.authorization.k8s.io/mpi-operator created +clusterrolebinding.rbac.authorization.k8s.io/mpi-operator created +deployment.apps/mpi-operator created +``` + +In addition to deploying the operator, please apply a patch to the mpi-operator clusterrole +to allow the mpi-operator service account access to `leases` resources in the `coordination.k8s.io` apiGroup. + +```sh +kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/kubeflow/mpi-operator/clusterrole-mpi-operator.yaml +``` + +Output: + +```text +clusterrole.rbac.authorization.k8s.io/mpi-operator configured +``` + +4. Test EFA + +We will run two tests. The first one will show the presence of EFA adapters on our EFA-enabled nodes. The second will test EFA performance. + +5. EFA Info Test + +To run the EFA info test, execute the following commands: + +```sh +kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/efa-device-plugin/test-efa.yaml +``` + +Output: + +```text +mpijob.kubeflow.org/efa-info-test created +``` + +```sh +kubectl get pods +``` + +Output: + +```text +NAME READY STATUS RESTARTS AGE +efa-info-test-launcher-hckkj 0/1 Completed 2 37s +efa-info-test-worker-0 1/1 Running 0 38s +efa-info-test-worker-1 1/1 Running 0 38s +``` + +Once the test launcher pod enters status `Running` or `Completed`, see the test logs using the command below: + +```sh +kubectl logs -f $(kubectl get pods | grep launcher | cut -d ' ' -f 1) +``` + +Output: + +```text +Warning: Permanently added 'efa-info-test-worker-1.efa-info-test-worker.default.svc,10.11.13.224' (ECDSA) to the list of known hosts. +Warning: Permanently added 'efa-info-test-worker-0.efa-info-test-worker.default.svc,10.11.4.63' (ECDSA) to the list of known hosts. +[1,1]:provider: efa +[1,1]: fabric: efa +[1,1]: domain: rdmap197s0-rdm +[1,1]: version: 116.10 +[1,1]: type: FI_EP_RDM +[1,1]: protocol: FI_PROTO_EFA +[1,0]:provider: efa +[1,0]: fabric: efa +[1,0]: domain: rdmap197s0-rdm +[1,0]: version: 116.10 +[1,0]: type: FI_EP_RDM +[1,0]: protocol: FI_PROTO_EFA +``` + +This result shows that two EFA adapters are available (one for each worker pod). + +Lastly, delete the test job: + +```sh +kubectl delete mpijob efa-info-test +``` + +Output: + +```text +mpijob.kubeflow.org "efa-info-test" deleted +``` + +6. EFA NCCL Test + +To run the EFA NCCL test please execute the following kubectl command: + +```sh +kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/efa-device-plugin/test-nccl-efa.yaml +``` + +Output: + +```text +mpijob.kubeflow.org/test-nccl-efa created +``` + +Then display the pods in the current namespace: + +```sh +kubectl get pods +``` + +Output: + +```text +NAME READY STATUS RESTARTS AGE +test-nccl-efa-launcher-tx47t 1/1 Running 2 (31s ago) 33s +test-nccl-efa-worker-0 1/1 Running 0 33s +test-nccl-efa-worker-1 1/1 Running 0 33s +``` + +Once the launcher pod enters `Running` or `Completed` state, execute the following to see the test logs: + +```sh +kubectl logs -f $(kubectl get pods | grep launcher | cut -d ' ' -f 1) +``` + +The following section from the beginning of the log, indicates that the test is being performed using EFA: + +```text +[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO NET/OFI Selected Provider is efa (found 1 nics) +[1,0]:test-nccl-efa-worker-0:21:21 [0] NCCL INFO Using network AWS Libfabric +[1,0]:NCCL version 2.12.7+cuda11.4 +``` + +Columns 8 and 12 in the output table show the in-place and out-of-place bus bandwidth calculated for the data size listed in column 1. In this case it is 3.13 and 3.12 GB/s respectively. +Your actual results may be slightly different. The calculated average bus bandwidth is displayed at the bottom of the log when the test finishes after it reaches the max data size, +specified in the mpijob manifest. In this result the average bus bandwidth is 1.15 GB/s. + +``` +[1,0]:# size count type redop root time algbw busbw #wrong time algbw busbw #wrong +[1,0]:# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) +... +[1,0]: 262144 65536 float sum -1 195.0 1.34 1.34 0 194.0 1.35 1.35 0 +[1,0]: 524288 131072 float sum -1 296.9 1.77 1.77 0 291.1 1.80 1.80 0 +[1,0]: 1048576 262144 float sum -1 583.4 1.80 1.80 0 579.6 1.81 1.81 0 +[1,0]: 2097152 524288 float sum -1 983.3 2.13 2.13 0 973.9 2.15 2.15 0 +[1,0]: 4194304 1048576 float sum -1 1745.4 2.40 2.40 0 1673.2 2.51 2.51 0 +... +[1,0]:# Avg bus bandwidth : 1.15327 +``` + +Finally, delete the test mpi job: + +```sh +kubectl delete mpijob test-nccl-efa +``` + +Output: + +```text +mpijob.kubeflow.org "test-nccl-efa" deleted +``` + +## Destroy + +To teardown and remove the resources created in this example: + +```sh +terraform destroy -target module.eks_blueprints_addons -auto-approve +terraform destroy -target module.eks -auto-approve +terraform destroy -auto-approve +``` diff --git a/examples/elastic-fabric-adapter/main.tf b/examples/elastic-fabric-adapter/main.tf new file mode 100644 index 0000000000..15a886284d --- /dev/null +++ b/examples/elastic-fabric-adapter/main.tf @@ -0,0 +1,272 @@ +provider "aws" { + region = local.region +} + +provider "kubernetes" { + host = module.eks.cluster_endpoint + cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } +} + +provider "helm" { + kubernetes { + host = module.eks.cluster_endpoint + cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } + } +} + +provider "kubectl" { + apply_retry_count = 5 + host = module.eks.cluster_endpoint + cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) + load_config_file = false + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } +} + +data "aws_availability_zones" "available" {} + +locals { + name = basename(path.cwd) + region = "us-west-2" + + vpc_cidr = "10.0.0.0/16" + azs = slice(data.aws_availability_zones.available.names, 0, 3) + + tags = { + Blueprint = local.name + GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints" + } +} + +################################################################################ +# Cluster +################################################################################ + +#tfsec:ignore:aws-eks-enable-control-plane-logging +module "eks" { + source = "terraform-aws-modules/eks/aws" + version = "~> 19.13" + + cluster_name = local.name + cluster_version = "1.27" + cluster_endpoint_public_access = true + + cluster_addons = { + coredns = {} + kube-proxy = {} + vpc-cni = {} + } + + vpc_id = module.vpc.vpc_id + subnet_ids = module.vpc.private_subnets + + # Extend node-to-node security group rules + node_security_group_additional_rules = { + ingress_self_all = { + description = "Node to node all ingress traffic" + protocol = "-1" + from_port = 0 + to_port = 0 + type = "ingress" + self = true + } + egress_self_all = { + description = "Node to node all egress traffic" + protocol = "-1" + from_port = 0 + to_port = 0 + type = "egress" + self = true + } + } + + eks_managed_node_groups = { + # For running services that do not require GPUs + default = { + instance_types = ["m5.large"] + + min_size = 1 + max_size = 5 + desired_size = 2 + } + + efa = { + ami_type = "AL2_x86_64_GPU" + instance_types = ["g5.8xlarge"] + + min_size = 1 + max_size = 3 + desired_size = 1 + + subnet_ids = slice(module.vpc.private_subnets, 0, 1) + + network_interfaces = [ + { + description = "EFA interface" + delete_on_termination = true + device_index = 0 + associate_public_ip_address = false + interface_type = "efa" + } + ] + + placement = { + group_name = aws_placement_group.efa.name + } + + pre_bootstrap_user_data = <<-EOT + # Install EFA + curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz + tar -xf aws-efa-installer-latest.tar.gz && cd aws-efa-installer + ./efa_installer.sh -y --minimal + fi_info -p efa -t FI_EP_RDM + + # Disable ptrace + sysctl -w kernel.yama.ptrace_scope=0 + EOT + + taints = { + dedicated = { + key = "nvidia.com/gpu" + value = "true" + effect = "NO_SCHEDULE" + } + } + } + } + + tags = local.tags +} + +################################################################################ +# EKS Blueprints Addons +################################################################################ + +module "eks_blueprints_addons" { + source = "aws-ia/eks-blueprints-addons/aws" + version = "0.2.0" + + cluster_name = module.eks.cluster_name + cluster_endpoint = module.eks.cluster_endpoint + cluster_version = module.eks.cluster_version + oidc_provider_arn = module.eks.oidc_provider_arn + + # We want to wait for the Fargate profiles to be deployed first + create_delay_dependencies = [for group in module.eks.eks_managed_node_groups : group.node_group_arn] + + enable_aws_efs_csi_driver = true + enable_aws_fsx_csi_driver = true + enable_kube_prometheus_stack = true + kube_prometheus_stack = { + values = [ + <<-EOT + prometheus: + prometheusSpec: + serviceMonitorSelectorNilUsesHelmValues: false + EOT + ] + } + enable_metrics_server = true + + helm_releases = { + prometheus-adapter = { + description = "A Helm chart for k8s prometheus adapter" + namespace = "prometheus-adapter" + create_namespace = true + chart = "prometheus-adapter" + chart_version = "4.2.0" + repository = "https://prometheus-community.github.io/helm-charts" + values = [ + <<-EOT + replicas: 2 + podDisruptionBudget: + enabled: true + EOT + ] + } + gpu-operator = { + description = "A Helm chart for NVIDIA GPU operator" + namespace = "gpu-operator" + create_namespace = true + chart = "gpu-operator" + chart_version = "v23.3.2" + repository = "https://nvidia.github.io/gpu-operator" + values = [ + <<-EOT + operator: + defaultRuntime: containerd + EOT + ] + } + } + + tags = local.tags +} + +################################################################################ +# Amazon Elastic Fabric Adapter (EFA) +################################################################################ + +data "http" "efa_device_plugin_yaml" { + url = "https://raw.githubusercontent.com/aws-samples/aws-efa-eks/main/manifest/efa-k8s-device-plugin.yml" +} + +resource "kubectl_manifest" "efa_device_plugin" { + yaml_body = <<-YAML + ${data.http.efa_device_plugin_yaml.response_body} + YAML +} + +################################################################################ +# Supporting Resources +################################################################################ + +module "vpc" { + source = "terraform-aws-modules/vpc/aws" + version = "~> 5.0" + + name = local.name + cidr = local.vpc_cidr + + azs = local.azs + private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)] + public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)] + + enable_nat_gateway = true + single_nat_gateway = true + + public_subnet_tags = { + "kubernetes.io/role/elb" = 1 + } + + private_subnet_tags = { + "kubernetes.io/role/internal-elb" = 1 + } + + tags = local.tags +} + +# Group instances within clustered placement group so they are in close proximity +resource "aws_placement_group" "efa" { + name = local.name + strategy = "cluster" +} diff --git a/examples/elastic-fabric-adapter/outputs.tf b/examples/elastic-fabric-adapter/outputs.tf new file mode 100644 index 0000000000..a43b620e9b --- /dev/null +++ b/examples/elastic-fabric-adapter/outputs.tf @@ -0,0 +1,4 @@ +output "configure_kubectl" { + description = "Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig" + value = "aws eks update-kubeconfig --region ${local.region} --name ${module.eks.cluster_name}" +} diff --git a/modules/kubernetes-addons/app-2048/variables.tf b/examples/elastic-fabric-adapter/variables.tf similarity index 100% rename from modules/kubernetes-addons/app-2048/variables.tf rename to examples/elastic-fabric-adapter/variables.tf diff --git a/examples/amp-amg-opensearch/versions.tf b/examples/elastic-fabric-adapter/versions.tf similarity index 62% rename from examples/amp-amg-opensearch/versions.tf rename to examples/elastic-fabric-adapter/versions.tf index 123e505f1d..d75641f32a 100644 --- a/examples/amp-amg-opensearch/versions.tf +++ b/examples/elastic-fabric-adapter/versions.tf @@ -6,17 +6,21 @@ terraform { source = "hashicorp/aws" version = ">= 4.47" } + helm = { + source = "hashicorp/helm" + version = ">= 2.9" + } kubernetes = { source = "hashicorp/kubernetes" - version = ">= 2.17" + version = ">= 2.20" } - helm = { - source = "hashicorp/helm" - version = ">= 2.8" + kubectl = { + source = "gavinbunney/kubectl" + version = ">= 1.14" } - grafana = { - source = "grafana/grafana" - version = ">= 1.34" + http = { + source = "hashicorp/http" + version = ">= 3.3" } } @@ -24,6 +28,6 @@ terraform { # backend "s3" { # bucket = "terraform-ssp-github-actions-state" # region = "us-west-2" - # key = "e2e/amp-amg-opensearch/terraform.tfstate" + # key = "e2e/elastic-fabric-adapter/terraform.tfstate" # } } diff --git a/examples/external-secrets/README.md b/examples/external-secrets/README.md index 1c6b3fb3fa..28fae1fb2d 100644 --- a/examples/external-secrets/README.md +++ b/examples/external-secrets/README.md @@ -1,4 +1,4 @@ -# External Secrets Operator Kubernetes addon +# Amazon EKS Cluster w/ External Secrets Operator This example deploys an EKS Cluster with the External Secrets Operator. The cluster is populated with a ClusterSecretStore and SecretStore example using SecretManager and Parameter Store respectively. A secret for each store is also created. Both stores use IRSA to retrieve the secret values from AWS. diff --git a/examples/external-secrets/main.tf b/examples/external-secrets/main.tf index aa27c040e5..f06aeab286 100644 --- a/examples/external-secrets/main.tf +++ b/examples/external-secrets/main.tf @@ -5,40 +5,54 @@ provider "aws" { provider "kubernetes" { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } } provider "helm" { kubernetes { host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) - token = data.aws_eks_cluster_auth.this.token + + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } } } provider "kubectl" { - apply_retry_count = 10 + apply_retry_count = 5 host = module.eks.cluster_endpoint cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) load_config_file = false - token = data.aws_eks_cluster_auth.this.token -} -data "aws_eks_cluster_auth" "this" { - name = module.eks.cluster_name + exec { + api_version = "client.authentication.k8s.io/v1beta1" + command = "aws" + # This requires the awscli to be installed locally where Terraform is executed + args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] + } } data "aws_availability_zones" "available" {} data "aws_caller_identity" "current" {} locals { - name = basename(path.cwd) - namespace = "external-secrets" - region = "us-west-2" + name = basename(path.cwd) + region = "us-west-2" vpc_cidr = "10.0.0.0/16" azs = slice(data.aws_availability_zones.available.names, 0, 3) + namespace = "external-secrets" cluster_secretstore_name = "cluster-secretstore-sm" cluster_secretstore_sa = "cluster-secretstore-sa" secretstore_name = "secretstore-ps" @@ -57,19 +71,12 @@ locals { #tfsec:ignore:aws-eks-enable-control-plane-logging module "eks" { source = "terraform-aws-modules/eks/aws" - version = "~> 19.12" + version = "~> 19.13" cluster_name = local.name - cluster_version = "1.24" + cluster_version = "1.27" cluster_endpoint_public_access = true - # EKS Addons - cluster_addons = { - coredns = {} - kube-proxy = {} - vpc-cni = {} - } - vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnets @@ -87,17 +94,29 @@ module "eks" { } ################################################################################ -# Kubernetes Addons +# EKS Blueprints Addons ################################################################################ -module "eks_blueprints_kubernetes_addons" { - source = "../../modules/kubernetes-addons" +module "eks_blueprints_addons" { + source = "aws-ia/eks-blueprints-addons/aws" + version = "0.2.0" - eks_cluster_id = module.eks.cluster_name - eks_cluster_endpoint = module.eks.cluster_endpoint - eks_oidc_provider = module.eks.oidc_provider - eks_cluster_version = module.eks.cluster_version + cluster_name = module.eks.cluster_name + cluster_endpoint = module.eks.cluster_endpoint + cluster_version = module.eks.cluster_version + oidc_provider_arn = module.eks.oidc_provider_arn + # EKS Add-ons + eks_addons = { + aws-ebs-csi-driver = { + service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn + } + coredns = {} + vpc-cni = {} + kube-proxy = {} + } + + # Add-ons enable_external_secrets = true tags = local.tags @@ -109,7 +128,7 @@ module "eks_blueprints_kubernetes_addons" { module "vpc" { source = "terraform-aws-modules/vpc/aws" - version = "~> 4.0" + version = "~> 5.0" name = local.name cidr = local.vpc_cidr @@ -132,54 +151,10 @@ module "vpc" { tags = local.tags } -#--------------------------------------------------------------- -# External Secrets Operator - Secret -#--------------------------------------------------------------- - resource "aws_kms_key" "secrets" { enable_key_rotation = true } -module "cluster_secretstore_role" { - source = "../../modules/irsa" - kubernetes_namespace = local.namespace - create_kubernetes_namespace = false - kubernetes_service_account = local.cluster_secretstore_sa - irsa_iam_policies = [aws_iam_policy.cluster_secretstore.arn] - eks_cluster_id = module.eks.cluster_name - eks_oidc_provider_arn = module.eks.oidc_provider_arn - - depends_on = [module.eks_blueprints_kubernetes_addons] -} - -resource "aws_iam_policy" "cluster_secretstore" { - name_prefix = local.cluster_secretstore_sa - policy = < update-kubeconfig --name +aws eks --region <$AWS_REGION> update-kubeconfig --name <$CLUSTER_NAME> ``` -2. Test by listing all the pods running currently. The CoreDNS pod should reach a status of `Running` after approximately 60 seconds: +3. Test by listing Nodes in in the Cluster, you should see Fargate instances as your Cluster Nodes. + + +```sh +kubectl get nodes +NAME STATUS ROLES AGE VERSION +fargate-ip-10-0-17-17.us-west-2.compute.internal Ready 25m v1.26.3-eks-f4dc2c0 +fargate-ip-10-0-20-244.us-west-2.compute.internal Ready 71s v1.26.3-eks-f4dc2c0 +fargate-ip-10-0-41-143.us-west-2.compute.internal Ready 25m v1.26.3-eks-f4dc2c0 +fargate-ip-10-0-44-95.us-west-2.compute.internal Ready 25m v1.26.3-eks-f4dc2c0 +fargate-ip-10-0-45-153.us-west-2.compute.internal Ready 77s v1.26.3-eks-f4dc2c0 +fargate-ip-10-0-47-31.us-west-2.compute.internal Ready 75s v1.26.3-eks-f4dc2c0 +fargate-ip-10-0-6-175.us-west-2.compute.internal Ready 25m v1.26.3-eks-f4dc2c0 +``` + +4. Test by listing all the Pods running currently. All the Pods should reach a status of `Running` after approximately 60 seconds: ```sh kubectl get pods -A +NAMESPACE NAME READY STATUS RESTARTS AGE +app-2048 app-2048-65bd744dfb-7g9rx 1/1 Running 0 2m34s +app-2048 app-2048-65bd744dfb-nxcbm 1/1 Running 0 2m34s +app-2048 app-2048-65bd744dfb-z4b6z 1/1 Running 0 2m34s +kube-system aws-load-balancer-controller-6cbdb58654-fvskt 1/1 Running 0 26m +kube-system aws-load-balancer-controller-6cbdb58654-sc7dk 1/1 Running 0 26m +kube-system coredns-7b7bddbc85-jmbv6 1/1 Running 0 26m +kube-system coredns-7b7bddbc85-rgmzq 1/1 Running 0 26m +``` + +5. Check if the `aws-logging` configMap for Fargate Fluentbit was created. -# Output should look like below -game-2048 deployment-2048-7ff458c9f-mb5xs 1/1 Running 0 5h23m -game-2048 deployment-2048-7ff458c9f-qc99d 1/1 Running 0 4h23m -game-2048 deployment-2048-7ff458c9f-rm26f 1/1 Running 0 4h23m -game-2048 deployment-2048-7ff458c9f-vzjhm 1/1 Running 0 4h23m -game-2048 deployment-2048-7ff458c9f-xnrgh 1/1 Running 0 4h23m -kube-system aws-load-balancer-controller-7b69cfcc44-49z5n 1/1 Running 0 5h42m -kube-system aws-load-balancer-controller-7b69cfcc44-9vhq7 1/1 Running 0 5h43m -kube-system coredns-7c9d764485-z247p 1/1 Running 0 6h1m +```sh +kubectl -n aws-observability get configmap aws-logging -o yaml +apiVersion: v1 +data: + filters.conf: | + [FILTER] + Name parser + Match * + Key_Name log + Parser regex + Preserve_Key True + Reserve_Data True + flb_log_cw: "true" + output.conf: | + [OUTPUT] + Name cloudwatch_logs + Match * + region us-west-2 + log_group_name /fargate-serverless/fargate-fluentbit-logs20230509014113352200000006 + log_stream_prefix fargate-logs- + auto_create_group true + parsers.conf: | + [PARSER] + Name regex + Format regex + Regex ^(?