Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-eks): aws-auth ConfigMap is still being replaced #19218

Open
IsaacLeeWebDev opened this issue Mar 3, 2022 · 7 comments
Open

(aws-eks): aws-auth ConfigMap is still being replaced #19218

IsaacLeeWebDev opened this issue Mar 3, 2022 · 7 comments
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1

Comments

@IsaacLeeWebDev
Copy link

IsaacLeeWebDev commented Mar 3, 2022

What is the problem?

#7981 was not fixed by #8447 – The aws-auth ConfigMap is still being replaced, rather than appended to, any time the Cluster.awsAuth getter is invoked, which happens whenever you add a Fargate Profile to a cluster, forcing all Fargate Profiles to originate from the same instance of Cluster from @aws-cdk/aws-eks/cluster.ts.

In the AwsAuth constructor, the following manifest is added added to the cluster:

    new KubernetesResource(this, 'manifest', {
      cluster: props.cluster,
      manifest: [
        {
          apiVersion: 'v1',
          kind: 'ConfigMap',
          metadata: {
            name: 'aws-auth',
            namespace: 'kube-system',
          },
          data: {
            mapRoles: this.synthesizeMapRoles(),
            mapUsers: this.synthesizeMapUsers(),
            mapAccounts: this.synthesizeMapAccounts(),
          },
        },
      ],
    });

This constructor is invoked whenever the Cluster.awsAuth getter is invoked, which happens any time you invoke Cluster.awsAuth.addRoleMapping, which happens any time you invoke Cluster.addFargateProfile.

What this means is that if any software changes the mapRoles of the aws-auth ConfigMap without invoking Cluster.awsAuth.addRoleMapping on that particular instance of Cluster, those changes will be erased upon the next addition of a Fargate Profile.

In other words, upon each instantiation of Cluster where the Cluster.awsAuth getter is invoked, all ARNs of all PodExecutionRoles added manually with kubectl, other CDK apps, other IaC tools, an L1 Construct like CfnFargateProfile, etc. will be removed from the aws-auth ConfigMap, causing this error below, and preventing any new pods from spinning up.

fargate-scheduler Misconfigured Fargate Profile: fargate profile <PROFILE_NAME> blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods.

Reproduction Steps

See #7981

What did you expect to happen?

See #7981

What actually happened?

See #7981

CDK CLI Version

1.85.0

Framework Version

1.85.0

Node.js Version

v14.17.1

OS

macOS

Language

Typescript

Language Version

TypeScript (4.0.3)

Other information

EKS version

1.19

Fix suggestion

Instead of applying a manifest to the cluster, an AwsAuth instance should create the aws-auth manifest if and only if the ConfigMap doesn't already exist in the cluster.

When aws-auth already exists, the result should be a kubectl patch of the existing aws-auth ConfigMap (pictured below):

image

@IsaacLeeWebDev IsaacLeeWebDev added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 3, 2022
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Mar 3, 2022
@otaviomacedo
Copy link
Contributor

otaviomacedo commented Mar 4, 2022

Hi, @IsaacLeeWebDev. Thanks for bringing this up. Indeed, if we want to preserve changes made outside the CDK app, the custom resource should issue a patch command instead of apply.

That said, I'm not sure this is the correct thing to do. The principle behind a CDK app (and CloudFormation, for that matter) is that it represents the full state of your infrastructure. Any out-of-band change should be disregarded in the next deploy of the app.

Leaving it up for discussion for the moment.

@otaviomacedo otaviomacedo added p1 effort/small Small work item – less than a day of effort and removed needs-triage This issue or PR still needs to be triaged. labels Mar 4, 2022
@otaviomacedo otaviomacedo removed their assignment Mar 4, 2022
@otaviomacedo otaviomacedo added effort/medium Medium work item – several days of effort and removed effort/small Small work item – less than a day of effort labels Mar 4, 2022
@IsaacLeeWebDev
Copy link
Author

IsaacLeeWebDev commented Mar 4, 2022

@otaviomacedo For what it's worth, for security and governance reasons, it is not viable for us to have a single tool or a single git repo that describes the desired state of all of our infrastructure.

It may also be worth noting that this bug potentially impacts those that use the approach recommended in #13153

@IsaacLeeWebDev
Copy link
Author

IsaacLeeWebDev commented Mar 4, 2022

As a workaround, we're currently pushing up the role ARN of the PodExecutionRole defined in another CDK app as a CfnOutput, fetching that output where the cluster is defined, and then explicitly calling Cluster.awsAuth.addRoleMapping like so:

myCluster.awsAuth.addRoleMapping(
  iam.Role.fromRoleArn(this, `${somethingUnique}-arn-ref`, roleArnFromCfnOutput),
  {
    username: 'system:node:{{SessionName}}',
    groups: ['system:bootstrappers', 'system:nodes', 'system:node-proxier'],
  },
);

This is less than ideal because the CDK apps are still tightly coupled to one another... albeit, less so.

@jordangullen
Copy link

jordangullen commented Apr 4, 2022

Chiming in that we are also facing the same issue following the recommended approach in #13153

I agree with @IsaacLeeWebDev that it's not viable to manage all of our desired infrastructure state in the one place.
It would be ideal to manage our Fargate Cluster in one CDK app and to create a Fargate Profile and PodExecutionRole in a separate CDK app per microservice.

While I agree with the principle of CDK (and CloudFormation) it is possible to apply resources to existing clusters in other CDK apps, does this use case not violate those same principles or is there a difference here I'm missing?

I would add that this entire issue caught me way off guard; in the documentation it wasn't clear to me the profiles would suffer different limitations to adding any other resources into the cluster. While this issue remains perhaps a note added to the EKS construct documentation would help make this behaviour clear?

Thanks @IsaacLeeWebDev for the overview on the issue and your workaround. I'll likely be looking at applying a similar workaround until this is resolved.

@jordangullen
Copy link

Perhaps an alternative approach that may be more in line with the principles above is to make changes to the AWS IAM Authenticator.

If patching the aws-auth ConfigMap from various sources is unacceptable perhaps many aws-auth ConfigMaps can be added with different names and discovered via appropriate selectors/annotations. Then each FargateProfile creates its own ConfigMap for the mappings.

Keen to hear thoughts.

@jordangullen
Copy link

Hi @otaviomacedo

Has there been any updates or thoughts on this issue?

@iliapolo iliapolo changed the title (aws-eks): #7981 was not fixed by #8447 – aws-auth ConfigMap is still being replaced (aws-eks): aws-auth ConfigMap is still being replaced May 16, 2023
@iliapolo
Copy link
Contributor

As noted here and in the related issues, the problem is that the aws-auth config map is a global singleton shared resource per cluster. This makes it different from regular workload manifests, which are normally scoped to a specific microservice. This is also why ICluster doesn't have the .awsAuth property, nor does it allow defining a fargate profile (which would require the config map).

In general, the CDK is not designed to support managing resources from both within the CDK app and external to it (for example kubectl). This scenario is different though because is presents a conflict between two CDK constructs, even within the same app. One being the Cluster resource, and one being the CfnFargateProfile construct.

@jordangullen your suggestion around the AWS IAM authenticator seems interesting and sounds like the right path forward.

I'm going to mark this as a feature request because technically there is no unexpected behavior here. We are keeping track of it but currently we have no concrete plans to address it.

We use +1s on this issue to help prioritize our work, and are happy to re-evaluate the prioritization of this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization.

If anyone is looking to submit a PR here, please contact us before starting any work because there are few important implementation details needed to be discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1
Projects
None yet
Development

No branches or pull requests

4 participants