Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eks: After upgrading to 2.80, imported cluster resources fail to deploy #25938

Closed
dancmeyers opened this issue Jun 12, 2023 · 9 comments
Closed
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. duplicate This issue is a duplicate. effort/medium Medium work item – several days of effort p1

Comments

@dancmeyers
Copy link

dancmeyers commented Jun 12, 2023

Describe the bug

This is a copy of #25835, because that issue was auto-resolved when a PR claiming to fix it was merged. However the associated PR did not fix the issue, it merely updated the documentation. The issue is still present.


After upgrading CDK to 2.80, the EKS cluster creation role no longer can be assumed by separate handlers in stacks that import the EKS cluster as explained in #25674.

The suggested fix of importing the whole kubectlProvider results in a Modifying service token is not allowed error when trying to deploy the stack.

Expected Behavior

Kubernetes resources in the importing stack should still deploy in 2.80+ after adjusting the cluster importing as mentioned in #25674

Current Behavior

cdk8s chart in the importing stack fails to deploy after the upgrade.

StackNameRedacted: creating CloudFormation changeset...
2:20:09 PM | UPDATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | KubeClusterCommonK8sChartA8489958
Modifying service token is not allowed.

❌ StackNameRedacted failed: Error: The stack named StackNameRedacted failed to deploy: UPDATE_ROLLBACK_COMPLETE: Modifying service token is not allowed.
at FullCloudFormationDeployment.monitorDeployment (/workdir/node_modules/aws-cdk/lib/index.js:397:10236)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Object.deployStack2 [as deployStack] (/workdir/node_modules/aws-cdk/lib/index.js:400:149977)
at async /workdir/node_modules/aws-cdk/lib/index.js:400:135508

❌ Deployment failed: Error: The stack named StackNameRedacted failed to deploy: UPDATE_ROLLBACK_COMPLETE: Modifying service token is not allowed.
at FullCloudFormationDeployment.monitorDeployment (/workdir/node_modules/aws-cdk/lib/index.js:397:10236)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Object.deployStack2 [as deployStack] (/workdir/node_modules/aws-cdk/lib/index.js:400:149977)
at async /workdir/node_modules/aws-cdk/lib/index.js:400:135508

The stack named StackNameRedacted failed to deploy: UPDATE_ROLLBACK_COMPLETE: Modifying service token is not allowed.

Reproduction Steps

Export (app A, stack A)

const kubectlProvider = cluster.stack.node
  .findChild('@aws-cdk--aws-eks.KubectlProvider') as eks.KubectlProvider

new CfnOutput(scope, 'KubectlProviderRole', {
  exportName: 'KubectlRoleArn',
  value: kubectlProvider.roleArn,
});

new CfnOutput(scope, 'KubectlProviderHandlerRole', {
  exportName: 'KubectlHandlerRoleArn',
  value: kubectlProvider.handlerRole.roleArn,
});

const customResourceProvider = kubectlProvider.node.findChild('Provider') as custom_resource.Provider;

new CfnOutput(scope, 'KubectlServiceToken', {
  exportName: 'KubectlServiceToken',
  value: customResourceProvider.serviceToken,
});

// tried also
// new CfnOutput(scope, 'KubectlServiceToken', {
//   exportName: 'KubectlServiceToken',
//   value: customResourceProvider.serviceToken,
// });

N.B. I have verified that the value of the KubectlServiceToken output is the same as the ServiceToken value provided to the lambda in app/stack A if a manifest is added to the cluster there.

Import (app B, stack B)

const kubectlProvider = eks.KubectlProvider.fromKubectlProviderAttributes(
  scope,
  'KubectlProvider',
  {
    functionArn: Fn.importValue('KubectlServiceToken'),
    handlerRole: iam.Role.fromRoleArn(
      scope,
      'HandlerRole',
      Fn.importValue('KubectlHandlerRoleArn')
    ),
    kubectlRoleArn: Fn.importValue('KubectlRoleArn'),
  }
);

const openIdConnectProvider = iam.OpenIdConnectProvider.fromOpenIdConnectProviderArn(
  scope,
  'KubeOidcProvider',
  Fn.importValue('KubernetesOidcProviderArn')
);

const kubectlSecurityGroupId = Fn.importValue('KubernetesControlPlaneSGId');
return eks.Cluster.fromClusterAttributes(scope, 'KubeCluster', {
  clusterName,
  kubectlPrivateSubnetIds: Fn.split(',', Fn.importValue('KubernetesPrivateSubnetIds')),
  kubectlSecurityGroupId: kubectlSecurityGroupId,
  clusterSecurityGroupId: kubectlSecurityGroupId,
  vpc,
  openIdConnectProvider,
  kubectlLayer: new KubectlV25Layer(scope, 'kubectl'),
  kubectlProvider: kubectlProvider,
});

Then trying to deploy the (already existing) stack B fails with the mentioned error. The service token has changed, because it is now referencing a function arn of a function from stack A, instead of the one in stack B that it previously referenced.

Possible Solution

No response

Additional Information/Context

Both of the stacks have been in use for a long time and the only change to the cluster importing was to replace kubectlRoleArn with kubectlProvider. They are part of bigger ckd apps and this problem affects multiple stacks in multiple importing apps.

CDK CLI Version

2.81.0 (build bd920f2)

Framework Version

No response

Node.js Version

v18.14.2

OS

Linux

Language

Typescript

Language Version

TypeScript (3.9.10)

Other information

Came across the following earlier issue with similar symptoms from back when eks was experimental in CDK: #6129.

@dancmeyers dancmeyers added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 12, 2023
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Jun 12, 2023
@dancmeyers
Copy link
Author

dancmeyers commented Jun 12, 2023

As indicated in the original issue (which was labelled as P1), the problem is that the service token has changed. In the old (<2.80) style of cluster import an all new lambda is created, with a service token from the current stack:

const cluster = Cluster.fromClusterAttributes(this, 'ImportedCluster', {
  clusterName: 'TempCluster',
  clusterSecurityGroupId: sgId,
  vpc,
  openIdConnectProvider,
  kubectlLayer: new KubectlV25Layer(this, 'kubectl'),
  kubectlRoleArn: Fn.importValue('TempKubectlRole')
});

In the new (>=2.80) cluster import, the original provider from the source stack is referenced. This includes the original lambda function. As such, the service token does change if you are migrating from <2.80 to >=2.80, as it is the lambda arn:

const kubectlProvider = KubectlProvider.fromKubectlProviderAttributes(
  this,
  'KubectlProvider',
  {
    functionArn: Fn.importValue('TempKubectlHandler'),
    handlerRole: Role.fromRoleArn(
      this,
      'HandlerRole',
        Fn.importValue('TempKubectlLambdaRole')
      ),
      kubectlRoleArn: Fn.importValue('TempKubectlRole'),
  }
);

const cluster = Cluster.fromClusterAttributes(this, 'ImportedCluster', {
  clusterName: 'TempCluster'
  clusterSecurityGroupId: sgId,
  vpc,
  openIdConnectProvider,
  kubectlLayer: new KubectlV25Layer(this, 'kubectl'),
  kubectlProvider: kubectlProvider,
});

Changing the ID of the imported cluster fixes this issue and CDK attempts to create resources on EKS via the provided manifests, but because the cluster is a 'new' resource it thinks the manifests need to be freshly created, where in fact they already exist and must be updated, so you get errors from K8s about the resource already existing. And presumably even if you could get past that, you would have problems with CloudFormation then issuing a delete for the 'old' resources, which should not actually be deleted because they have just been updated.

@dancmeyers
Copy link
Author

N.B. the original issue was listed as a P1, so presumably this one should be too.

@pahud pahud self-assigned this Jun 12, 2023
@pahud pahud added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jun 12, 2023
@pahud
Copy link
Contributor

pahud commented Jun 12, 2023

Looking into this.

@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 effort/medium Medium work item – several days of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels Jun 12, 2023
@dancmeyers
Copy link
Author

dancmeyers commented Jun 12, 2023

I'm not sure where the response went? I got the email, but I was expecting to see it as a comment here as well, and I can't.

Is there any way to lock this down with tags? I was trying to work out how to do that, I was thinking to edit the kubectlRole policy to allow assumption if the principal doing the assuming had a specific tag, and then set that tag on the roles the lambdas in other stacks run as. Problem is I've mostly used tags for identification before, not access control, so I wasn't sure of the logic in IAM there, let alone translating it into CFn and/or CDK. Especially as there's role assumption to execute the imported stack lambda anyway, so the principal is an assumed-role, not a role (I think, from what I was seeing in the error messages?)

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 12, 2023
@pahud
Copy link
Contributor

pahud commented Jun 12, 2023

Hi @dancmeyers

I just deleted the previous response as I was still thinking if we can streamline the experience.

This is my workaround.

Consider we create this with 2.79.0

lib.ts

export class EksStack extends Stack {
  readonly clusterName: string;
  readonly vpc: ec2.IVpc;
  readonly KubectlRoleArn: string;
  readonly kubectlProviderHandlerRoleArn: string;
  readonly crProviderServiceToken: string;

  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // create a new cluster
    const vpc = getOrCreateVpc(this);
    const kubectlLayer = new KubectlLayer(this, 'KubectlLayer');
    const cluster = new eks.Cluster(this, 'EksCluster', {
      vpc,
      version: eks.KubernetesVersion.V1_26,
      kubectlLayer,
      defaultCapacity: 0,
    });

    this.clusterName = cluster.clusterName;
    this.vpc = vpc;
    this.KubectlRoleArn = cluster.kubectlRole!.roleArn;

    const kubectlProvider = cluster.stack.node.tryFindChild('@aws-cdk--aws-eks.KubectlProvider') as eks.KubectlProvider;
    this.kubectlProviderHandlerRoleArn = kubectlProvider.handlerRole.roleArn;
    this.crProviderServiceToken = (kubectlProvider.node.tryFindChild('Provider') as cr.Provider).serviceToken;

    // EKS service role
    new CfnOutput(this, 'ClusterRole', { value: cluster.role.roleArn });
    // EKS cluster creation role
    new CfnOutput(this, 'ClusterAdminRole', { value: cluster.adminRole.roleArn });
    // Kubectl Role
    new CfnOutput(this, 'KubectlRole', { value: cluster.kubectlRole!.roleArn });
  }
}

export interface EksImportStackProps extends StackProps {
  readonly clusterName: string;
  readonly vpc: ec2.IVpc;
  readonly kubectlRoleArn: string;
}

export class EksImportStack extends Stack {
  constructor(scope: Construct, id: string, props: EksImportStackProps) {
    super(scope, id, props);

    const vpc = props.vpc;

    const importedCluster = eks.Cluster.fromClusterAttributes(this, 'ImportedCluster', {
      vpc,
      clusterName: props.clusterName,
      kubectlRoleArn: props.kubectlRoleArn,
    });

    importedCluster.addManifest('nginx-namespace', {
      apiVersion: 'v1',
      kind: 'Namespace',
      metadata: {
        name: 'nginx',
      },
    });
  }
}

And the app.ts

const eksStack = new EksStack(app, 'eks-cluster', { env })
new EksImportStack(app, 'import-eks', { 
    env,
    clusterName: eksStack.clusterName,
    kubectlRoleArn: eksStack.KubectlRoleArn,
    vpc: eksStack.vpc,
});

After deployment, we will have a nested stack as the KubectlProvider of the import-eks stack.

Now, upgrade to cdk 2.83.1 which is the latest version as of today.

re-deploy the main stack:

npx cdk diff eks-cluster
npx cdk deploy eks-cluster

update the import-eks stack as below:

export class EksImportStack extends Stack {
  constructor(scope: Construct, id: string, props: EksImportStackProps) {
    super(scope, id, props);

    const vpc = props.vpc;

    const importedCluster = eks.Cluster.fromClusterAttributes(this, 'ImportedCluster', {
      vpc,
      clusterName: props.clusterName,
      kubectlRoleArn: props.kubectlRoleArn,
      // kubectlProvider,
    });

    importedCluster.addManifest('nginx-namespace', {
      apiVersion: 'v1',
      kind: 'Namespace',
      metadata: {
        name: 'nginx',
      },
    });

    // get the kubectlProvider from this stack
    const kubectlProvider = Stack.of(this).node.findChild(`${Names.nodeUniqueId(importedCluster.node)}-KubectlProvider`) as eks.KubectlProvider;
    // Print the kubectlProvider handler role arn. This arn should be added in the creation role trust policy.
    new CfnOutput(this, 'ImportedKubectlProviderHandlerRole', { value: kubectlProvider.handlerRole.roleArn })
  }
}

re-deploy it, you should get the ImportedKubectlProviderHandlerRole role ARN from the output, which should be added into the trust policy of the cluster creation role.

Now let's add this in the eks-test stack

    cluster.adminRole.assumeRolePolicy?.addStatements(new iam.PolicyStatement({
      actions: ['sts:AssumeRole'],
      principals: [
        new iam.ArnPrincipal(YOUR_ImportedKubectlProviderHandlerRole_ARN),
      ],
    }))

re-deploy it to add ImportedKubectlProviderHandlerRole into the trust policy of the creation role.

Now, let's add additional manifest we'd like to add in the imported cluster stack

For example:

importedCluster.addManifest('nginx-namespace3', {
      apiVersion: 'v1',
      kind: 'Namespace',
      metadata: {
        name: 'nginx3',
      },
    });

re-deploy the import-eks stack. It should work!

Please note in this sample we are not specifying kubectlProvider in the eks.Cluster.fromClusterAttributes(). We just add the KubectlProviderHandlerRole into the trust policy of the cluster creation role.

Let me know if it works for you.

@pahud pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 12, 2023
@pahud
Copy link
Contributor

pahud commented Jun 12, 2023

Another option is to reuse the KubectlProviderHandlerRole from the cluster stack in the import stack.

Check out this sample:

export class EksStack extends Stack {
  readonly clusterName: string;
  readonly vpc: ec2.IVpc;
  readonly KubectlRoleArn: string;
  readonly kubectlProviderHandlerRoleArn: string;

  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // create a new cluster
    const vpc = getOrCreateVpc(this);
    const kubectlLayer = new KubectlLayer(this, 'KubectlLayer');
    const cluster = new eks.Cluster(this, 'EksCluster', {
      vpc,
      version: eks.KubernetesVersion.V1_26,
      kubectlLayer,
      defaultCapacity: 0,
    });

    this.clusterName = cluster.clusterName;
    this.vpc = vpc;
    this.KubectlRoleArn = cluster.kubectlRole!.roleArn;

    const kubectlProvider = cluster.stack.node.tryFindChild('@aws-cdk--aws-eks.KubectlProvider') as eks.KubectlProvider;
    this.kubectlProviderHandlerRoleArn = kubectlProvider.handlerRole.roleArn;

    // EKS service role
    new CfnOutput(this, 'ClusterRole', { value: cluster.role.roleArn });
    // EKS cluster creation role
    new CfnOutput(this, 'ClusterAdminRole', { value: cluster.adminRole.roleArn });
    // Kubectl Role
    new CfnOutput(this, 'KubectlRole', { value: cluster.kubectlRole!.roleArn });
    // KubectlProvider Handler Role
    new CfnOutput(this, 'KubectlProviderHandlerRole', { value: kubectlProvider.handlerRole.roleArn });
  }
}

export interface EksImportStackProps extends StackProps {
  readonly clusterName: string;
  readonly vpc: ec2.IVpc;
  readonly kubectlRoleArn: string;
  readonly kubectlProviderHandlerRole: string;
}

export class EksImportStack extends Stack {
  constructor(scope: Construct, id: string, props: EksImportStackProps) {
    super(scope, id, props);

    const vpc = props.vpc;

    const importedCluster = eks.Cluster.fromClusterAttributes(this, 'ImportedCluster', {
      vpc,
      clusterName: props.clusterName,
      kubectlRoleArn: props.kubectlRoleArn,
      // kubectlProvider,
    });

    importedCluster.addManifest('nginx-namespace', {
      apiVersion: 'v1',
      kind: 'Namespace',
      metadata: {
        name: 'nginx',
      },
    });

    // get the kubectlProvider from this stack
    const kubectlProvider = Stack.of(this).node.findChild(`${Names.nodeUniqueId(importedCluster.node)}-KubectlProvider`) as eks.KubectlProvider;
    const kubectlHandler = kubectlProvider.node.findChild('Handler') as lambda.Function;
    (kubectlHandler.node.defaultChild as lambda.CfnFunction).addPropertyOverride('Role', props.kubectlProviderHandlerRole)

    importedCluster.addManifest('nginx-namespace4', {
      apiVersion: 'v1',
      kind: 'Namespace',
      metadata: {
        name: 'nginx4',
      },
    });
  }
}

This essentially override the KubectlProviderHandlerRole in the import stack with the one from the cluster stack. This works as well.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 13, 2023
@dancmeyers
Copy link
Author

I can't get this to work:

const kubectlProvider = Stack.of(this).node.findChild(`${Names.nodeUniqueId(importedCluster.node)}-KubectlProvider`) as eks.KubectlProvider;

It says there's no node with that ID. It can't even find a node with the ID Names.nodeUniqueId(importedCluster.node). It can find a node with the ID ImportedCluster (the plain string we/you set as the ID for the importedCluster object, but then it can't find ImportedCluster-KubectlProvider. I know there must be a provider generated somewhere, because we manage to update manifests using it, but looking at the code I can't see where it is created. I only see that for 'direct' clusters...

@iliapolo iliapolo added p1 duplicate This issue is a duplicate. and removed p2 labels Jun 13, 2023
@iliapolo
Copy link
Contributor

Closing in favor of #25835 (which was re-opened)

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. duplicate This issue is a duplicate. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

No branches or pull requests

3 participants