Skip to content

EKS IRSA role assumption fails silently due to dependency issue #3555

Closed
@d-t-w

Description

@d-t-w

Describe the bug

Kpow for Apache Kafka is an enterprise toolkit for Apache Kafka that includes multiple AWS libraries including LicenseManager, MSK Iam Auth, and AWS Glue.

Including the AWS Glue dependency silently breaks IRSA, causing the pod to run under the NodeInstanceRole rather than the properly configured IRSA role.

This is likely to impact projects intending to use IRSA with MSK, as the MSK IAM library and AWS Glue libraries are very likely to be included in those projects. See: aws/aws-msk-iam-auth#55

Isolating this error required a full deploy into an IRSA enabled EKS environment with debug logging in place.

Expected Behavior

Kpow prior to the addition of the AWS Glue dependency implemented IRSA correctly.

We expect to add the AWS Glue dependency to Kpow without breaking IRSA.

Current Behavior

After adding the AWS Glue dependency Kpow reverted to operating under the EKS Node Instance role:

01:30:45.217 ERROR [main] instruct.system – [:instruct.system/init :kafka/primary-cluster] instruction failed
software.amazon.awssdk.services.licensemanager.model.AuthorizationException: User: arn:aws:sts::489728315157:assumed-role/eksctl-awsmp-kpow-example-nodegro-NodeInstanceRole-RF0DW6JPCQ07/i-0dd68413a10f85f5c is not authorized to perform: license-manager:CheckoutLicense because no identity-based policy allows the license-manager:CheckoutLicense action (Service: LicenseManager, Status Code: 400, Request ID: c2546bfe-6a8e-4d0f-a635-36d07ddacad2)

Turning on debug logging shows the WebIdentityTokenCredentialsProvider is not executing due to an error resolving the http implementation,

01:30:44.137 DEBUG [main] s.a.a.a.c.AwsCredentialsProviderChain – Unable to load credentials from WebIdentityTokenCredentialsProvider(): Multiple HTTP implementations were found on the classpath. To avoid non-deterministic loading implementations, please explicitly provide an HTTP client via the client builders, set the software.amazon.awssdk.http.service.impl system property with the FQCN of the HTTP service to use as the default, or remove all but one HTTP implementation from the classpath
software.amazon.awssdk.core.exception.SdkClientException: Multiple HTTP implementations were found on the classpath. To avoid non-deterministic loading implementations, please explicitly provide an HTTP client via the client builders, set the software.amazon.awssdk.http.service.impl system property with the FQCN of the HTTP service to use as the default, or remove all but one HTTP implementation from the classpath
        at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:102)
        at software.amazon.awssdk.core.internal.http.loader.ClasspathSdkHttpServiceProvider.loadService(ClasspathSdkHttpServiceProvider.java:62)

The error is caused by conflicting http implementations brought in by:

LicenseManager or MSK (Iam Auth): ApacheHttpClient
AWS Glue: URLConnection

Reproduction Steps

See minimum viable reproducer here: https://github.com/factorhouse/aws-irsa-deps-reproducer

Possible Solution

Manually set the http client implementation:

(System/setProperty "software.amazon.awssdk.http.service.impl" "software.amazon.awssdk.http.apache.ApacheSdkHttpService")

It is not clear how this setting impacts any of the libraries, but Glue/LM/MSK appear to work with that setting and IRSA roles are resumed.

Additional Information/Context

No response

AWS Java SDK version used

2.18.20

JDK version used

java --version openjdk 11.0.16 2022-07-19

Operating System and version

Mac OS

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.p3This is a minor priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions