Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS IAM Roles for Service Accounts (Pods) #23

Closed
pauncejones opened this issue Dec 5, 2018 · 108 comments
Closed

EKS IAM Roles for Service Accounts (Pods) #23

pauncejones opened this issue Dec 5, 2018 · 108 comments

Comments

@pauncejones
Copy link
Contributor

@pauncejones pauncejones commented Dec 5, 2018

Update 1/9/19:

After talking about this internally, we've been working on a proposed solution for this. Below is a writeup on what we're thinking, and we've included some example scripts so you can get a feel for how we expect this to work.

Our plan for IAM and Kubernetes integration

A recent Kubernetes feature, TokenRequestProjection, allows users of Kubernetes to mount custom projected service account tokens in their pods. A “projected service account” is a bearer token that is intended for use outside of the cluster. Conveniently, these projected service account tokens are also valid OpenID Connect (OIDC) tokens. AWS IAM has supported OIDC as a federated identity provider since 2014, which has allowed customers to use an external identity to assume an IAM role.

By combining these two features, an application running in a pod can pass the projected service account token along with a role ARN to the STS API AssumeRoleWithWebIdentity, and get back temporary role credentials! In order for this to work properly, there is some setup required to create an OIDC provider, and update an IAM role's trust policy so that the Kubernetes service account for a particular cluster is permitted to assume the role.

Some of the advantages to this approach are that any pod (including host pods) can assume a role, there is not a reliance on Kubernetes annotations for security, there are not any extra processes that need to be run on nodes, and you will be able to have nodes without any IAM permissions of their own.

In the coming months we will be building out functionality in EKS to create and manage OIDC providers for EKS clusters, as well as configuring IAM roles that can be used in an EKS cluster. We will also be adding support for this authentication mechanism in the AWS SDKs.

Totally open for comments, questions or suggestions on this -- let us know in the comments!

Micah Hausler (@micahhausler), System Development Engineer on EKS

@pauncejones pauncejones created this issue from a note in containers-roadmap (We're Working On It) Dec 5, 2018
@pauncejones pauncejones added the EKS label Dec 5, 2018
@christopherhein
Copy link

@christopherhein christopherhein commented Dec 13, 2018

Exciting to see this get so much attention. Here is an implementation that was brought up in sig-aws back in July of this year, those of you interested if you want to provide feedback it will help to guide the implementation. kubernetes/community#2329

We'll publish more about our approach soon.

👍

@gtaylor
Copy link

@gtaylor gtaylor commented Dec 13, 2018

Ahh, I was looking for that.

Will that KEP eventually me moved to https://github.com/kubernetes/enhancements ? It looks like kubernetes/community#2329 was closed due to KEPs being moved out to k/enhancements. Seems to have halted discussion and consideration.

@christopherhein
Copy link

@christopherhein christopherhein commented Dec 13, 2018

@gtaylor that was actually incorrect. Sorry about that. That was another implementation from the community. We'll have more details about our implementation coming out soon. Sorry for the confusion.

@cpaika
Copy link

@cpaika cpaika commented Dec 20, 2018

Big fan of this - our organization can't adopt EKS until this is resolved.

@sbkg0002
Copy link

@sbkg0002 sbkg0002 commented Dec 23, 2018

Same here, glad this is shared upfront.

@gtaylor
Copy link

@gtaylor gtaylor commented Dec 23, 2018

@007 kube2iam can not handle rapid pod churn and lacks some controls for selectively limiting metadata server exposure. It is not a complete, final solution to this problem.

Source: have used kube2iam in production at a large scale.

@Vlaaaaaaad
Copy link

@Vlaaaaaaad Vlaaaaaaad commented Dec 23, 2018

@gtaylor : did you try kiam too? Did you find a workaround for the rapid pod churn issues?

I'm in the process of implementing some very spiky workloads and I'm trying to prepare the best I can.

@gtaylor
Copy link

@gtaylor gtaylor commented Dec 23, 2018

I think we are going to stick it out for the "final" solution (the one this issue is tracking).

We had looked at kiam but aren't hurting badly enough to the point of having to make such a large change (for us). That might change, though. Kiam is probably where we'll go if we end up in a spot where kube2iam becomes untenable.

@oulydna
Copy link

@oulydna oulydna commented Jan 9, 2019

my EKS friends, any rough ETA on this one?

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Jan 9, 2019

@realAndyLuo "Working On It" https://github.com/aws/containers-roadmap/projects/1 :)

@oulydna
Copy link

@oulydna oulydna commented Jan 9, 2019

thanks @micahhausler . Does "Working On it" come with any target date? or too much a spoiler to ask for

@skyzyx
Copy link

@skyzyx skyzyx commented Jan 9, 2019

@realAndyLuo: Never. As a former Amazonian, I can tell you that it'll be ready when it's ready. "Working on it" is as close as you'll ever get to a time commitment.

Cheers. 👍

@mikkeloscar
Copy link

@mikkeloscar mikkeloscar commented Jan 9, 2019

I have been working on a replacement for kube2iam/kiam in the form of https://github.com/mikkeloscar/kube-aws-iam-controller. Currently it has only focused on robustness and doesn't have features to restrict what roles you can request within a cluster (there are open issues for that). It also only works with some of the AWS SDKs but eliminates all the race conditions which are inherit in the design of kube2iam and kiam.

Maybe it's interesting for some of you.

@christopherhein
Copy link

@christopherhein christopherhein commented Jan 10, 2019

Updated description by @micahhausler

cc @gtaylor @cpaika @sbkg0002 @realAndyLuo @007 @mikkeloscar

@cullenmcdermott
Copy link

@cullenmcdermott cullenmcdermott commented Jan 11, 2019

The new proposal looks interesting. Quick question though, how would I get/distribute the tokens? Would each token map to one role in IAM?

@mikkeloscar
Copy link

@mikkeloscar mikkeloscar commented Jan 11, 2019

By combining these two features, an application running in a pod can pass the projected service account token along with a role ARN to the STS API AssumeRoleWithWebIdentity, and get back temporary role credentials! In order for this to work properly, there is some setup required to create an OIDC provider, and update an IAM role's trust policy so that the Kubernetes service account for a particular cluster is permitted to assume the role.

Does this mean that applications have to actively implement this, or would the AWS SDK automatically do it? What I wanted to avoid with https://github.com/mikkeloscar/kube-aws-iam-controller is that applications needs to implement a custom SDK setup for running on Kubernetes. It should just work out of the box whether you run the application on bare EC2 or on Kubernetes or any other AWS like environment IMO. If this is not the case, then there will be a long tail of open source applications which needs to be updated to support this.

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Jan 11, 2019

@cullenmcdermott

The new proposal looks interesting. Quick question though, how would I get/distribute the tokens? Would each token map to one role in IAM?

Projected service account tokens are issued via the API server, and mounted via the kubelet. You can add a projected token today on newer versions of Kubernetes by using the projected volume type.

kind: Pod
apiVersion: v1
metadata: 
  name: pod-name
  namespace: default
spec:
  serviceAccountName: default
  containers: 
  - name: container-name
    image: container-image:version
    volumeMounts:
    - mountPath: "/var/run/secrets/something/serviceaccount/"
      name: projected-token
  volumes:
  - name: projected-token
    projected:
      sources:
      - serviceAccountToken:
          audience: "client-id"
          expirationSeconds: 86400
          path: token 

The thinking right now is you would add an annotation to either the ServiceAccount or the Pod (not totally decided yet) with the IAM role ARN, and the token volume, volumeMount, and required env AWS environment variables (variable names TBD, but the SDKs will need a role ARN and token path) would get added via a mutating webhook.

On a high level the user workflow would look like this:

  • Create an EKS cluster, OIDC identity provider gets created in IAM for the cluster automatically
  • User whitelists a specific ServiceAccount namespace/name for a specific cluster to assume the preexisting IAM role, which updates the role's trust policy (similar to this, but we'll make it easier than editing the JSON yourself)
  • User annotates ServiceAccount with the IAM role ARN
  • All pods using that service account get the projected volume and environment variables added by the webhook
  • Updated AWS SDKs running inside the pod know to look for env vars specifying the role and OIDC token path.

@mikkeloscar

Does this mean that applications have to actively implement this, or would the AWS SDK automatically do it?

It would be automatic with new versions of the SDK.

@pingles
Copy link

@pingles pingles commented Jan 11, 2019

This sounds cool, we'll definitely be looking to adopt (I say that as one of the creators of https://github.com/uswitch/kiam) 😀 Glad to see this in the roadmap.

Given the SDK update requirement we'd probably have to run side-by-side for a while as all our teams update their apps and libs etc but sounds like that's doable too so all good to me. Thanks to the team there for thinking on it and not just taking the first suggestion!

@mustafaakin
Copy link

@mustafaakin mustafaakin commented Jan 22, 2019

Would it be possible without upgrading all AWS SDK? It would be nice that if this component of the SDKs, at least for Java, be a seperate component until we can upgrade?

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Jan 29, 2019

@mustafaakin for applications that couldn't transition right away, you could run a sidecar that would perform the sts:AssumeRoleWithWebIdentity call and expose those credentials on a localhost HTTP endpoint within the pod. You'd have to configure the application container to use the sidecar by setting the environment variable AWS_CONTAINER_CREDENTIALS_FULL_URI.

@gtaylor
Copy link

@gtaylor gtaylor commented Jan 29, 2019

Does this also apply to both/boto3?

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Jan 29, 2019

Yes, pretty much any SDK within the last 2 years would have AWS_CONTAINER_CREDENTIALS_FULL_URI support.

@mikkeloscar
Copy link

@mikkeloscar mikkeloscar commented Jan 29, 2019

@mustafaakin for applications that couldn't transition right away, you could run a sidecar that would perform the sts:AssumeRoleWithWebIdentity call and expose those credentials on a localhost HTTP endpoint within the pod. You'd have to configure the application container to use the sidecar by setting the environment variable AWS_CONTAINER_CREDENTIALS_FULL_URI.

Isn't this just a recipe for race conditions? :) If your application container starts and requests the IAM role before the sidecar container has done assumeRole, then your application fails to get the credentials.

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Jan 29, 2019

@mikkeloscar You are right, but I would also say it depends on the implementation of the application. Most AWS SDK's have a retry for metadata credential fetching, and some applications may not initialize the AWS SDK at startup. For those that do and exit, Kubernetes should restart that container while still bringing the sidecar online. It is not the optimal solution, but for cases where an newer SDK update is not immediately available, it could work.

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Sep 12, 2019

@MarcusNoble in the SDKs it’s up to the client to figure it out.

Here’s an example in Go of getting the root fingerprint

@marcincuber
Copy link

@marcincuber marcincuber commented Sep 14, 2019

@micahhausler I have created OIDC provide for multiple EKS. I also obtained OIDC thumbprint separately for each provider. Am I correct by saying that OIDC thumbprint is always the same for EKS?

@MarcusNoble
Copy link

@MarcusNoble MarcusNoble commented Sep 16, 2019

@marcincuber that is the same conclusion I came to also, and same in all regions. For the time being I have added the thumbprint as a hardcoded string in my terraform. Not sure how often / if this value changes.

@marcincuber
Copy link

@marcincuber marcincuber commented Sep 16, 2019

@MarcusNoble I believe the root CA expires in like 2034 or something like that. I have hardcoded it for now as well. Thanks for confirming that you see the same behaviour.

@MarcusNoble
Copy link

@MarcusNoble MarcusNoble commented Sep 16, 2019

Ha! Well... I'll set myself a reminder 😆

I guess it'd only be an issue if the root CA needs to be recalled for whatever reason. Though I have no idea how that would be handled even if you'd done it through the web UI.

@dahu33
Copy link

@dahu33 dahu33 commented Sep 24, 2019

One thing I'm finding extremly frustrating when using IAM Roles for Service Accounts is that the OIDC_PROVIDER has to be hardcorded in the cloudformation templates...

In the example policy below, the StringEquals condition requires the OIDC_PROVIDER in the key but AFAIK, cloudformation doesn't allow for parameter substitution in dictionary key...

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::AWS_ACCOUNT_ID:oidc-provider/OIDC_PROVIDER"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "OIDC_PROVIDER:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME"
        }
      }
    }
  ]
}

Is there any official (or workaround) solution to this issue?

@kenske
Copy link

@kenske kenske commented Sep 24, 2019

@dahu33 use terraform instead, it will make your life easier in so many other ways.

@savithruml
Copy link

@savithruml savithruml commented Sep 24, 2019

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::AWS_ACCOUNT_ID:oidc-provider/OIDC_PROVIDER"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"OIDC_PROVIDER:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME"
}
}
}
]
}

We ended up using Jinja2 substitution to build our templates

@jqmichael
Copy link

@jqmichael jqmichael commented Sep 25, 2019

@max-rocket-internet
Copy link

@max-rocket-internet max-rocket-internet commented Sep 25, 2019

We ended up using Jinja2

🙁

Has anyone tried using AWS CDK to write their cfn template?

🙁

use terraform instead, it will make your life easier in so many other ways.

This x1000 🚀

@savithruml
Copy link

@savithruml savithruml commented Sep 25, 2019

Has anybody tried to read the token as a non-root user? The path where the token is stored /var/run/secrets/eks.amazonaws.com/serviceaccount/token is owned by root, so not sure how this will work. Looking for suggestions.

@fimbulvetr
Copy link

@fimbulvetr fimbulvetr commented Sep 25, 2019

Has anybody tried to read the token as a non-root user? The path where the token is stored /var/run/secrets/eks.amazonaws.com/serviceaccount/token is owned by root, so not sure how this will work. Looking for suggestions.

After reading the source code for k8s I found that you must have:

      securityContext:
        fsGroup: 1000 // should be the group id your process is running as in the container

on the container that will read the volume. If you don't have an fsGroup it will hardcoded to 0600. If you do have fsGroup spec'd, it will be hardcoded to 0660. This doesn't seem to be configurable.

There really should be a bug report for this as it seems like something that would come up often. If the process in your container runs as root, you wouldn't see this error but hopefully that's not happening often.

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Sep 25, 2019

@davidshin
Copy link

@davidshin davidshin commented Oct 25, 2019

I'm trying to get the AWS CNI Plugin to use IRSA using this guide (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-cni-walkthrough.html), but I'm having trouble getting the aws-node pods to assume role specified in the service account annotation.

The documentation here (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html) says that the minimum AWS Go SDK version is 1.23.13, but
amazon-vpc-cni-k8s is using 1.21.7 in master (https://github.com/aws/amazon-vpc-cni-k8s/blob/master/go.mod).

The guide mentioned seems to suggest to use AWS CNI Plugin v1.5.3 ("If your CNI version is earlier than 1.5.3, use the following command to upgrade your CNI version to the latest version"...), but v1.5.3 is also using the AWS Go SDK v1.21.7, which is earlier than the minimum required version for IRSA support.

Is the guide wrong?

(UPDATE: turns out that my trust policy was configured incorrectly. Everything is working as expected using AWS CNI Plugin v1.5.4. But still unsure why the discrepancy between the minimum AWS GO SDK (v1.23.13) and the version used in AWS CNI Plugin v1.5.4 (GO SDK v1.21.7)

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Oct 25, 2019

@davidshin This is being tracked in aws/amazon-vpc-cni-k8s#663

@davidshin
Copy link

@davidshin davidshin commented Oct 25, 2019

@davidshin This is being tracked in aws/amazon-vpc-cni-k8s#663

Thanks @micahhausler. As an update, I was actually able to get it all to work, so unless I'm missing something, I believe that the stated minimum AWS GO SDK version is incorrect here https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html

@micahhausler
Copy link
Member

@micahhausler micahhausler commented Oct 25, 2019

@davidshin The guide lists AWS SDK versions where the API model for eks:DescribeCluster includes the cluster.identity.oidc.issuer field is included, not the version the credential provider was first supported.

@davidshin
Copy link

@davidshin davidshin commented Oct 25, 2019

@davidshin The guide lists AWS SDK versions where the API model for eks:DescribeCluster includes the cluster.identity.oidc.issuer field is included, not the version the credential provider was first supported.

@micahhausler Out of curiosity, why would a container need to call eks:DescribeCluster, and retrieve the oidc issuer? Wouldn't the inserted AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN be enough to assume the role?

@devkid
Copy link

@devkid devkid commented Oct 25, 2019

The guide lists AWS SDK versions where the API model for eks:DescribeCluster includes the cluster.identity.oidc.issuer field is included, not the version the credential provider was first supported.

@micahhausler Then the description in the guide is wrong (or at least misleading)?

The containers in your pods must use an AWS SDK version that supports assuming an IAM role via an OIDC web identity token file. Be sure to use at least the minimum SDK versions listed below:

@devkid
Copy link

@devkid devkid commented Oct 30, 2019

@micahhausler ping. Can you clarify?

@ulm0
Copy link

@ulm0 ulm0 commented Mar 31, 2021

Hi guys, i've got a question regarding session refresh, what if i have a pod that is always running and it needs to consumer AWS services for more than 86400 seconds, what happens then? does the session automatically refresh?

@06kellyjac
Copy link

@06kellyjac 06kellyjac commented Apr 1, 2021

@ulm0 In the future can you please open a separate issue for that question. 🙂
It's not great to revive issues that haven't seen activity in over a year for mostly unrelated questions


As a note searching "eks iam roles for service accounts" then clicking on the first link and going to the "Technical overview" looks like everything you needed.
Please also try and search around a bit ahead of posting questions in GH issues

By default, the kubelet refreshes the token if it is older than 80 percent of its total TTL, or if the token is older than 24 hours. You can modify the expiration duration for any account, except the default service account, with settings in your pod spec.

https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html#pod-configuration
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection

Hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
containers-roadmap
  
Just Shipped
Linked pull requests

Successfully merging a pull request may close this issue.

None yet