Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS01 Challenge via AWS Route53 on AWS EKS 1.15 not working #3079

Closed
JanisOrlovs opened this issue Jul 8, 2020 · 22 comments
Closed

DNS01 Challenge via AWS Route53 on AWS EKS 1.15 not working #3079

JanisOrlovs opened this issue Jul 8, 2020 · 22 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@JanisOrlovs
Copy link

JanisOrlovs commented Jul 8, 2020

Describe the bug:
When doing Let's Encrypt validation via Route53 running on EKS 1.15 with IRSA getting AccessDenied

E0708 12:18:28.403757       1 controller.go:143] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="error instantiating route53 challenge solver: unable to assume role: AccessDenied: User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/ajetstack-cert-manager/1594210708202645062 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXXX:role/jetstack-cert-manager\n\tstatus code: 403, request id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" "key"="example/example-internal.our-domain.com-4004189744-1292225756-1302369230" 

Expected behaviour:
Issued LE cert via DNS01 validation

Steps to reproduce the bug:
Create following cluster-issuer and ingress
ClusterIssuer object:

---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: r53-letsencrypt-prod
  namespace: cert-manager
spec:
  acme:
    email: DevOpsSupport@our-domain.com
    privateKeySecretRef:
      name: r53-letsencrypt-prod
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - selector:
        dnsZones:
          - "our-domain.com"
      dns01:
        route53:
          region: eu-central-1
          role: arn:aws:iam::XXXXXXXXXXXX:role/jetstack-cert-manager

Ingress object:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/acme-challenge-type: dns01
    cert-manager.io/cluster-issuer: r53-letsencrypt-prod
    kubernetes.io/ingress.class: nginx-ingress-internal
    nginx.ingress.kubernetes.io/proxy-body-size: 10m
  labels:
    app.kubernetes.io/component: jenkins-master
    app.kubernetes.io/instance: jenkins
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: jenkins
    helm.sh/chart: jenkins-1.3.6
  name: jenkins-internal
  namespace: jenkins
spec:
  rules:
  - host: jenkins.our-domain.com
    http:
      paths:
      - backend:
          serviceName: jenkins
          servicePort: 8080
  tls:
  - hosts:
    - jenkins.our-domain.com
    secretName: jenkins.our-domain.com

Anything else we need to know?:
IRSA working properly, tested. Same role for R53 working good from CLI

IAM policy(veeery broad):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "route53:ListResourceRecordSets",
                "route53:ChangeResourceRecordSets"
            ],
            "Resource": "arn:aws:route53:::hostedzone/*"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "route53:GetChange",
            "Resource": "arn:aws:route53:::change/*"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "route53:ListHostedZones*",
            "Resource": "*"
        }
    ]
}

EDIT
Thrust relationship:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::XXXXXXXXXXXX:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.eu-central-1.amazonaws.com/id/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:sub": "system:serviceaccount:cert-manager:cert-manager"
        }
      }
    }
  ]
}

Plus trust relationships working as needed (tested with pod)

SA object:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    arn:aws:iam::XXXXXXXXXXXX:role/jetstack-cert-manager
  creationTimestamp: "2020-07-08T12:12:55Z"
  labels:
    app: cert-manager
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: cert-manager
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: cert-manager
    helm.sh/chart: cert-manager-v0.15.2
  name: cert-manager
  namespace: cert-manager
  resourceVersion: "46367502"
  selfLink: /api/v1/namespaces/cert-manager/serviceaccounts/cert-manager
  uid: 45d3a11b-feaa-4b7a-a91e-146454ad770a
secrets:
- name: cert-manager-token-nfqpv

Environment details::

  • Kubernetes version: v1.15.11
  • Cloud-provider/provisioner: AWS EKS
  • cert-manager images: deployed from quay.io official ones
  • cert-manager version: v0.15.2 (did not worked on 12.x, 13.x, 14.x aswell)
  • Install method: helm
  • LBL: ELB Internal
    /kind bug
@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 8, 2020
@meyskens
Copy link
Contributor

meyskens commented Jul 8, 2020

Seems this part is missing: https://cert-manager.io/docs/configuration/acme/dns01/route53/#iam-role-trust-policy
Can you confirm that is set up?

@JanisOrlovs
Copy link
Author

Hello,
as I mentioned, thrust is there and working. Tested with test container where role/service account is working. Added in first comment

@alkiko
Copy link

alkiko commented Jul 10, 2020

Hi,

I am seeing something similar.

Error when describing challenge resources

Warning  PresentError  9s (x15 over 17s)  cert-manager  (combined from similar events): Error presenting challenge: error instantiating route53 challenge solver: unable to assume role: AccessDenied: User: arn:aws:sts::XXXXXXXXXXX:assumed-role/cert-manager/XXXXXXXXXXXXXXXXXXX is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXX:role/cert-manager

ClusterIssuer

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: my@email.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-pk
    # Use DNS as challenge solver
    solvers:
      - selector:
          dnsZones:
            - "sub.domain.com"
            - "sub.sub.domain.com"
            - "sub.sub.domain.com"
        dns01:
          route53:
            region: eu-north-1
            role: arn:aws:iam::XXXXXXXXXXX:role/cert-manager

Trust Relationship

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::XXXXXXXXXXXX:oidc-provider/oidc.eks.eu-north-1.amazonaws.com/id/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.eu-north-1.amazonaws.com/id/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:sub": "system:serviceaccount:cert-manager:cert-manager"
        }
      }
    }
  ]
}

Service Account

I've annotated the cert-manager service account as described here.

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/cert-manager
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{"eks.amazonaws.com/role-arn":"arn:aws:iam::XXXXXXXXXXXX:role/cert-manager"},"labels":{"app":"cert-manager","app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"cert-manager","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"cert-manager","helm.sh/chart":"cert-manager-v0.14.1"},"name":"cert-manager","namespace":"cert-manager"}}
  creationTimestamp: "2020-07-10T15:23:17Z"
  labels:
    app: cert-manager
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: cert-manager
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: cert-manager
    helm.sh/chart: cert-manager-v0.14.1
  name: cert-manager
  namespace: cert-manager
  resourceVersion: "582016"
  selfLink: /api/v1/namespaces/cert-manager/serviceaccounts/cert-manager
  uid: c66be7a3-2267-4ca2-8b41-31e78bba856e
secrets:
- name: cert-manager-token-wk9tw

Security Context

I also tried updating the security context as described here.

spec:
    securityContext:
    enabled: true
    fsGroup: 1001
    containers: ...

Environment

  • Kubernetes version: v1.16
  • Istio version: 1.6.2
  • Cloud-provider/provisioner: AWS EKS
  • cert-manager version: v0.14.1
  • Install method: kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.yaml followed by annotating cert-manager service account.

@johndietz
Copy link

We are also having the same issue as described above by @JanisOrlovs and @akoudal. Wanted to add the context that my setup is

  • Kubernetes version: v1.14.9
  • Cloud-provider/provisioner: AWS EKS
  • cert-manager version: v0.15.2

Install method:

helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v0.15.2 -f cert-manager/values.yaml

values.yaml:

serviceAccount:
  annotations: 
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/cert-manager-role
securityContext:
  enabled: true
  fsGroup: 1001

i tried removing the condition on the trust relationship too but that didn't help get the role assumed. i too am getting the error described above

E0713 18:22:46.713885       1 controller.go:143] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="error instantiating route53 challenge solver: unable to assume role: AccessDenied: User: arn:aws:sts::XXXXXXXXXXXX:role:assumed-role/cert-manager-role/1234567890123456789 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXXX:role:role/cert-manager-role
\tstatus code: 403 ..."

@alkiko
Copy link

alkiko commented Jul 13, 2020

I got it working. I've described my installation procedure below. Hope can help solve your issues too @johndietz and @JanisOrlovs .

CertManager

This section describes how to install and setup cert-manager on an AWS EKS cluster. It will integrate with Route53 DNS for issuing DNS-based certificate challenges.

IAM Open ID Connect provider for the cluster

This is required for associating k8s Service Accounts with AWS IAM roles and policies. This is considered more secure than allowing the worker nodes to access AWS services (e.g. S3 Buckets, Route53 DNS, etc.).

eksctl utils associate-iam-oidc-provider --cluster name-of-the-cluster

More info about the command above can be found here.

AWS IAM Policy

Create an AWS IAM policy named AllowExternalDNSUpdates and attach the following permissions to it:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "route53:ChangeResourceRecordSets",
            "Resource": "arn:aws:route53:::hostedzone/*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "route53:GetChange",
                "route53:ListHostedZones",
                "route53:ListResourceRecordSets",
                "route53:ListHostedZonesByName"
            ],
            "Resource": "*"
        }
    ]
}

Less can probably do but the above works well if you want to use the same policy with e.g. external-dns.

Check the cert-manager docs if you wish to limit the policy to specific Route53 resources.

Create the AWS IAM Role

Create an IAM Service Account thats allowed to perform DNS updates (Route53). This is required in order to use DNS-based certificate challenges.

aws iam create-role --role-name cert-manager 

Attach the AllowExternalDNSUpdates policy to the role

This assumes that a policy named AllowExternalDNSUpdates already exists at arn:aws:iam::XXXXXXXXXXXX:policy/AllowExternalDNSUpdates (created in an earlier step)

aws iam attach-role-policy --policy-arn arn:aws:iam::XXXXXXXXXXXX:policy/AllowExternalDNSUpdates --role-name cert-manager

Trust relationship

Assign a trust relationship to the newly created IAM role in AWS web console, allowing you to map kubernetes service accounts to AWS IAM roles. It should look something like this, with the values in <value> replaced accordingly:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<aws-account-id>:oidc-provider/oidc.eks.eu-north-1.amazonaws.com/id/<eks-hash>"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.eu-north-1.amazonaws.com/id/<eks-hash>:sub": "system:serviceaccount:cert-manager:cert-manager"
        }
      }
    }
  ]
}

You can find the value at Principal.Federated at https://console.aws.amazon.com/iam/home?region=eu-north-1#/providers (replace eu-north-1 with your region). You can extract the <aws-account-id> and <eks-hash> from that string too.

Cert manager custom values

Create a file called cert-manager.values.yaml
This will cause the cert-manager ServiceAccount to be annotated with the newly created cert-manager AWS IAM role.

securityContext:
  enabled: "true"
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<aws-account-id>:role/cert-manager

Note the securityContext part which is a fix if you experience the issue described in issue #2147.

Install CertManager from Helm template

Start by checking that you have the correct <aws-account-id> in cert-manager.values.yaml, then run the command below:
Add the jetstack/cert-manager helm repo first if it's not present.

helm template cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v0.15.2 \
  --set installCRDs=true \
  -f cert-manager.values.yaml | kubectl apply -f -

ClusterIssuer

Create a ClusterIssuer which can be used to issue Lets Encrypt certificates in all namespaces.

Save the following yaml into a file called letsencrypt.clusterissuer.yaml

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: your@email.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-pk
    # Use DNS as challenge solver
    solvers:
      - dns01:
          route53:
            region: eu-north-1

Apply the manifest to your cluster:

kubectl apply -f letsencrypt.clusterissuer.yaml

@johndietz
Copy link

@akoudal i wanted to thank you for the time you took to document your walkthough of these items and that you got past the error.

i regret to say that i've followed these steps pretty much to the letter, and i continue to get the same issue. differences between our setups are that i'm still using the staging server for acme, though i trust that's not in play. i'm also in us-west-2 and i'm not installing crd with the helm chart, but rather deploying those separately. i should also call out that being on eks 1.14, i'm stuck on the legacy CRDs, perhaps something might be in play there.

@alkiko
Copy link

alkiko commented Jul 14, 2020

What does your ClusterIssuer look like @johndietz? I was seeing some AssumeRole errors until I removed the "role" entry from my dns01 solver.

@johndietz
Copy link

@akoudal the script i'd hacked together to iterate apparently missed on the directory that houses my clusterissuers. once i applied them with your changes the sts issue went away. i really can't thank you enough man. i'll open a PR against their route53 docs to pay your assistance forward.

@alkiko
Copy link

alkiko commented Jul 14, 2020

Awesome. Glad you got it working @johndietz.

johndietz pushed a commit to johndietz/website that referenced this issue Jul 14, 2020
the only route53 dn01 clusterissuer example prior to this change specifies a role in each solver example, however in a single-account eks setup where the role is instead specified on the serviceaccount, including the same role on the clusterissuer causes sts errors and prevents the dns management.
i welcome any feedback on the comments i've provided in this eks clusterissuer example.
see discussion in cert-manager/cert-manager#3079 for more context on this issue.
johndietz pushed a commit to johndietz/website that referenced this issue Jul 14, 2020
the only route53 dn01 clusterissuer example prior to this change specifies a role in each solver example, however in a single-account eks setup where the role is instead specified on the serviceaccount, including the same role on the clusterissuer causes sts errors and prevents the dns management.
i welcome any feedback on the comments i've provided in this eks clusterissuer example.
see discussion in cert-manager/cert-manager#3079 for more context on this issue.

Signed-off-by: John Dietz <jdietz@jdietz.fios-router.home>
@anarsen
Copy link

anarsen commented Jul 27, 2020

What does your ClusterIssuer look like @johndietz? I was seeing some AssumeRole errors until I removed the "role" entry from my dns01 solver.

Essentially, Issuer specific IAM roles can be used to allow Cert Manager to perform DNS validation in other AWS accounts. However, the trust relationship still needs to be setup whether it's within the same account or cross-account.

When the role key is left out in the ClusterIssuer or Issuer, then the IAM identity of Cert Manager itself is used which, as you know, is retrieved through IRSA.

@alkiko
Copy link

alkiko commented Jul 27, 2020

Thanks for clarifying, @anarsen. Took a while to figure out how to force it to use the eks.amazonaws.com/role-arn: arn:aws:iam:::role/cert-manager from the annotation in stead of an assumed role.

@hzhou97
Copy link
Contributor

hzhou97 commented Aug 4, 2020

It looks like this issue has been resolved. I'll close it for now, if anybody feels otherwise, feel free to reopen it.

/close

@jetstack-bot
Copy link
Contributor

@hzhou97: Closing this issue.

In response to this:

It looks like this issue has been resolved. I'll close it for now, if anybody feels otherwise, feel free to reopen it.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metral
Copy link

metral commented Oct 10, 2020

Hi folks,

On AWS EKS: v1.17.9
Using Helm Chart v1.0.3 with DNS01 Challenge.

I'm hitting the error: "error"="error instantiating route53 challenge solver: unable to construct route53 provider: empty credentials; perhaps you meant to enable ambient credentials?" with an Issuer set up like @alkiko suggests in #3079 (comment)

The IAM role is not being accessed/used in AWS, and the IRSA token is mounted on the pods along with the AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE, so this signals to me that the token file is not being accessed by the cert-manager due to possible permission issues.

I've tried fsGroup: 1001 and runAsUser: 1001 along with other variants but no luck.

Any input would be greatly appreciated!

@metral
Copy link

metral commented Oct 12, 2020

Update: tried v0.15.2 with a ClusterIssuer and it is working with IRSA, whereas before I was using an Issuer with v1.0.3 and it was not. fsGroup: 1001, and runAsUser: 1001 is also set in the securityContext of the Helm chart.

It seems that something may have broken in between releases or something w.r.t IRSA was not accurately captured in the docs.


Follow up: v1.0.3 of the helm chart is working with a ClusterIssuer and IRSA, but the same setup with a Issuer does not work

@dan-vaughan
Copy link

I've got the same issue as @metral with v1.0.3 of the chart: it works fine with a ClusterIssuer, but I get an error when using an Issuer.

@heschlie
Copy link

heschlie commented Apr 6, 2021

I was running into the same issue, I added this to my values.yaml

extraArgs:
  - --issuer-ambient-credentials

It appears that by default the namespace Issuers do not allow using "ambient" credentials, this let's them.

@ZeChArtiahSaher
Copy link

This is lowkey messy - devs update docs pls

rossigee added a commit to rossigee/website-1 that referenced this issue Sep 13, 2021
This should help reduce the amount of time people might waste trying to figure out how to resolve the following error:

```
error instantiating route53 challenge solver: unable to construct route53 provider: empty credentials; perhaps you meant to enable ambient credentials?
```

A couple of related bug reports:

* cert-manager/cert-manager#3009
* cert-manager/cert-manager#3079
@wallrj
Copy link
Member

wallrj commented Jan 21, 2022

rossigee added a commit to rossigee/website-1 that referenced this issue Apr 13, 2022
This should help reduce the amount of time people might waste trying to figure out how to resolve the following error:

```
error instantiating route53 challenge solver: unable to construct route53 provider: empty credentials; perhaps you meant to enable ambient credentials?
```

A couple of related bug reports:

* cert-manager/cert-manager#3009
* cert-manager/cert-manager#3079
@tapanhalani
Copy link

The documentation update still doesn't reflect the update. Can someone please check and do the needful ? Thank you :)

@tobiasehlert
Copy link

@JanisOrlovs,

How did you resolve the issue with that you received your error from the start?
I do get the same and I can't get my head around.. :)

// Tobias

@confiq
Copy link

confiq commented Jan 3, 2024

These lines of code + this issue made me realise I need the issuer-ambient-credentials.

Thanks @heschlie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests