Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 backend prioritizes node role over service account #3275

Closed
innovate-invent opened this issue Oct 16, 2020 · 30 comments
Closed

S3 backend prioritizes node role over service account #3275

innovate-invent opened this issue Oct 16, 2020 · 30 comments

Comments

@innovate-invent
Copy link

I am attempting to set up a pull through proxy for my AWS EKS cluster.

AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE are set for the container and AWS_ROLE_ARN is the service account role with S3 permissions via IRSA.

Bashing in and running wget -qO- http://169.254.169.254/latest/meta-data/iam/security-credentials/ returns the node role (expectedly).

registry fails to start with "Access denied" and S3 access logs report the node role.

The registry should prioritize the service account role over the node role.

This is likely an issue with the underlying awsgo library but I am opening an issue here to start.

Related: #2172

@ingshtrom
Copy link

#3245 should fix this

@michaeljguarino
Copy link

Was this fixed by #3245? I'm running a registry in EKS now and not seeing it take on the IRSA role.

@leandrocostam
Copy link

Was this fixed by #3245? I'm running a registry in EKS now and not seeing it take on the IRSA role.

I'm also facing the same issue.

@switchboardOp
Copy link

switchboardOp commented Aug 23, 2021

Was this fixed by #3245? I'm running a registry in EKS now and not seeing it take on the IRSA role.

As far as I can tell, no maybe.

Getting this error trying to use a registry pod in a cluster configured for IRSA using registry:2.7.1

s3aws: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors

@switchboardOp
Copy link

switchboardOp commented Aug 27, 2021

Having built off master I'm able to use IRSA. But the latest published tag for the official image, 2.7.1, throws the above error.

@SimonRupar
Copy link

Was this fixed by #3245? I'm running a registry in EKS now and not seeing it take on the IRSA role.

I'm also facing the same issue.

Also having the same issue running inside k8s. I would like move away from using static credentials and use IRSA.

@switchboardOp
Copy link

For anyone reading this after the 2.8.0 release, it still doesn't support IRSA based on my testing.

@svadnala
Copy link

svadnala commented Apr 27, 2022

initially was getting this error in 2.7.1 upgraded the image version to 2.8.1 still persists with same error

time="2022-04-26T20:08:29.773572268Z" level=error msg="response completed with error" err.code=unknown err.detail="s3aws: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors" err.message="unknown error

@switchboardOp did you infer that building the dockerfile from master/main branch and using it instead of published image worked for you?

@milosgajdos
Copy link
Member

Build of main or use [edge](https://hub.docker.com/layers/distribution/distribution/distribution/edge/images/sha256-1689ce0b5e01ed75069f2f4fe183cdcf62549e21290a1861b68a2c793f73cf6c?context=repo) tag from distribution Hub repo

@smartpierre
Copy link

I can confirm, 2.8.1 is not working with IRSA 😔

@switchboardOp
Copy link

@switchboardOp did you infer that building the dockerfile from master/main branch and using it instead of published image worked for you?

🎯 Yes. That's what is working for me. Built from the main branch about 2 months ago.

@jonnymccullagh
Copy link

@switchboardOp for the image you built could you push/tag it somewhere for others to try?
I am stuck with this same problem

@fredrikhgrelland
Copy link

fredrikhgrelland commented Aug 31, 2022

@switchboardOp for the image you built could you push/tag it somewhere for others to try? I am stuck with this same problem

This is an older build, but comes directly from the source distribution/distribution@sha256:79dfd00f73f4c258c38457dd0364fd31f80e996713286cb9ab601206de6085fb

This kustomize will work with irsa from the default helm chart:

kind: Deployment
metadata:
  name: registry
spec:
  template:
    spec:
      containers:
        - name: docker-registry
        # TODO: Update image when code is merged to a release. See https://github.com/distribution/distribution/issues/3275
          image: "distribution/distribution@sha256:79dfd00f73f4c258c38457dd0364fd31f80e996713286cb9ab601206de6085fb"
          env:
            - name: REGISTRY_STORAGE_S3_ACCESSKEY
              $patch: delete
            - name: REGISTRY_STORAGE_S3_SECRETKEY
              $patch: delete
            # Must remove region endpoint as well for the codepath using IRSA to take effect
            - name: REGISTRY_STORAGE_S3_REGIONENDPOINT
              $patch: delete```

@benjimin
Copy link

benjimin commented Oct 27, 2022

I'm also experiencing the problem that registry 2.8.1 is incompatible with authentication supplied through an EKS service account.

Note that IAM roles attached to service accounts via OIDC is the recommended best practice for Kubernetes security. It avoids issues of needing to manually rotate long-lived access keys supplied to pods (or else of granting too much access to other processes on the same node).

As a test, I started a pod using the dockerhub golang container and the service account (with attached IAM policy) intended for my registry, and ran a short script to test the same AWS SDK functions that the registry uses. I manually set the environment variables REGISTRY_STORAGE_S3_BUCKET and AWS_REGION and Kubernetes automatically injected AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE. This script was able to list, put, get and delete objects from the S3 bucket just fine.

package main

import ("github.com/aws/aws-sdk-go/aws/session"; "github.com/aws/aws-sdk-go/service/s3"; "fmt"; "os"; "strings"; "io")

func main() {

  bucket := os.Getenv("REGISTRY_STORAGE_S3_BUCKET")
  filename := "foo.txt"

  sess, err := session.NewSession(); if err != nil { fmt.Println(err); return }
  svc := s3.New(sess)

  _, err = svc.PutObject(&s3.PutObjectInput{Bucket: &bucket, Key: &filename, Body: strings.NewReader("Sample content")})
  if err != nil { fmt.Println(err) }

  list, err := svc.ListObjectsV2(&s3.ListObjectsV2Input{Bucket: &bucket})
  if err != nil { fmt.Println(err) } else { for _, item := range list.Contents { fmt.Println(*item.Key, *item.Size) } }

  get, err := svc.GetObject(&s3.GetObjectInput{Bucket: &bucket, Key: &filename})
  if err != nil { fmt.Println(err) } else { content, _ := io.ReadAll(get.Body); fmt.Println(string(content)) }

  _, err = svc.DeleteObject(&s3.DeleteObjectInput{Bucket: &bucket, Key: &filename})
  if err != nil { fmt.Println(err) }
}

I tested my script using golang 1.19.2 and aws-sdk-go v1.44.126. Registry 2.8.1 appears to be using golang 1.16.15 and aws-sdk-go 1.43.16, but my script still worked ok when I tried to use these older versions.

Note if I used env to unset AWS_WEB_IDENTITY_TOKEN_FILE then all four operations incurred the error message "NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors", which is exactly the same message the registry s3aws driver panics about (with a trace referring to this line of code from this repo). Incidentally, if instead I unset one of the other variables I get a different error message e.g. "WebIdentityErr: role ARN is not set" or "MissingRegion: could not find region configuration", or without the permissions then "AccessDenied" (and "NoSuchKey").

I'm suspicious that the problem could be the way the registry tries to impose its own configuration scheme (which did not anticipate any web tokens) rather than just letting the AWS SDK configure itself.

@benjimin
Copy link

benjimin commented Nov 12, 2022

git clone https://github.com/distribution/distribution
DOCKER_BUILDKIT=1 docker build distribution -t cesiumfrog/registry:$(git rev-parse --short HEAD)
docker push cesiumfrog/registry

I just locally built the docker image for the latest commit (3b8fbf9), pushed the image to cesiumfrog/registry and tested it (in a kubernetes deployment using a service account with an IAM policy granting S3 access) and it worked (with the registry running as a pull-through mirror, I can set up a port-forward and check e.g. docker pull 127.0.0.1:5000/ubuntu/postgres succeeds and that the expected layer blobs have appeared inside my s3 bucket).

Hopefully this means the problem has been fixed, and all we need is to wait for the maintainers to tag a 2.8.2 release.

@milosgajdos
Copy link
Member

Note that you don't need to build the registry instance yourself if you want the latest commit. We push the latest builds to Docker Hub tagged with edge tag: https://hub.docker.com/repository/docker/distribution/distribution/tags?page=1&ordering=last_updated

@gp42
Copy link

gp42 commented Jan 11, 2023

Leaving this note for other people trying to use this workaround.
Took me some time to realise that @edge build does not work (access to registry fails with messages in logs blob unknown to registry), but the build mentioned in #3275 (comment) does work, so there must be something wrong with the latest master.

@mikecook
Copy link

The changes in #3599 got merged in #3606

@davidspek
Copy link
Collaborator

@mikecook That was on the main branch which currently hasn't been released. That is why this issue is still open, since with the current releases this problem still exists.

@mikecook
Copy link

@mikecook That was on the main branch which currently hasn't been released. That is why this issue is still open, since with the current releases this problem still exists.

Ah! I see. Thanks, I missed that.

@GowriRegistry
Copy link

To add weightage,
TanzuNetwork Production has high dependency/blocker on Assume Role feature, based on organisation legal demands. There is zero tolerance observed currently and its important for the team to maintain the production platform to be compliant
#3923 has been closed as duplicate to track here.

@jackivanov
Copy link

What's ETA to make it a stable release?

@davidspek
Copy link
Collaborator

There (sadly) is not ETA for the v3 release. There is a milestone for it. However, some of the issues/PRs in that milestone are very old and could possibly be pushed or dropped. The main word that’s blocking the v3 release is regarding the OCI compliance for the upcoming OCI release. See the following PRs for reference:

@thiagoabdo
Copy link

Could it be ported to v2?

@davidspek
Copy link
Collaborator

That's what my PR #3921 does. But as it stands it will not get merged as all effort is being put into release v3 and the current release branch is only accepting security fixes.

@GBwandersoncamargo
Copy link

GBwandersoncamargo commented Sep 11, 2023

I am receiving this error too, but I discovered it happens because I deployed it in Kubernetes with Istio sidecars.

Istio redirect all network traffic to its sidecar.

Sometimes, on startup, Registry starts before Istio and in that time, there isn't network, so Registry triggers a panic error informing the credentials are unavailable.

I think it happens because Registry tries to retrieve credentials from IAM via HTTP. As it receives a network failure, no credentials are available, triggering the panic error.

Disabling sidecar resolves the problem.

@Jamstah
Copy link
Collaborator

Jamstah commented Sep 11, 2023

I think native sidecar containers are the right solution for network startup issues: https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

@milosgajdos
Copy link
Member

milosgajdos commented Dec 20, 2023

With the new release out https://github.com/distribution/distribution/releases/tag/v3.0.0-alpha.1 I'm inclined to close this issue. Do you mind testing the new release and closing this @innovate-invent thanks

@innovate-invent
Copy link
Author

This is great to see but I have moved on from the position that required this and won't be able to test it.

@milosgajdos
Copy link
Member

Closing. If the issues persists, we will reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests