Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error retrieving OAuth token from repositories that do not support it #51

Closed
s-irvine opened this issue Mar 15, 2019 · 68 comments
Closed

Comments

@s-irvine
Copy link

What commit ID of Portieris did you experience the problem with?

939c65b4e121c778d97f5dd1384484f5fcb078f1

What went wrong?

I am unable to deploy images from Docker Hub or a self-hosted Artifactory. When deploying an image from either registry Portieris queries https://<registry>/oauth/token. As far as I am aware this endpoint only exists for IBM Container Registry. So when Docker Hub is queried a 404 and large amount of HTML (probably the Docker 404 page) is returned. For Artifactory we receive an x509 error from our self-signed cert. This causes both deployments to fail despite being signed images with correct registry credentials.

These errors happen when querying the registry, before it gets as far as communicating with our self-hosted Notary.

We can deploy images hosted in IBM Container Registry just fine. Even when we host the signing information in our self-hosted Notary.

What should have happened differently?

Portieris should not query https://<registry>/oauth/token for non-IBM registries.
Portieris should be able to handle registries using self-signed TLS certificates.

How can it be reproduced?

Install Portieris and attempt to any Docker Hub pod with a valid pull secret.

Any other relevant information

We are running Portieris on an EKS cluster so we have deployed with the IBMContainerService option set to false.

@molepigeon
Copy link
Member

Hey there, thanks for the report.

We do have https://<registry server>/oauth/token hard-coded in our oauth code. That looks like it was an assumption made before we added support for other registries. The proper fix for that would be to make a request to /v2/ with no auth, and to look at the challenge we get back in the WWW-Authenticate header. We should also go to the Notary server to do that, so we should be passing the Notary server's hostname in.

Regarding your self-signed certificate issue, Portieris uses a system certificate pool that's defined at build time, so if it has the CA for your Artifactory in its /etc/ssl/certs/ then it'd trust it. You could mount the CA into /etc/ssl/certs/ to achieve the same thing. Or would you like to see some ability to override the certificate pool for certain servers?

@s-irvine
Copy link
Author

With the self-signed certificates I am mounting a file with certs for both Notary and Artifactory to /etc/certs/ca.pem at runtime and the Notary cert is being picked up fine. It looks like a client is created for Notary trust operations which loads in the arbitrary CA but a different one is created for registry operations which doesn't.

I don't think an override feature for specific servers is quite necessary for this one since adding an arbitrary CA for Notary operations at runtime is already a feature. An extension of that for registry operations is probably all that's needed.

@molepigeon
Copy link
Member

Interesting. The OAuth helper does use a custom http.Client, but it doesn't set the Transport.TLSClientConfig that would override TLS behaviour. I checked the docs for crypto/tls (here) to confirm, and they say that if Config.RootCAs is nil it uses the host's root CA set - which would presumably be the contents of /etc/certs.

We'll need to dig into that a bit more to find out what's going on there.

Incidentally, Portieris shouldn't need to interact with the Registry at all - it should only be talking to Notary (and the oauth server), hence my comment above about needing to change the hostname that the OAuth helper thinks it's going to before we can use that to find out where to go for auth. Perhaps fixing the fact that we're going to the wrong place for the OAuth token would resolve both symptoms here...

cc @bainsy88

@jerrinss5
Copy link
Contributor

jerrinss5 commented Apr 26, 2019

@molepigeon @bainsy88 I can fix it using your suggestion of hitting the v2 endpoint and code would look something like this

        req, err := http.NewRequest("GET", hostname+"/v2/", nil)

	resp, err := client.Do(req)

	if err != nil {
		glog.Errorf("Failed to query v2 endpoint for hostname: %s", hostname)
		return nil, err
	}

	// this is to fetch the value of Www-Authenticate header value
	wwwAuthenticateHeader := resp.Header.Get("Www-Authenticate")

	if wwwAuthenticateHeader == "" {
		errorMessage := "Www-Authenticate not found in the response header"
		glog.Errorf(errorMessage)
		return nil, fmt.Errorf(errorMessage)
	}

	// splitting on " to get the 1st index containing realm value for oauth server
       // eg: Bearer realm="https://auth.docker.io/token",service="notary.docker.io"
	authServer := strings.Split(wwwAuthenticateHeader, "\"")

	if len(authServer) < 2 {
		errorMessage := "Some issue occurred fetching the oauth server url from the response header"
		glog.Errorf(errorMessage)
		return nil, fmt.Errorf(errorMessage)
	}

	resp, err := client.PostForm(authServer[1], url.Values{
		"service":    {authServer[3]},
		"grant_type": {"password"},
		"client_id":  {"testclient"},
		"username":   {username},
		"password":   {token},
		"scope":      {"repository:" + repo + ":" + actions},
	})

If the above changes look good I will create test cases, run them and satisfy all the requirements for a pull request?

@jerrinss5
Copy link
Contributor

jerrinss5 commented Apr 27, 2019

I did some local testing with the new code changes and here are some screenshots for the same. Yet to create test cases though.
Happy Path - Signed image being pulled
Portieris Logs:
Portieris_logs_happypath
Deployment Logs:
deployment_logs_happypath

Sad Path - Unsigned image being denied
Portieris Logs:
Portieris_logs
Deployment Logs:
Deployment_logs

Code Changes

@molepigeon
Copy link
Member

@jerrinss5 Thanks for the code! That's pretty much on the lines that I was thinking. I misspoke before though, there's a couple of things that we should be careful about to be properly compliant with the spec from RFC 2617.

In essence, rather than making the initial call to /v2/, we should make an unauthenticated request to the resource that we actually want to access. When we do this, the server gets to tell us whether we need to authenticate for that request or not, and if we do it'll give us a 401 and tell us the service and scope fields that we need in the Www-Authenticate header so we don't need to guess them.

There are some cases where Notary wants auth for some requests and not others. Notary lets you configure a list of repositories that you can store trust data for. We use this in the IBM registry to prevent someone from using us as the source of trust for Docker Hub images, for example. If the repo doesn't match the list, it will 404 even without auth, at which point we know that the server doesn't have it - and going and getting an auth token won't change that so we needn't do it. We should only go for auth if we get a 401 when we ask for the resource without auth.

The RFC isn't entirely clear on whether we can assume that realm will always be the first thing in the Www-Authenticate header. Section 3.2.1 of the RFC just says that it should have exactly one realm. I think that it's generally accepted that realm would go first, but the defensive programmer in me says that we should be prepared for those directives being in any order. The code in docker/distribution to do the same thing converts the whole header into a map[string]string, which would allow us to more reliably extract the correct values.

Please do open a pull request once it's ready for that!

@jerrinss5
Copy link
Contributor

@molepigeon thanks for the feedback. Agree with defensive coding of using maps to extract the www-authenticate header, will make the necessary changes referring from docker/distribution.
As far as flow is concerned I am still a little confused when you say

In essence, rather than making the initial call to /v2/, we should make an unauthenticated request to the resource that we actually want to access
Say the registry in question is abc.azurecr.io

  1. Make a GET call to https://abc.azurecr.io
  2. If it returns a 401, call /v2 endpoint to fetch www-authenticate header
  3. If it returns a 404, no need to fetch the token?

If the above flow is expected, for azure registry when I try to GET call to https://abc.azurecr.io, I get back a 404, but there is auth required to access this registry.
How should we handle this scenario? Or what should the flow look like?

@molepigeon
Copy link
Member

molepigeon commented Apr 28, 2019

we should make an unauthenticated request to the resource that we actually want to access

So, if I wanted to access, for example, https://abc.azurecr.io/v2/foo/bar, the flow would be:

  1. I make a https://abc.azurecr.io/v2/foo/bar with no Authorization header
  2. If the response comes back as 2xx, that's our content and we're done - no auth necessary. If the response comes back as a 401 with a Www-Authenticate header, we continue. If the response is any other 4xx or 5xx, we fail out.
  3. We go and get the auth token using our credentials and the scope etc in the Www-Authenticate header.
  4. We go back to https://abc.azurecr.io/v2/foo/bar using the auth token in the Authorization header and get our content.

@jerrinss5
Copy link
Contributor

jerrinss5 commented Apr 28, 2019

Makes sense, this is lot more clearer I will implement the above mentioned changes and send a pull request. Thanks @molepigeon

@jerrinss5
Copy link
Contributor

jerrinss5 commented May 4, 2019

@molepigeon I was trying out your suggestion of hitting https://abc.azurecr.io/v2/foo/bar - but it leads to a 404.
Same for say https://quay.io/v2/coreos/hyperkube
So I was reading the spec of v2 and was wondering if we should hit
https://abc.azurecr.io/v2/foo/bar/manifests/1 which is same as https://abc.azurecr.io/v2/<image-name>/manifests/<tag>
On testing it,
For azurecr one (https://abc.azurecr.io/v2/foo/bar/manifests/1) it returned a 401
and for quay (https://quay.io/v2/coreos/hyperkube/manifests/v1.7.6_coreos.0) it returned a 200
Does that sound like the right implementation to you?

@molepigeon
Copy link
Member

@jerrinss5 I think you're going to the registry to get responses on those APIs. We should use the Notary API to be sure that we get the correct data back.

Notary's API is a little different to Registry's, but in short a sensible URL to call would be:
GET https://<notary-domain>:<port>/v2/<image_repository>/_trust/tuf/root.json

For example:
GET https://us.icr.io:4443/v2/us.icr.io/molepigeon/testimage/_trust/tuf/root.json

$ curl -vvv https://us.icr.io:4443/v2/us.icr.io/molepigeon/testimage/_trust/tuf/root.json 2>&1 | grep Www-Authenticate
< Www-Authenticate: Bearer realm="https://registry.ng.bluemix.net/oauth/token",service="notary",scope="repository:us.icr.io/molepigeon/testimage:pull"

Don't forget that the API could return something other than a 401 - for example, if you make an unauthenticated request to IBM's notary asking for a Docker Hub image, you'll get a 404 even without auth:

$ curl -vvv https://us.icr.io:4443/v2/docker.io/ubuntu/_trust/tuf/root.json 2>&1 | grep HTTP
> GET /v2/docker.io/ubuntu/_trust/tuf/root.json HTTP/1.1
< HTTP/1.1 404 Not Found

@jerrinss5
Copy link
Contributor

ok so I did implement the way you mentioned above to hit the tuf/root.json endpoint for respective endpoints and it is working fine for authenticated calls. changes
But how should we handle for docker.io for say nginx image where https://notary.docker.io/v2/docker.io/nginx/_trust/tuf/root.json returns a 401.
But when I do a pull from docker client it is able to fetch the signature information without the need to login.
Screen Shot 2019-05-07 at 3 52 42 PM

I tried stepping through the docker client code but was getting confused on how they are implementing.
Any help on how to address that, or I am doing something wrong with the implementation?

@molepigeon
Copy link
Member

@jerrinss5 I get 401 from that call, too. I don't have access to my laptop right now to confirm this, but I'd guess that you could go to Docker's oauth server (by following the realm) and ask for those scopes without supplying a username and password and you'd get a token back that would work for that request.

@molepigeon
Copy link
Member

molepigeon commented May 8, 2019

@jerrinss5 Just had a play with Docker Hub's auth server. I was struggling to get a valid token until I remembered that when you do docker pull nginx, it actually pulls docker.io/library/nginx. That should be the image name that comes into the function, so you don't need to worry about doing that translation in code, just don't forget that it's a thing when you're playing with the APIs by hand!

Also, Docker Hub seems to want you to use the GET (Registry V2 token) method for authenticating: https://docs.docker.com/registry/spec/auth/jwt/

... Rather than the POST (Oauth2) method: https://docs.docker.com/registry/spec/auth/oauth/. When I try to POST to auth.docker.io/token, I get a 405 Method Not Allowed error.

This worked for me using the GET endpoint, without supplying credentials:

$ curl -vvv https://notary.docker.io/v2/docker.io/library/nginx/_trust/tuf/root.json 2>&1 | grep Www-Authenticate
< Www-Authenticate: Bearer realm="https://auth.docker.io/token",service="notary.docker.io",scope="repository:docker.io/library/nginx:pull"
$ TOKEN=$(curl "https://auth.docker.io/token?service=notary.docker.io&scope=repository:docker.io/library/nginx:pull" 2>/dev/null | jq -r .token)
$ echo $TOKEN | cut -c 1-10 # expect it to start `ey` if it's a valid JWT
eyJhbGciOi
$ curl -H "Authorization: Bearer $TOKEN" https://notary.docker.io/v2/docker.io/library/nginx/_trust/tuf/root.json
{"signed":{...},...}

@jerrinss5
Copy link
Contributor

jerrinss5 commented May 8, 2019

so I did make changes to the code to fetch token from auth.docker.io with the required scopes and it is working fine with curl as you showed it above.
In code when i try to use the token i get the following error
trust.go:61] GetAllTargetMetadataByName returned err: you are not authorized to perform this operation: server returned 401

I will try to debug further to understand if the token I am fetching is correct in code.
EDIT: Just verified the token fetched in the code works out fine to fetch the root.json when tried with a curl.

@jerrinss5
Copy link
Contributor

@molepigeon I have tried this but running into trust.go:61] GetAllTargetMetadataByName returned err: you are not authorized to perform this operation: server returned 401
the token works fine when used with curl.
Do you know what I am doing wrong out here?

@molepigeon
Copy link
Member

It could be a bunch of things. I'd suggest putting some extra logging in so that you can tell which call returned 401 - whether it's the call to the oauth server or the call to the notary server.

If you get a token back from oauth, and then notary returns 401, your token might not have all the scopes you asked for. If you log the token, you can paste it into jwt.io to decode it and see what scopes you've been authorized for. If your token doesn't contain the scopes you asked for, the oauth server decided you're not allowed those for some reason, so it's worth double checking the oauth request.

@jerrinss5
Copy link
Contributor

@molepigeon the GetAllTargetMetadataByName call refers to the vendor tuf/notary call
It is working fine for azure notary server only the notary.docker.io flow is giving me the above mentioned error.
I did double check the scopes using jwt.io and it had all required scopes also using the same token with curl I was able to fetch root and target metadata.
Tried adding debugging statements to those vendor files but since it gets replaced inside a container I am not able to see those debug statements.
Any suggestions on how to get those debug statements down to the vendor files?

@molepigeon
Copy link
Member

You can remove the make build-deps line from the dockerfile to prevent it from overwriting your vendoring while you debug.

So - you used your code to get a token, then that token works for a curl but not for the request that the code makes? If that's right, there must be something different about the request that the code makes compared to your curl.

I notice your code put a special case in for library... Looks like that would make the challenge contain library, which is correct for docker hub public images. But does the code then go and ask Notary for the image with library in it? If not, that might be why you're getting a 401.

@jerrinss5
Copy link
Contributor

So - you used your code to get a token, then that token works for a curl but not for the request that the code makes?

Yes, when it is an image from docker.

But does the code then go and ask Notary for the image with library in it?

Do you mean this part - Calling: https://auth.docker.io/token?service=notary.docker.io&scope=repository:docker.io/library/nginx:pull
Code: glog.Info("Calling: " + oauthEndpoint + "?service=" + service + "&scope=" + scope) resp, err = client.Get(oauthEndpoint + "?service=" + service + "&scope=" + scope)
Code

@molepigeon
Copy link
Member

I mean once we have a token, we use it by adding it to the Authorization header in a call to the notary server, right? I think that bit is handled by the notary client. Does that call include the library part?

@jerrinss5
Copy link
Contributor

jerrinss5 commented May 12, 2019

Does that call include the library part?

That's a good point, I think i missed that part.
I will add that and also add debug statements to the notary client to see what's going wrong!

@jerrinss5
Copy link
Contributor

@molepigeon it started working when I added the library for notary client.
Now i have to figure how to handle the official images vs non official docker hub images.
Snippet from notary website.

For official images (identifiable by the “Official Image” moniker), the image name as displayed on Docker Hub, prefixed with docker.io/library/. For example, if you would normally type docker pull ubuntu you must enter notary {cmd} docker.io/library/ubuntu.
For all other images, the image name as displayed on Docker Hub, prefixed by docker.io

@jerrinss5
Copy link
Contributor

@molepigeon - So I got it working as such with these changes.

Handled official part as follows:

official := !strings.ContainsRune(img.NameWithoutTag(), '/')

Do let me know if these look good, I have yet to create test cases though.

@jerrinss5
Copy link
Contributor

@molepigeon can we have portieris with image pull secrets assigned to it, so in case of pods without any image pull secrets we can try image pull secrets of the portieris app as the default one. This is so that we can handle scenarios where images are pulled using the docker config within the host of the worker nodes?

@jerrinss5
Copy link
Contributor

so go it working.. still working on helm charts and test cases
But the changes are as follows: master...jerrinss5:master

@mistermania
Copy link

@molepigeon @jerrinss5 - I'm running Portieris in an EKS Cluster with a Harbor Registry and got everything working until the query to oauth/token which obviously is not a valid endpoint.
So, do you plan on merging the PR associated to this issue or is there something left to do ?

@jerrinss5
Copy link
Contributor

i have sent a PR @mistermania

@sureshthomas
Copy link

Thanks @josephdrichard That seems to be the issue - where do you need to include ?. Can you explain a bit more - Sorry being stupid

@josephdrichard
Copy link
Contributor

I added a line in https://github.com/IBM/portieris/blob/master/helm/portieris/templates/secret.yaml after defining serverCert.pem and serverKey.pem, to also include my ca.pem.

@sureshthomas
Copy link

Error from server (InternalError): error when creating "****.yaml": Internal error occurred: failed
I have added the conditional statements in the secret.yaml . Same error

calling webhook "trust.hooks.securityenforcement.admission.cloud.ibm.com": Post https://portieris.authentic.svc:443/admit?timeout=30s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "portieris_ca")

@josephdrichard
Copy link
Contributor

Can you check your logs again? Do you still see this?
CA not provided at /etc/certs/ca.pem, will use default system pool

Do you have this commit in your code?
d94edf3

@sureshthomas
Copy link

Incidentally, Portieris shouldn't need to interact with the Registry at all - it should only be talking to Notary (and the oauth server), hence my comment above about needing to change the hostname that the OAuth helper thinks it's going to before we can use that to find out where to go for auth. Perhaps fixing the fact that we're going to the wrong place for the OAuth token would resolve both symptoms here...

Why the above cant be done ? Shouldnt Portieris shoud only be worried about notary as the Imagepull secret is used by kube api to pull image ?

@molepigeon
Copy link
Member

It can - and as I suggested in your quote I suspect that that the PR that exists already will resolve this since the Notary server should tell Portieris where the oauth server is.

The imagepullsecret is used to authenticate with Notary to access the signature, so Portieris will still need to use it.

@st185229
Copy link

Thanks @molepigeon , what's preventing PR to be approved ? Is it just waiting for the test mentioned by @josephdrichard ? I am about to finalize a solution and thought this is the best rather than implementing on admission web hook (either mutating of validating) . The next bet could be Open Policy Agent.

@molepigeon
Copy link
Member

My attempt to rebase #82 went really poorly and I'll need to start over when I have time. The blocker is resolving the merge conflicts and getting it to pass CI.

@jerrinss5
Copy link
Contributor

@molepigeon - should i send a PR for this minus few test cases i am struggling with? https://github.com/jerrinss5/portieris/commits/master

@molepigeon
Copy link
Member

Sure. Please make sure that your branch is up to date with master though, #82 cherry picked your commits but it's way behind master and has merge conflicts.

Regarding those tests, I'm not sure if the extra commits that the author of #82 added to that PR would help to resolve your test issues. Maybe worth a look.

@jerrinss5
Copy link
Contributor

Submitted this #139. Will check #82 for test cases. Thanks!

@josephdrichard
Copy link
Contributor

Rebased and tested this. Requires a change in order to handle registries on non-default port.
https://github.com/josephdrichard/portieris/commit/ae9e9057e8965a314adfa90a7d24bf6c7ca02c0c
Aside from that, required a few small changes as part of the rebase.
Are you still working on this? I am hoping to get a fix in as soon as possible so I can use this.
What timeline are you hoping to get this merged in? Is there anything I can do to help get this merged in faster?

@jerrinss5
Copy link
Contributor

@josephdrichard - can you help me with the test cases? I haven't looked at it yet!

@josephdrichard
Copy link
Contributor

Okay. I am looking at those 5 test cases now.

@josephdrichard
Copy link
Contributor

I got them all to pass here:
https://github.com/josephdrichard/portieris/tree/issue_51_handle_port

Most were just failing because it was detecting the notary was invalid sooner. Also one failure with us.icr.io/hello being treated as official, and then being changed to docker.io/library/hello.

Can you look and see if there is anything else before your fix can be merged?

@jerrinss5
Copy link
Contributor

thank you so much @josephdrichard - i will have a look at it tomorrow evening and merge your changes. Will keep you posted.

@josephdrichard
Copy link
Contributor

Okay thanks. Is there anything else blocking this fix from going into master?

@jerrinss5
Copy link
Contributor

@josephdrichard - i have added your changes to my code. (jerrinss5@9d081ee) and the merge request updated out here (#139)

Previously it was mentioned that we should use fake registry for test cases, but lets see on review if the authors have additional comments on the test cases.

Let me know if something is off with my commits. Thanks.

@josephdrichard
Copy link
Contributor

Looks good to me. In my opinion there isn't any need to block this on using a fake registry for the test cases, so I hope that it can merge soon.

starlingx-github pushed a commit to starlingx/portieris-armada-app that referenced this issue Jul 23, 2020
Specifically, these two issues are blocking using portieris app with
starlingx:
 * IBM/portieris#51
 * IBM/portieris#59

Once these fixes are included in a published release of portieris,
drop this patch and rebase to the latest release.

Story: 2007348
Task: 40434
Change-Id: I14d0e5664333c5080440b9fd156c66a317444563
Signed-off-by: Joseph Richard <joseph.richard@windriver.com>
@sr258
Copy link

sr258 commented Jan 19, 2021

Any news on this in 2021? Compatibility of Portieris with Docker Hub and Harbor would be great!

@sjhx
Copy link
Member

sjhx commented Jan 20, 2021

@sr258 no specific commitments, but it is on the radar

@sjhx
Copy link
Member

sjhx commented Jun 28, 2022

fixed in v0.13.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests