Portus freezes when trying to get manifest after image push #373

Closed
morodin opened this Issue Sep 30, 2015 · 7 comments

Projects

None yet

3 participants

@morodin
morodin commented Sep 30, 2015

We use Portus in the docker environment. Both our registry as well as Portus use TLS. We do not use any proxy in addition to that set-up. The code we use is the current (15/09/30) master from github.

Whenever we try to push an image, Portus freezes exactly at the function get_response_token, line 140 Net::HTTP.start. The start call never returns. The trace till this point (with added debugging messages) is:

Registry client called, host=dregistry.example.com:5000, use_ssl=true, username=
Registry client manifest, repoistory=userx/busybox6, tag=sha256:4c537efbcc9382e8d6486b5185279f4c902532d4af3be707cff1a1cfb1bc01d8
perform_request called, uri=https://dregistry.example.com:5000/v2/userx/busybox6/manifests/sha256:4c537efbcc9382e8d6486b5185279f4c902532d4af3be707cff1a1cfb1bc01d8
perform_request: before calling get_response_token https://dregistry.example.com:5000/v2/userx/busybox6/manifests/sha256:4c537efbcc9382e8d6486b5185279f4c902532d4af3be707cff1a1cfb1bc01d8
get_response_token: uri=dregistry.example.com,5000, use_ssl=true
get_response_token: inside http.start
get_response_token: req done: #<Net::HTTP::Get:0x007fca97f9dcd8>
perform_request: after calling get_response_token res.code= 401
perform_request: 401
perform_request: calling request_auth_token
request_auth_token called, bearer_real=https://dregistry.example.com:3000/v2/token, query={"service"=>"dregistry.example.com:5000", "account"=>"portus", "scope"=>"repository:userx/busybox6:pull"}, uri=https://dregistry.example.com:3000/v2/token?account=portus&scope=repository%3Auserx%2Fbusybox6%3Apull&service=dregistry.example.com%3A5000
request_auth_token called, credentials found
request_auth_token: before calling get_response_token
get_response_token: uri=dregistry.example.com,3000, use_ssl=true

If we change the function a little bit and add http.open_timeout, it returns with the following error:

Could not fetch the tag for target {"mediaType"=>"application/vnd.docker.distribution.manifest.v1+json", "size"=>6709, "digest"=>"sha256:4c537efbcc9382e8d6486b5185279f4c902532d4af3be707cff1a1cfb1bc01d8", "length"=>6709, "repository"=>"userx/busybox6", "url"=>"https://dregistry.example:5000/v2/userx/busybox6/manifests/sha256:4c537efbcc9382e8d6486b5185279f4c902532d4af3be707cff1a1cfb1bc01d8"}
Reason: execution expired

Even more strange, when we call the manifest function from the rails console:

myClient=Portus::RegistryClient.new("dregistry.example:5000",true)
myClient.manifest("userx/busybox6","sha256:c3102a0622d47e661b61208c19de53826fc725f9840b915ad1afd17ee5be6f70")

... everything works as excepted.

This problem was first mentioned in another issue: #338

@morodin
morodin commented Oct 2, 2015

I have additional information. I added set_debug_output to the request. It seems, that the connection is established, however, http.start does not return. The following output is generated:

opening connection to dregistry.example.com:3000...
opened
starting SSL for dregistry.example.com:3000...
SSL established
@flavio flavio added this to the 2nd stable release of Portus milestone Oct 5, 2015
@mssola mssola self-assigned this Oct 5, 2015
@mssola
Contributor
mssola commented Oct 5, 2015

Ok, after debugging the problem, it seems like it's quite a straight-forward problem and it's not Portus' fault really. I was using a config in which Portus was behind NGinx with SSL and my registry run on its own with SSL too. The thing is that when Portus gets a notification, it will make a GET request to the registry in order to get the tag that has been pushed (we didn't do that before distribution v2.1, but it's required now). Anyways, in order to do that, you need a free SSL connection in place, and in this setup you don't, because it's already busy by the registry -> Portus web event. Thus, in this case Portus didn't hang in an infinite loop, rather it waited for the connection to be free. In my case I had to wait until the connection timed out, and then Portus could fetch the manifest. Even in this case, the tag would not be pushed into Portus' DB because the first connection timed out, and therefore errored.

The solution, therefore, is to either use the passenger config that we are using in the rpm that we build (and if I recall correctly, we didn't write any specific config for passenger to work here, it works out of the box :D), or to come up with a configuration for NGinx or another that respects that. For example, an approach that worked for my NGinx config was the one described here. Basically in this config you'll end up using two sockets with Thin, instead of one single connection.

Since this is tricky, I'll add a page on the wiki about deploying Portus.

Please, tell me if this solution works for you, and thanks for your patience :)

Update: I've already added it on the wiki: https://github.com/SUSE/Portus/wiki/Installing-Portus#known-issues

@morodin
morodin commented Oct 6, 2015

Thanks for the excellent work solving this problem. I can confirm - the problem is solved for us. We can now successfully push an image to the registry and all information are gathered without Portus freezing.

The configuration for running this in docker is now quite extensive. Shall i document it somewhere or send it to you so that you can have a look what of the configuration settings seems relevant for other users?

@flavio
Member
flavio commented Oct 6, 2015

@morodin yeah, please share the document with us. We would be glad to add more information to our wiki

@mssola
Contributor
mssola commented Oct 6, 2015

@morodin good to know! I'll close this issue, but it would be great if you could share your configuration with us, so we can extend the wiki. Thanks!

@mssola mssola closed this Oct 6, 2015
@morodin
morodin commented Oct 8, 2015

I documented our configuration on https://github.com/morodin/Portus/wiki/Portus-SSL-Docker-Configuration. I really appreciate all comments or correction.

@mssola
Contributor
mssola commented Oct 8, 2015

@morodin wow, this is fantastic! I'll try to complete the wiki from our side later with some of the information and tips that you've provided in your wiki. I'll also reference your wiki page for completeness. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment