Skip to content
This repository has been archived by the owner on Feb 10, 2022. It is now read-only.

Support logging in to a private registry before launch #104

Closed
wants to merge 1 commit into from
Closed

Conversation

DRuggeri
Copy link

@DRuggeri DRuggeri commented Feb 2, 2018

Simple patch that adds a few attributes to the job's properties to support logging in to a private registry.

@cfdreddbot
Copy link

Hey DRuggeri!

Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

@drnic
Copy link
Contributor

drnic commented Feb 8, 2018

@DRuggeri I have two thoughts - 1) perhaps move this into the docker job so its of benefit to all users, not just people using the containers job; 2) perhaps change login_endpoint to docker_endpoint, etc?

@DRuggeri
Copy link
Author

DRuggeri commented Feb 8, 2018

Thanks for the feedback, @drnic. No problem with naming it docker_endpoint. For the job location, I guess it could go either way. I placed it in the containers job just to be closer to the containers that get launched and to make it obvious that this login is related to this container.

I have no problem moving it to the docker job - I think the following would work well there:

properties:
  logins:
  - endpoint: https://foo
    username: usr1
    password: pass1
  - endpoint: https://bar
    username: usr2
    password: pass2

The one thing I wonder... with it being in the containers job, we would do the login pretty much right before the image would be pulled. This gives us a high degree of confidence that the login token is valid. I think it may be possible that a token obtained right after docker daemon start might expire when the containers job attempts to use it at some later time.

For example...
When the instance is first laid out at hour 0, the docker login occurs in the ctl script for the docker job and the containers jobs will start and use the logged in accounts. Life is good.
Say at hour expiry+1, an update to the containers job occurs and a new container must start. If the authentication token has expired, this container would fail to pull and start.

... does that sound like a realistic issue?

@drnic
Copy link
Contributor

drnic commented Feb 8, 2018

Gotcha on the issue of docker login timing out after expiry + 1.

Thought: I'm assuming that all the occasions that containers job starts (or any other job) and assumes docker job has very recently run docker login, would be from monit restarting all jobs sequentially. In which case, docker job will always have been recently restarted and docker login having been run.

Is there another occurrence of another bosh job wanting to use docker outside of monit restart/bosh restart/bosh deploy/resurrector?

@DRuggeri
Copy link
Author

DRuggeri commented Feb 8, 2018

I'm not sure if any other job would want to use docker (job or command? I assume you mean command). At the least, we know that the containers job uses it which is why a login in the docker job would be enough to permit a pull of the image from the containers job. However, it's probably not safe to assume the containers job is the only consumer of the docker daemon as laid down by the docker job.

Thanks for explaining your assumptions - this is where I'm seeking a bit of guidance. The assumption on my part is that a change only to the containers job would cause bosh to update only that job. This would have the effect of causing monit to trigger the ctl script for the containers job only. If the expiry of the token from the login in the docker job has elapsed in that time, such a restart would cause a failure pulling from the registry.

So, I guess that's the only question that's plaguing me - is it safe to assume that the docker job will have been started or restarted just before the container job? If so, we're golden and I'll go ahead and move it. If not, maybe a sane response would be to add login capabilities to both jobs to cover all bases?

@drnic
Copy link
Contributor

drnic commented Feb 8, 2018 via email

@DRuggeri
Copy link
Author

DRuggeri commented Feb 12, 2018

Thanks, @drnic. Try as I did, I couldn't get it to fail the way I thought it would. I've modified the PR as suggested to do the login in the docker job instead of the containers job.

Any additional feedback is greatly welcomed.

done

# Log in to any registries
<% p('logins', []).each do |login| %>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not thinking of this before as we were discussing different aspects of the PR - I think this setup of docker login might go well in a pre-start hook https://bosh.io/docs/post-start.html

As a bonus then we don't need to change the exec dockerd as above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm sorry for suggestion subsequent changes. I know it can annoy me when someone doesn't just press the "Merge PR" button ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem - what's right is right so we should just aim for that.

I can do that. I hope to have an update in later today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So things turned out a little funky when moving this to post-start. When running in combination with the containers job, the start of containers fails because it "depends" on the post-start of the docker job being successful. Bosh won't move to post-start hooks if any of the start hooks fail, so we never get to the point that the login happens. Basically, a circular dependency...

I think we may be stuck with this logic in the start hook of the docker job after all.

@@ -132,3 +132,10 @@ properties:
store_dir:
description: "Path to use as the root of the Docker runtime"
default: "/var/vcap/store"

logins:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks simple and nice.

@drnic
Copy link
Contributor

drnic commented Feb 13, 2018 via email

@glestaris
Copy link

hey @DRuggeri, sorry for failing to respond for so long. The team is uncomfortable with this change. Particularly the nohup part of it. We realised that testing in CI would be fairly complicated so we are not going to pursue this feature currently.

Closing for now but we can keep the conversation going and re-open if there is another potential solution.

@glestaris glestaris closed this May 15, 2018
@bgandon
Copy link

bgandon commented May 15, 2018

I'm watching this repo and I'm jumping in because was curious about this nohup question.

After doing some testing around nohup and what happens to a background child process when its parent exits, I agree to say the nohup is useless. Indeed, there is no way a SIGHUP would be send to the dockerd child process by the time the ctl script ends its execution.

There is no reason why we would need to protect the dockerd daemon from receiving SIGHUP hangup signals, either.

And when the ctl script ends its executions, its background dockerd child just keeps running. monit is happy anyway because of the --pidfile ${DOCKER_PID_FILE} argument passed to dockerd. monit just needs a PID file to track.

So, I would confirm the nohup is useless, but for the rest of this PR, I'm comfortable with the change. monit does not require daemons to be exec-ed. They can be forked as background jobs as well, provided that the PID file is properly created. monit doesn't mind if the ctl script ends or not, because it supports both cases.

But there's one problem left.

When @DRuggeri asks:

is it safe to assume that the docker job will have been started or restarted just before the container job?

He's right, the question is relevant. Co-located jobs are started in no particular order with BOSH. So no one should assume that any jobs starts before the others.

That's why this “login” feature should be implemented in the containers job. And there the pre-start script would be a good fit.

@DRuggeri, are you OK to move the docker login code to the pre-start script of the containers job?
You'll also need in this pre-start script to wait for the dockerddaemon to be up and running. In this wait loop, you'll need to implement a timeout in order not to wait forever, and just bailout with an error (i.e. an non zero exit code) when no Docker daemon shows up within a reasonable time (you'll need this timeout to be configurable, in order to support slow infrastructures).

@DRuggeri
Copy link
Author

Sorry for the super slow reply. It's been hectic.

@glestaris, I'm good with removing nohup and doing some testing around that based on the reply from @bgandon. No problem... so long as it scratches the itch of using a private registry in the long run. If you can confirm the nohup weirdness is the only concern, I can give it a whirl.

Regarding @bgandon's comments about flipping this over to pre-start, I'm not sure that would work since we cannot log into the registry until after the docker job starts. The job lifecycle doco points out that all of the pre-start scripts are called and waited for termination before moving to the next phase (calling start scripts).

I think a simple, happy medium could be to move the login logic to the containers job's start script. Implementing the same wait logic for the daemon to come up there. An added side benefit is that doing so avoids some of the noise in the logs when the containers job tries to start things but the daemon is not yet up. The down side, as @drnic points out, is that only the containers job benefits.

Thoughts? I am very motivated to see this make it upstream... if for any reason, so I don't have to maintain a custom patch 😁.

@owwweiha
Copy link

@DRuggeri would be really great to have this feature in the upstream... I would really need it ;)

@DRuggeri
Copy link
Author

@Onke, agreed, but it doesn't seem like the maintainers are too interested. Still awaiting feedback.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants