Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload certificate configuration at runtime #1799

Merged
merged 3 commits into from Nov 7, 2016

Conversation

nickbabcock
Copy link
Contributor

@nickbabcock nickbabcock commented Nov 3, 2016

Recent Jetty releases have allowed us to reload ssl/tls information at runtime, so it's no longer necessary to restart Dropwizard to install a new certificate. More information can be found: jetty/jetty.project#918

The motivation behind this PR is that as certificate management starts to be more automatic (looking at Let's Encrypt) and certificates are issues more frequently, the downtime caused by new certificates starts to become palpable. Yes, the service may only be down for a few moments, but any reductions in downtime should be welcomed.

Those who use a TLS termination proxy in front of dropwizard (or use http endpoints) will not be affected.

I implemented a quick check to ensure it's working

keytool -genkey -keyalg RSA -alias selfsigned -keystore keystore.jks -storepass password -keysize 2048

# add to config.yaml
# start dropwizard

rm keystore.jks
keytool -genkey -keyalg RSA -alias selfsigned2 -keystore keystore.jks -storepass password -keysize 2048
curl -k -X POST 'https://localhost:8081/tasks/ssl-reload'

I can confirm that the certificate was indeed changed without needing a restart. 馃帀

I marked this PR as RFC (request for comments) to get a feel of opinions. This PR is not ready to be merged.

Questions:

  • determine where SslReloadBundle / SslReloadTask should live. Right now, they reside in the e2e module, which shouldn't be their final resting place 馃槤
    • Add SslReloadTask to dropwizard-servlets with other Tasks?
    • Add dropwizard-ssl-reload submodule?
    • Something else
  • is an admin task the right way to reload configuration? Other ideas could be a filewatcher for automatic reloading...
  • is hooking into jetty's lifecycle handler appropriate? I feel like it resulted in a rather concise and elegant solution (if maybe a bit shortsighted -- ie. too opinionated).
  • what should the behavior be if one of the new configurations is misconfigured? Stop reloading or continue with the rest?

Todos:

  • tests
  • documentation

@arteam
Copy link
Member

arteam commented Nov 3, 2016

I think it's a very good improvement to Dropwizard from the operational side!

determine where SslReloadBundle / SslReloadTask should live. Right now, they reside in the e2e module, which shouldn't be their final resting place :stuck_out_tongue_closed_eyes:
    Add SslReloadTask to dropwizard-servlets with other Tasks?
    Add dropwizard-ssl-reload submodule?
    Something else

I think we could use a new module for that. This will allows users to disable this feature, if they don't want to expose the ability to change SSL certificates on the fly.

is an admin task the right way to reload configuration? Other ideas could be a filewatcher for automatic reloading...

We now support POST requests for admin tasks, so it's doable to use this approach. I like it, because
it requires explicit action from the administrator. A filewatcher is also an option, but it requires the administrator to put certificates to disk, which creates additional overhead for managing them.

is hooking into jetty's lifecycle handler appropriate? I feel like it resulted in a rather concise and elegant solution (if maybe a bit shortsighted -- ie. too opinionated).

Looks good to me. We use Jetty's lifecycle in other bundles and it works well. I think we can use it,
unless we have reasons not to.

what should the behavior be if one of the new configurations is misconfigured? Stop reloading or continue with the rest?

I think we should try to provide an atomic way to update all configurations with new certificates. I think it would be a nightmare to debug a server with a partially updated configuration.

@nickbabcock
Copy link
Contributor Author

nickbabcock commented Nov 4, 2016

Ok I'll cave to adding the submodule. My initial plan was not to add a submodule because many dropwizard submodules that users have to manually import have additional third party dependencies, mustache, forms, hibernate, client, etc. Since certificate reloading doesn't require additional dependencies I wasn't sure if another submodule was appropriate, but I'll move the classes to dropwizard-ssl-reload (or another name, if preferable)

On atomic updates. Good point. It's entirely plausible that the keystore could have been swapped out for one with a different password, causing the reload to fail. A failure to reload is tragic. No future connections are allowed (I tested this). Like you alluded to, it's a scary thought to hot reload a certificate and break all functionality.

But on one hand, if the app was restarted, it would break as well. The app wouldn't start. If we could fallback to the previous configuration we could keep the app chugging along, but at the cost it failing at the next restart. It's debatably a reasonable response. A 500 status code could inform the admin to examine the new certificate information before the next reboot that would force usage of the new certificate.

Since an SslContextFactory doesn't have a way to clone itself, I purpose we first test the configuration is valid by constructing and loading a new instance of SslContextFactory and only if all instances succeed, do we reload the real SslContextFactorys.

EDIT: This post was me thinking about loud, no need to respond to anything 馃槤

@jplock
Copy link
Member

jplock commented Nov 4, 2016

I'd probably favor keeping this in the existing jetty module and have a configuration option to enable or disable it (disabled by default).

@evnm
Copy link
Member

evnm commented Nov 4, 2016

This is cool. Thanks, @nickbabcock!

I agree with @jplock. Including the feature in dropwizard-jetty and disabling it by default would cover the bases of 1) not introducing unexpected new behavior, 2) making it easy to enable the feature, and 3) avoiding additional dependencies.

In order to help avoid the case where an invalid cert silently fails upon hot reload, perhaps we could add an onFailure callback as an (e.g. a Consumer<Throwable>) instance field and constructor argument to SslReloader. This callback would be triggered upon hot-reload failure and could, for instance, emit log statements or metrics which an operations team could wire into alerting.

Anybody have opinions on the Task vs filewatcher approaches? I don't have a strong opinion one way or the other, but @nickbabcock's Task implementation seems more idiomatic, arguably being akin to administrative tasks like GarbageCollectionTask.

@jplock
Copy link
Member

jplock commented Nov 4, 2016

I like the task approach for triggering the reload

@nickbabcock
Copy link
Contributor Author

Unfortunately none of the classes can go in dropwizard-jetty as Task and Bundle are defined in modules that import dropwizard-jetty. I'll play with it some more.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 81.95% when pulling 747a3d6 on nickbabcock:reload-ssl into 0870401 on dropwizard:master.

@nickbabcock
Copy link
Contributor Author

Is it awkward that I put the code in dropwizard-core? It's the only module that pulls in jetty for SslContextFactory, dropwizard-servlets for Task, core for Bundle. I'm still open to dropwizard-ssl-reload as well.

Also looks like coveralls doesn't do cross module code coverage, so my tests in e2e (which allows the most realistic and integrated testing) count against our code coverage 馃槶

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 81.95% when pulling 5d509bc on nickbabcock:reload-ssl into 0870401 on dropwizard:master.

@jplock
Copy link
Member

jplock commented Nov 5, 2016

I think I'm core is fine to me

@nickbabcock nickbabcock changed the title [RFC] Reload certificate configuration at runtime Reload certificate configuration at runtime Nov 7, 2016
@nickbabcock
Copy link
Contributor Author

Added the documentation.

PR is in a pretty good spot, but additional reviews/comments/suggestions welcomed 馃槃

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 81.95% when pulling 0448de9 on nickbabcock:reload-ssl into 0870401 on dropwizard:master.


@Override
public void initialize(Bootstrap<HelloWorldConfiguration> bootstrap) {
bootstrap.addBundle(new SslReloadBundle("/assets/", "/"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assets path in here needed or a cut and paste issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a cut-and-paste-with-unsaved-changes-that-fixed-this-issue 馃槥

fixed.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 81.95% when pulling 04f2a6e on nickbabcock:reload-ssl into c48bc04 on dropwizard:master.

@jplock
Copy link
Member

jplock commented Nov 7, 2016

@nickbabcock do you want to update the release notes? Then I think this is good to merge. Great job! 馃憤

@jplock jplock added this to the 1.1.0 milestone Nov 7, 2016
@nickbabcock
Copy link
Contributor Author

Cleaned up the commit history and made this PR first in the release notes because it seems like something we might want to highlight 馃槇

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 81.95% when pulling 1a52d6d on nickbabcock:reload-ssl into c48bc04 on dropwizard:master.

@jplock jplock merged commit bd388e9 into dropwizard:master Nov 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants