Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register critical services in consul. #553

Merged
merged 6 commits into from
Apr 13, 2018
Merged

Register critical services in consul. #553

merged 6 commits into from
Apr 13, 2018

Conversation

eliasbrange
Copy link
Contributor

Fixes #552

This PR adds the functionality to register services in consul that never gets healthy to be able to filter on failed services in consul and see that they are misbehaving.

I've added a method for registering a service with the status set to critical. This is called in onHealthChechFailed, but only if the service hasn't already been registered. I've also made some minor refactoring in services/services.go, i.e. having MarkForMaintenance call Deregister instead of the other way around.

jobs/jobs.go Outdated
}

// sendHeartbeat sends a heartbeat for this Job's service
func (job *Job) sendHeartbeat() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured since the tests pass this isn't a problem but what were your thoughts on changing this from an exported method? Job.SendHeartbeat => Job.sendHeartbeat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a hard time seeing how that method will be used outside of the job itself, but it's a small change so I can change it back if you'd like.

@jwreagor
Copy link
Contributor

jwreagor commented Apr 2, 2018

Overall I think this is a great change. I can definitely imagine services that never come up to be critical state yet still registered in a service catalog. One question in the review otherwise I think we can include this in a next minor bump.

@jwreagor
Copy link
Contributor

jwreagor commented Apr 2, 2018

Actually have a few more thoughts I'll post in the issue.

@eliasbrange
Copy link
Contributor Author

Thanks for the feedback, I will try to address those thoughts in the issue.

@misterbisson
Copy link
Contributor

@eliasbrange I'd love your feedback on @cheapRoc's suggestion about initial_status in #552, and if that works we'd also need to update the docs in this PR to correspond with the change .

Copy link
Contributor

@jwreagor jwreagor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some renaming otherwise the implementation details look sound. I'll test the full version before merging.

@@ -72,7 +101,7 @@ func (service *ServiceDefinition) registerService() error {
EnableTagOverride: service.EnableTagOverride,
Check: &api.AgentServiceCheck{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking up that this is using Consul's api.AgentServiceCheck I believe the configuration option should be on the job level and not within the health check. Optionally, we could expose the same initial_status option for all health checks independently... but I think that should be a future enhancement warranted from user requests for the feature.

jobs/config.go Outdated
@@ -24,6 +24,7 @@ type Config struct {

// service discovery
Port int `mapstructure:"port"`
InitialStatus string `mapstructure:"initialStatus"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it initial_status (snake case), I don't believe we use lower camel case anywhere else. In Go we'll continue to use InitialStatus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopTimeout uses lower camel case. But I'll switch to initial_status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you're right.

jobs/jobs.go Outdated
// SendHeartbeat sends a heartbeat for this Job's service
func (job *Job) SendHeartbeat() {
// sendHeartbeat sends a heartbeat for this Job's service
func (job *Job) sendHeartbeat() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename sendHeartbeat back to exported method SendHeartbeat.

@@ -159,6 +166,9 @@ func (job *Job) Run(pctx context.Context, completedCh chan struct{}) {
completedCh <- struct{}{}
}()
for {
// Check if job's service has been registered. Doing it inside the event
// loop to retry if consul registration fails.
job.checkRegistration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks handy all by itself.

},
{
name: "serviceB",
initialStatus: "invalid-value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to initial_status here.

@jwreagor
Copy link
Contributor

Thank you again @eliasbrange for your continued support and patience.

@eliasbrange
Copy link
Contributor Author

Reverted SendHeartbeat change and renamed to initial_status @cheapRoc .

@jwreagor
Copy link
Contributor

I tested this PR with the following (rather convoluted) test rig using Fabio, http-echo, and the following configuration file.

{
  consul: "localhost:8500",
  logging: {
    level: "DEBUG",
    format: "text"
  },
  jobs: [
    {
      name: "foo",
      exec: "http-echo -listen=:8484 -text=hello",
      initial_status: "warning",
      tags: ["urlprefix-/foo"],
      port: 8484,
      health: {
        exec: "/usr/bin/curl -o /dev/null -sfI http://localhost:8484/",
        interval: 60,
        ttl: 60,
      },
    },
  ],
  control: {
    socket: "/tmp/cp.socket"
  }
}

I was able to verify that the Consul service check was initially set to warning status.

screen shot 2018-04-12 at 10 54 26 pm

With the health check set to 60 seconds I was able to wait and see the health check passing.

screen shot 2018-04-12 at 10 54 53 pm

Once again, I want to thank you for a wonderful addition to ContainerPilot @eliasbrange!

@jwreagor jwreagor merged commit 4b550bf into TritonDataCenter:master Apr 13, 2018
@eliasbrange
Copy link
Contributor Author

Thanks for including it @cheapRoc. Will make it much easier for our monitoring!

@jwreagor jwreagor mentioned this pull request Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants