Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OnRestartFailed to reload, health and prometheus plugins #2659

Closed
miekg opened this issue Mar 7, 2019 · 1 comment · Fixed by #2816
Closed

Add OnRestartFailed to reload, health and prometheus plugins #2659

miekg opened this issue Mar 7, 2019 · 1 comment · Fixed by #2816
Assignees

Comments

@miekg
Copy link
Member

miekg commented Mar 7, 2019

As @fturib wrote:

Thanks to the new "OnRestartFailed" event that was introduced few months ago in Caddy.
We can handle the failed restart (when the Corefile is invalid).

i guess we could have:

        c.OnStartup(r.onStartup)
	c.OnStartup(r.wait)
	c.OnRestart(r.onRestart)
         c.OnRestartFailed(r.onStartup)
	c.OnRestartFailed(r.wait)
	c.OnFinalShutdown(r.onFinalShutdown)
@miekg miekg self-assigned this Mar 13, 2019
miekg added a commit that referenced this issue May 4, 2019
Add OnReStartFailed which makes the health plugin stay up if the
Corefile is corrupt and we revert to the previous version.

Also needs a fix for the channel handling

See #2659

Testing it will log the following when restarting with a corrupted
Corefile

~~~
2019-05-04T18:01:59.431Z [INFO] linux/amd64, go1.12.4,
CoreDNS-1.5.0
linux/amd64, go1.12.4,
[INFO] SIGUSR1: Reloading
[INFO] Reloading
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
~~~

After which the curl still works.

This also needed a change to reset the channel used for the metrics
go-routine which gets closed on shutdown, otherwise you'll see:

~~~
^C[INFO] SIGINT: Shutting down
panic: close of closed channel

goroutine 90 [running]:
github.com/coredns/coredns/plugin/health.(*health).OnFinalShutdown(0xc000089bc0, 0xc000063d88, 0x4afe6d)
~~~

Signed-off-by: Miek Gieben <miek@miek.nl>
@miekg
Copy link
Member Author

miekg commented May 4, 2019

ready and prometheus are next.

yongtang pushed a commit that referenced this issue May 4, 2019
Add OnReStartFailed which makes the health plugin stay up if the
Corefile is corrupt and we revert to the previous version.

Also needs a fix for the channel handling

See #2659

Testing it will log the following when restarting with a corrupted
Corefile

~~~
2019-05-04T18:01:59.431Z [INFO] linux/amd64, go1.12.4,
CoreDNS-1.5.0
linux/amd64, go1.12.4,
[INFO] SIGUSR1: Reloading
[INFO] Reloading
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
~~~

After which the curl still works.

This also needed a change to reset the channel used for the metrics
go-routine which gets closed on shutdown, otherwise you'll see:

~~~
^C[INFO] SIGINT: Shutting down
panic: close of closed channel

goroutine 90 [running]:
github.com/coredns/coredns/plugin/health.(*health).OnFinalShutdown(0xc000089bc0, 0xc000063d88, 0x4afe6d)
~~~

Signed-off-by: Miek Gieben <miek@miek.nl>
miekg added a commit that referenced this issue May 5, 2019
Add OnRestartFailed to the ready plugin and some various cleanups.

Document slightly better how things are supposed to work with multiple
`ready`'s in the multiple Server Blocks.

All manually tested with this Corefile:
~~~
. {
    log
    ready
}

example.org {
    log
    chaos
    ready
}
~~~
And then `kill -SIGUSR1` and curling the ready endpoint. This works
well, the FailedReload is triggered by adding a syntax error in the
Corefile.

See #2659

Signed-off-by: Miek Gieben <miek@miek.nl>
miekg added a commit that referenced this issue May 5, 2019
Fix metrics endpoint on a failed reload, follows the same lines as the
previous PRs, see for e.g. 076b8d4. Test with a Corefile with 2 server
blocks and metrics enabled and then introducing a syntax error:

~~~
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
~~~

And then curl-ing the metrics endpoint.

See #2659 and as this is the last one.

Fixes: #2659

Getting this all right turns out to be tricky, also it's not easy
testable which is something I should fix.

Signed-off-by: Miek Gieben <miek@miek.nl>
yongtang pushed a commit that referenced this issue May 13, 2019
Fix metrics endpoint on a failed reload, follows the same lines as the
previous PRs, see for e.g. 076b8d4. Test with a Corefile with 2 server
blocks and metrics enabled and then introducing a syntax error:

~~~
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
~~~

And then curl-ing the metrics endpoint.

See #2659 and as this is the last one.

Fixes: #2659

Getting this all right turns out to be tricky, also it's not easy
testable which is something I should fix.

Signed-off-by: Miek Gieben <miek@miek.nl>
miekg added a commit that referenced this issue May 23, 2019
Add OnRestartFailed to the ready plugin and some various cleanups.

Document slightly better how things are supposed to work with multiple
`ready`'s in the multiple Server Blocks.

All manually tested with this Corefile:
~~~
. {
    log
    ready
}

example.org {
    log
    chaos
    ready
}
~~~
And then `kill -SIGUSR1` and curling the ready endpoint. This works
well, the FailedReload is triggered by adding a syntax error in the
Corefile.

See #2659

Signed-off-by: Miek Gieben <miek@miek.nl>
miekg added a commit that referenced this issue Jun 9, 2019
Add OnRestartFailed to the ready plugin and some various cleanups.

Document slightly better how things are supposed to work with multiple
`ready`'s in the multiple Server Blocks.

All manually tested with this Corefile:
~~~
. {
    log
    ready
}

example.org {
    log
    chaos
    ready
}
~~~
And then `kill -SIGUSR1` and curling the ready endpoint. This works
well, the FailedReload is triggered by adding a syntax error in the
Corefile.

See #2659

Signed-off-by: Miek Gieben <miek@miek.nl>
dna2github pushed a commit to dna2fork/coredns that referenced this issue Jul 19, 2019
Add OnReStartFailed which makes the health plugin stay up if the
Corefile is corrupt and we revert to the previous version.

Also needs a fix for the channel handling

See coredns#2659

Testing it will log the following when restarting with a corrupted
Corefile

~~~
2019-05-04T18:01:59.431Z [INFO] linux/amd64, go1.12.4,
CoreDNS-1.5.0
linux/amd64, go1.12.4,
[INFO] SIGUSR1: Reloading
[INFO] Reloading
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
~~~

After which the curl still works.

This also needed a change to reset the channel used for the metrics
go-routine which gets closed on shutdown, otherwise you'll see:

~~~
^C[INFO] SIGINT: Shutting down
panic: close of closed channel

goroutine 90 [running]:
github.com/coredns/coredns/plugin/health.(*health).OnFinalShutdown(0xc000089bc0, 0xc000063d88, 0x4afe6d)
~~~

Signed-off-by: Miek Gieben <miek@miek.nl>
dna2github pushed a commit to dna2fork/coredns that referenced this issue Jul 19, 2019
Fix metrics endpoint on a failed reload, follows the same lines as the
previous PRs, see for e.g. 076b8d4. Test with a Corefile with 2 server
blocks and metrics enabled and then introducing a syntax error:

~~~
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
~~~

And then curl-ing the metrics endpoint.

See coredns#2659 and as this is the last one.

Fixes: coredns#2659

Getting this all right turns out to be tricky, also it's not easy
testable which is something I should fix.

Signed-off-by: Miek Gieben <miek@miek.nl>
dna2github pushed a commit to dna2fork/coredns that referenced this issue Jul 19, 2019
Add OnRestartFailed to the ready plugin and some various cleanups.

Document slightly better how things are supposed to work with multiple
`ready`'s in the multiple Server Blocks.

All manually tested with this Corefile:
~~~
. {
    log
    ready
}

example.org {
    log
    chaos
    ready
}
~~~
And then `kill -SIGUSR1` and curling the ready endpoint. This works
well, the FailedReload is triggered by adding a syntax error in the
Corefile.

See coredns#2659

Signed-off-by: Miek Gieben <miek@miek.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant