New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC Health Checks #2246

Open
jmccann opened this Issue Oct 18, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@jmccann
Contributor

jmccann commented Oct 18, 2017

Currently agent connectivity to server can be terminated under certain conditions (e.g. no jobs for a while and/or firewall/LB terminating idle connections). This results in jobs being triggered and sitting in pending until the agent is restarted and "reconnected" to the server.

GRPC health checks could help prevent/correct this issue and make the Drone platform more stable in general.

Discussion regarding this in Discourse: https://discourse.drone.io/t/0-8-1-agent-loses-connection-overnight/864/7
GRPC Health Checks docs: https://github.com/grpc/grpc/blob/master/doc/health-checking.md

@appleboy

This comment has been minimized.

Member

appleboy commented Oct 18, 2017

The same issue in kubernetes.

@Franklin89

This comment has been minimized.

Franklin89 commented Oct 18, 2017

I am also seeing this while running in Docker Swarm.

@drone drone locked and limited conversation to collaborators Oct 18, 2017

@bradrydzewski

This comment has been minimized.

Member

bradrydzewski commented Oct 18, 2017

Until we have a patch, it is not recommended to route agent communication through a load balancer or reverse proxy. When I start working on this I will update the issue status, but in the meantime, please feel free to send a patch if you want to expedite a resolution to this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.