Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service health check not working when deployed on Nomad-cluster #25

Open
oschistad opened this issue Sep 28, 2020 · 9 comments
Open

Service health check not working when deployed on Nomad-cluster #25

oschistad opened this issue Sep 28, 2020 · 9 comments

Comments

@oschistad
Copy link
Contributor

oschistad commented Sep 28, 2020

Context: Using this module to deploy a nomad job against a Nomad-cluster of 3 server + 3 clients. Cluster is ACL-enabled and runs in an environment where Consul ACLs have also been enabled.

Consul Connect is enabled for the MInIO service.

Consul is reporting the Service as unhealth, and logs the following errors:


minio-live

ServiceName
    minio

CheckID
    _nomad-check-152ee3424af83fbcbb628873facbad73408e823d

Output

    Get http://<HOSTIP>:27237/minio/health/live: dial tcp <HOSTIP>:27237: connect: connection refused
@oschistad
Copy link
Contributor Author

I suspect this issue is caused by connect fully isolating the container, including from the process performing the service checks.

If so then this thread may contain a solution:
https://discuss.hashicorp.com/t/consul-connect-with-health-checks/7591/2

@Neha-Sinha2305
Copy link
Contributor

Also this module has starting behaving this way because of the following fix that was done to vagrant-hashistack box :Skatteetaten/vagrant-hashistack#344
Proxying fails when consul_default_policy=deny, works fine when consul_default_policy=allow.

@fredrikhgrelland
Copy link
Contributor

@Neha-Sinha2305 So this means we need to fix this asap ;)

@Neha-Sinha2305
Copy link
Contributor

@oschistad : Is it possible for you to set the consul_default_policy=allow in your cluster and redeploy the module and check that the service health check works fine?

@zhenik
Copy link
Contributor

zhenik commented Sep 29, 2020

@oschistad Could you also provide consul and nomad versions?

@Neha-Sinha2305
Copy link
Contributor

Context: Using this module to deploy a nomad job against a Nomad-cluster of 3 server + 3 clients. Cluster is ACL-enabled and runs in an environment where Consul ACLs have also been enabled.

Consul Connect is enabled for the MInIO service.

Consul is reporting the Service as unhealth, and logs the following errors:


minio-live

ServiceName
    minio

CheckID
    _nomad-check-152ee3424af83fbcbb628873facbad73408e823d

Output

    Get http://<HOSTIP>:27237/minio/health/live: dial tcp <HOSTIP>:27237: connect: connection refused

@oschistad: I looked into this issue and the minio module in the current state is capable of running without any errors in environments that have the following configurations:

Consul ACL Consul default policy Nomad ACL
enabled deny enabled
enabled allow enabled
disabled deny enabled
enabled deny disabled
enabled allow disabled
disabled deny disabled

Also, the checks that you mentioned are already in place in the minio nomad job and the terraform apply runs without any error so that means the healthchecks pass when run with the above configurations.
https://github.com/fredrikhgrelland/terraform-nomad-minio/blob/4c6087cbc3133a3b0f50622dcf61433e882bb62a/conf/nomad/minio.hcl#L15-L34

So, probably the issue lies in some other place and would need more details to debug it.

@fredrikhgrelland
Copy link
Contributor

fredrikhgrelland commented Oct 2, 2020

This is under investigation: Skatteetaten/vagrant-hashistack#368

@zhenik
Copy link
Contributor

zhenik commented Oct 21, 2020

Is this still relevant @oschistad @fredrikhgrelland @pdmthorsrud @zhenik @dangernil @claesgill

@oschistad
Copy link
Contributor Author

@zhenik Unknown - the issue referenced above relates to a flaw in the automated testing but not I believe an actual fix for the issue.

@dangernil dangernil removed this from the 0.3.0 milestone Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants