Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algo fetch: HTTP errors aren't displayed in the logs #222

Closed
AurelienGasser opened this issue Apr 20, 2020 · 4 comments
Closed

Algo fetch: HTTP errors aren't displayed in the logs #222

AurelienGasser opened this issue Apr 20, 2020 · 4 comments

Comments

@AurelienGasser
Copy link
Contributor

AurelienGasser commented Apr 20, 2020

HTTP errors occurring during the algo fetch aren't reported in the logs. This makes it hard to understand what is going on.

For instance, in this issue, the logs only error show hash doesn't match [...] which is unhelpful: the real source of the problem is the HTTP 403 error which occurred prior to the hash computation.

@samlesu
Copy link
Contributor

samlesu commented Apr 21, 2020

It looks like it is similar to the issue #217. The cause of the failure is different, but it's the same ambiguous error message.

@AurelienGasser
Copy link
Contributor Author

After more investigation, the case I was looking at was not a HTTP 403 error. Despite the appearances, the server returns a 200.

image

For context, this is what the worker pod receives when it tries to fetch e.g. susbtra-backend.node-1.com/algo/.... but its DNS resolution points to the public internet, and not the k8s cluster.

Maybe another approach would be to look at the response content type. Thoughts @Kelvin-M

@AurelienGasser
Copy link
Contributor Author

I'm going to close this for now as there's no easy way to fix it.

We should keep in mind that errors like hash doesn't match ... are usually due to DNS misconfiguration. The solution is make sure that the address substra-backend.node-{1-N}.com resolves to the cluster IP, and not to the public internet. This could be part of a troubleshooting guide? cc @natct10

@natct10
Copy link
Contributor

natct10 commented May 5, 2020

I'm going to close this for now as there's no easy way to fix it.

We should keep in mind that errors like hash doesn't match ... are usually due to DNS misconfiguration. The solution is make sure that the address substra-backend.node-{1-N}.com resolves to the cluster IP, and not to the public internet. This could be part of a troubleshooting guide? cc @natct10

You're right! I will add this to the troubleshooting section, thank you for pointing this one out @AurelienGasser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants