Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable HTTP Error Handling #12

Open
Michael-F-Bryan opened this issue Jul 22, 2019 · 3 comments · May be fixed by #35
Open

Configurable HTTP Error Handling #12

Michael-F-Bryan opened this issue Jul 22, 2019 · 3 comments · May be fixed by #35

Comments

@Michael-F-Bryan
Copy link
Owner

We might want to add some sort of http-error-behaviour option to the config which lets you select how to handle HTTP errors. Some possible strategies are:

  • AlwaysFail - translates all non-2XX status codes to a failure
  • Warn - translates all non-2XX status codes to a warning
  • WarnOnServerError - translates status codes which indicate there was a spurious server failure to warnings. All other non-2XX status codes are errors.
  • ...

Original comment from rust-lang/rustc-dev-guide#388 (comment):

@Michael-F-Bryan I think it would be helpful to be able to not fail the build for certain types of errors. For example, a 404 should definitely fail the build, whereas a time out, 429 (too many requests), or 50x (internal error) should not fail the build.

@pihme
Copy link

pihme commented Jun 9, 2020

We use the link checker as part of our CI builds. Some sites are down every once in a while. They are down for several hours and return a 503 code during that time. And our builds fail.
Would be nice to have an option to treat those as warnings and only 4XX as errors.
(Currently we are thinking about setting the link checker to warnings only and then scraping the output for the error code ourselves.)

Would appreciate if you could comment whether this is a feature you would consider and in case you do at what time frame.

@Michael-F-Bryan
Copy link
Owner Author

Some sites are down every once in a while. They are down for several hours and return a 503 code during that time.

If a website will spuriously go down, is it even worth trying to check links that go to it?

Currently we are thinking about setting the link checker to warnings only and then scraping the output for the error code ourselves

Setting it to warnings only probably won't do anything. Warnings are for edge cases where the link could be broken, but it could also be a false negative. For example, if you wrote something that looks like a link (such as [some text]) without an accompanying footer ([some text]: https://example.com/).

The actual code that translates the Outcomes from linkcheck into diagnostics that are emitted to the screen is here.

Would appreciate if you could comment whether this is a feature you would consider and in case you do at what time frame.

We might need to think about a good policy for deciding what is an error and what isn't, but I like the idea of improving the way we detect errors!

Instead of evolving organically and adding if-statements as people find errors they'd like to handle differently. What about coming up with some general strategy for interpreting HTTP errors?

For example, maybe by specifying a list of rules that match on status codes, where earlier rules take precedence?

[output.linkcheck.http-error-handling]
# specific rules
"200" = "OK"
"429" = "Warn"
"503" = "Warn"

# catch-all
"400-499" = "AlwaysFail"
"500-599" = "AlwaysFail"

@pihme
Copy link

pihme commented Jun 10, 2020

If a website will spuriously go down, is it even worth trying to check links that go to it?

Short answer is yes. We link to pages from http://www.omg.org . These are down for up to three hours every couple of weeks. Other than that the links are fine (also one time Amazon US was down for a couple of hours. That was truly an outlier, but I just had to take a screenshot).
In general the more links you have the likelier it is that one of them is not in stellar condition.

Setting it to warnings only probably won't do anything. Warnings are for edge cases where the link could be broken, but it could also be a false negative. For example, if you wrote something that looks like a link (such as [some text]) without an accompanying footer ([some text]: https://example.com/).

Thanks for clarifying that.

For example, maybe by specifying a list of rules that match on status codes, where earlier rules take precedence?

Sounds great. Will also give other users more flexibility.

PS: I would love to contribute PRs not just an upvote on an issue, but I haven't written a single line of Rust yet. It's on my list of languages I want to learn, but might be a while before I can do something useful with it.

@Michael-F-Bryan Michael-F-Bryan linked a pull request Jun 11, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants