Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkchecker improovement #19303

Closed
MichalMaler opened this issue Mar 16, 2021 · 4 comments · Fixed by eclipse-che/che-docs#1926
Closed

Linkchecker improovement #19303

MichalMaler opened this issue Mar 16, 2021 · 4 comments · Fixed by eclipse-che/che-docs#1926
Assignees
Labels
area/doc Issues related to documentation kind/enhancement A feature request - must adhere to the feature request template. kind/technical-debt Technical debt issue severity/P2 Has a minor but important impact to the usage or development of the system.

Comments

@MichalMaler
Copy link
Contributor

Hello,
Please, the current linkchecker script that validates current Eclipse Che PRs get stuck on an error caused by a website denial of service attack protection, causing a delay in the merging process. When this link test isn't finished successfully, we can merge a PR.

The problem is, that the test fails on a link that is active and OK, but a "Max retries exceeded" counter of the affected page blocks access to that page (caused by several connection attempts in a short time), and linckcecker threatens it as a broken link.

To work around this issue, we set the ignorewarnings=url-rate-limited parameter in linkcheckerrc configuration, which should make the linkchecer validation immune against this error:

# Ignore the comma-separated list of warnings.
ignorewarnings=url-rate-limited

But it seems it doesn't work as we expected and the script ends with a failure that blocks a merge.

Example

URL        `https://github.com/che-incubator/chectl#user-content-chectl-serverdeploy'
Parent URL http://localhost:4000/che-7/administration-guide/configuring-openshift-oauth/, line 725, col 1
Real URL   https://github.com/che-incubator/chectl#user-content-chectl-serverdeploy
Result     Error: ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /che-incubator/chectl (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f57cadda6a0>: Failed to establish a new ...

Can you help us to tune up the linkchecker config so that our process of adding new documentation could be as fast as we can deliver?
Thank you.

@MichalMaler MichalMaler added kind/bug Outline of a bug - must adhere to the bug report template. kind/enhancement A feature request - must adhere to the feature request template. do-not-merge/hold area/doc Issues related to documentation kind/technical-debt Technical debt issue labels Mar 16, 2021
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Mar 16, 2021
@metlos metlos added severity/P2 Has a minor but important impact to the usage or development of the system. and removed area/git kind/bug Outline of a bug - must adhere to the bug report template. status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Mar 16, 2021
@themr0c
Copy link
Contributor

themr0c commented Mar 22, 2021

@themr0c
Copy link
Contributor

themr0c commented Mar 22, 2021

We have issues piling up here:

  • We had multiple GitHub Actions runners. Since few, we have one runner only (we could run multiple actions in parallel before). As a consequence, all linkchecker checks are run from the same IP. As a consequence, the checks are triggering the Denial Of Service defenses of external websites.

Hopefully, moving to the eclipse-che GhitHub organization will help mitigate the issue with the number of runners.

  • The ignorewarnings=url-rate-limited parameter seems not sufficient to ignore errors caused by these warnings.

Investigating this is complex.

Thinking about how we could optimize our link checking strategy.

  • Internal links: we need to check all internal links on PR, in case content of the PR would break a link from another place in the docs.
  • External links: checking these links often is useful, we have a good MTTR, but it is causing troubles: it's slowing down all checks, we have false positives dure to DOS protection on external websites. Path of investigation: could we check them only once a day, and create an issue in case of errors?

@themr0c
Copy link
Contributor

themr0c commented Mar 22, 2021

Other possibility: cache the results?

@themr0c
Copy link
Contributor

themr0c commented Mar 22, 2021

NOTE: Enabling threads => errors on anchors. is still true, even for internal links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/doc Issues related to documentation kind/enhancement A feature request - must adhere to the feature request template. kind/technical-debt Technical debt issue severity/P2 Has a minor but important impact to the usage or development of the system.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants