Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning category to broken link checker #1695

Closed
Tracked by #1704
david-venhoff opened this issue Sep 17, 2022 · 4 comments · Fixed by #1977
Closed
Tracked by #1704

Add warning category to broken link checker #1695

david-venhoff opened this issue Sep 17, 2022 · 4 comments · Fixed by #1977
Assignees
Milestone

Comments

@david-venhoff
Copy link
Member

Motivation

For some pages the link checker shows an error, even though the page technically works for users. For this reason, a warning category should be added.

Proposed Solution

Add a warning category which should include the following links:

  • Links to pages with status code 3xx
  • Links with invalid fragment
  • Links to pages with status code 403 because the page might just not be available to the link checker, even though it works for users.
@david-venhoff david-venhoff added 💡 feature New feature or request ❗ prio: medium Should be scheduled in the forseeable future. 🔨 enhancement This improves an existing feature 😅 effort: medium Should be doable in <12h labels Sep 17, 2022
@david-venhoff david-venhoff added this to the 22Q4 milestone Sep 17, 2022
@ulliholtgrave ulliholtgrave added the 🍼 good first issue Good for newcomers label Oct 2, 2022
@Gaston69
Copy link
Contributor

Gaston69 commented Nov 9, 2022

I would like to catch this issue.
Can anybody please give examples for the mentioned links:
3xx, 200 with invalid fragments, 403

I would add them to the fixtures.

And where are the links saved (db-table)?

@timobrembeck
Copy link
Member

timobrembeck commented Nov 13, 2022

And where are the links saved (db-table)?

We use django-linkcheck for this tasks, this library ships two models: Url which contains the url string, the status and message, and Link, which holds a many to many mapping from translation models to the Url model (to de-duplicate the checking process and check urls which appear in multiple pages only once).

I would add them to the fixtures.

Please do not include the real life example links below into our fixtures, they are just for local testing purposes. Instead, I would recommend to use https://httpstat.us/ for the fixtures... you can e.g. generate links like:

(Only adding the Link objects leads to an inconsistent test database - ideally you would also create a few new page translations which contain these links... then you could also do this the other way round: add the links via the content editor first, then export the data via the dumpdata command)

Can anybody please give examples for the mentioned links:
3xx, 200 with invalid fragments, 403

Some example links for the warning category:

"Broken external hash anchor"

When the message is "Broken external hash anchor", the link usually works, some examples:

So the intention of this warning category would be to encourage content creators to either fix the anchor or remove everything after the #...

"301 Moved Permanently" and "302 Found"

When the message is "301 Moved Permanently" or "302 Found", the link also works, but is just a redirect...

So content creators could replace the link with the new target to avoid the redirect (which might e.g. be removed in the future, who knows)...

One thing I didn't get is why most of the times the status is True and why it is False sometimes...
I believe this has something to do with that linkcheck cannot follow the redirect, see:
https://github.com/DjangoAdminHackers/django-linkcheck/blob/dfa053fbebd832e17de477dd9653472b775d0261/linkcheck/models.py#L323-L325

But again I'm not sure why...

"403 Forbidden"

These kind of errors are difficult to assess - sometimes, the error is justified, e.g.:

and sometimes, this is just because the link target recognizes our link checker as bot and denies access, even if the link works for normal users:

So for the 403 status code, I'm undecided whether we should really put this into the "warning" category or keep the current solution of giving the user the option to re-check or ignore the link at any given time if they notice that they are in fact working...

@timobrembeck
Copy link
Member

@Gaston69 It appears that this is a problem of the underlying library and not us.
The links which are indeed valid are not correctly recognized as such.
So probably I will open an upstream PR for linkcheck which means we will no longer have to create a separate warning category.

@timobrembeck
Copy link
Member

For reference, see the following issues in the linkcheck library:

After these two have been resolved, the only case that remains is "403 Forbidden", but as I stated earlier, I think we should still leave these cases for the manual review instead of marking all those URLs as potentially valid.

@timobrembeck timobrembeck added ⛔ blocked Blocked by external dependency and removed 💡 feature New feature or request 🍼 good first issue Good for newcomers ❗ prio: medium Should be scheduled in the forseeable future. 🔨 enhancement This improves an existing feature 😅 effort: medium Should be doable in <12h labels Nov 20, 2022
@timobrembeck timobrembeck self-assigned this Dec 12, 2022
@timobrembeck timobrembeck removed the ⛔ blocked Blocked by external dependency label Dec 18, 2022
@timobrembeck timobrembeck modified the milestones: 22Q4, 23Q1 Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants