Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle linkrot / broken external links? #1

Open
Zegnat opened this issue Aug 7, 2019 · 6 comments

Comments

@Zegnat
Copy link
Owner

commented Aug 7, 2019

gmpg.org is a pretty old site, by internet standards. Some degree of link rot is expected to have occurred. How should this be handled?

Currently this repo is as close to the last public state of the site as possible. This may not be fit for publication as there is no saying what external links point to on today’s web compared to 2003. As @tantek said:

I know some of those old links got domainsquatted and taken over by nasty things

But how do we handle these links in the least destructive way possible? Should broken links be replaced with Wayback Machine (archive.org) links? Should special styling for them be added?

What if no archived version of content is found at all? Is the mention removed from the site completely, or should only the link be disabled so the reference itself still exists for historical value?

@kevinmarks

This comment has been minimized.

Copy link

commented Aug 7, 2019

The https://zegnat.github.io/html-gmpgdotorg/xfn/and/ page has a lot of 'buy this' links (including an internal one) and shoudl probably be linked to archives of the sites.
Maybe we should add newer sites that support xfn, like mastodon?

@Lewiscowles1986

This comment has been minimized.

Copy link

commented Aug 7, 2019

Perhaps a custom 404 page with search-bar, and links to archive.org?

@Zegnat

This comment has been minimized.

Copy link
Owner Author

commented Aug 7, 2019

@kevinmarks:

The https://zegnat.github.io/html-gmpgdotorg/xfn/and/ page has a lot of 'buy this' links […]

Yep. But how are we going to handle them?

[…] (including an internal one) […]

The feedback page is a deliberate 404, because I can’t handle the form with GitHub pages. Opened #2 to address further.

@Lewiscowles1986:

Perhaps a custom 404 page with search-bar, and links to archive.org?

All internal pages that were missing on gmpg.org should have been backfilled with copies from the Wayback Machine. So a 404 on our end shouldn’t be needed. Or are you proposing we have some sort of custom 404 page that we show instead of linking to external URLs? A bit like what Thomas Baekdal does with a custom exit page?

@jeremycherfas

This comment has been minimized.

Copy link

commented Aug 8, 2019

Only my personal experience, but if there aren't too many links per page, as there aren't on my site, I adopt the following approach:

  1. Attempt to open all links
  2. If the linked page still exists, link to it and save it to Wayback Machine.
  3. If it does not exist, or has been squatted, search Wayback Machine.
  4. If I can find a version on Wayback Machine (which sometimes saves squatted pages), use that.
  5. If I cannot find a version on Wayback Machine, remove the hyperlink and add a footnote saying that the site is dead.

Sometimes, for large sites, like newspaper sites, they have completely changed their permalink scheme. I may try to search for a new permalink if I saved enough information to make that possible.

This may also explain why I am so slow at importing my old posts.

@Zegnat

This comment has been minimized.

Copy link
Owner Author

commented Aug 8, 2019

@jeremycherfas:

[…] if there aren't too many links per page […]

Some pages are pretty light on external links, like the GMPG History. Others consist mostly of external links, like XFN: Tools and XFN: What's Out There?

Though I do think that manually doing the steps you propose may lead to the best quality page we can offer.

@Lewiscowles1986

This comment has been minimized.

Copy link

commented Aug 8, 2019

I was not aware of Thomas Beekdal, but thanks. I was more thinking that most 404 pages are unhelpful. It's fantastic when they are a little more helpful (using window.location.href is a way)

For GitHub pages, it may be the only way along with yet more JS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.