Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chasing dead links #690

Closed
tlienart opened this issue Mar 27, 2020 · 17 comments · Fixed by #695
Closed

Chasing dead links #690

tlienart opened this issue Mar 27, 2020 · 17 comments · Fixed by #695

Comments

@tlienart
Copy link
Sponsor Contributor

tlienart commented Mar 27, 2020

Right so I ran a script to check all pages, it's not perfect but it should help us find quite a few of the deadlinks, list below. Maybe that people could take responsibility in fixing things and pinging me in the PR so that I can mark it as done here.

There's a bunch of dead meetups etc that should probably just be removed

@ Helpers: there's two things you can do:

  • do you recognise a link? do you know a fix? --> open a PR with the fix
  • page is dead? just remove the link

Edit:

  • fixed a bunch of issues due to use of https instead of http
  • removed from the list github links that errorred with 429 (too many requests), some may still be broken but most are fine
  • removed 999 (linkedin)
  • pruned from 600 to 100, yay!

Script

(skip this, it's only for maintenance purpose, I might eventually put this in the README)

It uses blc and is not pretty but it kind of does an ok job:

# modify this to wherever your npm installs stuff
const BLC = "/usr/local/Cellar/node/11.10.1/lib/node_modules/broken-link-checker/bin/blc"
@assert success(`$BLC -V`)
function check_page(url)
    open("tempf", "w") do outf
        redirect_stdout(outf) do
            try run(`$BLC $url`); catch; end # sometimes BLC does weird stuff
        end
    end
    output = readlines("tempf")
    rm("tempf")
    for line in output
        startswith(line, "├─BROKEN─") || continue
        tmp = replace(line, "├─BROKEN─ " => "")
        println("  * [ ] $tmp")
    end
end
# modify this to your local version of the site, assumes you've built it.
const BASE_PATH = "/Users/tlienart/Desktop/www.julialang.org/__site"
for (root, _, files) in walkdir(BASE_PATH)
    for file in files
        file == "index.html" || continue
        fp = replace(joinpath(root, file), BASE_PATH => "")
        fp = "https://julialang.org" * fp
        println("* [ ] $fp")
        check_page(fp)
    end
end

List of dead links

Notes:

  • the BLC error codes are indicated, 404 is the most obvious one; there are some others which are due to the robot being kicked so we might have to do this manually.
  • BLC_UNKNOWN are usually related to a https instead of http

Errors

Chunk 1 (checked)

Chunk 2 (checked)

Chunk 4

Chunk 6

Chunk 8

Chunk 9

Chunk 10

Chunk 11

Chunk 14

Chunk 15

@tlienart
Copy link
Sponsor Contributor Author

I realise a bunch of these BLC_UNKNOWN errors are due to me adding https: instead of http: massively. E.g.:

http://www-math.mit.edu/~edelman/ vs https://www-math.mit.edu/~edelman/

@ViralBShah
Copy link
Member

For that particular one, the right link is https://math.mit.edu/~edelman/

@tlienart
Copy link
Sponsor Contributor Author

oh man this is hard work...

@tlienart
Copy link
Sponsor Contributor Author

alright did about half, many of which were due to the HTTPS. Another low hanging fruit are all the github ones and error 429...

@ViralBShah
Copy link
Member

Thank you for this tireless effort!

@ViralBShah
Copy link
Member

The github URLs all appear valid. What's going on?

@ViralBShah
Copy link
Member

We should announce on #website and #general to look for help, but perhaps remove the 429 and 999 URLs from this list?

@tlienart
Copy link
Sponsor Contributor Author

Yeah I think GitHub may have a strict robot, I’ll do another pass to do more pruning.

One thing that would be good is advice for dead pages, do we just remove the link or do we replace it with an indication that there was a link that’s now dead?

@ViralBShah
Copy link
Member

Not sure if it is worth the effort to do more. I think we can just remove.

This was referenced Mar 28, 2020
@tlienart
Copy link
Sponsor Contributor Author

tlienart commented Mar 28, 2020

no no, not done yet...

@tlienart tlienart reopened this Mar 28, 2020
felixcremer added a commit to felixcremer/www.julialang.org that referenced this issue May 7, 2020
I changed the links to the v0.4 docs to be findable again. 
I also changed the link to Lindsey Kupers current website, but I am not sure, whether this is wanted, because this is technically the same link as in the old version.
ViralBShah pushed a commit that referenced this issue May 7, 2020
I changed the links to the v0.4 docs to be findable again. 
I also changed the link to Lindsey Kupers current website, but I am not sure, whether this is wanted, because this is technically the same link as in the old version.
@ViralBShah
Copy link
Member

Bump. Help appreciated here.

@tlienart
Copy link
Sponsor Contributor Author

I imagine a few of those are not relevant anymore, would be good to re-run the script (bottom of readme, I'm on my phone now so I can't do it but can try later)

@ashwani-rathee
Copy link
Sponsor

So all the errors are left to be worked on (after chunk 4),right??

@tlienart
Copy link
Sponsor Contributor Author

tlienart commented Oct 28, 2020

yes that's correct; this list might be a bit outdated now but basically the steps are:

  1. is the faulty link still there?
  2. yes --> is there an alternative link that works?
    1. yes --> change it in a PR (ideally do multiple links per PRs though maybe at most 20-30 to facilitate reviewing)
    2. no --> remove the link
  3. no --> reply here with a bunch of links that are irrelevant; I'll update the list

that's about it; then we should re-run the tool to see if there are any stray links in the mix.

@HarshCasper
Copy link

Hi @tlienart @ViralBShah

I would like to work on this Issue back again. Since this issue has been open for a very long time now, and major chunks of dead links are still left to be reviewed and fixed, I would like to take it up and clean it to give the Project a better shape.

I have recently learned about Julia and would like to contribute to Julia Open-Source in all ways possible. Kindly let me know if I can start working on it, or is there any other obligation I need to fulfil for the same.

I am looking to get started with contributing to Julia with this 😄

@ViralBShah
Copy link
Member

Thanks @HarshCasper. Just open PRs fixing the dead links. You can do so in batches. Don't mix any other changes into the PRs that are fixing the dead links - so that they can be quickly merged.

@ViralBShah
Copy link
Member

I think we just have to leave the external links as they are. It's impossible to play catch up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants