-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonical urls for deduplication of google results in rustdoc #9461
Comments
Do you know how search engines handle situations where pages go away or pages are just created? In theory old documentation could refer to a canonical location which no longer exists (if the module were removed), and new documentation could refer to canonical locations which do not yet exist (because they're newly added modules). Do you know of special attributes to handle these cases? |
If a module is removed, a 404 is correct. In theory it would be better to redirect them on renames but it's not going to be possible because it's not tracked. The point of a canonical URL is to say that the page is only a non-canonical version of another URL and shouldn't show up separate in search. When we eventually have supported versions, the newest release (or master) can be given as the canonical one so the older pages won't clutter search results but will be available via a drop-down menu. Of course, if the newer version does not have the module, you would have to omit stating it is the canonical URL - meaning you need to regenerate the old documentation every time you do the new ones. I don't think it's worth the complexity. |
FWIW I think we should only have documentation on the site for releases we still support. Until we get to 1.0, we can make an exception for the last 0.x snapshot :). |
When a module is removed, 404 is indeed correct, but just remember that that's not the end of the story, as I wrote recently about at http://chrismorgan.info/blog/github-links-case-study.html. What the Django docs do is worthwhile considering: https://docs.djangoproject.com/. It makes it easy to switch between versions and shows a warning banner for the development build suggesting you may want to look at the latest stable instead. They don't, however, have a banner reminding you "this isn't the latest stable version" for old versions, which continues to surprise me a little. I reckon old versions (though not before 1.0 after a while) should stay in existence but with a banner at the top indicating that this is an unsupported release, and docs for the latest version, X.Y, are available in such-and-such a place. Of course, these things become much more directly applicable once we get to 1.0 and beyond. @alexcrichton I guess in the no-longer-exists case you'd need to either implement something so that you can conveniently reprocess the old docs, or do a little bit of post-processing to fix the "errors". For the doesn't-yet-exist case, checking online or comparing crates (which sounds risky) would be the only real ways, I suppose. |
Triage: no change. |
Triage: no changes |
(sorry for the duplicate; moving relavant link here) This page from Google's help center [1] appears to suggest that they only use the canonical URL as a hint. While this doesn't explicitly say it, this seems to concur with behavior I've seen in the past where if the page pointed to by the canonical URL is a 404, Google simply uses the original URL (which I suspect is what we want in this case since it makes it easy: point canonical url's to the |
I would like to re-open discussion on this issue (@steveklabnik, I think you would be the one to ping). With https://docs.rs/ in place, I think all Google's former SEO representative has indicated that Google may disregard canonical links that result in 404 HTTP response codes. Since Google is by far the most used search engine, and it addressed this issue in a sensible way already, I personally take little issue with the possibility of 404 canonical links. Here's my logic:
|
No need to re-open anything 😄 It's an open issue.
That would be nice, but without some improvements to docs.rs, it's not feasible. There are several people who do extra things to make their docs nicer and explicitly don't want their docs hosted on docs.rs at all. |
Interesting, could you point me to examples of what those people do? |
I believe @briansmith, @retep998 , and @bluss are three of those people? |
I'm perfectly fine with my docs being hosted on docs.rs. I just haven't actually published a new winapi since docs.rs gained Windows support. There's a few features I'm waiting for like the ability to specify the default target to show docs for and which cargo features to enable. But once that's all set then I'd much rather use docs.rs than have to deal with rustdoc generating a hundred thousand files and then committing them to git and pushing them (which is a really slow process). There's a few copies of winapi documentation floating on the internet from other people's personal project documentation being published and I really wish they wouldn't exist because they interfere with search results. Sometimes I'll lookup some obscure windows function and the only results will be someone's rustdoc generated documentation that happens to include winapi. |
@sanmai-NL A crate needs to be compiled to generate its docs, and the dependencies might not be present on docs.rs's builders, nor is there yet any way to indicate what dependencies to use. My crates they should all have migrated their docs to docs.rs except ndarray. ndarray has lots of optional crate features and I want their items to be visible in the docs (and such items are marked in their doc string). It's not a big thing, but ndarray's docs are therefore technically superiour outside docs.rs. It also has blue boxes for example code, which is obviously nicer to the eye 😉 And by the way, here's a group of crates where an author has done an amazing job with non-docs.rs docs http://nalgebra.org/ |
Thanks for the comments. I deduce two extra issues. First, https://docs.rs should provide complete documentation and it should combine well with features and optional dependencies, and it seems not to. Secondly, sometimes https://docs.rs docs should not be the canonical variant anyway. IMO, it should be canonical by default, and this may be overridden with some configuration setting coming with the source tree. |
onur/docs.rs/pull/73 can fix some of these issues |
The optional configuration setting may be a string that is a URL to the canonical API docs. |
A while back, I filed https://github.com/onur/docs.rs/issues/74 to have docs.rs include the canonical link, and @onur committed at least one change towards making that happen. https://github.com/onur/docs.rs/issues/73 will help a lot with the current main concrete problem with doc.rs. In the meantime I added a note to my documentation: “IMPORTANT: If you are reading this on docs.rs or another third-party site, you may not be seeing the complete documentation due to their limitations. Read it at https://briansmith.org/rustdoc/ring/signature/ instead.” |
I see the problem that projects may want their official doc pages as the canonical page. Making docs.rs the canonical URL by default might give credit where no credit is due. A stopgap would be a noindex tag for dependencies (#41882). |
The docs.rs stuff feels like a separate discussion that could have its own issue. I think fixing the "every release version on doc.rust-lang.org shows up in Google search results" is a specific thing that's important to fix, and using |
After poking around the rustdoc sources a little bit I have a concrete proposal. rustdoc already supports several options on the We should add support for a It would be picked up and stored into the rust/src/librustdoc/html/render.rs Line 532 in a85417f
Callers of rust/src/librustdoc/html/layout.rs Line 25 in a85417f
This seems to mostly be useful for rust/src/librustdoc/html/render.rs Line 1478 in a85417f
which calls rust/src/librustdoc/html/render.rs Line 1410 in a85417f
which calls rust/src/librustdoc/html/format.rs Line 393 in a85417f
Then finally, |
Triage: there has been some small movement; by now, this issue is getting larger and larger, and is affecting more and more people. I hope to have a plan sometime in the near-ish future; we'l see. |
By the way, the issue with doc.rust-lang.org has been fixed, as we now have a |
I propose closing as a duplicate of rust-lang/docs.rs#1438. |
Closing as this seems fixed/taken on by docs.rs |
When multiple versions of the documentation are available, it tends to pollute google results. As a way to prevent that, it would be good to always have the latest stable release available under /current/, and have all previous versions + the master docs contain canonical links to the current docs like:
That way it consolidates all results under the current URL which will always be correct, and it also encourages people linking to docs in blog posts and such to use links that will not rot.
/cc @alexcrichton
The text was updated successfully, but these errors were encountered: