Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Send a 'canonical' link header in non-canonical responses #16113

Conversation

rr-it
Copy link
Contributor

@rr-it rr-it commented Mar 6, 2022

The html response body already has a canonical link-tag.

This might save crawler resources for non-canonical pages:
If the crawler trusts the additional canonical link response header, it does not need to parse/handle the html response body.

The html response *body* already has a canonical link-tag.

This might save crawler resources for non-canonical pages:
If the crawler trusts the additional canonical link response *header*, it does not need to parse/handle the html response *body*.
@rr-it
Copy link
Contributor Author

rr-it commented Mar 6, 2022

See https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls?hl=en#rel-canonical-header-method

If you can configure your server, you can use a rel="canonical" HTTP header (rather than an HTML tag) to indicate the canonical URL for a document supported by Search, including non-HTML documents such as PDF files.

  • 👍 We can configure our server.
  • Does use a rel="canonical" HTTP header rather than an HTML tag emphasise a preference for the HTTP header solution?

From #11553

Googlebot handles no-index headers very elegantly. It advises to leave as many routes as possible open and uses headers for high fidelity rules regarding indexes.

Maybe Google handles canonical link headers equaly elegantly to no-index headers.

@discoursebot
Copy link

This pull request has been mentioned on Discourse Meta. There might be relevant details there:

https://meta.discourse.org/t/send-canonical-link-header-instead-of-noindex-header/220213/1

Copy link
Member

@xfalcox xfalcox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't makes sense to rollback the noindex feature like this.

@xfalcox xfalcox closed this Mar 8, 2022
@rr-it
Copy link
Contributor Author

rr-it commented Mar 8, 2022

noindex is not rolled back by this change.

With and without patch and the new default SiteSetting.allow_indexing_non_canonical_urls = false

  • header noindex
  • html link-tag canonical (might be ignored)

Without patch and SiteSetting.allow_indexing_non_canonical_urls = true

  • – no header –
  • html link-tag canonical

With patch and SiteSetting.allow_indexing_non_canonical_urls = true

  • header: Link: <https://forum.example.com/t/test-example/1234>; rel="canonical"
  • html link-tag canonical (might be ignored - but anyway same as header)

@discoursebot
Copy link

This pull request has been mentioned on Discourse Meta. There might be relevant details there:

https://meta.discourse.org/t/send-canonical-link-header-instead-of-noindex-header/220213/6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants