Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "nofollow" rel attribute to links in facets/filters in PUI #1797

Merged
merged 1 commit into from
Feb 26, 2020
Merged

Add "nofollow" rel attribute to links in facets/filters in PUI #1797

merged 1 commit into from
Feb 26, 2020

Conversation

andrew-morrison
Copy link
Contributor

This change asks search engine crawlers not to follow the links in facets. Such filtered search pages are not information sources themselves, and do not have distinct titles, so I don't think it is desirable for them to appear in search results.

Description

Soon after we launched, we noticed Amazonbot was the first crawler to hit our public user interface. It honours robots.txt instructions but does not support sitemaps, so it just spiders through all the links in the HTML. Unfortunately, that means it spent most of its time trying every combination of facets in the collections list, until it reached some kind of preset limit and shut down. Instead of throwing wave after wave of search results pages at it, this change allows it to crawl to every collection (and subject, agent, etc) by following the pagination links, but not waste time following the links in the filters.

Googlebot does not support the rel attribute for internal links, only links out to other web sites, but seems to be smart enough not to follow facet links anyway (or maybe it is because we submit sitemaps via the Google Search Console.) Bingbot does support it. So this is not a definitive solution. But stopping just a few crawlers will reduce some load.

Note that if pull requests #1778 and #1792 are approved, they will increase the number of potential permutations of facets that can be applied.

Related JIRA Ticket or GitHub Issue

N/A

How Has This Been Tested?

This change has been on our production system as part of a local plug-in for six months, with no issues.

Screenshots (if appropriate):

N/A

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have read the CONTRIBUTING document.
  • I have authority to submit this code.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@lmcglohon
Copy link
Contributor

@andrew-morrison thanks for this! We are currently in a code freeze because of a release candidate going out today. Once the release is out, a reviewer will be assigned so it will continue to move forward.

@lorawoodford lorawoodford added this to the 2.8.0 milestone Feb 18, 2020
@sdm7g sdm7g merged commit aa91f8c into archivesspace:master Feb 26, 2020
@cdibella cdibella added the community code contributed by community members not on or contracted by the ArchivesSpace program team label Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community code contributed by community members not on or contracted by the ArchivesSpace program team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants