Site Crawler incorrectly skips links when robots contains noindex #2989
Labels
Priority: Low
Affects a small number of Rock installations and will not be noticed by most users.
Status: Confirmed
It's clear what the subject of the issue is about, and what the resolution should be.
Topic: Rock Internals
Related to internal core stuff.
Type: Bug
Confirmed bugs or reports that are very likely to be bugs.
x-Fixed in v8.0
Milestone
Prerequisites
Description
The Rock Site Crawler that came with Universal Search will not follow links if the robots
noindex
option is specified. This is incorrect, links should be followed unless the robots meta specifies thenofollow
option, or the link itself has arel="nofollow"
option.I can submit a PR for this.
Suggested Action
Add support for the
nofollow
flag in the robots meta tag. At the same time update theParseLinks
method to check forrel="nofollow"
in the link and if found skip that individual link.Expected behavior:
I want to build a page that has links to other pages that should be indexed but would not normally be found during a site crawl (example, event pages whose links only show up after clicking a PostBack button, which cannot be indexed).
Additionally, there are a few pages on the site that we don't want indexed because they are little more than menu/link-only pages.
Actual behavior:
These link-only pages are indexed because I cannot have the crawler follow links but not index the page itself.
Versions
The text was updated successfully, but these errors were encountered: