-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawler ignores rel="" and robots.txt (broken links checker shows thousands of broken links) #2813
Comments
You can add the |
This is the desired behaviour. The broken link checker does not care about |
Why are they marked as broken, they should be valid links that show a 200 response IMO. |
Yeah, you would need to analyse the reason. I suspect that the respective sites are may be blocking the crawler due to the number of requests or may be because of the User-Agent. |
LinkedIn for example returns a 999 on crawl, that's why @Toflar added the skip broken link checker attribute. |
Oh nice, let's break the web. 🤦 |
LinkedIn 🤷♂️ |
May be the broken link checker should only check for |
|
Yeah, you would likely want to know if something is |
Or just a |
There's no point in discussing that. We won't implement workarounds for companies that think it's a good way to invent their own HTTP status codes. It's been a standard for what, the better part of 30 years? |
Just ignore all above |
If the response code is not |
LinkedIn may be uses the "unofficial" |
Affected version(s)
Contao 4.9
Description
The social sharing links out of the box have
rel="nofollow"
, and just to make sure I also addedDisallow: /_contao/
to the robots.txt. Still the Crawler follows the links, and marks the share-links as broken. Leads to this result right now:The website has about 520 URLs, including news and events.
The text was updated successfully, but these errors were encountered: