Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse unable to download robots.txt #10225

Closed
Krinkle opened this issue Jan 11, 2020 · 5 comments · Fixed by #12423
Closed

Lighthouse unable to download robots.txt #10225

Krinkle opened this issue Jan 11, 2020 · 5 comments · Fixed by #12423

Comments

@Krinkle
Copy link

Krinkle commented Jan 11, 2020

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://lighthouse-dot-webdotdevsite.appspot.com//lh/html?url=https://www.wikipedia.org/#seo

Describe the bug

I'm getting a lowered SEO score when measuring via https://web.dev, with the following issue reported:

SEO: 90
robots.txt is not valid: Lighthouse was unable to download a robots.txt file

However, https://www.wikipedia.org/robots.txt responds without issue.

Perhaps it is getting blocked by something in the middleware, e.g. an Inspector rule of some kind in the way that web.dev/PSI configure Lighthouse? See also #10198 which might be similar.

@robdodson robdodson transferred this issue from GoogleChrome/web.dev Jan 13, 2020
@robdodson robdodson added the PSI/LR PageSpeed Insights and Lightrider label Jan 13, 2020
@exterkamp
Copy link
Member

exterkamp commented Jan 14, 2020

I can replicate this in DevTools and CLI. So not a PSI/LR bug at least.

Edit: At a cursory glance it might be because https://wikipedia.org/robots.txt is returning 301's?

I can audit https://en.wikipedia.org and the robots.txt is found. We don't follow redirect/moved chains in following robots.txt. Possible feature?

@exterkamp exterkamp removed the PSI/LR PageSpeed Insights and Lightrider label Jan 14, 2020
@exterkamp exterkamp changed the title Lighthouse via web.dev unable to download robots.txt Lighthouse unable to download robots.txt Jan 14, 2020
@connorjclark
Copy link
Collaborator

strange, redirect for fetch options is set to follow by default, so it should follow redirects.

https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/fetch

@churchthecat
Copy link

I have the same issue: robots.txt is not valid Lighthouse was unable to download a robots.txt file
https://tcckonsult.com/robots.txt

@churchthecat
Copy link

Found the issue for me, CSP policy was set to Header set Content-Security-Policy "default-src 'none'; changing it to Header set Content-Security-Policy "default-src 'self'; fixed it.

@martinburger
Copy link

Found the issue for me, CSP policy was set to Header set Content-Security-Policy "default-src 'none'; changing it to Header set Content-Security-Policy "default-src 'self'; fixed it.

Mozilla Observatory suggests that you deny by default, using default-src 'none' instead of default-src 'self'. Thus, you might following the suggestion at this answer at StackOverflow and use connect-src 'self' instead.

That way, you ensure a high(er) score at Mozilla Observatory and you have a passable workaround.

BTW, this seems to be related with issue #4386.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants