New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lighthouse on web.dev fails to download robots.txt due to CSP violation #13394
Comments
The in-page legacy robots.txt fetcher is used for Lightrider, but it can get blocked by CSPs. When running Lighthouse in DevTools/CLI, we use the DevTools protocol to fetch the robots.txt. Unfortunately, the new protocol fetcher fails most (100%?) of the time in Lightrider (#12936). Relevant fork: lighthouse/lighthouse-core/gather/fetcher.js Lines 95 to 106 in fe3daf0
Have there been any updates on using the protocol fetcher since #12936? |
b/199294474 |
Update: https://chromium-review.googlesource.com/c/chromium/src/+/3291970 just landed, but we need to wait for it to hit stable channel before using in LR. That is ~Feb 1. |
PSI is using M98 now, are we able to remove the legacy iframe fetcher for this case now? |
This is still an issue on Page Speed Insights. |
Above I mentioned that it will be at least until February 1st before this can be fixed. |
That comment is dated 2021, did you mean Feb 2023 at that time? |
Sorry @davidmurdoch, I was a bit hasty in my reply and overlooked the dates here. We've got a fix for this in a staging environment (this work was done Sept 2022), so this should be resolved with our 10.0 release to PSI within the coming weeks. |
As of today (27 August 2023) it seems the issue that Lighthouse being unable to access robots.txt has not been properly fixed. However, the workaround is very easy. In your Content-Security-Policy header, you can include the following: connect-src 'self'; After that use : If there no errors Good luck! |
Hi Santosh,
Great news. Thank you for the quick fix.
Just to add a note for folks that might retrieve this thread in the future - I
don't think suggestions for workarounds that reduce the security of the site
are warranted. It seems perverse that a tool like lighthouse would force you
to reduce the security profile of your site for the tool to work properly.
…--
JP
On Sunday, August 27, 2023 1:34:03 PM EDT Santosh Pandit wrote:
As of today (27 August 2023) it seems the issue that Lighthouse being unable
to access robots.txt has not been properly fixed. However, the workaround
is very easy. In your Content-Security-Policy header, you can include the
following:
connect-src 'self';
After that use :
nginx -t
If there no errors
systemctl restart nginx
Good luck!
|
@beatquantum can you provide a site that reproduces this problem? |
@adamraine - I have included the "connect-src 'self' as the workaround for https://hard.email without which Lighthouse did not find "robots.txt". If you wish, we can coordinate and I can remove "connect-src" and share with you Nginx log entries. Happy to help to make this work. (JP - You are right. I wish I did not have to create this workaround. But the developers seem to be a nice bunch.) |
If you updated https://hard.email/ with a workaround can you please share a version of this site that reproduces this problem? There isn't much we can do without a repro case. I can't reproduce this problem on other sites that use |
FAQ
URL
https://ettyproject.org/
What happened?
When using https://web.dev/measure I receive an error message under Best Practices:
This does not occur with LightHouse in Chrome DevTools.
The referenced website does not have any JavaScript. Normally loading in a browser does not show any CSP violation either in the browser console or on the server. The robots.txt file can be download as you can easily test.
What did you expect?
Able to download the robots.txt file.
What have you tried?
How were you running Lighthouse?
WebPageTest
Lighthouse Version
9.0.0
Chrome Version
96
Node Version
No response
OS
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: