Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse on web.dev fails to download robots.txt due to CSP violation #13394

Closed
2 tasks done
jplejacq-quoininc-com opened this issue Nov 18, 2021 · 13 comments
Closed
2 tasks done
Assignees
Labels
bug P1.5 PSI/LR PageSpeed Insights and Lightrider

Comments

@jplejacq-quoininc-com
Copy link

FAQ

URL

https://ettyproject.org/

What happened?

When using https://web.dev/measure I receive an error message under Best Practices:

Issues logged to the `Issues` panel in Chrome Devtools indicate unresolved problems. They can come from network request failures, insufficient security controls, and other browser concerns. Open up the Issues panel in Chrome DevTools for more details on each issue.

Issue type
--
Content security policy
/robots.txt(ettyproject.org)

This does not occur with LightHouse in Chrome DevTools.

The referenced website does not have any JavaScript. Normally loading in a browser does not show any CSP violation either in the browser console or on the server. The robots.txt file can be download as you can easily test.

What did you expect?

Able to download the robots.txt file.

What have you tried?

  • Downloaded robots.txt with curl and chrome. No problems.
  • No CSP violations reported in Chrome console.
  • No CSP violations reported on server.
  • Checked for other related bugs - see issues 4386 and 10225.

How were you running Lighthouse?

WebPageTest

Lighthouse Version

9.0.0

Chrome Version

96

Node Version

No response

OS

Linux

Relevant log output

Issues logged to the `Issues` panel in Chrome Devtools indicate unresolved problems. They can come from network request failures, insufficient security controls, and other browser concerns. Open up the Issues panel in Chrome DevTools for more details on each issue.

Issue type
--
Content security policy
/robots.txt(ettyproject.org)
@adamraine
Copy link
Member

The in-page legacy robots.txt fetcher is used for Lightrider, but it can get blocked by CSPs. When running Lighthouse in DevTools/CLI, we use the DevTools protocol to fetch the robots.txt.

Unfortunately, the new protocol fetcher fails most (100%?) of the time in Lightrider (#12936).

Relevant fork:

/**
* `Network.loadNetworkResource` was introduced in M88.
* The long timeout bug with `IO.read` was fixed in M92:
* https://bugs.chromium.org/p/chromium/issues/detail?id=1191757
* Lightrider has a bug forcing us to use the old version for now:
* https://docs.google.com/document/d/1V-DxgsOFMPxUuFrdGPQpyiCqSljvgNlOqXCtqDtd0b8/edit?usp=sharing&resourcekey=0-aIaIqcHFKG-0dX4MAudBEw
* @return {Promise<boolean>}
*/
async shouldUseLegacyFetcher() {
const {milestone} = await getBrowserVersion(this.session);
return milestone < 92 || Boolean(global.isLightrider);
}

Have there been any updates on using the protocol fetcher since #12936?

@adamraine adamraine added PSI/LR PageSpeed Insights and Lightrider P1.5 and removed needs-priority labels Nov 18, 2021
@connorjclark
Copy link
Collaborator

b/199294474

@connorjclark
Copy link
Collaborator

connorjclark commented Nov 20, 2021

Update: https://chromium-review.googlesource.com/c/chromium/src/+/3291970 just landed, but we need to wait for it to hit stable channel before using in LR. That is ~Feb 1.

@adamraine
Copy link
Member

PSI is using M98 now, are we able to remove the legacy iframe fetcher for this case now?

@davidmurdoch
Copy link

This is still an issue on Page Speed Insights.

@connorjclark
Copy link
Collaborator

Above I mentioned that it will be at least until February 1st before this can be fixed.

@davidmurdoch
Copy link

That comment is dated 2021, did you mean Feb 2023 at that time?

@connorjclark
Copy link
Collaborator

connorjclark commented Jan 31, 2023

Sorry @davidmurdoch, I was a bit hasty in my reply and overlooked the dates here. We've got a fix for this in a staging environment (this work was done Sept 2022), so this should be resolved with our 10.0 release to PSI within the coming weeks.

@beatquantum
Copy link

As of today (27 August 2023) it seems the issue that Lighthouse being unable to access robots.txt has not been properly fixed. However, the workaround is very easy. In your Content-Security-Policy header, you can include the following:

connect-src 'self';

After that use :
nginx -t

If there no errors
systemctl restart nginx

Good luck!

@jplejacq-quoininc-com
Copy link
Author

jplejacq-quoininc-com commented Aug 28, 2023 via email

@adamraine
Copy link
Member

@beatquantum can you provide a site that reproduces this problem?

@beatquantum
Copy link

@adamraine - I have included the "connect-src 'self' as the workaround for https://hard.email without which Lighthouse did not find "robots.txt".

If you wish, we can coordinate and I can remove "connect-src" and share with you Nginx log entries.

Happy to help to make this work.

(JP - You are right. I wish I did not have to create this workaround. But the developers seem to be a nice bunch.)

@adamraine
Copy link
Member

If you updated https://hard.email/ with a workaround can you please share a version of this site that reproduces this problem? There isn't much we can do without a repro case.

I can't reproduce this problem on other sites that use connect-src 'none' in PSI or Lighthouse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug P1.5 PSI/LR PageSpeed Insights and Lightrider
Projects
None yet
Development

No branches or pull requests

6 participants