Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robots.txt is not valid- Lighthouse was unable to download a robots.txt file #12936

Closed
2 tasks done
owaishassan72 opened this issue Aug 18, 2021 · 8 comments
Closed
2 tasks done
Assignees
Labels
bug needs-priority PSI/LR PageSpeed Insights and Lightrider

Comments

@owaishassan72
Copy link

FAQ

URL

zomusk.com

What happened?

I was running Lighthouse on my website and saw the error that "robots.txt in invalid" even though it is NOT.
I have tested my robots.txt file and it is fine and as per google's expectations.

What did you expect?

I expected to not face this issue as there is no issue with my robots.txt file.
image
image

What have you tried?

No response

How were you running Lighthouse?

Chrome DevTools

Lighthouse Version

7.5.0

Chrome Version

No response

Node Version

No response

Relevant log output

No response

@asbhogal
Copy link

Hi @owaishassan72, I've just ran the Lighthouse PSI from my end using v8.0 and it confirms that your robots.txt is valid (please see screenshot.) From this, I assume that it may be an issue with the version of Lighthouse you're using. Maybe try updating to v8.0 and giving it a try and see if it recognises it on your end.

Hope this helps,
Aman
Valid robots txt - Screenshot

@AleksandrHovhannisyan
Copy link

I'm seeing this on web.dev/measure as well: https://lighthouse-dot-webdotdevsite.appspot.com//lh/html?url=https%3A%2F%2Fwww.aleksandrhovhannisyan.com%2F

image

@asbhogal
Copy link

asbhogal commented Aug 31, 2021

Hi Aleksandr,

I've ran your site through the DevTools Lighthouse (v7.5) and it detects your robots.txt fine:

Lightouse - www aleksandrhovhannisyan com

I've also ran it through the PSI Extension (v.8.0) and I'm getting the same message you are. For your case it seems the other way round - not sure why the PSI isn't detecting your robots.txt as the syntax looks correct to me. I've also checked it with Ryte's scanner and it crawls it fine. You can also check it with Google's robots.txt scanner too and other site scanners to be sure. Hopefully Google can diagnose what the issue is @connorjclark

Hope this helps,
Aman

@brendankenny
Copy link
Member

Yes, this looks like a new PSI issue. From HTTPArchive/httparchive.org#422, https://httparchive.org/ is also failing. Confirmed with the direct API request to 'https://pagespeedonline.googleapis.com/pagespeedonline/v5/runPagespeed?url=https%3A%2F%2Fhttparchive.org%2F&category=SEO'.

PSI updated to Chrome 92 yesterday, so this is hitting a new path (for PSI) in the RobotsTxt gatherer:

const {milestone} = await getBrowserVersion(passContext.driver.defaultSession);
// TODO: Remove when 92 hits stable.
// Iframe fetcher still has issues with CSPs.
// Only use the fetcher if we are fetching over the CDP.
if (milestone < 92) {
return passContext.driver.executionContext.evaluate(getRobotsTxtContent, {
args: [],
useIsolation: true,
});
}
const robotsUrl = new URL('/robots.txt', passContext.url).href;
await passContext.driver.fetcher.enable();
return passContext.driver.fetcher.fetchResource(robotsUrl)
.catch(() => ({status: null, content: null}));

@mercadeowebcr
Copy link

Started seeing this error since yesterday. It now disappeared on one site https://geek.pontik.com but still appears in the sub domain https://radio.pontik.com and on the main site https://pontik.com. These are in separate servers.

@brendankenny brendankenny added the PSI/LR PageSpeed Insights and Lightrider label Sep 1, 2021
@brendankenny
Copy link
Member

This should be fixed for everyone now. Keeping open to track a permanent fix.

@nzsteve
Copy link

nzsteve commented Nov 1, 2021

I am still seeing this as an issue with all WordPress Multisite installs even though there is a robots.txt file that can be viewed in the browser and no issues seen with Google robots testing tool.

Example content of robots.txt for https://bloxhams.co.nz/ and https://bloxhams.co.nz/robots.txt
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Screenshot from Chrome 95 Lighthouse 8.4
lighthouse issue no robots

@connorjclark
Copy link
Collaborator

connorjclark commented Nov 1, 2021

I got the same using latest (master) LH, but only for the first attempt. Subsequent runs fetched the robots.txt successfully.

If you can consistently repro with the LH node CLI, please run with lighthouse https://bloxhams.co.nz/ --view -GA and provide the value for RobotsTxt in latest_run/artifacts.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-priority PSI/LR PageSpeed Insights and Lightrider
Projects
None yet
Development

No branches or pull requests

9 participants