Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing Google crawlers #85

Closed
alaz opened this issue Aug 3, 2022 · 4 comments · Fixed by #86
Closed

Add missing Google crawlers #85

alaz opened this issue Aug 3, 2022 · 4 comments · Fixed by #86

Comments

@alaz
Copy link
Owner

alaz commented Aug 3, 2022

List of the crawlers

@inspire22
Copy link

I'm getting googlebot blocked quite a bit in my rack-attack logs using legitbot, it's probably because some IPs are missing?
95.216.227.158
95.216.33.117

Is it possible to automate the process of adding new IPs using the host command like they suggest here?
https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot

I actually had more IPs in there but calling 'host' on them I realized they were actually fake & I'd just randomly had the first ones I tested be actually googlebot.

@alaz
Copy link
Owner Author

alaz commented Sep 30, 2022

@inspire22 Legitbot follows the exact verification procedure you linked to, only programmatically. Did you try to follow the steps? These IPs do not pass for me:

$ host 95.216.227.158
158.227.216.95.in-addr.arpa domain name pointer crawl-95-216-227-158.googlebot.com.
$ host crawl-95-216-227-158.googlebot.com
Host crawl-95-216-227-158.googlebot.com not found: 3(NXDOMAIN)

$ host 95.216.33.117
117.33.216.95.in-addr.arpa domain name pointer crawl-95-216-33-117.googlebot.com.
$ host crawl-95-216-33-117.googlebot.com
Host crawl-95-216-33-117.googlebot.com not found: 3(NXDOMAIN)

@inspire22
Copy link

Oops, you're right, thanks! Strange they would match the first step and not the second.

I'd mistaken your TODO to add crawlers for adding crawler IPs, which is why I jumped on here. My bad and apologies :)

@alaz
Copy link
Owner Author

alaz commented Sep 30, 2022

By the way, I don't think these IPs belong to Google. Both of them are owned by Hetzner (a well known European hosting provider):

$ whois 95.216.227.158
…
route:          95.216.0.0/16
org:            ORG-HOA1-RIPE
descr:          HETZNER-DC
…

$ whois 95.216.33.117
…
route:          95.216.0.0/16
org:            ORG-HOA1-RIPE
descr:          HETZNER-DC

Strange they would match the first step and not the second.

Someone managed to convince Hetzner to create these reverse DNS records (I am surprised). Faking corresponding forward records is close to impossible, as Google itself controls the zone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants