-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolv issues with googlebot sometimes #63
Comments
Hi, My guess is that is is an inconsistency between the forward and the reverse DNS records. Most likely Google dynamically and frequently updates the crawler's DNS records. So my theory is that sometimes the forward resolves correctly, but the reverse returns NXDOMAIN because it was not in the cache and hence the result is more up-to-date. Does it make sense to you? |
Interesting. I have only a loose understanding of dns but I'd thought it'd propagate up to a master copy if it can't find a record? Any tips how to solve it best in a rack-attack block? Or should it be fine to just rescue the exception? |
Yes. Local / closest DNS servers cache records. Consider your closest DNS server has a forward record in its cache and returns it. But it has to go further in the chain to get the result for a reverse query and gets NXDOMAIN in return, because it does not exist anymore. |
Does rescuing the exception seem like a decent plan, or does it open up other bots? Or should this be something rescued in the gem itself. |
Yes, rescuing the exception in a calling code might open the door for other bots. It should be rescued in the same way as for forward resolv: legitbot/lib/legitbot/validators/domains.rb Lines 51 to 55 in 4a98be8
|
Published as 1.6.0 |
Awesome, thanks! |
Thanks again for this project :)
I've been getting this sometimes now:
DNS result has no information for crawl-95-216-33-117.googlebot.com"
I can rescue nil inside the rack_attack Legitbot.bot call, but would love to solve the actual problem as well.
It's strange that it says "no information" but then clearly has resolved it to crawl-95-216-33-117.googlebot.com
Hmm, maybe reverse-dns is working to get the address, but then it's not able to ping it?
The IP reported in my error logs is in fact 95.216.33.117
The text was updated successfully, but these errors were encountered: