Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolv issues with googlebot sometimes #63

Closed
inspire22 opened this issue Feb 28, 2022 · 7 comments
Closed

Resolv issues with googlebot sometimes #63

inspire22 opened this issue Feb 28, 2022 · 7 comments
Labels

Comments

@inspire22
Copy link

inspire22 commented Feb 28, 2022

Thanks again for this project :)

I've been getting this sometimes now:
DNS result has no information for crawl-95-216-33-117.googlebot.com"

I can rescue nil inside the rack_attack Legitbot.bot call, but would love to solve the actual problem as well.

It's strange that it says "no information" but then clearly has resolved it to crawl-95-216-33-117.googlebot.com

Hmm, maybe reverse-dns is working to get the address, but then it's not able to ping it?

The IP reported in my error logs is in fact 95.216.33.117


/usr/local/rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/resolv.rb:379:in `getaddress'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/validators/domains.rb:66:in `reverse_ip'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/validators/domains.rb:48:in `valid_domain?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/validators/domains.rb:22:in `valid_domain?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/botmatch.rb:25:in `valid?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/legitbot-1.5.1/lib/legitbot/botmatch.rb:29:in `fake?'
/u/apps/ap.next/current/config/initializers/rack_attack.rb:16:in `block in '
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/check.rb:15:in `matched_by?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/configuration.rb:72:in `block in blocklisted?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/configuration.rb:72:in `any?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack/configuration.rb:72:in `blocklisted?'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-attack-6.5.0/lib/rack/attack.rb:107:in `call'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/newrelic_rpm-8.3.0/lib/new_relic/agent/instrumentation/middleware_tracing.rb:100:in `call'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/rack-2.2.3/lib/rack/tempfile_reaper.rb:15:in `call'
/u/apps/ap.next/shared/bundle/ruby/2.7.0/gems/newrelic_rpm-8.3.0/lib/new_relic/agent/instrumentation/middleware_tracing.rb:100:in `call'
@alaz
Copy link
Owner

alaz commented Mar 1, 2022

Hi,

My guess is that is is an inconsistency between the forward and the reverse DNS records. Most likely Google dynamically and frequently updates the crawler's DNS records. So my theory is that sometimes the forward resolves correctly, but the reverse returns NXDOMAIN because it was not in the cache and hence the result is more up-to-date.

Does it make sense to you?

@inspire22
Copy link
Author

Interesting. I have only a loose understanding of dns but I'd thought it'd propagate up to a master copy if it can't find a record?

Any tips how to solve it best in a rack-attack block? Or should it be fine to just rescue the exception?

@alaz
Copy link
Owner

alaz commented Mar 1, 2022

I'd thought it'd propagate up to a master copy if it can't find a record?

Yes. Local / closest DNS servers cache records. Consider your closest DNS server has a forward record in its cache and returns it. But it has to go further in the chain to get the result for a reverse query and gets NXDOMAIN in return, because it does not exist anymore.

@inspire22
Copy link
Author

Does rescuing the exception seem like a decent plan, or does it open up other bots? Or should this be something rescued in the gem itself.

@alaz
Copy link
Owner

alaz commented Mar 2, 2022

Yes, rescuing the exception in a calling code might open the door for other bots. It should be rescued in the same way as for forward resolv:

def reverse_domains(ip)
resolver.getnames(ip)
rescue Resolv::ResolvError
nil
end

@alaz alaz closed this as completed in d2128ab Mar 2, 2022
@alaz alaz added the bug label Mar 2, 2022
@alaz
Copy link
Owner

alaz commented Mar 2, 2022

Published as 1.6.0

@inspire22
Copy link
Author

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants