Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
respect greater of delay option or robots.txt crawl delay... #56
... when obeying robots.txt
I'd like to obey webmasters' wishes when crawling their sites. If they specify a 10 second delay and I specify only 5 and that I'd like to obey their robots.txt, 10 should be the delay.
In order to make this change I also had to make threads default to 1 whenever 'obey_robots_txt' was set. I just didn't see any quick way to allow this without otherwise significant changes. I figure if you want to obey robots.txt you probably are okay with erring on the side of caution and running a single thread.
Maybe not acceptable for some. Your call. Hope this helps. And thanks for Anemone! It rocks!