respect greater of delay option or robots.txt crawl delay... #56

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
@operand

operand commented Jun 5, 2012

... when obeying robots.txt

I'd like to obey webmasters' wishes when crawling their sites. If they specify a 10 second delay and I specify only 5 and that I'd like to obey their robots.txt, 10 should be the delay.

In order to make this change I also had to make threads default to 1 whenever 'obey_robots_txt' was set. I just didn't see any quick way to allow this without otherwise significant changes. I figure if you want to obey robots.txt you probably are okay with erring on the side of caution and running a single thread.

Maybe not acceptable for some. Your call. Hope this helps. And thanks for Anemone! It rocks!

@moezzie

This comment has been minimized.

Show comment Hide comment
@moezzie

moezzie Jul 16, 2012

Great initiative.

The single thread approach feels a little awkward, but there doesn't seem to be a whole lot of other options.

Keep up the good work. :)

moezzie commented Jul 16, 2012

Great initiative.

The single thread approach feels a little awkward, but there doesn't seem to be a whole lot of other options.

Keep up the good work. :)

@operand

This comment has been minimized.

Show comment Hide comment
@operand

operand Jul 12, 2017

Closing this as this project and/or PR seem to be dormant.

operand commented Jul 12, 2017

Closing this as this project and/or PR seem to be dormant.

@operand operand closed this Jul 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment