Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

add options to constrain page and link queues by size #62

Open
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants

gnapse commented Jul 24, 2012

This feature can provide a means to lower the memory usage of the crawler under certain circumstances.

add options to constrain page and link queues by size
This feature can provide a means to lower the memory usage of the crawler under certain circumstances.

If you use both of these settings (example page queue: 100, link_queue: 1000) it's trivial to reproduce a deadlock. I'm assuming that's because there's a tentacle that is processing a page, but can't add more links to the queue, thus it's locked.

Owner

gnapse replied Oct 8, 2012

Yes, you're probably right. This feature would need more thought to be usable without these risks that you mention. Thanks for pointing that out. Sadly, I don't have time these days to tackle these issues, but hopefully somebody else can take a shot at it.

I had some success with writing these leehambley/ruby-persistent-queue-classes and I've made some changes in my fork of Anemone to allow you to pass the queue to use for the page and link queues into Anemone::Core#run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment