Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add support for crawling subdomains #27
MaGonglei: It is very simple to gather external links using Anemone, and comparably simple to actually check these links to verify they are valid, etc. The 'on_every_page' block is very helpful in this regard.
If you'd like some code that does exactly what you are asking, I could send an example your way.
Hi,wokkaflokka,thanks for your reply.
Because if I use the "on_every_page" block to search the external links (e.g. "page.doc.xpath '//a[@href]') ,it seemed cost too much CPU and Memorys.
If I'm wrong,give me the example.