New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addUrl添加的入口地址被Scheduler过滤后,下次无法启动 #501
Comments
或者有什么手段,我能够标识一下,指明http://www.haha.mx/rising 这个入口是不入Scheduler的? |
|
@code4craft 谢谢回复 |
你好,请问你最后是如何解决的?我遇到了和你相同的问题。谢谢 |
@TGhoul 就是自己写啊,我照着抄了一个scheduler, |
能否发一个demo给我看看吗?谢谢 |
` private void initDuplicateRemover() {
` 加个filterUrls属性,实例化时候设置,重复判断加上,没了. |
非常感谢! |
我添加了入口
Spider.create(HahaProcessor()) .addUrl("http://www.haha.mx/rising")
设置的FileCacheQueueScheduler的Scheduler
爬取一次后,此url被标记为已爬取
事实上我是每半小时从这个页面取到最热门的几条,我希望这个入口不会被过滤,但其他内容页的url照常过滤,应该怎么做啊?
The text was updated successfully, but these errors were encountered: