Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

waitCount导致Spider结束 #18

Closed
Abel-Liu opened this issue Jun 8, 2017 · 1 comment
Closed

waitCount导致Spider结束 #18

Abel-Liu opened this issue Jun 8, 2017 · 1 comment

Comments

@Abel-Liu
Copy link

Abel-Liu commented Jun 8, 2017

Spider开了两个线程,有一个入口地址,第一个线程拿到Url以后去处理,第二个线程循环等待,恰巧这个地址处理了很长时间,第二个线程等待waitCount后将Spider状态设为Finished,但第一个Url其实还在处理。

所以是否应该判断:所有线程都空闲的时候,再等待waitCount视为结束。

@zlzforever
Copy link
Collaborator

建议把EmptySleepTime设大一点, 默认是30秒,如果30秒或者超过30秒某一个Url都没有处理掉, 认为是异常忽略掉了也是可以接受的。如果要去管理所有线程状态会比较麻烦,最早我实现过自己的队列,后面还是放弃了,直接用类库Parallel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants