Skip to content

perfect Spider.run to avoid some rare concurrent issue, change the Sp… #1033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 4, 2021

Conversation

carl-don-it
Copy link
Contributor

  1. under rare circumstances , living thread may push url to scheduler before line 3 , after line 2 , and finish , and now threadPool.getThreadAlive() is 0 , it will exit , which shouldn't have happen.
1.final Request request = scheduler.poll(this);
2. if (request == null) {
3.    if (threadPool.getThreadAlive() == 0 && exitWhenComplete) {
4.         break;
5.    }
     ...
  }
  1. i change the emptySleepTime type to long

@sutra
Copy link
Collaborator

sutra commented Aug 4, 2021

我在想,既然活动线程数已经是 0 了,又是哪个线程后来将新的 URL 加进了 scheduler?

@carl-don-it
Copy link
Contributor Author

经是 0 了,又是哪个线程后来将

活动线程是0和scheduler容量为null不是原子操作。
在运行到第三行代码前,活动线程不为0,完全可以push进scheduler然后结束,这时run线程才开始运行第三行代码

@sutra
Copy link
Collaborator

sutra commented Aug 4, 2021

if (request == null) { // 这句取出来是空的,但是不代表活动线程数是0
// 某个活动线程加了个 URL 后,scheduler 不为空了,然后活动线程数为 0 了。
if (threadPool.getThreadAlive() == 0 // 活动线程数为0了,但是 scheduler 不为空了。

hmm……似乎是这么回事。

把提交的代码格式化问题稍微解决一下?

@sutra sutra merged commit f110147 into code4craft:develop Aug 4, 2021
@sutra sutra linked an issue Aug 4, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spider莫名退出问题
2 participants