Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

执行run()之后,怎么判断已完成呢? #825

Open
zczhzy opened this issue Aug 24, 2018 · 4 comments
Open

执行run()之后,怎么判断已完成呢? #825

zczhzy opened this issue Aug 24, 2018 · 4 comments

Comments

@zczhzy
Copy link

zczhzy commented Aug 24, 2018

Spider.create(new TestProcessor()) .addUrl("http://xxx.xxx") .thread(5) .run();
我想在页面爬完之后,做下一步的操作,怎么样才能知道已经爬完了呢?

@gcqst
Copy link

gcqst commented Aug 29, 2018

同问

@wqqwqqwqq
Copy link

wqqwqqwqq commented Oct 15, 2018

个人发现的两个办法:
1.爬取动作从异步改为同步,调用的start方法改为run方法。
2.获取源码中的Spider.java,搜索“Spider {} closed! {} pages downloaded”,在下方写入自定义方法。。

@whitefly
Copy link

whitefly commented Mar 6, 2021

我最近在基于这个框架开发,也有这个需求
目前解决方法是继承Spider的类,然后在该类中重写run()方法,在super.run()之后插入一个函数作为钩子

@lomoye
Copy link

lomoye commented Jul 20, 2021

while (spiderWorker.getStatus() != Spider.Status.Stopped) {
try {
sleep(1000);
log.info("spiderWorker running sleep");
} catch (InterruptedException e) {
log.info("spiderWorker interruptedException", e);
}
}
我是这样搞的,在外面死循环判断爬虫的状态

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants