Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于账号全部被BAN后的继续执行任务问题 #83

Closed
Martinhu95 opened this issue Mar 28, 2018 · 9 comments
Closed

关于账号全部被BAN后的继续执行任务问题 #83

Martinhu95 opened this issue Mar 28, 2018 · 9 comments

Comments

@Martinhu95
Copy link

Martinhu95 commented Mar 28, 2018

在提交Issue之前请先回答下面问题,谢谢!

1.你是怎么操作的?

nohup celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1 &
nohup python login_first.py &
nohup celery beat -A tasks.workers -l info &
nohup python search_first.py &

2.你期望的结果是什么?

我想让程序一直跑每天都爬取一次所有的关键词

3.实际上你得到的结果是什么?

我爬取一天得到了1w8k条数据,然后账号1变0,后来我登录微博发现账号还可以用,就update了数据库0变1,然后重启任务,但是使用 ps aux|grep celery 抓取任务时发现不是开始时候的:
root 1281 0.3 2.9 180644 59444 pts/8 S 13:02 0:01 /usr/bin/python3 /usr/local/bin/celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1
root 1286 4.7 3.6 275440 73800 pts/8 S 13:02 0:16 /usr/bin/python3 /usr/local/bin/celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1
root 1311 0.3 2.8 188344 57460 pts/8 S 13:04 0:00 /usr/bin/python3 /usr/local/bin/celery beat -A tasks.workers -l info

而是只有:

root 1311 0.3 2.8 188344 57460 pts/8 S 13:04 0:00 /usr/bin/python3 /usr/local/bin/celery beat -A tasks.workers -l info

并且数据库增加了几十个条目后暂停了

4.你使用的是哪个版本的WeiboSpider? 你的操作系统是什么?是否有读本项目的[常见问题]

我使用的是release的最后一个版本,操作系统是ubuntu14.04,读了几遍文档但是还是不知道怎么解决。

@Martinhu95
Copy link
Author

root 1311 0.1 2.8 188344 57460 pts/8 S 13:04 0:00 /usr/bin/python3 /usr/local/bin/celery beat -A tasks.workers -l info
root 1379 0.0 0.0 14220 1084 pts/8 S+ 13:12 0:00 grep --color=auto celery
[1]- Exit 1 nohup celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1

这条语句被exit是为什么?

@Martinhu95
Copy link
Author

root@iZ2zeftexcphcu8dj9if0mZ:/home/admin/project/weibospider# jobs -l
[2]- 1311 Running nohup celery beat -A tasks.workers -l info &
[3]+ 1398 Running nohup celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1 &

这个是正常的时候,不正常爬取的时候,jobs中就只剩下 nohup celery -A tasks.workers -Q 了
这是因为什么呢

@ResolveWang
Copy link
Member

1.你检查一下是否能用你的账号进行搜索,因为微博封账号的情况很复杂,它可能只封锁你一个功能

2.如果你重启celery worker之后,它会直接继续执行上次没执行完的任务,你如果要让它从当前时刻开始执行你指定的任务,需要清除redis db5和6

@Martinhu95
Copy link
Author

我试了试没有被封,之前enable从1变0是不是封了我几个小时?我现在update账号enable了以后,爬取时候,这条语句老是被exit,任务就停止了:

celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1

@ResolveWang
Copy link
Member

1.你在basic.py中打印一下response.status_coderesponse.text,看看响应是否是正常的

2.把redis的db1(cookies)、db5和6都清空,再启动worker和相关任务调度器

@Martinhu95
Copy link
Author

2018-03-28 14:11:50 - crawler - INFO - the crawling url is http://weibo.com/p/1005052244164900/follow?page=1#Pl_Official_HisRelation__60
[2018-03-28 14:11:50,662: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005052244164900/follow?page=1#Pl_Official_HisRelation__60
2018-03-28 14:11:50 - crawler - ERROR - failed to crawl http://weibo.com/p/1005052244164900/follow?page=1#Pl_Official_HisRelation__60,here are details:MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error., stack is File "/home/admin/project/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-03-28 14:11:50,664: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005052244164900/follow?page=1#Pl_Official_HisRelation__60,here are details:MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error., stack is File "/home/admin/project/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-03-28 14:11:50,667: WARNING/ForkPoolWorker-1] /root/anaconda3/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1062, "Duplicate entry '6490414635' for key 'uid'")
result = self._query(query)
[2018-03-28 14:11:50,667: WARNING/ForkPoolWorker-1] /root/anaconda3/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1062, "Duplicate entry '3764351355' for key 'uid'")
result = self._query(query)
2018-03-28 14:11:50 - crawler - INFO - the crawling url is http://weibo.com/p/1005053105868177/info?mod=pedit_more
[2018-03-28 14:11:50,682: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005053105868177/info?mod=pedit_more
2018-03-28 14:11:50 - crawler - ERROR - failed to crawl http://weibo.com/p/1005053105868177/info?mod=pedit_more,here are details:MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error., stack is File "/home/admin/project/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-03-28 14:11:50,683: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005053105868177/info?mod=pedit_more,here are details:MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error., stack is File "/home/admin/project/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-03-28 14:11:50,684: ERROR/ForkPoolWorker-1] list index out of range
[2018-03-28 14:11:50,684: ERROR/ForkPoolWorker-1] list index out of range
[2018-03-28 14:11:50,685: ERROR/ForkPoolWorker-1] list index out of range
[2018-03-28 14:11:51,745: WARNING/MainProcess] Restoring 4 unacknowledged message(s)

原因好像是因为redis,但是我不熟悉redis所以不知道怎么改,这个是日志

@Martinhu95
Copy link
Author

我查了一下,好像原因是因为强制把redis快照关闭了导致不能持久化的问题,在网上查了一些相关解决方案,通过stop-writes-on-bgsave-error值设置为no即可避免这种问题。

后续继续测试下还会不会停

@ResolveWang
Copy link
Member

好的,多谢反馈

@thekingofcity
Copy link
Member

MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
这个问题也有可能是内存占用率太高导致的, 我的VPS上是这个原因

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants