New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
插入登录账号和种子信息后执行出错 #61
Comments
$ python home_first.py [2018-01-02 15:00:15,558: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit 2018-01-02 15:00:15 - crawler - WARNING - user 1751681657 has no weibo [2018-01-02 15:00:17,044: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit 2018-01-02 15:00:17 - crawler - WARNING - user 1195242865 has no weibo |
报错信息很明显了
原因是程序没从cookie池中获取到cookie,你看看项目 还有,你自己看看mysql中 |
$ ps -aux|grep celery
$ python login_first.py
$ redis-cli
logs日志:
配置文件:
login_info表已经填写账号密码,执行登录无异常。打码成功(云打码有记录),但是看情况是写不入redis。会有其他什么地方忽略了吗? |
看样子是执行了登录了。你可以在login/login.py中的
然后添加
看看是否获取到了cookies,如果获取到了,那么估计就是 if url != '':
rs_cont = session.get(url, headers=headers)
login_info = rs_cont.text
u_pattern = r'"uniqueid":"(.*)",'
m = re.search(u_pattern, login_info)
if m and m.group(1):
# check if account is valid
check_url = 'http://weibo.com/2671109275/about'
resp = session.get(check_url, headers=headers)
if is_403(resp.text):
other.error('account {} has been forbidden'.format(name))
LoginInfoOper.freeze_account(name, 0)
return None
other.info('Login successful! The login account is {}'.format(name))
Cookies.store_cookies(name, session.cookies.get_dict()) 这一段有问题,你可以debug一下,也可以把它的登录模块拿来单独测试一下,因为有可能有本项目没捕捉到的异常,被celery给隐藏了,没抛出来,所以也没看到报错。 由于你这个问题我这边无法复现,也没见别的用户反馈过,所以希望你能再确认一下,如果真有bug,欢迎进一步交流。 |
[2018-01-03 09:28:03,591: WARNING/ForkPoolWorker-1] Invalid URL 'login_need_pincode': No schema supplied. Perhaps you meant http://login_need_pincode? get_redirect函数返回‘login_need_pincode’,get_session函数将其作为url,执行rs_cont = session.get(url, headers=headers)导致出错 |
还是最初那个问题?不会吧?我这边用了三种情况的账号验证,都没问题啊。你方便把你的账号给我验证一下吗 |
账号这个不太方便。我淘宝没买到,这个是跟朋友借的。 |
那就算了吧,只能你自己调了。 |
今天一个网友遇到一个和该issue行为相同的问题,通过和用户沟通、调试,发现是用户给电脑开了全局代理,导致reqeusts模块请求失败,而celery却把该信息给隐藏起来了。调试思路是,将项目tasks/login相关代码改成单机的,修改方式如下
主要是因为celery可能会隐藏部分异常,导致我们错过重要信息,改成单机之后就会直接抛出相关异常。此外,该项目其它地方也可以使用这种方式进行调试。 |
1、python login_first.py
2、python user_first.py
2018-01-02 14:09:53 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
[2018-01-02 14:09:53,646: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
2018-01-02 14:09:53 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:53,650: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
(WeiboSpider)root@jian-spider:/home/ubuntu/weibospider# 2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
[2018-01-02 14:09:54,293: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,305: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,324: INFO/MainProcess] Received task: tasks.user.crawl_follower_fans[49a1e5cb-240c-4b0d-a767-e1664574b74e]
2018-01-02 14:09:54 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
[2018-01-02 14:09:54,329: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
2018-01-02 14:09:54 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:54,331: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
[2018-01-02 14:09:54,958: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
The text was updated successfully, but these errors were encountered: