插入登录账号和种子信息后执行出错 #61

jianzzz · 2018-01-02T14:52:10Z

1、python login_first.py
2、python user_first.py

2018-01-02 14:09:53 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
[2018-01-02 14:09:53,646: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
2018-01-02 14:09:53 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:53,650: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
(WeiboSpider)root@jian-spider:/home/ubuntu/weibospider# 2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 14:09:54,293: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,305: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,324: INFO/MainProcess] Received task: tasks.user.crawl_follower_fans[49a1e5cb-240c-4b0d-a767-e1664574b74e]
2018-01-02 14:09:54 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
[2018-01-02 14:09:54,329: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
2018-01-02 14:09:54 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:54,331: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 14:09:54,958: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

jianzzz · 2018-01-02T15:01:42Z

$ python home_first.py
[2018-01-02 15:00:14,872: INFO/MainProcess] Received task: tasks.home.crawl_weibo_datas[41075592-07d2-49d8-861f-ba38c24ef872]
[2018-01-02 15:00:14,874: INFO/MainProcess] Received task: tasks.home.crawl_weibo_datas[daef0591-991b-4a97-8928-a9deb86db5e4]
2018-01-02 15:00:14 - crawler - INFO - the crawling url is http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1
[2018-01-02 15:00:14,879: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1
2018-01-02 15:00:14 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 15:00:14,881: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
ubuntu@jian-spider:~/weibospider$ 2018-01-02 15:00:15 - crawler - ERROR - failed to crawl http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 15:00:15,558: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

2018-01-02 15:00:15 - crawler - WARNING - user 1751681657 has no weibo
[2018-01-02 15:00:15,561: WARNING/ForkPoolWorker-1] user 1751681657 has no weibo
2018-01-02 15:00:15 - crawler - INFO - the crawling url is http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1
[2018-01-02 15:00:15,564: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1
2018-01-02 15:00:15 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 15:00:15,565: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
2018-01-02 15:00:17 - crawler - ERROR - failed to crawl http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 15:00:17,044: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1，here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

2018-01-02 15:00:17 - crawler - WARNING - user 1195242865 has no weibo
[2018-01-02 15:00:17,045: WARNING/ForkPoolWorker-1] user 1195242865 has no weibo

ResolveWang · 2018-01-03T01:22:18Z

报错信息很明显了

no cookies in cookies pool, please find out the reason

原因是程序没从cookie池中获取到cookie，你看看项目logs文件夹中日志是什么样的。然后先手动登录，看看你的微博账号和密码是否对得上。再看你启动worker的时候指定-Q login_queue没。启动worker后需要你先执行login_first.py，login完成后再执行抓取任务。

还有，你自己看看mysql中login_info那张表的你的微博账号和密码是否都填好了，enable字段是否是１，你的redis的db1是否有cookie。你贴的信息只能推断出这些东西了。所有错误都是没有cookie导致的

jianzzz · 2018-01-03T04:46:04Z

$ ps -aux|grep celery

ubuntu   14811  0.1  1.1 135264 48372 pts/0    S    Jan02   1:33 /usr/local/bin/python3.5 /usr/local/bin/celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler,ajax_home_crawler,comment_crawler,comment_page_crawler,repost_crawler,repost_page_crawler worker -l info -c 1

$ python login_first.py

2018-01-03 04:22:33 - crawler - INFO - The login task is starting...

$ redis-cli

127.0.0.1:6379> auth weibospider
OK
127.0.0.1:6379> KEYS *
(empty list or set)

logs日志：

2018-01-03 04:22:33 - crawler - INFO - The login task is starting...
2018-01-03 04:22:40 - crawler - WARNING - account 1312728790@qq.com need verification for login

配置文件：

redis:
    host: 127.0.0.1
    port: 6379
    password: weibospider
    cookies: 1                   # store and fetch cookies
    # store fetched urls and results,so you can decide whether retry to crawl th                                         e urls or not
    urls: 2
    broker: 5                    # broker for celery
    backend: 6                   # backed for celery
    id_name: 8                   # user id and names，for repost info analysis
    # expire_time (hours) for redis db2, if they are useless to you, you can set                                          the value smaller
    expire_time: 48
#    sentinel:
#        - host: 2.2.2.2
#          port: 26379
#        - host: 3.3.3.3
#          port: 26379
#        - host: 4.4.4.4
#          port: 26380
    sentinel: ''
#    master: mymaster             # redis sentinel master name
    master: ''
    socket_timeout: 5               # sockt timeout for redis sentinel

login_info表已经填写账号密码，执行登录无异常。打码成功（云打码有记录），但是看情况是写不入redis。会有其他什么地方忽略了吗？

ResolveWang · 2018-01-03T06:10:24Z

看样子是执行了登录了。你可以在login/login.py中的get_session找到这段代码

url, yundama_obj, cid, session = do_login(name, password)

然后添加

print(session.cookies.get_dict())

看看是否获取到了cookies，如果获取到了，那么估计就是

    if url != '':
        rs_cont = session.get(url, headers=headers)
        login_info = rs_cont.text

        u_pattern = r'"uniqueid":"(.*)",'
        m = re.search(u_pattern, login_info)
        if m and m.group(1):
            # check if account is valid
            check_url = 'http://weibo.com/2671109275/about'
            resp = session.get(check_url, headers=headers)

            if is_403(resp.text):
                other.error('account {} has been forbidden'.format(name))
                LoginInfoOper.freeze_account(name, 0)
                return None
            other.info('Login successful! The login account is {}'.format(name))
            Cookies.store_cookies(name, session.cookies.get_dict())

这一段有问题，你可以debug一下，也可以把它的登录模块拿来单独测试一下，因为有可能有本项目没捕捉到的异常，被celery给隐藏了，没抛出来，所以也没看到报错。

由于你这个问题我这边无法复现，也没见别的用户反馈过，所以希望你能再确认一下，如果真有bug，欢迎进一步交流。

jianzzz · 2018-01-03T09:37:05Z

[2018-01-03 09:28:03,591: WARNING/ForkPoolWorker-1] Invalid URL 'login_need_pincode': No schema supplied. Perhaps you meant http://login_need_pincode?

get_redirect函数返回‘login_need_pincode’，get_session函数将其作为url，执行rs_cont = session.get(url, headers=headers)导致出错

ResolveWang · 2018-01-03T10:27:25Z

还是最初那个问题？不会吧？我这边用了三种情况的账号验证，都没问题啊。你方便把你的账号给我验证一下吗

jianzzz · 2018-01-03T11:11:49Z

账号这个不太方便。我淘宝没买到，这个是跟朋友借的。

ResolveWang · 2018-01-03T11:38:11Z

那就算了吧，只能你自己调了。
由于开发者无法复现，所以关了该issue

ResolveWang · 2018-01-10T13:01:22Z

今天一个网友遇到一个和该issue行为相同的问题，通过和用户沟通、调试，发现是用户给电脑开了全局代理，导致reqeusts模块请求失败，而celery却把该信息给隐藏起来了。调试思路是，将项目tasks/login相关代码改成单机的，修改方式如下

@app.task(ignore_result=True)
def excute_login_task():
    # infos = login_info.get_login_info()
    # # Clear all stacked login tasks before each time for login
    # Cookies.check_login_task()
    # log.crawler.info('The login task is starting...')
    # for info in infos:
    #     app.send_task('tasks.login.login_task', args=(info.name, info.password), queue='login_queue',
    #                   routing_key='for_login')
    #     time.sleep(10)
    # 这里填你的登录失败的账号和密码，然后执行python3 login_first.py
    login_task('you_test_account', 'password')

主要是因为celery可能会隐藏部分异常，导致我们错过重要信息，改成单机之后就会直接抛出相关异常。此外，该项目其它地方也可以使用这种方式进行调试。

jianzzz changed the title ~~seed_ids插入uid之后执行出错~~ 插入登录账号和种子信息后执行出错 Jan 2, 2018

ResolveWang closed this as completed Jan 3, 2018

Joker-zc mentioned this issue Apr 19, 2018

无法登录，KeyError #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

插入登录账号和种子信息后执行出错 #61

插入登录账号和种子信息后执行出错 #61

jianzzz commented Jan 2, 2018

jianzzz commented Jan 2, 2018 •

edited

ResolveWang commented Jan 3, 2018 •

edited

jianzzz commented Jan 3, 2018 •

edited

ResolveWang commented Jan 3, 2018

jianzzz commented Jan 3, 2018

ResolveWang commented Jan 3, 2018

jianzzz commented Jan 3, 2018 •

edited

ResolveWang commented Jan 3, 2018 •

edited

ResolveWang commented Jan 10, 2018

插入登录账号和种子信息后执行出错 #61

插入登录账号和种子信息后执行出错 #61

Comments

jianzzz commented Jan 2, 2018

jianzzz commented Jan 2, 2018 • edited

ResolveWang commented Jan 3, 2018 • edited

jianzzz commented Jan 3, 2018 • edited

ResolveWang commented Jan 3, 2018

jianzzz commented Jan 3, 2018

ResolveWang commented Jan 3, 2018

jianzzz commented Jan 3, 2018 • edited

ResolveWang commented Jan 3, 2018 • edited

ResolveWang commented Jan 10, 2018

jianzzz commented Jan 2, 2018 •

edited

ResolveWang commented Jan 3, 2018 •

edited

jianzzz commented Jan 3, 2018 •

edited

jianzzz commented Jan 3, 2018 •

edited

ResolveWang commented Jan 3, 2018 •

edited