Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

插入登录账号和种子信息后执行出错 #61

Closed
jianzzz opened this issue Jan 2, 2018 · 9 comments
Closed

插入登录账号和种子信息后执行出错 #61

jianzzz opened this issue Jan 2, 2018 · 9 comments

Comments

@jianzzz
Copy link

jianzzz commented Jan 2, 2018

1、python login_first.py
2、python user_first.py

2018-01-02 14:09:53 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
[2018-01-02 14:09:53,646: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
2018-01-02 14:09:53 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:53,650: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
(WeiboSpider)root@jian-spider:/home/ubuntu/weibospider# 2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 14:09:54,293: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,305: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,324: INFO/MainProcess] Received task: tasks.user.crawl_follower_fans[49a1e5cb-240c-4b0d-a767-e1664574b74e]
2018-01-02 14:09:54 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
[2018-01-02 14:09:54,329: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
2018-01-02 14:09:54 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:54,331: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 14:09:54,958: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

@jianzzz
Copy link
Author

jianzzz commented Jan 2, 2018

$ python home_first.py
[2018-01-02 15:00:14,872: INFO/MainProcess] Received task: tasks.home.crawl_weibo_datas[41075592-07d2-49d8-861f-ba38c24ef872]
[2018-01-02 15:00:14,874: INFO/MainProcess] Received task: tasks.home.crawl_weibo_datas[daef0591-991b-4a97-8928-a9deb86db5e4]
2018-01-02 15:00:14 - crawler - INFO - the crawling url is http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1
[2018-01-02 15:00:14,879: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1
2018-01-02 15:00:14 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 15:00:14,881: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
ubuntu@jian-spider:~/weibospider$ 2018-01-02 15:00:15 - crawler - ERROR - failed to crawl http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 15:00:15,558: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/u/1751681657?is_ori=1&is_tag=0&profile_ftype=1&page=1,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

2018-01-02 15:00:15 - crawler - WARNING - user 1751681657 has no weibo
[2018-01-02 15:00:15,561: WARNING/ForkPoolWorker-1] user 1751681657 has no weibo
2018-01-02 15:00:15 - crawler - INFO - the crawling url is http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1
[2018-01-02 15:00:15,564: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1
2018-01-02 15:00:15 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 15:00:15,565: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
2018-01-02 15:00:17 - crawler - ERROR - failed to crawl http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

[2018-01-02 15:00:17,044: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/u/1195242865?is_ori=1&is_tag=0&profile_ftype=1&page=1,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)

2018-01-02 15:00:17 - crawler - WARNING - user 1195242865 has no weibo
[2018-01-02 15:00:17,045: WARNING/ForkPoolWorker-1] user 1195242865 has no weibo

@jianzzz jianzzz changed the title seed_ids插入uid之后执行出错 插入登录账号和种子信息后执行出错 Jan 2, 2018
@ResolveWang
Copy link
Member

ResolveWang commented Jan 3, 2018

报错信息很明显了

no cookies in cookies pool, please find out the reason

原因是程序没从cookie池中获取到cookie,你看看项目logs文件夹中日志是什么样的。然后先手动登录,看看你的微博账号和密码是否对得上。再看你启动worker的时候指定-Q login_queue没。启动worker后需要你先执行login_first.pylogin完成后再执行抓取任务。

还有,你自己看看mysql中login_info那张表的你的微博账号和密码是否都填好了,enable字段是否是1,你的redis的db1是否有cookie。你贴的信息只能推断出这些东西了。所有错误都是没有cookie导致的

@jianzzz
Copy link
Author

jianzzz commented Jan 3, 2018

$ ps -aux|grep celery

ubuntu   14811  0.1  1.1 135264 48372 pts/0    S    Jan02   1:33 /usr/local/bin/python3.5 /usr/local/bin/celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler,ajax_home_crawler,comment_crawler,comment_page_crawler,repost_crawler,repost_page_crawler worker -l info -c 1

$ python login_first.py

2018-01-03 04:22:33 - crawler - INFO - The login task is starting...

$ redis-cli

127.0.0.1:6379> auth weibospider
OK
127.0.0.1:6379> KEYS *
(empty list or set)

logs日志:

2018-01-03 04:22:33 - crawler - INFO - The login task is starting...
2018-01-03 04:22:40 - crawler - WARNING - account 1312728790@qq.com need verification for login

配置文件:

redis:
    host: 127.0.0.1
    port: 6379
    password: weibospider
    cookies: 1                   # store and fetch cookies
    # store fetched urls and results,so you can decide whether retry to crawl th                                         e urls or not
    urls: 2
    broker: 5                    # broker for celery
    backend: 6                   # backed for celery
    id_name: 8                   # user id and names,for repost info analysis
    # expire_time (hours) for redis db2, if they are useless to you, you can set                                          the value smaller
    expire_time: 48
#    sentinel:
#        - host: 2.2.2.2
#          port: 26379
#        - host: 3.3.3.3
#          port: 26379
#        - host: 4.4.4.4
#          port: 26380
    sentinel: ''
#    master: mymaster             # redis sentinel master name
    master: ''
    socket_timeout: 5               # sockt timeout for redis sentinel

login_info表已经填写账号密码,执行登录无异常。打码成功(云打码有记录),但是看情况是写不入redis。会有其他什么地方忽略了吗?

@ResolveWang
Copy link
Member

看样子是执行了登录了。你可以在login/login.py中的get_session找到这段代码

url, yundama_obj, cid, session = do_login(name, password)

然后添加

print(session.cookies.get_dict())

看看是否获取到了cookies,如果获取到了,那么估计就是

    if url != '':
        rs_cont = session.get(url, headers=headers)
        login_info = rs_cont.text

        u_pattern = r'"uniqueid":"(.*)",'
        m = re.search(u_pattern, login_info)
        if m and m.group(1):
            # check if account is valid
            check_url = 'http://weibo.com/2671109275/about'
            resp = session.get(check_url, headers=headers)

            if is_403(resp.text):
                other.error('account {} has been forbidden'.format(name))
                LoginInfoOper.freeze_account(name, 0)
                return None
            other.info('Login successful! The login account is {}'.format(name))
            Cookies.store_cookies(name, session.cookies.get_dict())

这一段有问题,你可以debug一下,也可以把它的登录模块拿来单独测试一下,因为有可能有本项目没捕捉到的异常,被celery给隐藏了,没抛出来,所以也没看到报错。

由于你这个问题我这边无法复现,也没见别的用户反馈过,所以希望你能再确认一下,如果真有bug,欢迎进一步交流。

@jianzzz
Copy link
Author

jianzzz commented Jan 3, 2018

[2018-01-03 09:28:03,591: WARNING/ForkPoolWorker-1] Invalid URL 'login_need_pincode': No schema supplied. Perhaps you meant http://login_need_pincode?

get_redirect函数返回‘login_need_pincode’,get_session函数将其作为url,执行rs_cont = session.get(url, headers=headers)导致出错

@ResolveWang
Copy link
Member

还是最初那个问题?不会吧?我这边用了三种情况的账号验证,都没问题啊。你方便把你的账号给我验证一下吗

@jianzzz
Copy link
Author

jianzzz commented Jan 3, 2018

账号这个不太方便。我淘宝没买到,这个是跟朋友借的。

@ResolveWang
Copy link
Member

ResolveWang commented Jan 3, 2018

那就算了吧,只能你自己调了。
由于开发者无法复现,所以关了该issue

@ResolveWang
Copy link
Member

今天一个网友遇到一个和该issue行为相同的问题,通过和用户沟通、调试,发现是用户给电脑开了全局代理,导致reqeusts模块请求失败,而celery却把该信息给隐藏起来了。调试思路是,将项目tasks/login相关代码改成单机的,修改方式如下

@app.task(ignore_result=True)
def excute_login_task():
    # infos = login_info.get_login_info()
    # # Clear all stacked login tasks before each time for login
    # Cookies.check_login_task()
    # log.crawler.info('The login task is starting...')
    # for info in infos:
    #     app.send_task('tasks.login.login_task', args=(info.name, info.password), queue='login_queue',
    #                   routing_key='for_login')
    #     time.sleep(10)
    # 这里填你的登录失败的账号和密码,然后执行python3 login_first.py
    login_task('you_test_account', 'password')

主要是因为celery可能会隐藏部分异常,导致我们错过重要信息,改成单机之后就会直接抛出相关异常。此外,该项目其它地方也可以使用这种方式进行调试。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants