New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
爬关键词搜索失败 #196
Comments
检查代理问题 |
好的感谢,另外请问worker是不是不能在root用户下启动啊 |
可以, 只是celery会提示不推荐 |
代理是能拿到的而且登陆的时候用代理就没问题,为什么搜索就出现代理问题呢,如果page_get里不用代理话也是什么都拿不到,直接has been crawled,讲道理不登录的情况下也应该能拿到第一页的数据才对啊 2020-03-08 15:07:05 - other - INFO - Login successful! The login account is 17507424089 |
看下你的账号能不能正常登陆,在本机上试 |
是可以正常登录的,之前买了一批需要手机验证,特地重新买了一批,我感觉是不是微博搜索的cookie跟微博的cookie不一样。我之前用的temp_verification那版,今天我登不上超级鹰了,但云打码又可以用了,我用1.7.2试了一下,报错是这样的 |
这个问题碰到过,首先账号如果手动试验,需要手机号解封,那么即使登陆成功也是请求不到搜索页内容。如果账号没问题,也没有手机号解封,登陆也成功,还拿不到搜索页内容,很可能是 IP 被限制了。两种都有。 |
刚刚换了种api 限制5次1秒的代理,抓了30条,然后又报代理错误了,我放弃了,我主要还是想要搜索到的微博的转发跟评论。我另外去抓了搜索的微博,导进weibo_data然后爬评论跟转发,可以运行。感谢大佬们的回复 |
因为ip被微博封了,所以加了ip代理,login运行成功了,但是python first_task_execution/search 之后结果是这样的,weibo_data里也没有出现任何数据。page_get/basic.py里的get_page的need_proxy已经改成=True了
[2020-03-07 21:34:12,974: INFO/MainProcess] Received task: tasks.search.search_keyword[d652d4ea-826a-488f-a1aa-eaf52d9d8363]
2020-03-07 21:34:12 - crawler - INFO - We are searching keyword "武汉红十字会"
[2020-03-07 21:34:12,976: INFO/ForkPoolWorker-1] We are searching keyword "武汉红十字会"
2020-03-07 21:34:12 - crawler - INFO - the crawling url is http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1
[2020-03-07 21:34:12,979: INFO/ForkPoolWorker-1] the crawling url is http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1
2020-03-07 21:37:08 - crawler - WARNING - Excepitons are raised when crawling http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1.Here are details:HTTPConnectionPool(host='183.164.228.73', port=49691): Max retries exceeded with url: http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1 (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fd699739ba8>: Failed to establish a new connection: [Errno 110] Connection timed out',)))
[2020-03-07 21:37:08,589: WARNING/ForkPoolWorker-1] Excepitons are raised when crawling http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1.Here are details:HTTPConnectionPool(host='183.164.228.73', port=49691): Max retries exceeded with url: http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1 (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fd699739ba8>: Failed to establish a new connection: [Errno 110] Connection timed out',)))
2020-03-07 21:37:08 - crawler - ERROR - failed to crawl http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1,here are details:an integer is required (got type str), stack is File "/home/xwt/Desktop/weibospider-temp_verification/decorators/decorators.py", line 17, in time_limit
return func(*args, **kargs)
[2020-03-07 21:37:08,590: ERROR/ForkPoolWorker-1] failed to crawl http://s.weibo.com/weibo/%E6%AD%A6%E6%B1%89%E7%BA%A2%E5%8D%81%E5%AD%97%E4%BC%9A&xsort=hot&suball=1×cope=custom:2020-01-25-0:2020-02-25-0&page=1,here are details:an integer is required (got type str), stack is File "/home/xwt/Desktop/weibospider-temp_verification/decorators/decorators.py", line 17, in time_limit
return func(*args, **kargs)
2020-03-07 21:37:08 - crawler - WARNING - No search result for keyword 武汉红十字会, the source page is
[2020-03-07 21:37:08,591: WARNING/ForkPoolWorker-1] No search result for keyword 武汉红十字会, the source page is
[2020-03-07 21:37:08,592: INFO/ForkPoolWorker-1] Task tasks.search.search_keyword[d652d4ea-826a-488f-a1aa-eaf52d9d8363] succeeded in 175.61601991499992s: None
Max retries exceeded with url这是因为代理ip失效太快了嘛
The text was updated successfully, but these errors were encountered: