Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

获取自身微博信息 #113

Closed
Hylan129 opened this issue Feb 27, 2020 · 9 comments
Closed

获取自身微博信息 #113

Hylan129 opened this issue Feb 27, 2020 · 9 comments

Comments

@Hylan129
Copy link

在抓取自己微博历史记录时,原程序一直抓不到,程序本身没有报错;抓取其他人的微博正常。

分析后发现微博地址更改下可用,在如下两处url后面增加“/profile",即可。
1:
def get_weibo_info(self): """获取微博信息""" try: url = 'https://weibo.cn/%s/profile' % (self.user_config['user_uri'])
2:
def get_one_page(self, page): """获取第page页的全部微博""" try: url = 'https://weibo.cn/%s/profile?page=%d' % ( self.user_config['user_uri'], page)

ps:更改成新地址后,抓取其他人的微博同样可用。

@dataabc dataabc changed the title 抓取地址异常 获取自身微博信息 Feb 28, 2020
@dataabc
Copy link
Owner

dataabc commented Feb 28, 2020

感谢反馈。

非常好的建议,但是与现在的部分功能冲突。现在user_id即可以是真实的用户id,也可以是个性域名,如胡歌的微博页是https://weibo.cn/hu_ge,其中“hu_ge”就是个性域名。添加“/profile”后,如果user_id写的是真实的id可以正确获取信息,但是如果写的是个性域名,就会获取失败。考虑到很多微博是个性域名形式,为了更好的扩展性,程序暂时不作修改。

再次感谢,如果发现其它问题,欢迎继续反馈:smile:

@purplepalmdash
Copy link

抓取不到图片的解决方法如下, 供参考

 440             #first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=0'
 441             first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=1'

@Hylan129
Copy link
Author

Hylan129 commented Mar 1, 2020

@dataabc
嗯好的。个人问题自己已经解决,感谢回复。

@Hylan129
Copy link
Author

Hylan129 commented Mar 1, 2020

抓取不到图片的解决方法如下, 供参考

 440             #first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=0'
 441             first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=1'

@purplepalmdash 感谢,已解决!

@Yuuoniy
Copy link

Yuuoniy commented Aug 23, 2020

请问以上提到的两处url在代码文件的哪里呢?在 spider.py 看到 get_weibo_info 函数,但是看不到 url 的赋值。

@songzy12
Copy link
Collaborator

@Yuuoniy 你好,url 的构建当前都在 parser 模块下:
https://github.com/dataabc/weiboSpider/tree/master/weibo_spider/parser

每个 parser 对应了一类相关 url.

@songzy12
Copy link
Collaborator

@scriptway
Copy link

scriptway commented Dec 15, 2020

请问以上提到的两处url在代码文件的哪里呢?在 spider.py 看到 get_weibo_info 函数,但是看不到 url 的赋值。

程序已经更新,URL的引用变了。我也遇到这个问题,parser目录下的index_parser, info_parser, page_parser 里的url相关地址我都加上profile 可是依然无法解析个人微博。看程序报错是xpath匹配不到数据 ,我有一些微博是仅自己可见的,但是分析微博的页面结构后发现仅自己可见的微博div和人的微博div并没有什么不同,不知道出错环节在哪里
报错信息如下

``list index out of range
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/parser/info_parser.py", line 39, in extract_user_info
if self.selector.xpath(
IndexError: list index out of range
'NoneType' object has no attribute 'id'

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/parser/index_parser.py", line 36, in get_user
self.user.id = user_id
AttributeError: 'NoneType' object has no attribute 'id'
None


'NoneType' object has no attribute 'nickname'

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath
self.user.nickname)
AttributeError: 'NoneType' object has no attribute 'nickname'
expected str, bytes or os.PathLike object, not NoneType

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/writer/csv_writer.py", line 25, in init
with open(self.file_path, 'a', encoding='utf-8-sig',
TypeError: expected str, bytes or os.PathLike object, not NoneType
'NoneType' object has no attribute 'nickname'

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath
self.user.nickname)
AttributeError: 'NoneType' object has no attribute 'nickname'
'NoneType' object has no attribute 'nickname'

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath
self.user.nickname)
AttributeError: 'NoneType' object has no attribute 'nickname'
'NoneType' object has no attribute 'nickname'

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath
self.user.nickname)
AttributeError: 'NoneType' object has no attribute 'nickname'
'NoneType' object has no attribute 'dict'

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 269, in start
self.write_user(self.user)
File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 114, in write_user
writer.write_user(user)
File "/usr/local/lib/python3.9/site-packages/weibo_spider/writer/txt_writer.py", line 29, in write_user
[v + ':' + str(self.user.dict[k]) for k, v in self.user_desc])
File "/usr/local/lib/python3.9/site-packages/weibo_spider/writer/txt_writer.py", line 29, in
[v + ':' + str(self.user.dict[k]) for k, v in self.user_desc])
AttributeError: 'NoneType' object has no attribute 'dict'
``

@dataabc
Copy link
Owner

dataabc commented Dec 15, 2020

@scriptway
是因为速度太快,被暂时限制了。要降低速度,按照常见问题的问题2修改就可以了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants