Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加 类型为文本的帖子,下载其中的图片 #108

Closed
wants to merge 1 commit into from
Closed

添加 类型为文本的帖子,下载其中的图片 #108

wants to merge 1 commit into from

Conversation

moxuec
Copy link

@moxuec moxuec commented Dec 8, 2018

No description provided.

Copy link
Owner

@dixudx dixudx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@646420882 Thanks for your contribution. Really appreciate.

This is a new medium type. So we need more tests to make it work smoothly. Please help test against some sites.

xml_cleaned = re.sub(u'[^\x20-\x7f]+',
u'', response.content.decode('utf-8'))
data = xmltodict.parse(xml_cleaned)
data = xmltodict.parse(response.text)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, wait...
This is doing intentionally to fix some unicode errors.
Please don't change this back.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哈哈,我后来也发现了这个问题,出出现unicode错误

pattern = re.compile('.*?<img.*?data-orig-src="(.*?)"/>', re.S)
pic_urls = re.findall(pattern, post["regular-body"])
for pic_url in pic_urls:
self.queue.put((medium_type, pic_url, target_folder))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using text as medium type here. This is not correct.

Here medium_type should be photo.

@moxuec moxuec closed this Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants