-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic mypy support #613
Basic mypy support #613
Changes from 2 commits
9d99beb
1df0963
dd551a1
db1fd97
6cd433a
f900ead
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -229,6 +229,7 @@ def is_tag_match(node, tag_dict): | |
scores = [] | ||
for tag in defines.ARTICLE_BODY_TAGS: | ||
if is_tag_match(node, tag): | ||
assert isinstance(tag["score_boost"], int) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not needed in my opinion, since tag is from the constant ARTICLE_BODY_TAGS, which is static defined, and should not be anything other than int There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
scores.append(tag["score_boost"]) | ||
if scores: | ||
return max(scores) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,7 +58,7 @@ def _get_favicon(self, doc: lxml.html.Element) -> str: | |
) | ||
if meta: | ||
favicon = parsers.get_attribute(meta[0], "href") | ||
return favicon | ||
return favicon if favicon is not None else "" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this works too: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does. I tend towards more verbose options and not liking relying on truthy values v.s. explicit tests, but I've changed it to the form you suggested. |
||
return "" | ||
|
||
def _get_meta_image(self, doc: lxml.html.Element) -> str: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ | |
""" | ||
from concurrent.futures import ThreadPoolExecutor | ||
import logging | ||
from typing import Callable, List, Tuple | ||
from typing import Callable, List, Tuple, Union | ||
import requests | ||
from requests import RequestException | ||
from requests import Response | ||
|
@@ -101,7 +101,7 @@ def is_binary_url(url: str) -> bool: | |
allow_redirects=True, | ||
) | ||
|
||
content = resp.content | ||
content: Union[str, bytes] = resp.content | ||
if isinstance(content, bytes): | ||
try: | ||
content = content.decode("utf-8", errors="replace") | ||
|
@@ -119,8 +119,8 @@ def is_binary_url(url: str) -> bool: | |
chars = len( | ||
[ | ||
char | ||
for char in content | ||
if 31 < ord(char) < 128 or ord(char) in [9, 10, 13] | ||
for char in [ord(c) if isinstance(c, str) else c for c in content] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why the precaution? we ensured at line 105 that content is not bytes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, that's not quite true. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep, i see, the mistake is that decode with errors=replace should never raise an error. |
||
if 31 < char < 128 or char in [9, 10, 13] | ||
] | ||
) | ||
if chars / len(content) < 0.6: # 40% of the content is binary | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that maybe a test case for article.to_json would be better than an assert here?
My thoughts: only if to_json is buggy, this would trigger. For the enduser it would not mean a lot to get an assertion error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some overload bits (see https://stackoverflow.com/a/76302414/320546 for explanation) to
to_json
to make it have a more explicit link between the params and output, and so have removed these.