New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic mypy support #613
Basic mypy support #613
Conversation
* Documentation changes v0.9.2 (AndyTheFactory#604) (AndyTheFactory#605) * feat(doc): 📝 adding evaluation results * feat(doc): 🚀 Documentation Update. Added Examples, documented new features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great contribution.
I was having some unfinished work on the parsing optimization part and had to wait to merge this pull requests until i was finished.
I will merge this in the work-0.9.3 branch, since it's the current development branch.
I think will start merging tomorrow.
Additionally, I had some questions that i added as comments, would value your feedback.
newspaper/cli.py
Outdated
@@ -221,10 +221,12 @@ def csv_string(article_dict: dict) -> str: | |||
|
|||
output = article.to_json(as_string=args.output_format == "json") | |||
if args.output_format == "json": | |||
assert isinstance(output, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that maybe a test case for article.to_json would be better than an assert here?
My thoughts: only if to_json is buggy, this would trigger. For the enduser it would not mean a lot to get an assertion error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some overload bits (see https://stackoverflow.com/a/76302414/320546 for explanation) to to_json
to make it have a more explicit link between the params and output, and so have removed these.
@@ -229,6 +229,7 @@ def is_tag_match(node, tag_dict): | |||
scores = [] | |||
for tag in defines.ARTICLE_BODY_TAGS: | |||
if is_tag_match(node, tag): | |||
assert isinstance(tag["score_boost"], int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed in my opinion, since tag is from the constant ARTICLE_BODY_TAGS, which is static defined, and should not be anything other than int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARTICLE_BODY_TAGS
was defined as a dict with str|int
keys, which meant it could be a str from the typing perspective even if the actual data forbid it. I've rewritten ARTICLE_BODY_TAGS
with an explicit TypedDict type to solve this.
@@ -58,7 +58,7 @@ def _get_favicon(self, doc: lxml.html.Element) -> str: | |||
) | |||
if meta: | |||
favicon = parsers.get_attribute(meta[0], "href") | |||
return favicon | |||
return favicon if favicon is not None else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works too:
return favicon or ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. I tend towards more verbose options and not liking relying on truthy values v.s. explicit tests, but I've changed it to the form you suggested.
@@ -119,8 +119,8 @@ def is_binary_url(url: str) -> bool: | |||
chars = len( | |||
[ | |||
char | |||
for char in content | |||
if 31 < ord(char) < 128 or ord(char) in [9, 10, 13] | |||
for char in [ord(c) if isinstance(c, str) else c for c in content] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the precaution? we ensured at line 105 that content is not bytes
In my opinion it makes the code less readable like this. Am i missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, that's not quite true. The except UnicodeDecodeError: pass
code means it could still be bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, i see, the mistake is that decode with errors=replace should never raise an error.
I will fix that in the merge. I think then it's safe to leave the ord(c) without any checking
Thanks for pointing that out 👍
(no need to cange anything, i am in the merging process)
25ab806
into
AndyTheFactory:work-0.9.3
Proposed Changes:
Adds basic mypy support. Note, this doesn't include things like
disallow_untyped_defs
so there's a fair chunk of the code not typed yet, but this gets a first-pass done.How did you test it?
Manuallly running
mypy
Notes for the reviewer
This currently assumes that #612 has been resolved removing < 3.8 as that's harder to support. This is doable without it, but much easier, so figured I'd send through this version first.
Checklist
[] I have updated the related issue with new insights and changes
[] I added unit tests and updated the docstrings
[x] I've used one of the conventional commit types for my PR title:
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.[] I documented my code
[x] I ran pre-commit hooks and fixed any issue