Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic mypy support #613

Merged
merged 6 commits into from Feb 26, 2024
Merged

Conversation

palfrey
Copy link
Contributor

@palfrey palfrey commented Feb 18, 2024

Proposed Changes:

Adds basic mypy support. Note, this doesn't include things like disallow_untyped_defs so there's a fair chunk of the code not typed yet, but this gets a first-pass done.

How did you test it?

Manuallly running mypy

Notes for the reviewer

This currently assumes that #612 has been resolved removing < 3.8 as that's harder to support. This is doable without it, but much easier, so figured I'd send through this version first.

Checklist

[] I have updated the related issue with new insights and changes
[] I added unit tests and updated the docstrings
[x] I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
[] I documented my code
[x] I ran pre-commit hooks and fixed any issue

AndyTheFactory and others added 2 commits January 17, 2024 08:32
* Documentation changes v0.9.2 (AndyTheFactory#604) (AndyTheFactory#605)

* feat(doc): 📝 adding evaluation results

* feat(doc): 🚀 Documentation Update. Added Examples, documented new features
Copy link
Owner

@AndyTheFactory AndyTheFactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution.

I was having some unfinished work on the parsing optimization part and had to wait to merge this pull requests until i was finished.

I will merge this in the work-0.9.3 branch, since it's the current development branch.
I think will start merging tomorrow.

Additionally, I had some questions that i added as comments, would value your feedback.

newspaper/cli.py Outdated
@@ -221,10 +221,12 @@ def csv_string(article_dict: dict) -> str:

output = article.to_json(as_string=args.output_format == "json")
if args.output_format == "json":
assert isinstance(output, str)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that maybe a test case for article.to_json would be better than an assert here?
My thoughts: only if to_json is buggy, this would trigger. For the enduser it would not mean a lot to get an assertion error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some overload bits (see https://stackoverflow.com/a/76302414/320546 for explanation) to to_json to make it have a more explicit link between the params and output, and so have removed these.

@@ -229,6 +229,7 @@ def is_tag_match(node, tag_dict):
scores = []
for tag in defines.ARTICLE_BODY_TAGS:
if is_tag_match(node, tag):
assert isinstance(tag["score_boost"], int)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed in my opinion, since tag is from the constant ARTICLE_BODY_TAGS, which is static defined, and should not be anything other than int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARTICLE_BODY_TAGS was defined as a dict with str|int keys, which meant it could be a str from the typing perspective even if the actual data forbid it. I've rewritten ARTICLE_BODY_TAGS with an explicit TypedDict type to solve this.

@@ -58,7 +58,7 @@ def _get_favicon(self, doc: lxml.html.Element) -> str:
)
if meta:
favicon = parsers.get_attribute(meta[0], "href")
return favicon
return favicon if favicon is not None else ""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works too:
return favicon or ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. I tend towards more verbose options and not liking relying on truthy values v.s. explicit tests, but I've changed it to the form you suggested.

@@ -119,8 +119,8 @@ def is_binary_url(url: str) -> bool:
chars = len(
[
char
for char in content
if 31 < ord(char) < 128 or ord(char) in [9, 10, 13]
for char in [ord(c) if isinstance(c, str) else c for c in content]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the precaution? we ensured at line 105 that content is not bytes
In my opinion it makes the code less readable like this. Am i missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that's not quite true. The except UnicodeDecodeError: pass code means it could still be bytes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, i see, the mistake is that decode with errors=replace should never raise an error.
I will fix that in the merge. I think then it's safe to leave the ord(c) without any checking
Thanks for pointing that out 👍
(no need to cange anything, i am in the merging process)

@AndyTheFactory AndyTheFactory changed the base branch from master to work-0.9.3 February 26, 2024 21:53
@AndyTheFactory AndyTheFactory self-assigned this Feb 26, 2024
@AndyTheFactory AndyTheFactory added this to the Release 0.9.3 milestone Feb 26, 2024
@AndyTheFactory AndyTheFactory merged commit 25ab806 into AndyTheFactory:work-0.9.3 Feb 26, 2024
5 of 10 checks passed
@palfrey palfrey deleted the typing branch April 14, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants