Basic mypy support #613

palfrey · 2024-02-18T21:14:27Z

Proposed Changes:

Adds basic mypy support. Note, this doesn't include things like disallow_untyped_defs so there's a fair chunk of the code not typed yet, but this gets a first-pass done.

How did you test it?

Manuallly running mypy

Notes for the reviewer

This currently assumes that #612 has been resolved removing < 3.8 as that's harder to support. This is doable without it, but much easier, so figured I'd send through this version first.

Checklist

[] I have updated the related issue with new insights and changes
[] I added unit tests and updated the docstrings
[x] I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
[] I documented my code
[x] I ran pre-commit hooks and fixed any issue

* Documentation changes v0.9.2 (AndyTheFactory#604) (AndyTheFactory#605) * feat(doc): 📝 adding evaluation results * feat(doc): 🚀 Documentation Update. Added Examples, documented new features

AndyTheFactory

Thanks for the great contribution.

I was having some unfinished work on the parsing optimization part and had to wait to merge this pull requests until i was finished.

I will merge this in the work-0.9.3 branch, since it's the current development branch.
I think will start merging tomorrow.

Additionally, I had some questions that i added as comments, would value your feedback.

AndyTheFactory · 2024-02-25T22:11:39Z

newspaper/cli.py

@@ -221,10 +221,12 @@ def csv_string(article_dict: dict) -> str:

        output = article.to_json(as_string=args.output_format == "json")
        if args.output_format == "json":
+            assert isinstance(output, str)


Do you think that maybe a test case for article.to_json would be better than an assert here?
My thoughts: only if to_json is buggy, this would trigger. For the enduser it would not mean a lot to get an assertion error here.

I've added some overload bits (see https://stackoverflow.com/a/76302414/320546 for explanation) to to_json to make it have a more explicit link between the params and output, and so have removed these.

AndyTheFactory · 2024-02-25T22:18:08Z

newspaper/extractors/articlebody_extractor.py

@@ -229,6 +229,7 @@ def is_tag_match(node, tag_dict):
        scores = []
        for tag in defines.ARTICLE_BODY_TAGS:
            if is_tag_match(node, tag):
+                assert isinstance(tag["score_boost"], int)


not needed in my opinion, since tag is from the constant ARTICLE_BODY_TAGS, which is static defined, and should not be anything other than int

ARTICLE_BODY_TAGS was defined as a dict with str|int keys, which meant it could be a str from the typing perspective even if the actual data forbid it. I've rewritten ARTICLE_BODY_TAGS with an explicit TypedDict type to solve this.

AndyTheFactory · 2024-02-25T22:21:02Z

newspaper/extractors/image_extractor.py

@@ -58,7 +58,7 @@ def _get_favicon(self, doc: lxml.html.Element) -> str:
        )
        if meta:
            favicon = parsers.get_attribute(meta[0], "href")
-            return favicon
+            return favicon if favicon is not None else ""


I think this works too:
return favicon or ""

It does. I tend towards more verbose options and not liking relying on truthy values v.s. explicit tests, but I've changed it to the form you suggested.

AndyTheFactory · 2024-02-25T22:28:05Z

newspaper/network.py

@@ -119,8 +119,8 @@ def is_binary_url(url: str) -> bool:
        chars = len(
            [
                char
-                for char in content
-                if 31 < ord(char) < 128 or ord(char) in [9, 10, 13]
+                for char in [ord(c) if isinstance(c, str) else c for c in content]


why the precaution? we ensured at line 105 that content is not bytes
In my opinion it makes the code less readable like this. Am i missing something?

So, that's not quite true. The except UnicodeDecodeError: pass code means it could still be bytes.

yep, i see, the mistake is that decode with errors=replace should never raise an error.
I will fix that in the merge. I think then it's safe to leave the ord(c) without any checking
Thanks for pointing that out 👍
(no need to cange anything, i am in the merging process)

AndyTheFactory and others added 2 commits January 17, 2024 08:32

Documentation changes v0.9.2 (AndyTheFactory#610)

9d99beb

* Documentation changes v0.9.2 (AndyTheFactory#604) (AndyTheFactory#605) * feat(doc): 📝 adding evaluation results * feat(doc): 🚀 Documentation Update. Added Examples, documented new features

feat: Basic mypy support

1df0963

AndyTheFactory reviewed Feb 25, 2024

View reviewed changes

palfrey added 3 commits February 25, 2024 23:19

Explicitly define ArticleBodyTag to remove score_boost check

dd551a1

Simplify favicon return

db1fd97

Add overload to to_json to specify types

6cd433a

AndyTheFactory changed the base branch from master to work-0.9.3 February 26, 2024 21:53

AndyTheFactory self-assigned this Feb 26, 2024

AndyTheFactory added this to the Release 0.9.3 milestone Feb 26, 2024

Merge branch 'work-0.9.3' into typing

f900ead

AndyTheFactory merged commit 25ab806 into AndyTheFactory:work-0.9.3 Feb 26, 2024
5 of 10 checks passed

palfrey deleted the typing branch April 14, 2024 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic mypy support #613

Basic mypy support #613

palfrey commented Feb 18, 2024 •

edited

AndyTheFactory left a comment

AndyTheFactory Feb 25, 2024

palfrey Feb 25, 2024

AndyTheFactory Feb 25, 2024

palfrey Feb 25, 2024

AndyTheFactory Feb 25, 2024

palfrey Feb 25, 2024

AndyTheFactory Feb 25, 2024

palfrey Feb 25, 2024

AndyTheFactory Feb 26, 2024 •

edited

Basic mypy support #613

Basic mypy support #613

Conversation

palfrey commented Feb 18, 2024 • edited

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

AndyTheFactory left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyTheFactory Feb 26, 2024 • edited

Choose a reason for hiding this comment

palfrey commented Feb 18, 2024 •

edited

AndyTheFactory Feb 26, 2024 •

edited