-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type hints for read_bytes #9440
base: main
Are you sure you want to change the base?
Conversation
CI failure unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool
dask/bytes/core.py
Outdated
**kwargs, | ||
): | ||
urlpath: str | list[str], | ||
delimiter: Any | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use bytes | None
here, per the docstring? Doesn't seem to cause any problems for me locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
dask/bytes/core.py
Outdated
compression: str | None = None, | ||
include_path: bool = False, | ||
**kwargs: Any, | ||
) -> tuple[Any, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be covered with an @overload
covering different values for include_path
, but I leave it to you whether it;s worth your time :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for pointing that out, I think I got something that should work
@ian-r-rose thanks for taking a look! |
dask/bytes/core.py
Outdated
compression: str | None = None, | ||
include_path: Literal[False] = False, | ||
**kwargs: Any, | ||
) -> tuple[bytes, list[list[Delayed]], list[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this overload
not return a list[str]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep 🤦♂️. I was surprised mypy didnt catch this. If I make the change to tuple[bytes, list[list[Delayed]]]
I get
dask/bytes/core.py:16: error: Overloaded function signatures 1 and 2 overlap with incompatible return types [misc]
from mypy. It looks like mypy doesn't see a difference between
include_path: Literal[False] = False
and
include_path: Literal[True] = True
because include_path
is the only difference between the two overloads... still digging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK looks like it had something to do with giving the overloads default arguments. I think it should be fixed 👍
Sorry for the delay in getting back to this @douglasdavis! Unfortunately, I think you still need to give the overloads defaults in order for type inference to work. For example, on this branch the following fails to type check: from dask.bytes import read_bytes
reveal_type(read_bytes("path/to/stuff", include_path=True)) Dealing with literals in default keyword-arguments has been a persistently annoying thing in So I was able to get it working with this diff: diff --git a/dask/bytes/core.py b/dask/bytes/core.py
index 0164d024d..8b783cba6 100644
--- a/dask/bytes/core.py
+++ b/dask/bytes/core.py
@@ -15,28 +15,28 @@ from dask.utils import is_integer, parse_bytes
@overload
def read_bytes(
urlpath: str | list[str],
- delimiter: bytes | None,
- not_zero: bool,
- blocksize: str | int | None,
- sample: int | str | bool,
- compression: str | None,
- include_path: Literal[True],
+ delimiter: bytes | None = None,
+ not_zero: bool = False,
+ blocksize: str | int | None = None,
+ sample: int | str | bool = "10 kiB",
+ compression: str | None = None,
+ include_path: Literal[False] = False,
**kwargs: Any,
-) -> tuple[bytes, list[list[Delayed]], list[str]]:
+) -> tuple[bytes, list[list[Delayed]]]:
...
@overload
def read_bytes(
urlpath: str | list[str],
- delimiter: bytes | None,
- not_zero: bool,
- blocksize: str | int | None,
- sample: int | str | bool,
- compression: str | None,
- include_path: Literal[False],
+ delimiter: bytes | None = None,
+ not_zero: bool = False,
+ blocksize: str | int | None = None,
+ sample: int | str | bool = "10 kiB",
+ compression: str | None = None,
+ include_path: Literal[True] = ...,
**kwargs: Any,
-) -> tuple[bytes, list[list[Delayed]]]:
+) -> tuple[bytes, list[list[Delayed]], list[str]]:
...
|
2f33e23
to
cc33528
Compare
Thanks, @ian-r-rose! This does some seem like quite the tough case for mypy 😛 with your diff I'm seeing this error from mypy (version:
but the example code snippet you wrote does indeed work. I'm wondering how to make the mypy check on the dask codebase work. At this point I'd be OK with having an |
pre-commit run --all-files