Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type hints for read_bytes #9440

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

douglasdavis
Copy link
Member

@douglasdavis douglasdavis commented Aug 29, 2022

  • Passes pre-commit run --all-files

@douglasdavis
Copy link
Member Author

CI failure unrelated

Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

**kwargs,
):
urlpath: str | list[str],
delimiter: Any | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use bytes | None here, per the docstring? Doesn't seem to cause any problems for me locally

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

compression: str | None = None,
include_path: bool = False,
**kwargs: Any,
) -> tuple[Any, ...]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be covered with an @overload covering different values for include_path, but I leave it to you whether it;s worth your time :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for pointing that out, I think I got something that should work

@douglasdavis
Copy link
Member Author

@ian-r-rose thanks for taking a look!

compression: str | None = None,
include_path: Literal[False] = False,
**kwargs: Any,
) -> tuple[bytes, list[list[Delayed]], list[str]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this overload not return a list[str]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep 🤦‍♂️. I was surprised mypy didnt catch this. If I make the change to tuple[bytes, list[list[Delayed]]] I get

dask/bytes/core.py:16: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [misc]

from mypy. It looks like mypy doesn't see a difference between

include_path: Literal[False] = False

and

include_path: Literal[True] = True

because include_path is the only difference between the two overloads... still digging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK looks like it had something to do with giving the overloads default arguments. I think it should be fixed 👍

@ian-r-rose
Copy link
Collaborator

Sorry for the delay in getting back to this @douglasdavis!

Unfortunately, I think you still need to give the overloads defaults in order for type inference to work. For example, on this branch the following fails to type check:

from dask.bytes import read_bytes

reveal_type(read_bytes("path/to/stuff", include_path=True))

Dealing with literals in default keyword-arguments has been a persistently annoying thing in mypy (see the long discussion in python/mypy#6580), but I do think it's doable using the special ellipsis symbol. It's not very well documented, but see here.

So I was able to get it working with this diff:

diff --git a/dask/bytes/core.py b/dask/bytes/core.py
index 0164d024d..8b783cba6 100644
--- a/dask/bytes/core.py
+++ b/dask/bytes/core.py
@@ -15,28 +15,28 @@ from dask.utils import is_integer, parse_bytes
 @overload
 def read_bytes(
     urlpath: str | list[str],
-    delimiter: bytes | None,
-    not_zero: bool,
-    blocksize: str | int | None,
-    sample: int | str | bool,
-    compression: str | None,
-    include_path: Literal[True],
+    delimiter: bytes | None = None,
+    not_zero: bool = False,
+    blocksize: str | int | None = None,
+    sample: int | str | bool = "10 kiB",
+    compression: str | None = None,
+    include_path: Literal[False] = False,
     **kwargs: Any,
-) -> tuple[bytes, list[list[Delayed]], list[str]]:
+) -> tuple[bytes, list[list[Delayed]]]:
     ...
 
 
 @overload
 def read_bytes(
     urlpath: str | list[str],
-    delimiter: bytes | None,
-    not_zero: bool,
-    blocksize: str | int | None,
-    sample: int | str | bool,
-    compression: str | None,
-    include_path: Literal[False],
+    delimiter: bytes | None = None,
+    not_zero: bool = False,
+    blocksize: str | int | None = None,
+    sample: int | str | bool = "10 kiB",
+    compression: str | None = None,
+    include_path: Literal[True] = ...,
     **kwargs: Any,
-) -> tuple[bytes, list[list[Delayed]]]:
+) -> tuple[bytes, list[list[Delayed]], list[str]]:
     ...
 

@douglasdavis
Copy link
Member Author

douglasdavis commented Sep 6, 2022

Thanks, @ian-r-rose! This does some seem like quite the tough case for mypy 😛 with your diff I'm seeing this error from mypy (version: mypy 0.971 (compiled: yes)) if I run mypy dask/bytes/core.py:

dask/bytes/core.py:16: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [misc]

but the example code snippet you wrote does indeed work. I'm wondering how to make the mypy check on the dask codebase work. At this point I'd be OK with having an Any return typehint if this is getting too in the weeds

@github-actions github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants