Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up some mypy errors #76

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

jacobtomlinson
Copy link
Collaborator

@jacobtomlinson jacobtomlinson commented Mar 8, 2024

There were some mypy issues in #71 that were unrelated. Following up here to fix things up. Closes #74

There seems to be two issues when running pre-commit run --all-files:

  • MAX_SUPPORTED_PYARROW_WRITER_VERSION was renamed in deltalake>=0.16. Updated the name and bumped the minimum version.
  • Passing **kwargs to tokenize makes mypy unhappy. Explicitly ignoring that.

@mrocklin
Copy link
Contributor

mrocklin commented Mar 8, 2024

Thanks for handling this @jacobtomlinson . +1 to merge from my perspective.

However, I'm also curious about the failing tests. If you have even more appetite to fix things and want to fix that that would be welcome.

Leaving when to merge up to you.

@jacobtomlinson
Copy link
Collaborator Author

Yeah I'm wondering if upgrading to 0.16 has caused some other issues. Let me dig a little.

@jacobtomlinson
Copy link
Collaborator Author

jacobtomlinson commented Mar 8, 2024

Ok it looks like deltalake has changed how it reads a few dtypes from the all_primitive_types dataset.

binary                 string[pyarrow]  # detlalake
binary                          object  # dask

timestamp          datetime64[ns, UTC]  # detlalake
timestamp               datetime64[ns]  # dask

It looks like the failing test is already smoothing out some dtype issues and is assuming deltalake is correct so I've updated the test to do the same here.

However I still seem to be running into a problem with the test failing complaining the dtypes are not correct and I'm a little stumped.

@mrocklin
Copy link
Contributor

mrocklin commented Mar 8, 2024

cc @phofl who might have thoughts on dtypes returned here (I suspect that deltalake is using arrow)

@jacobtomlinson
Copy link
Collaborator Author

cc @jrbourbeau who also opened #74 about this issue.

@phofl
Copy link
Collaborator

phofl commented Mar 8, 2024

The string thing seems fine, the utc change is a little bit odd

@jacobtomlinson
Copy link
Collaborator Author

@jrbourbeau @phofl do either of you have thoughts on how I can move this PR forward? I'm a little stuck as to why the assertion is failing.

@phofl
Copy link
Collaborator

phofl commented Mar 13, 2024

Do you have the dtypes of both objects?

@jacobtomlinson
Copy link
Collaborator Author

Yeah but when I print them out that appear identical. So something deeper must be happening.

@phofl
Copy link
Collaborator

phofl commented Mar 13, 2024

Can you post them here?

@codecov-commenter
Copy link

codecov-commenter commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 25.00000% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 70.90%. Comparing base (ec1c90c) to head (696cfd3).

Files Patch % Lines
dask_deltatable/write.py 33.33% 2 Missing ⚠️
dask_deltatable/core.py 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
- Coverage   71.62%   70.90%   -0.73%     
==========================================
  Files           6        6              
  Lines         356      354       -2     
==========================================
- Hits          255      251       -4     
- Misses        101      103       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator Author

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for nudging this over the line @jrbourbeau!

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jacobtomlinson. Let's leave this up for another day in case others have feedback, otherwise I think it's good to go

tests/test_acceptance.py Show resolved Hide resolved
Comment on lines 122 to 125
fs=fs,
columns=columns,
schema=schema,
**kwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_map in dask-expr doesn't support token=, but in this case I think we can have from_map handle tokenization internally (someone should feel free to correct me if that's not the case)

@milesgranger
Copy link
Collaborator

I came to give my review, noticed some conflicts and thought I'd make a PR to yours (jacobtomlinson#1) in order to help resolve the conflicts caused by #78, then sorta noticed that did most of what was done here. My apologies for stepping on your toes here.

Copy link
Collaborator

@milesgranger milesgranger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jacobtomlinson!

@jacobtomlinson
Copy link
Collaborator Author

Thanks for the reviews and iterations on this everyone. I'm going to merge this in now.

@jacobtomlinson jacobtomlinson merged commit 78718f1 into dask-contrib:main Apr 3, 2024
10 checks passed
@jacobtomlinson jacobtomlinson deleted the linting-fixes branch April 3, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ImportError with deltalake=0.16.0
6 participants