-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle pandas timestamps #958
Conversation
…hema to microsecond timestamps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Have two requests on it.
python/deltalake/writer.py
Outdated
_data = pa.Table.from_pandas(data) | ||
_schema = _data.schema | ||
schema_out = [] | ||
for _field in _schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be willing to pull this out into a helper function (_delta_arrow_schema_from_pandas
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like this idea. Factored this out into a helper function, but I'd prefer it to be public rather than private. Would like to leverage this helper function elsewhere. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but if you make it public, then I'd suggest putting it in schema.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Not sure why |
@hayesgb you might need to add timestamp to https://github.com/delta-io/delta-rs/blob/main/python/stubs/pandas/__init__.pyi in order to fix the mypy error |
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI test is being flaky. I tested locally tested and all tests pass.
Thank you @hayesgb!
Description
As described in #686 some pandas datatypes are not converted to a format that is compatible with delta lake. This handles the instance of timestamps, which are stored with
ns
resolution in Pandas. Here, if is a schema is not provided, we specify converting the timestamps tous
resolution.We also update
python/tests/test_writer.py::test_write_pandas
to reflect this change.Related Issue(s)
#685