Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read Delta Lake created with write_deltalake (Python bindings) #617

Closed
MrPowers opened this issue Jun 1, 2022 · 1 comment · Fixed by #618
Closed

Cannot read Delta Lake created with write_deltalake (Python bindings) #617

MrPowers opened this issue Jun 1, 2022 · 1 comment · Fixed by #618
Labels
bug Something isn't working

Comments

@MrPowers
Copy link
Collaborator

MrPowers commented Jun 1, 2022

Environment

Delta-rs version: 0.5.7

Binding: Python

Environment:

  • Cloud provider: N/A (using localhost)
  • OS: MacOS

Bug

What happened: I created a Delta Lake using delta-rs with write_deltalake and was unable to read the Delta Lake using delta-rs

What you expected to happen: I expected the read operation to work

How to reproduce it:

import pandas as pd
from deltalake.writer import write_deltalake

# create Delta Lake
df = pd.DataFrame({"x": [1, 2, 3]})
write_deltalake("./tmp/delta-table", df)

# read Delta Lake
from deltalake import DeltaTable
dt = DeltaTable("./tmp/delta-table")
dt.to_pandas()

Here's the error message:

File ~/opt/miniconda3/envs/mr-delta-rs/lib/python3.9/site-packages/deltalake/fs.py:149, in DeltaStorageHandler.open_input_file(self, path)
    142 def open_input_file(self, path: str) -> pa.NativeFile:
    143     """
    144     Open an input file for random access reading.
    145 
    146     :param source: The source to open for reading.
    147     :return:  NativeFile
    148     """
--> 149     raw = self._storage.get_obj(path)
    150     return pa.BufferReader(pa.py_buffer(raw))

PyDeltaTableError: Object not found

More details: You can easily reproduce this by creating this conda environment and running this Jupyter notebook.

Other operations like dt.files() and dt.version() work as expected. I am able to read Delta Tables created by delta-io/delta without any issue.

@MrPowers MrPowers added the bug Something isn't working label Jun 1, 2022
@wjones127
Copy link
Collaborator

Thanks for reporting this @MrPowers. Looks like we only tested with absolute paths so far, so didn't catch this yet.

But it's made me realize we are writing the paths in the delta log incorrectly; they should be relative to the table's root, but we've been writing absolute paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants