-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dd.from_map
#24
Use dd.from_map
#24
Conversation
meta = make_meta(self.schema.empty_table().to_pandas()) | ||
if self.columns: | ||
meta = meta[self.columns] | ||
self.meta = meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is purely cosmetic. Since self._make_meta_from_schema
was only used in one place I just moved it into __init__
as an attribute
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #24 +/- ##
==========================================
- Coverage 93.04% 92.85% -0.19%
==========================================
Files 2 2
Lines 115 112 -3
==========================================
- Hits 107 104 -3
Misses 8 8
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Maybe @rjzamora has interest in reviewing here too?
dask_deltatable/core.py
Outdated
# Create Blockwise layer | ||
label = "read-delta-table-" | ||
output_name = f"{label}{tokenize(self.fs_token, **kwargs)}" | ||
layer = DataFrameIOLayer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my general suggestion would be to use from_map
instead of interacting directly with DataFrameIOLayer
like this, but I may be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rjzamora. Looks like internally from_map
is a wrapper for DataFrameIOLayer
. The intent of this PR is to make sure we get nice blockwise optimization here. I've updated to use from_map
. Let me know if that looks okay to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_map is a wrapper for DataFrameIOLayer
Yes exactly. All public IO functions (e.g. read_parquet
and read_csv
) use from_map
for this purpose. from_map
can be thought of as DataFrameIOLayer
's "public" API
Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
xref #25 for the Python 3.7 CI build failures |
@rjzamora how does this look to you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Note that it wouldn't be difficult to add column projection in a follow-up (since we are using from_map
).
Thanks for reviewing @rjzamora |
This gives us more efficient graph optimizations when reading in data
Closes #23