-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] [Scan Operator] Integrate size_bytes
with ScanOperator
s
#1586
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1586 +/- ##
==========================================
- Coverage 85.29% 85.19% -0.11%
==========================================
Files 54 54
Lines 5161 5180 +19
==========================================
+ Hits 4402 4413 +11
- Misses 759 767 +8
|
} else { | ||
// if the table is not loaded and we dont have stats, just return 0. | ||
// if the table is not loaded, we dont have stats, and we don't have the file size in bytes, just return 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return None
in that case as in... "I don't know"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change this to return a DaftResult<Option<usize>>
, which should be fine for resource request usage, but I'll have to double-check other usages to make sure that a None
won't blow things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that we sum size_bytes()
results in a few places in physical_plan.py
, so we'd need to do this 0 fallback there anyways. How about we keep this 0 fallback for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, or we could also do None to 0 conversion in the physical
planner and add a loud comment there.
The case I’m worried about is if we don’t know the size of a file, and tell
the planner that it is 0, but the file in fact turns out to be huge 😝
At least if the planner knows the file size is unknown it can maybe handle
it a little differently in the future
…n partition size information is lacking.
92bc37e
to
3408af6
Compare
This PR integrates
size_bytes
, which is used for eventual resource requests for the Ray runner, with theScanOperator
implementations (currently justGlobScanOperator
) andMicroPartition
.