-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Adaptive Query Execution #2176
Conversation
4fc57ee
to
0628788
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2176 +/- ##
==========================================
- Coverage 85.64% 85.08% -0.56%
==========================================
Files 71 68 -3
Lines 7661 7367 -294
==========================================
- Hits 6561 6268 -293
+ Misses 1100 1099 -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀 🚀 🚀
@@ -302,6 +302,9 @@ def num_partitions(self) -> int | None: | |||
def size_bytes(self) -> int | None: | |||
return self.value.size_bytes() if self.value is not None else None | |||
|
|||
def num_rows(self) -> int | None: | |||
return len(self.value) if self.value is not None else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an assertion in the AdaptivePhysicalPlanScheduler
that cache_entry.num_rows() is not None
, is there any case in which that won't be true (i.e. will cache_entry.value
ever be None
at that point)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I was just following the convention above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline!
adaptive_planner = builder.to_adaptive_physical_plan_scheduler(daft_execution_config) | ||
while not adaptive_planner.is_done(): | ||
source_id, plan_scheduler = adaptive_planner.next() | ||
# don't store partition sets in variable to avoid reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, good call.
metadatas = [PartitionMetadata.from_table(p) for p in partitions] | ||
assert len(partial_metadatas) == len(partitions), f"{len(partial_metadatas)} vs {len(partitions)}" | ||
|
||
metadatas = [PartitionMetadata.from_table(p).merge_with_partial(m) for p, m in zip(partitions, partial_metadatas)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I was just doing something similar in the executor branch. 😄
} | ||
Self::Project(Project { input, .. }) | ||
| Self::MonotonicallyIncreasingId(MonotonicallyIncreasingId { input, .. }) => { | ||
// TODO(sammy), we need the schema to estimate the new size per row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we can always tweak logical->physical translation to add the schema to the physical Project
and MontonicallyIncreasingId
structs for this case! Obviously not a blocking issue, though.
build_partitions
where it didn't forward partial metadataDAFT_ENABLE_AQE=1