-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[Relation] Add MaterializedRelation #11835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…'t inherit the segments, only the allocator
…he type of the property is a pointer
…own into BoundColumnRef as well
…ataCollection and Copy is called, we perform a copy of the column data collection
|
As always, impressive work. Only note for posterity: this might make so that going through serialization and then deserialization we might start from 1 object somewhere in memory to having a few copies, potentially with degenerate cases like circular linked list that expand to infinite data. Responsibility is on the users of optionally_owned_ptr to guarantee that data structure are acyclical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks great - some minor comments.
- Maybe we can also do some tests where we create a materialized relation and then query it.
- Perhaps also something like, we destroy the connection it came from and then query it in a different one.
- Maybe also some performance tests, e.g. how fast is the query of a materialized query result?
I'll add those 👍
MaterializedRelations are currently not standalone, I think this requires rethinking the structure of Relation a bit I think this change will not be trivial and should probably be separate.
Good idea, I'll add it to the regression_test_python.py script |
|
The benchmarks seem to use too much memory for the CI, could you perhaps reduce the load there?
|
|
Thanks! |
Merge pull request duckdb/duckdb#11835 from Tishj/materialized_relation Merge pull request duckdb/duckdb#11913 from hawkfish/icu-basictz
This PR adds a new Relation type.
MaterializedRelationThis Relation represents a materialized data set, likely produced by a MaterializedQueryResult.
It contains a ColumnDataCollection and allows us to efficiently scan this with the existing Logical + PhysicalColumnDataScan.
ColumnDataRefTo do this it introduces the ColumnDataRef and BoundColumnDataRef classes.
We don't want to make unnecessary copies of this (potentially giant) column data collection, so the ColumnDataRef only gets a reference to the collection.
Because TableRef's need to be Serializable though, we can't rely on this reference after deserialization.
If the ColumnDataRef is serialized to disk, it writes the entire column data collection to disk as well.
When the ColumnDataRef is deserialized, it creates a copy of the collection that it now owns.
As a result, the ColumnDataRef has an
optionally_owned_ptr, and this trickles down intoBoundColumnDataRef,LogicalColumnDataScanandPhysicalColumnDataScan(which already had this quality)optionally_owned_ptrTo encapsulate the scenario where a class contains this pattern:
We create a new class
optionally_owned_ptrwhich makes sure that these constraints are respected:owned_objectis set,objectmust refer to itis_owned()can be used to check whether the pointer is owningBecause the class is neither an
optional_ptror aunique_ptrit is intentionally not convertible to either.Usage
In the Python API, in the internal execute method (shared by query/sql and execute) we previously created a ValueRelation if the statement was not a SELECT, this is now replaced by a MaterializedRelation.
Benchmark results:
These are timings measuring
fetchall()from relations created with eitherCALLorSELECTstatements.The CALL statements utilize the MaterializedRelation.
Future work
The plan is to extend the DuckDBPyRelation to store a MaterializedRelation internally when
DuckDBPyRelation.executeis called, any future reference of the relation after that (by a replacement scan or by incrementally building on top of it) will make use of the MaterializedRelation.