New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954
Comments
I don't know exactly what the deal is with vectors of these things but you can treat them as bytes and re-type them to the logical type later? |
Yeah, that sounds possible. |
The record types are going to be worse when they come along. We'll want to break this down into enums and record types when the time comes. |
If in the year 2030 I couldn't get push-down for record types stored in tables and indexes, I don't think I'd be sad. If I couldn't get push-down for enums though, I think I'd be quite sad. |
I thought a little bit more about this, and I think we can support enums in 23.1. This will require adding the "native" support of enums in the vectorized engine where the physical representation is stored in the |
Alright, I think we can get enums this release. |
93400: coldata: add native support of enums r=yuzefovich a=yuzefovich This commit adds the native support of enum types to the vectorized engine. We store them via their physical representation, so we can easily reuse `Bytes` vector for almost all operations, and, thus, we just mark the enum family as having the bytes family as its canonical representation. There are only a handful of places where we need to go from the physical representation to either the logical one or to the `DEnum`: - when constructing the pgwire message to the client (in both text and binary format the logical representation is used) - when converting from columnar to row-by-row format (fully-fledged `DEnum` is constructed) - casts. In all of these places we already have access to the precise typing information (similar to what we have for UUIDs which are supported via the bytes canonical type family already). I can really see only one downside to such implementation - in some places the resolution based on the canonical (rather than actual) type family might be too coarse. For example, we have `<bytes> || <bytes>` binary operator (`concat`). As it currently stands the execution will proceed to perform the concatenation between two UUIDs or between a BYTES value and a UUID, and now we'll be adding enums into the mix. However, the type checking is performed earlier on the query execution path, so I think it is acceptable since the execution should never reach such a setup. An additional benefit of this work is that we'll be able to support the KV projection pushdown in presence of enums - on the KV server side we'll just operate with the physical representations and won't need to have access to the hydrated type whereas on the client side we'll have the hydrated type, so we'll be able to do all operations. Addresses: #42043. Informs: #92954. Epic: CRDB-14837 Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
When implementing the projection pushdown into the KV (#82323), at least initially, we won't support user-defined types other than enums. The difficulty is that these types require hydration, but it seems non-trivial to inject that on the KV server side.
Jira issue: CRDB-22065
The text was updated successfully, but these errors were encountered: