sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954

yuzefovich · 2022-12-02T20:30:21Z

When implementing the projection pushdown into the KV (#82323), at least initially, we won't support user-defined types other than enums. The difficulty is that these types require hydration, but it seems non-trivial to inject that on the KV server side.

Jira issue: CRDB-22065

ajwerner · 2022-12-02T20:34:52Z

I don't know exactly what the deal is with vectors of these things but you can treat them as bytes and re-type them to the logical type later?

yuzefovich · 2022-12-02T20:38:52Z

Yeah, that sounds possible.

ajwerner · 2022-12-02T20:48:05Z

The record types are going to be worse when they come along. We'll want to break this down into enums and record types when the time comes.

ajwerner · 2022-12-02T20:49:45Z

If in the year 2030 I couldn't get push-down for record types stored in tables and indexes, I don't think I'd be sad. If I couldn't get push-down for enums though, I think I'd be quite sad.

yuzefovich · 2022-12-08T17:29:29Z

I thought a little bit more about this, and I think we can support enums in 23.1. This will require adding the "native" support of enums in the vectorized engine where the physical representation is stored in the coldata.Bytes under the hood and we produce DEnums from it on demand. I'll work on that support tomorrow.

yuzefovich · 2022-12-12T20:33:12Z

Alright, I think we can get enums this release.

93400: coldata: add native support of enums r=yuzefovich a=yuzefovich This commit adds the native support of enum types to the vectorized engine. We store them via their physical representation, so we can easily reuse `Bytes` vector for almost all operations, and, thus, we just mark the enum family as having the bytes family as its canonical representation. There are only a handful of places where we need to go from the physical representation to either the logical one or to the `DEnum`: - when constructing the pgwire message to the client (in both text and binary format the logical representation is used) - when converting from columnar to row-by-row format (fully-fledged `DEnum` is constructed) - casts. In all of these places we already have access to the precise typing information (similar to what we have for UUIDs which are supported via the bytes canonical type family already). I can really see only one downside to such implementation - in some places the resolution based on the canonical (rather than actual) type family might be too coarse. For example, we have `<bytes> || <bytes>` binary operator (`concat`). As it currently stands the execution will proceed to perform the concatenation between two UUIDs or between a BYTES value and a UUID, and now we'll be adding enums into the mix. However, the type checking is performed earlier on the query execution path, so I think it is acceptable since the execution should never reach such a setup. An additional benefit of this work is that we'll be able to support the KV projection pushdown in presence of enums - on the KV server side we'll just operate with the physical representations and won't need to have access to the hydrated type whereas on the client side we'll have the hydrated type, so we'll be able to do all operations. Addresses: #42043. Informs: #92954. Epic: CRDB-14837 Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

yuzefovich added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Dec 2, 2022

yuzefovich added this to Triage in SQL Queries via automation Dec 2, 2022

blathers-crl bot added the T-sql-queries SQL Queries Team label Dec 2, 2022

yuzefovich moved this from Triage to Backlog in SQL Queries Dec 2, 2022

yuzefovich mentioned this issue Dec 2, 2022

sql,kv,storage: push column batch generation into kvserver #82323

Open

10 tasks

yuzefovich changed the title ~~sql, kv: support user-defined types with COL_BATCH_RESPONSE scan format~~ sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format Dec 12, 2022

yuzefovich mentioned this issue Dec 12, 2022

coldata: add native support of enums #93400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954

sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954

yuzefovich commented Dec 2, 2022 •

edited

ajwerner commented Dec 2, 2022

yuzefovich commented Dec 2, 2022

ajwerner commented Dec 2, 2022

ajwerner commented Dec 2, 2022

yuzefovich commented Dec 8, 2022

yuzefovich commented Dec 12, 2022

sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954

sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954

Comments

yuzefovich commented Dec 2, 2022 • edited

ajwerner commented Dec 2, 2022

yuzefovich commented Dec 2, 2022

ajwerner commented Dec 2, 2022

ajwerner commented Dec 2, 2022

yuzefovich commented Dec 8, 2022

yuzefovich commented Dec 12, 2022

yuzefovich commented Dec 2, 2022 •

edited