Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954

Open
Tracked by #82323
yuzefovich opened this issue Dec 2, 2022 · 6 comments
Open
Tracked by #82323
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-sql-queries SQL Queries Team
Projects

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Dec 2, 2022

When implementing the projection pushdown into the KV (#82323), at least initially, we won't support user-defined types other than enums. The difficulty is that these types require hydration, but it seems non-trivial to inject that on the KV server side.

Jira issue: CRDB-22065

@yuzefovich yuzefovich added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Dec 2, 2022
@yuzefovich yuzefovich added this to Triage in SQL Queries via automation Dec 2, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Dec 2, 2022
@yuzefovich yuzefovich moved this from Triage to Backlog in SQL Queries Dec 2, 2022
@ajwerner
Copy link
Contributor

ajwerner commented Dec 2, 2022

I don't know exactly what the deal is with vectors of these things but you can treat them as bytes and re-type them to the logical type later?

@yuzefovich
Copy link
Member Author

Yeah, that sounds possible.

@ajwerner
Copy link
Contributor

ajwerner commented Dec 2, 2022

The record types are going to be worse when they come along. We'll want to break this down into enums and record types when the time comes.

@ajwerner
Copy link
Contributor

ajwerner commented Dec 2, 2022

If in the year 2030 I couldn't get push-down for record types stored in tables and indexes, I don't think I'd be sad. If I couldn't get push-down for enums though, I think I'd be quite sad.

@yuzefovich
Copy link
Member Author

I thought a little bit more about this, and I think we can support enums in 23.1. This will require adding the "native" support of enums in the vectorized engine where the physical representation is stored in the coldata.Bytes under the hood and we produce DEnums from it on demand. I'll work on that support tomorrow.

@yuzefovich yuzefovich changed the title sql, kv: support user-defined types with COL_BATCH_RESPONSE scan format sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format Dec 12, 2022
@yuzefovich
Copy link
Member Author

Alright, I think we can get enums this release.

craig bot pushed a commit that referenced this issue Dec 14, 2022
93400: coldata: add native support of enums r=yuzefovich a=yuzefovich

This commit adds the native support of enum types to the vectorized
engine. We store them via their physical representation, so we can
easily reuse `Bytes` vector for almost all operations, and, thus, we
just mark the enum family as having the bytes family as its canonical
representation. There are only a handful of places where we need to go
from the physical representation to either the logical one or to the
`DEnum`:
- when constructing the pgwire message to the client (in both text and
binary format the logical representation is used)
- when converting from columnar to row-by-row format (fully-fledged
`DEnum` is constructed)
- casts.

In all of these places we already have access to the precise typing
information (similar to what we have for UUIDs which are supported via
the bytes canonical type family already).

I can really see only one downside to such implementation - in some
places the resolution based on the canonical (rather than actual) type
family might be too coarse. For example, we have `<bytes> || <bytes>`
binary operator (`concat`). As it currently stands the execution will
proceed to perform the concatenation between two UUIDs or between a
BYTES value and a UUID, and now we'll be adding enums into the mix.
However, the type checking is performed earlier on the query execution
path, so I think it is acceptable since the execution should never
reach such a setup.

An additional benefit of this work is that we'll be able to support the
KV projection pushdown in presence of enums - on the KV server side
we'll just operate with the physical representations and won't need to
have access to the hydrated type whereas on the client side we'll have
the hydrated type, so we'll be able to do all operations.

Addresses: #42043.
Informs: #92954.

Epic: CRDB-14837

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-sql-queries SQL Queries Team
Projects
Status: Backlog
SQL Queries
Backlog (DO NOT ADD NEW ISSUES)
Development

No branches or pull requests

2 participants