sql: add infrastructure for unresolved types#47386
sql: add infrastructure for unresolved types#47386craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
❌ The GitHub CI (Cockroach) build has failed on 70bfd265. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
❌ The GitHub CI (Cockroach) build has failed on 5162fa92. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
❌ The GitHub CI (Cockroach) build has failed on 94520ad8. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
❌ The GitHub CI (Cockroach) build has failed on 4831469a. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
Now that this is starting to pass tests, I wanted to get opinions on it before I really drill down and fix the remaining tests/removing the tricky TODO's. cc'ing folks who I think have opinions on the type system -- @knz, @andy-kimball, @RaduBerinde |
|
❌ The GitHub CI (Cockroach) build has failed on 4fa37003. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
❌ The GitHub CI (Cockroach) build has failed on 963068e5. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
otan
left a comment
There was a problem hiding this comment.
Took a quick peek but will come back to it - the approach sounds sane and it seems like something postgres has to do something similar to this to resolve types from the name to the entry in pg_type. Also similar to the approach of ResolvableFunctionReference if I understand correctly.
The main question I have is whether this affects immutability of the AST. Don't want another ResolvableFunctionReference thing to happen again.
|
|
||
| // UnresolvedCastExpr represents a CAST(expr AS type) expression with a | ||
| // possibly unresolved type reference. | ||
| type UnresolvedCastExpr struct { |
There was a problem hiding this comment.
do you know why we can't just make CastExpr have ResolvableTypeReference?
There was a problem hiding this comment.
There's no technical reason, but it involves changing alot more code -- alot of places make cast exprs, then we have to plumb type resolvers to all those places. Ideally once we typecheck there's no need to resolve anymore types, so that's what I went with here. It should only be before typechecking that if someone has a *types.T they need to wrap it with KnownType before doing anything.
There was a problem hiding this comment.
Hmm, maybe worth flagging as a TODO for cleanup or a comment on why it is.
Seems like we have to duplicate a lot of the same logic if it's messy...
There was a problem hiding this comment.
What about this: We can rename CastExpr.Type to CastExpr.UnresolvedType (and type it as ResolvableTypeReference), and then switch all post-resolution users to call ResolvedType() instead. This lets us get rid of UnresolvedCastExpr without much disturbance.
| // in the case that the reference is not statically known. This function | ||
| // is intended to be used in tests or in cases where it is not possible | ||
| // to have any unresolved type references. | ||
| func GetKnownTypeOrPanic(ref ResolvableTypeReference) *types.T { |
There was a problem hiding this comment.
i think the standard is MustGetKnownType for things that may panic.
The AST's aren't immutable -- the race was just caused by people assuming it was. But I don't think this affects the immutability -- we aren't writing types into a node once they are resolved. |
|
❌ The GitHub CI (Cockroach) build has failed on e797d12f. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
jordanlewis
left a comment
There was a problem hiding this comment.
I feel somewhat unconvinced by this approach so far. I don't see why you need to plumb all of these references so deep into the system. I would prefer an approach that resolved unresolved types into ordinary types.T as high up in the stack as possible - but maybe I'm missing some reason this is not possible?
Would it help if you did a pass through the AST before type checking?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rohany)
pkg/sql/sem/tree/create.go, line 304 at r2 (raw file):
typ := GetKnownType(d.Type) if typ == nil { return nil, errors.New("Cannot collate on this type")
this should be an internal error
pkg/sql/sem/tree/eval.go, line 2825 at r2 (raw file):
iVarContainerStack []IndexedVarContainer TypeResolver TypeReferenceResolver
comment
pkg/sql/sem/tree/type_name.go, line 40 at r2 (raw file):
// KnownType is a wrapper around *types.T that represents a type reference // that has a statically known type. type KnownType struct {
I don't understand the point of this wrapper. Could we have a method on types.T instead? You could add Resolve to types.T as well, which would do the same thing. I just don't know why you need this new interface set, which seems rather heavy-weight...
rohany
left a comment
There was a problem hiding this comment.
I would prefer an approach that resolved unresolved types into ordinary types.T as high up in the stack as possible - but maybe I'm missing some reason this is not possible?
It doesn't make sense where to do type resolution before this -- we have a pipeline of parsing -> type checking -> planning -> execution. Typechecking is the first place after parsing where we get a semantic context of how to interpret expressions (i.e. we get a search path etc). This is very similar to the approach to resolve function references.
One reason for the leakage you see is that all of our Statements don't really go through typechecking -- so they hold onto this reference until they try to operate on it later (like add column). This doesn't seem problematic to me.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rohany)
pkg/sql/sem/tree/type_name.go, line 40 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I don't understand the point of this wrapper. Could we have a method on types.T instead? You could add
Resolvetotypes.Tas well, which would do the same thing. I just don't know why you need this new interface set, which seems rather heavy-weight...
I agree that its not technically needed -- but there are some logistical issues (not sure how severe they actually are) with pushing this interface down into types and letting types.T implement it directly:
- This trick of having an unexported method to control what types actually implement the ResolvableType interface won't work (some things that ascribe to the interface in tree won't be able to implement the hidden method)
- This
TypeReferenceResolvermost likely will need to take in a tree.UnresolvedName or something similar in the future, but pushing it down into tree restricts the ability to do that.
Generally the problems seem to be the difficulties caused by types and tree being separate packages. If we let types.T implement this interface directly, we can't add methods on it that use any objects in tree in the future, which seems tricky. It would be really nice to have types.T implement this interface directly though -- it would cut a bunch of the changes in this commit out.
currently looking into it, but as a question, what should tree contain and what should it not? |
otan
left a comment
There was a problem hiding this comment.
if this is the approach we're going with (i don't have any argument against this general approach), it mostly looks good to me (few questions below)
my main gripe/questions are the naming and workflow -- see type_name.go comments.
It would be really nice to have types.T implement this interface directly though -- it would cut a bunch of the changes in this commit out.
is that hard? the interface can still live in tree. i can't make sense of the magnitude of the change atm.
There are also quite a few TODOs in here that seem like we should make them tasks or clarify. I've picked out the ones that I think we should.
| // | TupleContents | Contains tuple field types (can be recursively defined) | | ||
| // | TupleLabels | Contains labels for each tuple field | | ||
| // | ||
| // Array types |
| // in the case that the reference is not statically known. This function | ||
| // is intended to be used in tests or in cases where it is not possible | ||
| // to have any unresolved type references. | ||
| func GetKnownTypeOrPanic(ref ResolvableTypeReference) *types.T { |
|
|
||
| // KnownType is a wrapper around *types.T that represents a type reference | ||
| // that has a statically known type. | ||
| type KnownType struct { |
There was a problem hiding this comment.
I'm reflecting on the name of thing thing and think it may make sense to call it something like StaticallyKnownType or StaticType or FixedType or Hardcoded something. KnownType I think may be confusing because well, a custom made type can be Known too, right? I've noticed you called a function IsReferenceSerialType below which seems like another different name.
If this can also result in a non-statically known type (which is against the comment above but makes sense with the UnresolvedObjectName comment below), maybe call it ResolvedType?
Optional on this one, but the name kinda reads awkward.
Also wondering if "static"/"known" types should be explicitly marked as such on types.T to prevent mis-use. Something like a field, or a known set of OIDs would help here.
| case *UnresolvedCastExpr: | ||
| if arg, ok := t.Expr.(*Placeholder); ok { | ||
| // TODO (rohany): This function needs access to a semaContext. | ||
| // TODO (rohany): Call into the case for an annotateExpr. |
| indexElemList := make(tree.IndexElemList, 0, len(cols)) | ||
| for i := range cols { | ||
| semType := cols[i].Type.Family() | ||
| // TODO (rohany): Will we have to resolve types here? |
There was a problem hiding this comment.
will we?
(what's our policy on cleaning up TODOs? should this be TODO(task num) instead?)
|
|
||
| // UnresolvedCastExpr represents a CAST(expr AS type) expression with a | ||
| // possibly unresolved type reference. | ||
| type UnresolvedCastExpr struct { |
There was a problem hiding this comment.
Hmm, maybe worth flagging as a TODO for cleanup or a comment on why it is.
Seems like we have to duplicate a lot of the same logic if it's messy...
| return strength, s, nil | ||
|
|
||
| case *UnresolvedAnnotateTypeExpr: | ||
| // Ditto CastExpr. |
| } | ||
|
|
||
| // Resolve implements the ResolvableTypeReference interface. | ||
| func (name *UnresolvedObjectName) Resolve(resolver TypeReferenceResolver) (*types.T, error) { |
There was a problem hiding this comment.
What are the semantics of UnresolvedObjectName? Once it gets resolved, is it still going to be called UnresolvedObjectName, or does it become a new object?
If it's the former, we may need a better name?
If it's the latter, how does that work (do we deep copy the AST and replace it with KnownType? If so maybe KnownType is an okay name, but we should call it ResolvedType instead)
|
❌ The GitHub CI (Cockroach) build has failed on 98d081eb. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
d576cf6 to
26929bb
Compare
|
❌ The GitHub CI (Cockroach) build has failed on 26929bbe. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
ed16702 to
4f0bdde
Compare
|
Hi all, can you PTAL. I've updated this PR heavily according to suggestions and more experience with related parts of the code. I've done the following:
|
|
❌ The GitHub CI (Cockroach) build has failed on 4f0bdde8. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
otan
left a comment
There was a problem hiding this comment.
Reviewed 3 of 43 files at r1, 1 of 14 files at r2, 3 of 33 files at r3, 44 of 50 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @RaduBerinde, and @rohany)
pkg/ccl/importccl/read_import_pgdump.go, line 190 at r2 (raw file):
if typ := tree.GetKnownType(e.Type); typ.Oid() == oid.T_regclass { // tree.Visitor says we should make a copy, but since copyNode is unexported // and there's no planner here, I think it's safe to directly modify the
Yikes at the "I think".
pkg/sql/conn_executor_exec.go, line 332 at r4 (raw file):
typeHints = make(tree.PlaceholderTypes, stmt.NumPlaceholders) for i, t := range s.Types { resolved, err := tree.ResolveType(t, ex.planner.semaCtx.GetTypeResolver())
nit: typeHints[i], err := ... works too right?
pkg/sql/sem/tree/create.go, line 547 at r4 (raw file):
// However, the argument is that we deal with serial at parse time only, // so we handle those cases here. switch GetStaticallyKnownType(node.Type).Width() {
what happens if this is nil?
pkg/sql/sem/tree/type_check_test.go, line 315 at r4 (raw file):
for _, d := range testData { t.Run(d.expr, func(t *testing.T) { ctx := tree.MakeSemaContext()
can we test with explicit nil semaCtx?
pkg/sql/sem/tree/type_name.go, line 110 at r4 (raw file):
// GetStaticallyKnownType possibly promotes a ResolvableTypeReference into a *types.T // if the reference is a statically known type. It returns nil otherwise. func GetStaticallyKnownType(ref ResolvableTypeReference) *types.T {
i know this is painful but...i think to make users think about this properly when using it, it should return (*types.T, error). Wrap the error around something so it's decomposable, e.g. type NotStaticallyKnownTypeError { error } (or something similar).
If that's heavyweight, I think a bool works too, returning false if it is not a statically known type.
pkg/sql/sem/tree/type_name.go, line 159 at r4 (raw file):
} var _ ResolvableTypeReference = &UnresolvedObjectName{}
nit: i like these closer to the interface definition (or right underneath the struct definition so it's clear UnresolvedObjectName implements ResolvableTypeReference).. also (*UnresolvedObjectName)(nil) if you feel poncy.
pkg/sql/sem/tree/type_name.go, line 178 at r4 (raw file):
// MakeDummyTypeResolver creates a MapTypeResolver from a map. func MakeDummyTypeResolver(typeMap map[string]*types.T) TypeReferenceResolver {
nit: prefix test only with TestingMapTypeResolver, and make this name similar.
(also mockery is cool: https://github.com/vektra/mockery, but may be too heavy duty for this)
|
pkg/sql/conn_executor_exec.go, line 332 at r4 (raw file): Previously, otan (Oliver Tan) wrote…
oh nvm |
rohany
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @otan, @RaduBerinde, and @rohany)
pkg/ccl/importccl/read_import_pgdump.go, line 190 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
Yikes at the "I think".
I think this is out of date -- sadly the "i think" was already present, and I don't know enough about this code to prove or disprove it.
pkg/sql/opt/idxconstraint/index_constraints_test.go, line 89 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I don't think adding support for UDT's is necessary here. We can always add later if we need it.
Done.
pkg/sql/opt/memo/expr_test.go, line 80 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I don't think adding support for UDT's is necessary here. We can always add later if we need it.
Done.
pkg/sql/opt/optbuilder/builder_test.go, line 84 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I don't think adding support for UDT's is necessary here. We can always add later if we need it.
Done.
pkg/sql/opt/optgen/exprgen/parse_type.go, line 26 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I don't think adding support for UDT's is necessary here. We can always add later if we need it. Ditto for all the exprgen-related files.
Done.
pkg/sql/opt/testutils/testcat/create_table.go, line 385 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
maybe confirm with optimizer team and delete the line and replace with an explanation?
Done.
pkg/sql/sem/tree/create.go, line 547 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
what happens if this is nil?
Good catch. Another argument for your suggestion below (or above idk how these files are arranged)
pkg/sql/sem/tree/eval.go, line 2825 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
comment
Done.
pkg/sql/sem/tree/expr.go, line 1447 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
What about this: We can rename
CastExpr.TypetoCastExpr.UnresolvedType(and type it asResolvableTypeReference), and then switch all post-resolution users to callResolvedType()instead. This lets us get rid ofUnresolvedCastExprwithout much disturbance.
Done.
pkg/sql/sem/tree/expr.go, line 1677 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Same comment here as with
CastExpr.
Done.
pkg/sql/sem/tree/type_check.go, line 2351 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
is this TODO safe to ignore?
Done.
pkg/sql/sem/tree/type_check_test.go, line 315 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
can we test with explicit nil semaCtx?
Done.
pkg/sql/sem/tree/type_name.go, line 72 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
bump!
Done.
pkg/sql/sem/tree/type_name.go, line 68 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Better names:
ElementTypeorContentType.
Done.
pkg/sql/sem/tree/type_name.go, line 110 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
i know this is painful but...i think to make users think about this properly when using it, it should return
(*types.T, error). Wrap the error around something so it's decomposable, e.g.type NotStaticallyKnownTypeError { error }(or something similar).If that's heavyweight, I think a bool works too, returning false if it is not a statically known type.
I think a bool makes sense. Forcing the caller to check the return value will prevent a good number of bugs.
pkg/sql/sem/tree/type_name.go, line 178 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
nit: prefix test only with
TestingMapTypeResolver, and make this name similar.
(also mockery is cool: https://github.com/vektra/mockery, but may be too heavy duty for this)
pulling in dependencies is a nono
pkg/sql/sqlbase/testutils.go, line 1321 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
will we?
(what's our policy on cleaning up TODOs? should this be TODO(task num) instead?)
Done.
otan
left a comment
There was a problem hiding this comment.
changes LGTM!
Reviewed 9 of 19 files at r5, 1 of 12 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @otan, @RaduBerinde, and @rohany)
|
Thanks for all the work on this. It's an ambitious PR, with a lot of details to get right. Looks like you addressed all of my concerns. I didn't look at every detail, but the overall structure and design looks right to me. |
|
TFTRs! bors r+ |
jordanlewis
left a comment
There was a problem hiding this comment.
this is great, exciting! I have a few tiny nits, but don't bother un-borsing it since I know this is going to take more work to finish soon.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @otan, @RaduBerinde, and @rohany)
pkg/sql/sem/tree/name_part.go, line 211 at r6 (raw file):
func (u *UnresolvedName) ToUnresolvedObjectName(idx AnnotationIdx) (*UnresolvedObjectName, error) { if u.NumParts == 4 { return nil, errors.Newf("improper qualified name (too many dotted names): %s", u)
Is this a user-facing error? If so, it should have a pgcode.
pkg/sql/sem/tree/type_name.go, line 71 at r6 (raw file):
// TypeReferenceResolver is the interface that will provide the ability // to actually lookup type metadata and transform references into
nit: s/lookup/look up/
pkg/sql/sem/tree/type_name.go, line 104 at r6 (raw file):
// If we don't have a resolver, we can't actually resolve this // name into a type. return nil, errors.Newf("type %q does not exist", t)
This probably wants to have a pgcode as well.
|
pkg/sql/sem/tree/type_name.go, line 128 at r6 (raw file):
[nit] panic(errors.AssertionFailedf(..)) |
|
I'll just fix these, bors seems unhappy. bors r- |
This PR extends our type system to unresolved/unknown types before and after type checking. The idea is to an indirection to `*types.T` during parsing called a `ResolvableTypeReference`. Wherever we are unsure whether a type is statically known or not we have to use this type reference. The typechecking phase then removes all of these unknown type references by attempting to resolve each one before proceeding in typechecking. A large amount of this commit is propogating this information through to the rest of the SQL system, where most places expect `*types.T` we have to teach it to now handle type references and to resolve them. Release note: None
|
bors r+ |
Build succeeded |
This PR extends our type system to unresolved/unknown types
before and after type checking. This work is part of the larger project
to add ENUM types to CockroachDB. The idea is to an indirection
to
*types.Tduring parsing called aResolvableTypeReference.Wherever we are unsure whether a type is statically known or not
we have to use this type reference. The typechecking phase then
removes all of these unknown type references by attempting to
resolve each one before proceeding in typechecking. This intuiton
is enforced by ensuring that no values that ascribe to the
TypedExprinterface containResolvableTypeReferences.A large amount of this commit is propogating this information
through to the rest of the SQL system, where most places expect
*types.Twe have to teach it to now handle type references andto resolve them.
Release note: None