Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Unify the SQL type system #36598

Merged
merged 18 commits into from Apr 19, 2019

Conversation

Projects
None yet
4 participants
@andy-kimball
Copy link
Contributor

commented Apr 7, 2019

This PR unifies the three different SQL type systems in CRDB:

sqlbase.ColumnType - protobuf representation of table column types
coltypes.T - in-memory representation of table column types
types.T - computation types

The type systems are quite similar, but also subtly different. This causes bugs on a regular basis throughout the codebase. Each type system supports a different subset of the PG type functionality. For example, sqlbase.ColumnType stores column widths. types.T does not store widths, but it can store a Postgres OID value, which sqlbase.ColumnType is unable to do.

In addition, even for supported functionality, the API's are often tricky and error-prone. For example, types.T values can sometimes be compared using ==, but other times Equivalent must be called, and it is up to the user to know when to use which. As another example, support for OID values require wrapping types.T instances in a special TOIDWrapper type. Our developers are expected to know when it needs to be done, but typically they either don't understand when it's needed, or else forget to do it. It also breaks == comparison semantics, leading to subtle bugs that only manifest in edge cases with unusual OID values.

Another big problem that comes up repeatedly are lossy conversions. Throughout the codebase, there are frequent conversions to/from the various type systems. Many conversions discard important type information, which often leads to incompatibilities with Postgres, as well as providing a breeding ground for bugs.

This PR unifies the three type systems into just one. The unified type is called types.T (in new sql/types package). It uses a protobuf-generated struct that is backwards-compatible with sqlbase.ColumnType so that it can be embedded in place of ColumnType in larger protobuf messages (like sqlbase.ColumnDescriptor). The API is locked down so that it's difficult to ignorantly or accidentally create invalid types. The unified type supports everything that the three previous type systems did, and is more compatible with PG as a result (with more work, even more type compatibility is possible).

Resolves #32639

@andy-kimball andy-kimball requested review from jordanlewis, knz and bobvawter Apr 7, 2019

@andy-kimball andy-kimball requested review from cockroachdb/cli-prs as code owners Apr 7, 2019

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

commented Apr 7, 2019

This change is Reviewable

@andy-kimball

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

Note to reviewers:
I know this is a real monster of a PR. However, I did it in stages, where each commit is a standalone step that builds and passes tests. This also makes it easier to rebase.

It might be easiest to review the final sql/types package first, to see where it all ends up. Then, you might review commit by commit to see each step towards the final product.

@RaduBerinde

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

This is fantastic. Thank you for doing this!

@andy-kimball andy-kimball requested review from BramGruneir and removed request for cockroachdb/sql-bulk-prs Apr 8, 2019

@andy-kimball

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

Anyone have suggestions about resolving a golint error:

pkg/sql/types/types.go:286:2: var Uuid should be UUID

I'm already using UUID to signify the name of the SemanticType. All other types follow this convention of types.INT (SemanticType) vs. types.Int (pre-constructed instance). I can't find a way to disable lint for this one line, either.

@andy-kimball
Copy link
Contributor Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)


pkg/sql/pgwire/types.go, line 73 at r15 (raw file):

// mapResultOid performs any necessary mapping of Oid values to be returned to
// the client. For now, this only includes mapping T_float4 => T_float8.
func mapResultOid(o oid.Oid) oid.Oid {

@knz, I wasn't sure what to do about this issue (I believe it's a bug in libpq). I ended up mapping from float4 => float8 to work around it. But it raises larger issues about whether it's a good idea to begin returning the true OID values for selected columns. Should we map all OIDs to "canonical" OIDs (e.g. int2 => int8, varchar => string, etc) in order to match previous version behavior?

P.S. The PG client does not exhibit this bad float4 behavior when pointed at CRDB. It's a libpq thing.

@andy-kimball

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

Another suggestion for reviewers: compare all changes made to test files (e.g. information_schema and pg_catalog), since that will tend to show differences in behavior as a result of this PR.

@RaduBerinde

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

Anyone have suggestions about resolving a golint error:

Just add an exception here: https://github.com/cockroachdb/cockroach/blob/master/pkg/testutils/lint/lint_test.go#L1451
Something like
stream.GrepNot("pkg/sql/types/types.go.* var Uuid should be UUID")
so it's only for this file.

@andy-kimball
Copy link
Contributor Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)


pkg/sql/logictest/testdata/logic_test/array, line 212 at r15 (raw file):

{1,2,3}

query T

This new behavior matches PG.

@andy-kimball andy-kimball force-pushed the andy-kimball:types branch 2 times, most recently from c97cd19 to cc215c0 Apr 8, 2019

@andy-kimball
Copy link
Contributor Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)


pkg/sql/parser/parse.go, line 374 at r17 (raw file):

}

// The SERIAL types are pseudo-types that are only used during parsing. After

@knz, I wanted to call this out. I'm treating SERIAL differently now by not creating a "real" type for it. Instead, it's just different instances of the INT types that I then use == reference comparison to recognize. All "memory" of these types is gone after the analysis phase is complete; they exist only for pretty-printing the parse tree.

@andy-kimball
Copy link
Contributor Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)


pkg/sql/pgwire/types.go, line 73 at r15 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

@knz, I wasn't sure what to do about this issue (I believe it's a bug in libpq). I ended up mapping from float4 => float8 to work around it. But it raises larger issues about whether it's a good idea to begin returning the true OID values for selected columns. Should we map all OIDs to "canonical" OIDs (e.g. int2 => int8, varchar => string, etc) in order to match previous version behavior?

P.S. The PG client does not exhibit this bad float4 behavior when pointed at CRDB. It's a libpq thing.

As another example, not mapping to previous OIDs results in this Java acceptance test failing:

    public void testArrayWithProps() throws Exception {
        PreparedStatement stmt = conn.prepareStatement("CREATE TABLE x (a SMALLINT[])");
        stmt.execute();
        stmt = conn.prepareStatement("INSERT INTO x VALUES (ARRAY[123])");
        stmt.execute();
        stmt = conn.prepareStatement("SELECT a FROM x");
        ResultSet rs = stmt.executeQuery();
        rs.next();

        Array ar = rs.getArray(1);
        Long[] fs = (Long[]) ar.getArray();
        Assert.assertArrayEquals(new Long[]{123L}, fs);
    }

with the error: incompatible types: long cannot be converted to java.lang.Integer
since oid.T__int2 is returned rather than oid.T__int8, as before.

@andy-kimball andy-kimball force-pushed the andy-kimball:types branch from cc215c0 to 12f56f7 Apr 8, 2019

andy-kimball added some commits Apr 12, 2019

sql: Add support for nested arrays to ColumnType
Previously, ColumnType did not support nested arrays. Support for nested arrays
is necessary to subsume the functionality in types.T, since nested arrays can
be created during evaluation of expressions. This commit adds support by
changing the type of the ArrayContents field from *SemanticType to *ColumnType
(the old field is renamed and deprecated). The marshal and unmarshal code maps
to and from the old representation in order to preserve backwards-compatibility.

Note that after this commit, the nested array support will still only be used
for in-memory evaluation of expressions like this one:

  SELECT pg_typeof(ARRAY[ARRAY[1,2]])

Table columns are still not allowed to have a nested array type, pgwire still
can't return nested array expressions, etc. All nested array errors are still
errors.

In addition, the ArrayDimensions field is now deprecated, since it was never
used.

Release note (sql change): Extend the list of advertised OIDs in pg_type to
include additional array-related types, such as _int2vector, _regproc, etc.
sql: Fix cases where ColumnType is unnecessarily passed by value
ColumnType is a rather big struct, and it should almost never be passed by
value. Update most usages to pass by ref rather than by value.

Release note: None
sql: Convert types.T to use ColumnType
Convert all uses of the existing types.T interface and structs to instead
use an alias type:

  type T = ColumnType

In a future commit, ColumnType will be converted to T so that the alias is
no longer necessary.

This change requires all current usages of types.T to be converted to *types.T,
since types.T is now a struct instead of an interface.

Release note: None
sql: Show more specific types when available
Update the String() method on types.T to show a type name that is closer to what
coltypes.T shows. For example, varchar and bit are currently never returned by
types.T, which instead maps them to string and varbit, respectively.

This commit prepares for unification of types.T and coltypes.T.

Release note: None
sql: Merge coltypes.T with types.T
Directly create types.T instances in the SQL parser and other places where
coltypes.T is in use (e.g. cast, annotate exprs). Doing this requires:

  1. Support for SERIAL pseudo-types to types.T
  2. Reconciliation of type name string algorithms

This commit also removes the coltypes package, since it is no longer in use.

In addition, this commit reworks some of the random type generation needed to
get various tests fully working after these typing changes.

Release note: None
sql: Rename SemanticType NULL to UNKNOWN
It's confusing that the type is called NULL, even though PG calls it
UNKNOWN and it prints as UNKNOWN. Change it to be consistent.

Release note: None
sql: Rename types.ColumnType to types.T and delete VisibleType
Remove the "type T = ColumnType" alias by renaming ColumnType to T. Now
there is just one way to refer to SQL types in the system: types.T, which
should reduce confusion.

Also delete VisibleType, since it's no longer used, except for mapping
to/from the legacy ColumnType representation.

Release note: None

andy-kimball added some commits Apr 4, 2019

sql: Expose methods rather than fields on types.T
Do not export fields from InternalType as fields on types.T. Instead,
corresponding methods. This enables tighter control over how the fields
are set. In particular, the Unmarshal method and the pre-constructed types
can now guarantee that the Oid, Width, and Locale fields are properly
set. This better encapsulation makes it more difficult for callers to
accidentally construct invalid types, which is a real concern, given all
the subtle rules and usages of various Type objects across multiple
releases.

This change means that every types.T constructor now needs to call a
constructor method or use a pre-constructed type.

Release note: None
sql: Reorder constants and methods in types.go
Put constants and methods in a more intuitive order, for easier
comprehension.

Only code is moved; this commit makes no functional changes, and so
does not need to be carefully reviewed.

Release note: None
sql: Comment and test types package
Rearrange and clean up methods in the package, including moving some
functionality from the package to other locations (e.g. information
schema related methods). Add comments to all exported structs and methods
in the types package. Add tests for all functionality in the package.

Fix compatibility issue in pg_type table, where oidvector and int2vector
had array oid types of 0.

Release note (sql change): Display correct array type oids for oidvector
and int2vector types in pg_type table, for better PG compatibility.
sql: Move sql/sem/types package to sql/types
Now that the type system is unified, put the new types package in a
more discoverable, intuitive location.

Release note: None
sql: Rename SemanticType to Family
Update terminology from SemanticType to Family, per conversation on
PR #36699. The rename follows this pattern:

  INT -> IntFamily
  STRING -> StringFamily
  ...

In addition, update all the places that were using the name of the family
as if it were the name of the type, and update them to use one of the
other type name methods instead. Add a PGName() method to round out the
set of type name methods that we use in various places in the code.

Release note: None
sql: Use PG-compatible name when generating column names
When a SQL projection column has no alias, a name is generated for it. If
the expression is a CAST expression, the name is derived from the data type
of the cast. This commit uses the PG-compatible name rather than the CRDB
name for the data type, which slightly increases PG-compat.

Release note (sql change): Use the PG-compatible name when auto-generating a
column alias for a CAST expression.
sql: Do not quote locale name in COLLATE clauses
Currently, locale strings are quoted when they contain underscore or
dash characters. The Unicode CLDR standard treats those characters as
equivalent. Furthermore, SQL identifiers don't need to be quoted. So
this commit introduces a new EncodeLocaleName method that maps dash
characters to underscore characters so that locale names do not need
to be quoted when printed as part of SQL syntax.

This change resolves a compatibility issue with the TypeORM driver,
which does not handle quoted locale names.

Release note (sql change): Locale names in COLLATE clauses have dash
characters mapped to underscore when printed as part of SQL syntax.

@andy-kimball andy-kimball force-pushed the andy-kimball:types branch from 12f56f7 to 7365a07 Apr 19, 2019

@andy-kimball

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

bors r+

craig bot pushed a commit that referenced this pull request Apr 19, 2019

Merge #36598 #36681 #36962
36598: sql: Unify the SQL type system r=andy-kimball a=andy-kimball

This PR unifies the three different SQL type systems in CRDB:

  `sqlbase.ColumnType` - protobuf representation of table column types
  `coltypes.T` - in-memory representation of table column types
  `types.T` - computation types

The type systems are quite similar, but also subtly different. This causes bugs on a regular basis throughout the codebase. Each type system supports a different subset of the PG type functionality. For example, `sqlbase.ColumnType` stores column widths. `types.T` does not store widths, but it can store a Postgres OID value, which `sqlbase.ColumnType` is unable to do.

In addition, even for supported functionality, the API's are often tricky and error-prone. For example, `types.T` values can sometimes be compared using `==`, but other times `Equivalent` must be called, and it is up to the user to know when to use which. As another example, support for OID values require wrapping `types.T` instances in a special `TOIDWrapper` type. Our developers are expected to know when it needs to be done, but typically they either don't understand when it's needed, or else forget to do it. It also breaks `==` comparison semantics, leading to subtle bugs that only manifest in edge cases with unusual OID values.

Another big problem that comes up repeatedly are lossy conversions. Throughout the codebase, there are frequent conversions to/from the various type systems. Many conversions discard important type information, which often leads to incompatibilities with Postgres, as well as providing a breeding ground for bugs.

This PR unifies the three type systems into just one. The unified type is called `types.T` (in new `sql/types` package). It uses a protobuf-generated struct that is backwards-compatible with `sqlbase.ColumnType` so that it can be embedded in place of `ColumnType` in larger protobuf messages (like `sqlbase.ColumnDescriptor`). The API is locked down so that it's difficult to ignorantly or accidentally create invalid types. The unified type supports everything that the three previous type systems did, and is more compatible with PG as a result (with more work, even more type compatibility is possible).

Resolves #32639

36681: distsqlplan: reset MergeOrdering if windower is planned r=yuzefovich a=yuzefovich

Windower doesn't guarantee maintaining the order, so merge ordering
of the plan should be set to empty.

Fixes: #35179.

Release note: None

36962: storage/engine: fix handling of 0-length varstrings in RocksDBBatchReader r=tbg a=petermattis

Previously, 0-length varstrings inadvertently cuased the reader to
truncate the batch repr which would usually result in a call to `Next`
complaining about the batch containing an unexpected number of
entries. So far, it looks like the only effect of this would be "invalid
batch" errors while using `cockroach debug` commands. It is possible
there is a more serious effect, though.

See #36937

Release note (bug fix): Fixed a bug in write batch decoding that could
cause "invalid batch" errors while using `cockroach debug` commands to
analyze data.

Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Peter Mattis <petermattis@gmail.com>
@craig

This comment has been minimized.

Copy link

commented Apr 19, 2019

Build succeeded

@craig craig bot merged commit 7365a07 into cockroachdb:master Apr 19, 2019

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@andy-kimball andy-kimball deleted the andy-kimball:types branch Apr 19, 2019

BramGruneir added a commit to BramGruneir/cockroach that referenced this pull request Apr 22, 2019

roachtest: update hibernate blacklist v19.2
With the updated unified types from cockroachdb#36598, a number of tests are now passing.
Each test has already been vetted.

Fixes cockroachdb#36971

Release note: None

craig bot pushed a commit that referenced this pull request Apr 22, 2019

Merge #37002
37002: roachtest: update hibernate blacklist v19.2 r=BramGruneir a=BramGruneir

With the updated unified types from #36598, a number of tests are now passing.
Each test has already been vetted.

Fixes #36971

Release note: None

Co-authored-by: Bram Gruneir <bram@cockroachlabs.com>

craig bot pushed a commit that referenced this pull request Apr 23, 2019

Merge #37002
37002: roachtest: update hibernate blacklist v19.2 r=BramGruneir a=BramGruneir

With the updated unified types from #36598, a number of tests are now passing.
Each test has already been vetted.

Fixes #36971

Release note: None

Co-authored-by: Bram Gruneir <bram@cockroachlabs.com>

BramGruneir added a commit to BramGruneir/cockroach that referenced this pull request Apr 23, 2019

roachtest: update hibernate blacklist v19.2
With the updated unified types from cockroachdb#36598, a number of tests are now passing.
Each test has already been vetted.

Fixes cockroachdb#36971

Release note: None

craig bot pushed a commit that referenced this pull request Apr 23, 2019

Merge #37002
37002: roachtest: update hibernate blacklist v19.2 r=BramGruneir a=BramGruneir

With the updated unified types from #36598, a number of tests are now passing.
Each test has already been vetted.

Fixes #36971

Release note: None

Co-authored-by: Bram Gruneir <bram@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.