Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: refactor type types #10624

Merged
merged 4 commits into from
Feb 15, 2022
Merged

Conversation

benesch
Copy link
Member

@benesch benesch commented Feb 13, 2022

This mega PR refactors the type types as described in #10571. More details in that issue and in the commit messages within. The net result is, IMO, a set of types whose purposes are more clear, safer and more correct conversions between them, and the removal of a lot of duplicate code.

I'm sorry this diff is so large, but these changes were massively interdependent, and I don't have any better ideas for further splitting it into chunks.

Fix #10571.
Fix #10576.

Motivation

  • This PR refactors existing code.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any user-facing behavior changes.

}

impl NumericConstraints {
fn from_typmod(typmod: i32) -> Option<NumericConstraints> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 This seems like wants to share an implementation w/

fn mz_render_typemod<'a>(
, though maybe I'm being overeager

Once this PR lands, will see if I can find a place to tie these together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh 100% it should!! Good catch!

Copy link
Contributor

@sploiselle sploiselle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I obviously didn't spend time reverse-engineering the motivation behind all of these changes. However, I "spot checked" all of the major points of refactoring and things seem really great/clear.

If you have the time or interest, I'd love to hear your high-level thinking behind:

  • PgScalarType being the unnecessary pivot between pgrepr and ScalarType. I believe you mentioned removing this to me previously, but did you know it was chaff because you had to write it originally, or was this something you discovered by means of "looking at"/understanding the dependency graph?
  • New typing the typmods in adt. Like, is it nice to tidy up the semantic boundaries of these things? I saw a "greatest common denominator" for all of them, so never would have been motivated to specialize the code. Would love to get a clearer sense of why you think this approach is superior (asking to learn, not in any way a disagreement).

/// [`VARHDRSZ`](https://github.com/postgres/postgres/blob/REL_14_0/src/include/c.h#L627) constant
const PG_HEADER_SIZE: i32 = 4;

pub enum PgScalarType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇

Copy link
Contributor

@nharring-adjacent nharring-adjacent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks reasonable, I mostly focused on the postgres_util and pgrepr bits since I'm more familiar with that code however everything is clear and readable. Thanks very much for doing this, these type relationships seem demonstrably easier to grok and use.

@benesch
Copy link
Member Author

benesch commented Feb 14, 2022

PgScalarType being the unnecessary pivot between pgrepr and ScalarType. I believe you mentioned removing this to me previously, but did you know it was chaff because you had to write it originally, or was this something you discovered by means of "looking at"/understanding the dependency graph?

A combination of both, I think! When I wrote pgrepr::Type I meant for it to be an exact representation of Postgres's view of the type. So seeing a type called PgScalarType pop up definitely triggered my spidey sense. But pgrepr::Type wasn't directly usable as PgScalarType for sure, because it had accumulated some cruft over the years and didn't handle typmods directly.

New typing the typmods in adt. Like, is it nice to tidy up the semantic boundaries of these things? I saw a "greatest common denominator" for all of them, so never would have been motivated to specialize the code. Would love to get a clearer sense of why you think this approach is superior (asking to learn, not in any way a disagreement).

Oh, I just get stressed out about having representable states that are invalid. Like, nothing stopped someone from inadvertently constructing ScalarType::Numeric { scale: Some(400) }, and falling into completely unexercised code paths. This doesn't matter too much when you're only constructing ScalarType::Numeric in one place, but since there's a few different ways of getting there now (via the SQL parser -> planner, via pgrepr, via the catalog, and via interchange), it just seemed worth completely eliminating the possibility of error. I can now say with confidence that nothing in our codebase can construct a ScalarType::Numeric with a scale that exceeds 38. On the other hand: still not convinced that everything that constructs a decimal is properly restricting the precision to 38!

They don't seem to be used. Also improve the comments on the wrapper
types.
…types

Redistribute how PostgreSQL types are handled. Creating a PostgreSQL
type from an OID and typmod is now handled entirely by `pgrepr::Type`.
The `postgres_util::PgColumn` struct now stores a `pgrepr::Type`
directly, rather than a separate `PgScalarType` enum that was relatively
duplicative with `pgrepr::Type`.

The `sql` crate now generates views for a PostgreSQL source by
converting a `pgrepr::Type` to an `UnresolvedDataType` by way of a new
`fmt::Display` implementation on `pgrepr::Type`.

For safety, the modifiers on `ScalarType::{Numeric,VarChar,Char}` now
have newtype wrappers that enforce the bounds at construction. For
clarity, the `scale` parameter of `ScalarType::Numeric` has been renamed
`max_scale` and the `length` parameter of `ScalarType::VarChar` has been
renamed `max_length`, to better reflect that they are maximums, not hard
requirements. This is in contrast to `ScalarType::Char`, where the
length is in fact a hard requirement, not a maximum.
@benesch
Copy link
Member Author

benesch commented Feb 15, 2022

I had to ditch the last commit that updated the test that compares PostgreSQL OIDs to Materialize OIDs, because it was failing in CI for reasons entirely unrelated to this commit. I'll do that in a separate PR to get this unblocked.

Store types in the catalog via a new CatalogType enum. This enum is
nearly one-to-one with ScalarType, except that modifier fields are
removed and embedded types are replaced with GlobalId references.

This type is used throughout the SQL planner, in the plans returned by
`CREATE TYPE` and in the type name resolution code.

The motivation for this refactor is the removal of the "lossy"
conversions from `pgrepr::Type` to `ScalarType`. Now, all conversions
are either full fidelity or report an error if they would discard data.
@benesch
Copy link
Member Author

benesch commented Feb 15, 2022

Hallelujah! Thanks for the reviews, everyone!

@benesch benesch merged commit 522e7e2 into MaterializeInc:main Feb 15, 2022
@benesch benesch deleted the repr-improvements branch February 15, 2022 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

repr: create newtypes for NumericMaxScale and [Var]CharMaxLength sql: refactor type types
3 participants