Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support max_identifier_length session variable #20999

Merged
merged 13 commits into from Aug 8, 2023

Conversation

sthm
Copy link
Contributor

@sthm sthm commented Aug 4, 2023

This PR introduces a session variable to report the maximum identifier length of 255 and enforces that limit on identifiers. Previously, the identifier length was unconstrained, so this is a backwards incompatible change.

This PR adds a known-desirable feature #20931.

Motivation

For compatibility reasons (SQL Server Management Studio), its desirable to report this value for clients.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • Breaking change. Limit the identifier length to 255 bytes and introduce the max_identifier_length session variable.

Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Can you work with @mjibson/ @jkosh44 to ensure that no one is using identifiers larger than this before we ship this in a release?

@morsapaes also heads up that we should adjust the dbt adapter for this accordingly.

src/sql/src/session/vars.rs Outdated Show resolved Hide resolved
@sthm sthm marked this pull request as ready for review August 5, 2023 13:46
@sthm sthm requested a review from a team as a code owner August 5, 2023 13:46
@jkosh44
Copy link
Contributor

jkosh44 commented Aug 7, 2023

SELECT octet_length(name), name FROM
    (SELECT name FROM mz_objects
    UNION ALL
    SELECT name FROM mz_clusters
    UNION ALL
    SELECT name FROM mz_cluster_replicas
    UNION ALL
    SELECT name FROM mz_databases
    UNION ALL
    SELECT name FROM mz_schemas
    UNION ALL
    SELECT name FROM mz_roles
    UNION ALL
    SELECT name FROM mz_columns)
WHERE octet_length(name) > 255;

Should do the trick. I'll post in the release channel to ask the release manager to run this in prod.

EDIT: I added mz_columns.

@jkosh44
Copy link
Contributor

jkosh44 commented Aug 7, 2023

Looks like this won't break any environments: https://materializeinc.slack.com/archives/CTESPM7FU/p1691430142889159?thread_ts=1691420649.092829&cid=CTESPM7FU, so this is good to merge assuming those nightly failures are unrelated.

I'll add a step to the release playbook to check again during the release.

@jkosh44
Copy link
Contributor

jkosh44 commented Aug 7, 2023

so this is good to merge assuming those nightly failures are unrelated.

Though it does look like the Parallel workload tests need to be updated

worker_10 Query failed: CREATE INDEX idx_v12_c25_int_t2_c23_bigint_t2_c7_bigint_t2_c17_bigint_t2_c33_text_t2_c11_int_t2_c6_boolean_t2_c10_int_t2_c16_int_t2_c8_bigint_t2_c2_boolean_t2_c26_smallint_t2_c1_text_t2_c18_int_t2_c22_bigint_t2_c9_smallint_t2_c15_smallint_t2_c4_bigint_t2_c28_text_t2_c27_text_t2_c37_smallint_t2_c24_text_t2_c3_smallint_t2_c12_boolean_t2_c36_boolean_t2_c30_int_t2 ON v12 (c25_int_t2 ASC, c23_bigint_t2 DESC, c7_bigint_t2 ASC, c17_bigint_t2 DESC, c33_text_t2 DESC, c11_int_t2 DESC, c6_boolean_t2 ASC, c10_int_t2 ASC, c16_int_t2 DESC, c8_bigint_t2 ASC, c2_boolean_t2 ASC, c26_smallint_t2 ASC, c1_text_t2 ASC, c18_int_t2 DESC, c22_bigint_t2 DESC, c9_smallint_t2 ASC, c15_smallint_t2 DESC, c4_bigint_t2 DESC, c28_text_t2 DESC, c27_text_t2 ASC, c37_smallint_t2 DESC, c24_text_t2 ASC, c3_smallint_t2 ASC, c12_boolean_t2 DESC, c36_boolean_t2 ASC, c30_int_t2 ASC); {'S': 'ERROR', 'C': '42601', 'M': 'identifier length exceeds 255 bytes', 'P': '14'}
...
pg8000.exceptions.DatabaseError: {'S': 'ERROR', 'C': '42601', 'M': 'identifier length exceeds 255 bytes', 'P': '14'}

@sthm sthm requested a review from a team as a code owner August 8, 2023 07:47
@morsapaes
Copy link
Contributor

Thanks for the heads-up, @benesch. Pushed the dbt change in #21044.

@sthm
Copy link
Contributor Author

sthm commented Aug 8, 2023

Thanks a lot for taking a look, Joe. I've fixed the tests that failed because of too long identifies. The feature benchmark still seems to be failing, but I don't see how I can possibly introduce a regression there.

@jkosh44
Copy link
Contributor

jkosh44 commented Aug 8, 2023

The feature benchmark still seems to be failing, but I don't see how I can possibly introduce a regression there.

@nrainer-materialize @philip-stoev Is that something to worry about?

@sthm as a side note I left column names and aliases in view definitions out of my query above. For column names we can add the mz_columns, but there isn't a good way to extract aliases out of view definitions. I think we're probably fine as it's unlikely someone has an aliases longer than 255 bytes, but just something to keep in mind for the next release.

@philip-stoev
Copy link
Contributor

The feature benchmark failure can be ignored. It is unrelated to the PR at hand.

@sthm sthm merged commit 0164656 into MaterializeInc:main Aug 8, 2023
122 of 124 checks passed
@sthm sthm deleted the id-length-lexer branch August 8, 2023 15:31
morsapaes added a commit that referenced this pull request Aug 8, 2023
ParkMyCar added a commit that referenced this pull request Nov 15, 2023
This PR adds a max length to the `Ident` struct. In
#20999 we added the
concept of a "max_identifier_length" which was enforced when parsing
SQL, but it was not enforced for identifiers created internally, e.g.
when generating a name for a progress subsource by appending "_progress"
to the source name.

The largest changes in this PR are the following APIs:

* **`Ident::new(...)`** will return an `Err(IdentError)` if the provided
string is longer than our max, which is 255 bytes.
* **`ident!("some static str")`** macro, which enforces our invariants
at compile time. This prevents a pattern like `Ident::new("some static
str").expect("known correct")` from polluting the code base.
* **`Ident::new_unchecked(...)`** checks the invariants with a
`soft_assert!`. This pattern is an escape hatch for cases we know are
valid, but can't express in the type system, e.g. appending a number to
a static string.

Wherever possible I used `Ident::new(...)` or `ident!(...)`, but there
were some callsites that didn't already return an error, and didn't have
a `&'static str` as an argument. For those cases I used
`Ident::new_unchecked(...)` to prevent this PR from getting too large.
At the very least by using `new_unchecked(...)` our tests will catch
invalid `Ident`s via soft asserts.

### Motivation

Fixes #22535

But also has the goal of eliminating these kinds of bugs entirely.
Chatted about this on
[Slack](https://materializeinc.slack.com/archives/C02FWJ94HME/p1699476924158659)
and it seemed desirable.

### Checklist

- [ ] This PR has adequate test coverage / QA involvement has been duly
considered.
- [ ] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [ ] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [ ] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [x] This PR includes the following [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note):
  - Internal only change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants