Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: creating statistics on tables with indexed NOT NULL virtual columns causes internal error #71080

Closed
mgartner opened this issue Oct 4, 2021 · 1 comment · Fixed by #71105
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects

Comments

@mgartner
Copy link
Collaborator

mgartner commented Oct 4, 2021

CREATE STATISTICS fails with an internal error when invoked on a table that has an index on a NOT NULL virtual column.

For example:

CREATE TABLE t (
  k INT PRIMARY KEY,
  a INT,
  b INT NOT NULL AS (a + 10) VIRTUAL
);

INSERT INTO t VALUES (1, 2);

CREATE STATISTICS s FROM t;

Results in:

ERROR: internal error: Non-nullable column "t:b" with no value! Index scanned was "primary" with the index key columns (k) and the values (1)
SQLSTATE: XX000
DETAIL: stack trace:
/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:1580: finalizeRow()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:1304: NextRow()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:249: Next()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/sampler.go:255: mainLoop()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/sampler.go:226: Run()
/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:329: func1()
/usr/local/go/src/runtime/asm_amd64.s:1374: goexit()

I believe the problem is that CREATE STATISTICS is trying to collect samples of the virtual column from the primary index. This is invalid because the primary index does not store virtual computed columns.

@mgartner mgartner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 4, 2021
@mgartner mgartner self-assigned this Oct 4, 2021
@mgartner mgartner added this to Triage in SQL Queries via automation Oct 4, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Oct 4, 2021
@mgartner
Copy link
Collaborator Author

mgartner commented Oct 4, 2021

The reproduction mentioned about is fixed by #68312. I'll backport that to 21.1.

However, there's still some cases when stats are requested for virtual columns when they should not be. For example, when the virtual column is included in a multi-column index, creating stats fails with the same error as above:

statement ok
SET CLUSTER SETTING sql.stats.multi_column_collection.enabled = true

statement ok
CREATE TABLE t (
  k INT PRIMARY KEY,
  a INT,
  b INT NOT NULL AS (a + 10) VIRTUAL,
  INDEX (a, b)
);

statement ok
INSERT INTO t VALUES (1, 2);

statement ok
CREATE STATISTICS s FROM t;

Explicitly creating stats on a virtual computed column also fails:

statement ok
CREATE TABLE t (
  k INT PRIMARY KEY,
  a INT,
  b INT NOT NULL AS (a + 10) VIRTUAL
);

statement ok
INSERT INTO t VALUES (1, 2);

statement ok
CREATE STATISTICS s ON b FROM t;

I'll fix these additional cases and backport both fixes to 21.1.

@RaduBerinde RaduBerinde moved this from Triage to 22.1 High Likelihood (90%) in SQL Queries Oct 5, 2021
craig bot pushed a commit that referenced this issue Oct 6, 2021
70648: sql: move a single remote flow to the gateway in some cases r=yuzefovich a=yuzefovich

**sql: show distribution info based on actual physical plan in EXPLAIN**

Previously, the `distribution` info in `EXPLAIN` output was printed
based on the recommendation about the distribution of the plan. For
example, if the plan is determined as "should be distributed", yet
it only contains a single flow on the gateway, we would say that the
plan has "full" distribution.

This commit updates the code to print the distribution based on the
actual physical plan (in the example above it would say "local"),
regardless of the reason - whether it is the recommendation to plan
locally or the data happened to be only on the gateway.

I think it makes more sense this way since now DISTSQL diagram
consisting of a single flow on the gateway more appropriately
corresponds to "local" distribution. Additionally, this change is
motivated by the follow-up commit which will introduce changes to the
physical plan during the plan finalization, and we want to show the
correct distribution in the EXPLAIN output for that too.

Release note: None

**sql: move a single remote flow to the gateway in some cases**

This commit updates the physical planner to move a single remote flow
onto the gateway in some cases, namely when
- the flow contains a processor that might increase the cardinality of
the data flowing through it or that performs the KV work
- we estimate that the whole flow doesn't reduce the cardinality when
compared against the number of rows read by the table readers.
To be conservative, when there is no estimate, we don't apply this
change to the physical plan.

The justification behind this change is the fact that we're pinning the
whole physical planning based on the placement of table readers. If the
plan consists only of a single flow, and the flow is quite expensive,
then with high enough frequency of such flows, the node having the
lease for the ranges of the table readers becomes the hot spot (we have
seen this in practice a few months ago). In such a scenario we might now
choose to run the flow locally to distribute the load on the cluster
better (assuming that the queries are issued against all nodes with
equal frequency).

The EXPLAIN output will correctly say "distribution: local" if the flow
is moved to the gateway.

Informs: #59014.

Release note (bug fix): Some query patterns that previously could cause
a single node to become a hot spot have been fixed so that the load is
evenly distributed across the whole cluster.

71011: cli: add --max-sql-memory flag to `cockroach mt start-sql` r=knz a=jaylim-crl

Previously the `--max-sql-memory` flag wasn't available to the multi-tenancy
start-sql command, even though the feature was already there for other
`start`-related commands.

Release note (cli change): `cockroach mt start-sql` will now support the
`--max-sql-memory` flag to configure maximum SQL memory capacity to store
temporary data.

Release justification: The upcoming Serverless MVP release plans to use a
different value for `--max-sql-memory` instead of the default value of 25%
of container memory. This commit is only a flag change that will only be used
in multi-tenant scenarios, and should have no impact on dedicated customers.

71105: sql: do not collect statistics on virtual columns r=mgartner a=mgartner

PR #68312 intended to update the behavior of  `CREATE STATISTICS` to
prevent statistics collection on virtual computed columns. However, it
failed to account for multi-column statistics and for
`CREATE STATISTICS` statements that explicitly reference virtual
columns. This commit accounts for these two cases.

This prevents internal errors from occuring when the system tries to
collect statistics on `NOT NULL` virtual columns. Virtual column values
are not included in the primary index. So when the statistics
job reads the primary index to sample the virtual column, it assumes the
value is null, which violates the column's `NOT NULL` constraint. This
violation causes an error.

Fixes #71080

Release note (bug fix): A bug has been fixed which caused internal
errors when collecting statistics on tables with virtual computed
columns.

71206: cmd/roachtest: add testLifecycle to hibernateIgnoreList r=ZhouXing19 a=ZhouXing19

Resolves #70482

Add `org.hibernate.userguide.pc.WhereTest.testLifecycle`
to `hibernateIgnoreList21_1`, `hibernateIgnoreList21_2`,
 and `hibernateIgnoreList22_1`.

Release note: None
Release justification: None

71212: opt: use fragment for optstepweb with long URLs r=mgartner a=mgartner

The `optstepsweb` test command can produce very long URLs. If the URL is
longer than ~8201 characters, the GitHub Pages server hosting
`optsteps.html` responds with a 414 status code.

To make these long URLs work, this commit uses a fragment rather than a
query parameter in the URL if the compressed data that represents the
optimizer steps is over 8100 characters (the 100 characters of buffer is
meant to account for the protocol, domain, and path). A fragment is not
sent to the server by the browser, so Github Pages responds
successfully. A downside is that when anchor links are clicked to
navigate the page, the original fragment is overridden and the URL is
invalid. For this reason, we still use a query parameter when the
compressed data is small enough.

Related to #68697.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Jay <jay@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Jane Xing <zhouxing@uchicago.edu>
@craig craig bot closed this as completed in d9eed2b Oct 6, 2021
SQL Queries automation moved this from 22.1 High Likelihood (90%) to Done Oct 6, 2021
blathers-crl bot pushed a commit that referenced this issue Oct 6, 2021
PR #68312 intended to update the behavior of  `CREATE STATISTICS` to
prevent statistics collection on virtual computed columns. However, it
failed to account for multi-column statistics and for
`CREATE STATISTICS` statements that explicitly reference virtual
columns. This commit accounts for these two cases.

This prevents internal errors from occuring when the system tries to
collect statistics on `NOT NULL` virtual columns. Virtual column values
are not included in the primary index. So when the statistics
job reads the primary index to sample the virtual column, it assumes the
value is null, which violates the column's `NOT NULL` constraint. This
violation causes an error.

Fixes #71080

Release note (bug fix): A bug has been fixed which caused internal
errors when collecting statistics on tables with virtual computed
columns.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 7, 2021
PR cockroachdb#68312 intended to update the behavior of  `CREATE STATISTICS` to
prevent statistics collection on virtual computed columns. However, it
failed to account for multi-column statistics and for
`CREATE STATISTICS` statements that explicitly reference virtual
columns. This commit accounts for these two cases.

This prevents internal errors from occuring when the system tries to
collect statistics on `NOT NULL` virtual columns. Virtual column values
are not included in the primary index. So when the statistics
job reads the primary index to sample the virtual column, it assumes the
value is null, which violates the column's `NOT NULL` constraint. This
violation causes an error.

Fixes cockroachdb#71080

Release note (bug fix): A bug has been fixed which caused internal
errors when collecting statistics on tables with virtual computed
columns.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 7, 2021
PR cockroachdb#68312 intended to update the behavior of  `CREATE STATISTICS` to
prevent statistics collection on virtual computed columns. However, it
failed to account for multi-column statistics and for
`CREATE STATISTICS` statements that explicitly reference virtual
columns. This commit accounts for these two cases.

This prevents internal errors from occuring when the system tries to
collect statistics on `NOT NULL` virtual columns. Virtual column values
are not included in the primary index. So when the statistics
job reads the primary index to sample the virtual column, it assumes the
value is null, which violates the column's `NOT NULL` constraint. This
violation causes an error.

Fixes cockroachdb#71080

Release note (bug fix): A bug has been fixed which caused internal
errors when collecting statistics on tables with virtual computed
columns.
ericharmeling pushed a commit to ericharmeling/cockroach that referenced this issue Oct 20, 2021
PR cockroachdb#68312 intended to update the behavior of  `CREATE STATISTICS` to
prevent statistics collection on virtual computed columns. However, it
failed to account for multi-column statistics and for
`CREATE STATISTICS` statements that explicitly reference virtual
columns. This commit accounts for these two cases.

This prevents internal errors from occuring when the system tries to
collect statistics on `NOT NULL` virtual columns. Virtual column values
are not included in the primary index. So when the statistics
job reads the primary index to sample the virtual column, it assumes the
value is null, which violates the column's `NOT NULL` constraint. This
violation causes an error.

Fixes cockroachdb#71080

Release note (bug fix): A bug has been fixed which caused internal
errors when collecting statistics on tables with virtual computed
columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant