Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2.17-rc: sql/stats: don't use linear regression with NaN for stats forecasting #113800

Conversation

DrewKimball
Copy link
Collaborator

Backport 2/2 commits from #113712.

/cc @cockroachdb/release


This patch adds a method to the quantile function, invalid, which checks for NaN values within the function. These indicate a failure to derive a linear regression model for the observed histograms, and can lead to internal errors and panics if not caught. Stats forecasting now checks this method before attempting to use a quantile function to derive a histogram prediction.

Fixes #113645
Fixes #113680

Release note (bug fix): Fixed a bug that could cause internal errors or panics while attempting to forecast statistics on a numeric column.

Release justification: fix for internal error encountered by customer

This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
@DrewKimball DrewKimball requested a review from a team as a code owner November 4, 2023 00:02
Copy link

blathers-crl bot commented Nov 4, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL and one additional
    TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Nov 4, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @rafiss)


pkg/sql/stats/stats_cache.go line 243 at r2 (raw file):

      }
    }
  }()

Looks like indentation is off here (I think this is what TestLint/TestGofmtSimplify is catching).

This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
@DrewKimball
Copy link
Collaborator Author

Are all of the changes protected via a flag or option, new syntax, an environment variable or default-disabled cluster setting?

No, but this is a simple change that we are confident is correct.

What are the risks to backporting this change? Can the risks of backporting be mitigated?

If the change introduced a bug, it could make a table or tables effectively unavailable. However, as mentioned, we are confident that this is not the case.

What are the risks to not backporting?

Customers that run into the bug will find that a table is unavailable due to a NPE. This is rare, but has already happened in a few cases.

Does this change really need to be backported? Or can it reasonably wait until the next major release to be addressed?

It would be best to backport to active releases, since table unavailability is fairly severe and the best workaround currently is to disable stats forecasting for the affected table.

@DrewKimball
Copy link
Collaborator Author

@rafi friendly ping - this backport has been approved and is a blocker.

@DrewKimball DrewKimball merged commit 5ca8a41 into cockroachdb:release-22.2.17-rc Nov 7, 2023
5 of 6 checks passed
@DrewKimball DrewKimball deleted the backport22.2.17-rc-113712 branch November 7, 2023 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants