Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: schema change repeatedly retries with gcttl error #126260

Open
itsbilal opened this issue Jun 26, 2024 · 3 comments
Open

sql: schema change repeatedly retries with gcttl error #126260

itsbilal opened this issue Jun 26, 2024 · 3 comments
Assignees
Labels
branch-master Failures on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@itsbilal
Copy link
Member

itsbilal commented Jun 26, 2024

On the drt-chaos test cluster running V24.2.0-ALPHA.00000000-DEV-5AFD790501E946EF306ABE2B592C5798C29C342F, a schema change for ALTER TABLE cct_tpcc.public.order_line DROP COLUMN add_column_op_2902590426 CASCADE has been running nonstop and is being repeatedly retried.

Screenshot 2024-06-26 at 7 56 40 PM

Link to the job

Looking at the logs, we see the job failing with this error. For reference, the gc ttl on this db/table is 4 hours.

job 979031533120225281: running execution encountered retriable error: failed to construct index entries during backfill: batch timestamp 1718847123.942402651,0 must be after replica GC threshold 1719379269.625591541,0
(1) forced error mark
  | ‹"retriable job error"›
  | github.com/cockroachdb/errors/withstack/*withstack.withStack::
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).runBackfill.func1
  | 	github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:319
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).runBackfill.Group.GoCtx.func3
  | 	github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168
  | golang.org/x/sync/errgroup.(*Group).Go.func1
  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:78
  | runtime.goexit
  | 	src/runtime/asm_amd64.s:1695
Wraps: (3) failed to construct index entries during backfill
Wraps: (4) batch timestamp 1718847123.942402651,0 must be after replica GC threshold 1719379269.625591541,0
Error types: (1) *markers.withMark (2) *withstack.withStack (3) *errutil.withPrefix (4) *kvpb.BatchTimestampBeforeGCError

Jira issue: CRDB-39823

@itsbilal itsbilal added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster labels Jun 26, 2024
@blathers-crl blathers-crl bot added this to Triage in SQL Foundations Jun 26, 2024
Copy link

blathers-crl bot commented Jun 26, 2024

Hi @itsbilal, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@itsbilal itsbilal added the branch-master Failures on the master branch. label Jun 26, 2024
@fqazi
Copy link
Collaborator

fqazi commented Jun 26, 2024

We are running into two problems, in this scenario:

  1. We always clear the protected timestamp even if a retryable error is hit, see:
    defer func() {
    cleanupError := protectedTimestampCleaner(ctx)
    if cleanupError != nil {
    err = errors.CombineErrors(cleanupError, err)
    }
    }()
  2. The readAsOf timestamp does not properly take into account the current time, if a retry happens it will assume GC TTL * 0.8 time has to pass again:
    waitBeforeProtectedTS := time.Duration((time.Duration(zoneCfg.GC.TTLSeconds) * time.Second).Seconds() *

@rafiss
Copy link
Collaborator

rafiss commented Jul 2, 2024

@Dedej-Bergin I'll assign this to you as a bugfix/improvement that would be nice to land, but it's not highly urgent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
SQL Foundations
  
Triage
Development

No branches or pull requests

4 participants