-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs,importccl: adopt protected timestamps for IMPORT #44739
jobs,importccl: adopt protected timestamps for IMPORT #44739
Conversation
3a084b5
to
bb36776
Compare
cc @lucy-zhang this is the PR with the jobs change |
bb36776
to
d965916
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @spaskob)
pkg/jobs/registry.go, line 286 at r3 (raw file):
// StartableJob is a job created with a transaction to be started later. // See Registry.CreateStartableJob type StartableJob struct {
please move to jobs.go
pkg/jobs/registry.go, line 304 at r3 (raw file):
} }() // TODO(ajwerner): Consider adding some protection against starting more than
can you add detail on how that can happen?
pkg/jobs/registry.go, line 320 at r3 (raw file):
// CreateStartableJobWithTxn creates a job to be started later, after the // creating txn commits. It stores writes job in the jobs table, marks it
... stores writes ... ?
pkg/jobs/registry.go, line 332 at r3 (raw file):
return nil, err } j.WithTxn(nil)
please add a comment why we need this
pkg/jobs/registry.go, line 338 at r3 (raw file):
} resumerCtx, cancel := r.makeCtx() if err := r.register(*j.ID(), cancel); err != nil {
if the txn is aborted the job will not be unregistered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @spaskob)
pkg/ccl/importccl/import_stmt.go, line 630 at r4 (raw file):
StartableJob
why can't we do that here using using QueueJob from planner.go?
cc @lucy-zhang this has the jobs change we talked about - @spaskob and I discussed offline that you should certainly merge your change that utilized |
pkg/ccl/importccl/import_stmt.go
Outdated
// Prepare the protected timestamp record. | ||
spans := make([]roachpb.Span, 0, len(tableDetails)) | ||
for i := range tableDetails { | ||
spans = append(spans, tableDetails[i].Desc.TableSpan()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need to add spans for tables where IsNew
is false (which could require revert) and then below we can skip if spans is empty
d965916
to
d8b6f38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is RFAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @spaskob)
pkg/ccl/importccl/import_stmt.go, line 630 at r4 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
StartableJob
why can't we do that here using using QueueJob from planner.go?
I think we discussed this offline.
pkg/ccl/importccl/import_stmt.go, line 652 at r4 (raw file):
Previously, dt (David Taylor) wrote…
I think we only need to add spans for tables where
IsNew
is false (which could require revert) and then below we can skip if spans is empty
Done.
pkg/jobs/registry.go, line 286 at r3 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
please move to jobs.go
Done.
pkg/jobs/registry.go, line 304 at r3 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
can you add detail on how that can happen?
If there's a programming error in the client. I added protection.
pkg/jobs/registry.go, line 320 at r3 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
... stores writes ... ?
Fixed.
pkg/jobs/registry.go, line 332 at r3 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
please add a comment why we need this
Done.
pkg/jobs/registry.go, line 338 at r3 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
if the txn is aborted the job will not be unregistered.
Good point! I've extended the interface to deal with that case and added some testing. The contract is now that user transactions which are rolled back will need to be cleaned up using CleanupOnRollback
. I also added unit testing.
erm actually let me fix the test, must have broken something |
d8b6f38
to
9adacc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, RFAL now
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @spaskob)
dc340d3
to
607ea4a
Compare
@spaskob can you give the first commit this a look? I'd like to get this merged ASAP |
ok looking now
…On Thu, Feb 27, 2020 at 10:12 AM ajwerner ***@***.***> wrote:
@spaskob <https://github.com/spaskob> can you give the first commit this
a look? I'd like to get this merged ASAP
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#44739?email_source=notifications&email_token=AANL635UYHO7YVKRVLFS2PLRE7J53A5CNFSM4KQCXN6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENEWQZY#issuecomment-592013415>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANL633QVKEM627VOFKKCG3RE7J53ANCNFSM4KQCXN6A>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @spaskob)
pkg/jobs/jobs.go, line 740 at r5 (raw file):
} // Start will resume a resumable job. The transaction used to create the
Start will start a job; resume can be confused with RESUME job_id
pkg/jobs/jobs_test.go, line 1994 at r5 (raw file):
} // TestStartableJobErrors tests that the StartableJob returns the expected
I know you want to get this merged but a good to have is to search for use of CreateAndStartJob in this test file sand see if it may be useful to add similar test or replace with a StartableJob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @spaskob)
pkg/jobs/jobs_test.go, line 1994 at r5 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
I know you want to get this merged but a good to have is to search for use of CreateAndStartJob in this test file sand see if it may be useful to add similar test or replace with a StartableJob.
I don't know that I follow. CreateAndStartJob is implemented on top of StartableJob. All tests which exercise it exercise the StartableJob code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great then, I missed that.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @spaskob)
4f92a1d
to
8d32c56
Compare
The remaining test failures are addressed by #45544 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for jobs code
Reviewed 1 of 7 files at r5, 4 of 9 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @spaskob)
pkg/ccl/importccl/import_processor_test.go, line 611 at r6 (raw file):
pkg/ccl/importccl/import_processor_test.go, line 611 at r6 (raw file):
pkg/ccl/importccl/import_processor_test.go, line 612 at r6 (raw file):
resumer := raw.(*importResumer)
8d32c56
to
dd9f637
Compare
StartableJob isn't great but it's better than what we've got now. A StartableJob is a job which is created transactionally but also hooked in to a `resultCh` and can be started on the local gateway after the transaction commits. Currently we've got two main ways that jobs are kicked off "synchronously": 1) Registry.Run() This doesn't currently do anything to kick off the jobs - it just waits for all of the relevant jobs to be started. This is used in conjunction with `planner.QueueJob` and `CreateJobWithTxn`. The idea here is that when a transaction creates a job we want, upon commit, for it to wait for the jobs it created to complete. 2) Registry.CreateAndStartJob() This method has special powers that prevents the gateway node from adopting the job and also writes the job with a lease for the local node. It also allows the caller to pass a resultsCh which will recieve datums from the job. That resultCh is critical for IMPORT and BACKUP (?). Currently in both of those cases we create and run the job on a transaction outside of the caller's txn. This is particularly frustrating because the jobs package assigns IDs at creation time, making it difficult to transactionally associate a job with another resource while also retaining the ability to ensure that it starts locally and is connected to a resultsCh. This commit addresses that. The fact that `IMPORT` and `BACKUP` commit the job before committing the planner's txn feels dirty. I'm not planning on mucking with that stuff here in this PR but ultimately the work here seems to lay a possible path forward. In particular, instead of calling `Run()` with a slice of IDs we could call it with a slice of `StartableJob`s which may be hooked up to the channel. In that way we could unify CreateAndStartJob() and Run() with something better. Release note: None.
Perhaps this only needed to be applied to IMPORT INTO but it doesn't hurt import. This change ensures that IMPORT statements protect the spans into which they'll be writing from being GC'd. The change adopts the StartableJob change from the previous commit, protects a timestamp, and then makes sure to clean up that timestamp on success or failure. Before this change the test added here would fail with the following error: ``` require.go:1104: Error Trace: import_into_test.go:152 Error: Expect "pq: job 526896149330362369: cannot be reverted, manual cleanup may be required: rolling back partially completed IMPORT: cannot revert before replica GC threshold 1580866345.938283735,0" to match "error response from server: 500 Internal Server Error" Test: TestProtectedTimestampsDuringImportInto ``` Release note (bug fix): IMPORT INTO jobs which are canceled or fail can no longer get stuck in an unrecoverable state if data from the previous state of the table had expired relative to the GC TTL.
The logic to detect whether a type is clonable cannot traverse into all of the implementations of itnerfaces. This is problematic when one of the structs underneath a oneof is unclonable. While a more general solution might be nice, this workaround allows the addition of a UUID field to a job details struct. Release note: None
dd9f637
to
bfa371b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @spaskob)
pkg/jobs/jobs.go, line 740 at r5 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
Start will start a job; resume can be confused with
RESUME job_id
Done.
Build succeeded |
This PR comes in two commits. See each for details.
jobs.Registry
API to enable transactional creation of a job which will be started on the local gateway. I would love to discuss this commit itself a bit.