jobs,sql: create jobs in a batch in a transactions #67991

sajjadrizvi · 2021-07-23T16:23:47Z

This commit change the way jobs are created for a transaction.
Previously, the jobs were created individually during the
execution of a transaction. A job creation incurs a round trip
to insert an entry in system.jobs table.

This commit introduces a new function in Registry, 'CreateJobsWithTxn',
that allows creating jobs in a batch using a single SQL statement. Using
this function, this commit creates jobs in a batch during a transaction, which
reduces the number of round-trips between distant nodes during a transaction
execution. This is a step forward to improve the performance of GRANT/REVOKE
queries in large databases.

Release note: None

Fixes: #64389

cockroach-teamcity · 2021-07-23T16:23:54Z

This change is

sajjadrizvi · 2021-07-25T00:48:46Z

Although some tests are failing, I appreciate if I can get some feedback. I might be pushing too hard in the tests that are failing. If not, they may lead to further changes or adding bounds on the size of a batch.

ajwerner

Preliminary pass on the subway back from the dentist

ajwerner · 2021-07-26T13:05:29Z

pkg/jobs/jobs.go

@@ -76,6 +76,50 @@ type Record struct {
 	CreatedBy *CreatedByInfo
 }

+// Specification specifies a job, which will be created in a batch with other jobs.
+type Specification struct {
+	jobID  jobspb.JobID


I'm starting to think maybe we should just stick ID as a field in Record. It'd simplify the code and some function signatures. I don't see any good reason not to. The history here is that the jobs package used to allocate the id for the user. That turned out to be unfortunate so we changed the client to allocate the id. May as well just move the ID into the record.

ajwerner · 2021-07-26T13:13:38Z

pkg/jobs/registry.go

+// in a single insert to the jobs table.
+// There must be at least one job to create, otherwise the function returns an error.
+// If an error occurs while preparing the insert statement, no job is created.
+//TODO(sajjad): To discuss: What batch size do we expect? Should we limit the


Yeah we'll want a maximum batch size. Start with just some constant. Maybe 100? We'll want to make sure we test using multiple batches.

ajwerner · 2021-07-26T13:14:25Z

pkg/jobs/registry.go

+// column's value will be large due to large statement size. Are there any
+// implications that can result in exceeding limits such as columns size limits?
+func (r *Registry) CreateJobsWithTxn(
+	ctx context.Context, jobSpecs []*Specification, txn *kv.Txn,


nit: put the transaction before the jobs

ajwerner · 2021-07-26T13:44:03Z

pkg/sql/planner.go

@@ -93,7 +93,7 @@ type extendedEvalContext struct {
 	Jobs *jobsCollection

 	// SchemaChangeJobCache refers to schemaChangeJobsCache in extraTxnState.
-	SchemaChangeJobCache map[descpb.ID]*jobs.Job
+	SchemaChangeJobCache map[descpb.ID]*jobs.Specification


My guess is that we're going to want to nuance this data structure a bit. There are places where we delete from this cache. Those places are trying to be a bit clever and used to rely on the fact that we had already written the job. I'm thinking what we want is to create a slice of job records and a map from descriptor ID to the record. Then in the places where today we delete from the map we'd retain the record in the slice.

I suspect this is the cause of some of the failing tests.

ajwerner · 2021-07-26T13:45:25Z

pkg/sql/schema.go

-		}
-		log.Infof(ctx, "queued new schema change job %d for schema %d", newJob.ID(), desc.ID)
+		newSpec := jobs.NewSpecification(p.extendedEvalCtx.ExecCfg.JobRegistry.MakeJobID(), jobRecord)
+		//TODO(sajjad): To discuss: Previous code didn't cache the new job. I think


Can you look at what QueueJob does?

sajjadrizvi

Thanks for the quick review.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/jobs/jobs.go, line 81 at r2 (raw file):

Previously, ajwerner wrote…

I'm starting to think maybe we should just stick ID as a field in Record. It'd simplify the code and some function signatures. I don't see any good reason not to. The history here is that the jobs package used to allocate the id for the user. That turned out to be unfortunate so we changed the client to allocate the id. May as well just move the ID into the record.

I have not built a sufficient context yet. I am following your judgement here.

pkg/jobs/registry.go, line 375 at r2 (raw file):

Previously, ajwerner wrote…

Yeah we'll want a maximum batch size. Start with just some constant. Maybe 100? We'll want to make sure we test using multiple batches.

I will set a max batch size. We may want to make it a cluster setting, but not for now.

I already have a test case that checks multiple batches. The test fails for larger batches.

pkg/jobs/registry.go, line 380 at r2 (raw file):

Previously, ajwerner wrote…

nit: put the transaction before the jobs

Done.

pkg/sql/planner.go, line 96 at r2 (raw file):

Previously, ajwerner wrote…

My guess is that we're going to want to nuance this data structure a bit. There are places where we delete from this cache. Those places are trying to be a bit clever and used to rely on the fact that we had already written the job. I'm thinking what we want is to create a slice of job records and a map from descriptor ID to the record. Then in the places where today we delete from the map we'd retain the record in the slice.

I suspect this is the cause of some of the failing tests.

OK. I guess that clever function is markTableMutationJobsSuccessful. We will end up deleting the slice only inresetExtraTxnState.

It might be the cause of error! However, the test logs show messages related to throttling and ranges going down.

pkg/sql/schema.go, line 87 at r2 (raw file):

Previously, ajwerner wrote…

Can you look at what QueueJob does?

Yeah, it creates a new job and adds the job to Jobs collection. So this is a place where want the job to be only in the Jobs collection but not in the cache. I will have to preserve this.

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sajjadrizvi)

pkg/jobs/jobs.go, line 81 at r2 (raw file):

Previously, sajjadrizvi (Sajjad Rizvi) wrote…

I have not built a sufficient context yet. I am following your judgement here.

Yeah, my advice is that we add ID as a field to record. It should simplify the interfaces.

pkg/jobs/registry.go, line 375 at r2 (raw file):

Previously, sajjadrizvi (Sajjad Rizvi) wrote…

I will set a max batch size. We may want to make it a cluster setting, but not for now.

I already have a test case that checks multiple batches. The test fails for larger batches.

It should not fail. Instead this function should have a loop to send multiple batches.

pkg/sql/conn_executor_exec.go, line 807 at r2 (raw file):

Quoted 10 lines of code…


	// 1. I am not sure if I am returning the error correctly.
	// 2. Is this the right place to create jobs? I have placed it outside of the
	//    the following code block because the jobs were being created outside of
	//    it in the previous code.
	// 3. If I have add any ex.phaseTimes, which one should it be? Do I have to
	//    create a new one?
	if err := ex.createJobs(ctx); err != nil {
		return ex.makeErrEvent(err, ast)
	}

This logic should be pulled into commitSQLTransactionInternal

pkg/sql/conn_executor_exec.go, line 824 at r2 (raw file):

	}
	var specList []*jobs.Specification
	//TODO(sajjad): To discuss: Does the order of jobs creation matter?

No, it does not matter.

pkg/sql/planner.go, line 96 at r2 (raw file):

the test logs show messages related to throttling and ranges going down.

I have a feeling that that's just noise at the end of the test and is not the true failure.

pkg/sql/schema.go, line 87 at r2 (raw file):

Previously, sajjadrizvi (Sajjad Rizvi) wrote…

Yeah, it creates a new job and adds the job to Jobs collection. So this is a place where want the job to be only in the Jobs collection but not in the cache. I will have to preserve this.

My guess is that not putting the descriptor into the cache was an oversight.

pkg/sql/schema.go, line 90 at r2 (raw file):

		// the newSpec should be cached.
		p.extendedEvalCtx.SchemaChangeJobCache[desc.ID] = newSpec
		log.Infof(ctx, "cached specification of new schema change job %d for schema %d", newSpec.ID(), desc.ID)

This is not a good log message. It would be extremely confusing for a read. I think the old message was pretty reasonable. Your change of the implementation here doesn't affect the behavior really at all.

sajjadrizvi

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

pkg/jobs/registry.go, line 375 at r2 (raw file):

Previously, ajwerner wrote…

It should not fail. Instead this function should have a loop to send multiple batches.

I meant the batch size set in the test function. Nevertheless, I have implemented as suggested.

pkg/sql/conn_executor_exec.go, line 807 at r2 (raw file):

Previously, ajwerner wrote…


	// 1. I am not sure if I am returning the error correctly.
	// 2. Is this the right place to create jobs? I have placed it outside of the
	//    the following code block because the jobs were being created outside of
	//    it in the previous code.
	// 3. If I have add any ex.phaseTimes, which one should it be? Do I have to
	//    create a new one?
	if err := ex.createJobs(ctx); err != nil {
		return ex.makeErrEvent(err, ast)
	}

This logic should be pulled into commitSQLTransactionInternal

Done.

pkg/sql/conn_executor_exec.go, line 824 at r2 (raw file):

Previously, ajwerner wrote…

No, it does not matter.

Done.

pkg/sql/schema.go, line 90 at r2 (raw file):

Previously, ajwerner wrote…

This is not a good log message. It would be extremely confusing for a read. I think the old message was pretty reasonable. Your change of the implementation here doesn't affect the behavior really at all.

Done.

sajjadrizvi

Sorry I forgot to publish the comments before pushing the changes.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

sajjadrizvi

This PR is now ready for another review.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

ajwerner

Nice, this is getting close.

Reviewed 5 of 14 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sajjadrizvi)

pkg/bench/rttanalysis/testdata/benchmark_expectations, line 80 at r8 (raw file):

Quoted 6 lines of code…

26,Truncate/truncate_1_column_0_rows
26,Truncate/truncate_1_column_1_row
26,Truncate/truncate_1_column_2_rows
26,Truncate/truncate_2_column_0_rows
26,Truncate/truncate_2_column_1_rows
26,Truncate/truncate_2_column_2_rows

I have a feeling we're going to want to retain the range here of round-trips here (put the 26-27 back).

pkg/jobs/registry.go, line 375 at r8 (raw file):

no further jobs are created and the error is returned.

I find this language confusing. The caller shouldn't need to care about other jobs having been created. It makes it sound like it is not a transactional operation.

pkg/jobs/registry.go, line 378 at r8 (raw file):

func (r *Registry) CreateJobsWithTxn(
	ctx context.Context, txn *kv.Txn, records []*Record,
) ([]*Job, error) {

Do callers want the *Job or would they be happy with just a slice of IDs? I feel like IDs might be better.

pkg/jobs/registry.go, line 382 at r8 (raw file):

		return nil, errors.Errorf("no jobs to create.")
	}

can you refactor this a bit into a helper to write a batch and a helper to generate the statement and args. I think it should read something like:

created := make([]*Job, 0, len(records)
for toCreate := records; len(toCreate) > 0; {
    const maxBatchSize = 100
    batchSize := len(toCreate)
    if batchSize > maxBatchSize {
        batchSize = maxBatchSize
    }
    createdInBatch, err := r.createJobsWithTxnInBatch(ctx, toCreate[:batchSize])
    if err != nil {
        return nil, err
    }
    created = append(created, createdInBatch...)
    toCreate = toCreate[batchSize:]
}

func (r *Registry) createJobsWithTxnInBatch(ctx context.Context, txn *kv.Txn records []*Record) ([]*Job, error) {
    s, err  := r.sqlInstance.Session(ctx)
    if err != nil {
        return nil, errors.Wrap(err, "error getting live session")
    }
    stmt, args := batchJobInsertStmt(s, records)
    if err := r.ex.Exec(ctx, "job-rows-batch-insert", txn, stmt, args...); err != nil {
        return nil, err
    }
    // return the jobs etc. Maybe construct them when you construct the statement and args. 
}

pkg/jobs/registry.go, line 454 at r8 (raw file):

			if _, err = r.ex.Exec(ctx, "job-rows-batch-insert", txn, stmtSB.String(), args...,
			); err != nil {

nit: I find this wrapping to be bizarre. My preference is:

		if _, err = r.ex.Exec(
			ctx, "job-rows-batch-insert", txn, stmtSB.String(), args...,
		); err != nil {

pkg/jobs/registry_test.go, line 302 at r8 (raw file):

				}

				skip.UnderStress(t)

Isn't this covered above?

pkg/jobs/registry_test.go, line 334 at r8 (raw file):

				// Create jobs in a batch.
				var jobs []*Job
				require.NoError(t, kvDB.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {

I think you're going to have a bad time if the callback gets called more than once. You should reset the jobs slice in the callback.

pkg/jobs/registry_test.go, line 350 at r8 (raw file):

Quoted 4 lines of code…


				// and description? I was expecting that the type will be "IMPORT" in
				// the jobs table when the job is created, but it occurred to be NULL. Is
				// that expected? Similarly, description is NULL.

When you say type what do you mean? Which table? The description is a field in the record so it should be what you set it to. Maybe set it to some expected values.

pkg/sql/drop_table.go, line 509 at r8 (raw file):

	for _, mj := range tableDesc.MutationJobs {
		jobID := jobspb.JobID(mj.JobID)
		if record, exists := p.ExtendedEvalContext().SchemaChangeJobCache[tableDesc.ID]; exists {

This deserves some commentary for future readers as well as some commentary about this behavior on the function itself.

pkg/sql/table.go, line 119 at r8 (raw file):

Quoted 4 lines of code…

	var record *jobs.Record
	if cachedRecord, ok := p.extendedEvalCtx.SchemaChangeJobCache[tableDesc.ID]; ok {
		record = cachedRecord
	}

nit: replace this with the following code to allow you to eliminate some code.

record, recordExists :=  p.extendedEvalCtx.SchemaChangeJobCache[tableDesc.ID]

sajjadrizvi

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/bench/rttanalysis/testdata/benchmark_expectations, line 80 at r8 (raw file):

Previously, ajwerner wrote…

26,Truncate/truncate_1_column_0_rows
26,Truncate/truncate_1_column_1_row
26,Truncate/truncate_1_column_2_rows
26,Truncate/truncate_2_column_0_rows
26,Truncate/truncate_2_column_1_rows
26,Truncate/truncate_2_column_2_rows
I have a feeling we're going to want to retain the range here of round-trips here (put the 26-27 back).

Done.

pkg/jobs/jobs.go, line 81 at r2 (raw file):

Previously, ajwerner wrote…

Yeah, my advice is that we add ID as a field to record. It should simplify the interfaces.

Done.

pkg/jobs/registry.go, line 375 at r8 (raw file):

Previously, ajwerner wrote…

no further jobs are created and the error is returned.

I find this language confusing. The caller shouldn't need to care about other jobs having been created. It makes it sound like it is not a transactional operation.

Done.

pkg/jobs/registry.go, line 378 at r8 (raw file):

Previously, ajwerner wrote…

Do callers want the *Job or would they be happy with just a slice of IDs? I feel like IDs might be better.

That's right. I have changed it to IDs.

pkg/jobs/registry.go, line 382 at r8 (raw file):

Previously, ajwerner wrote…

can you refactor this a bit into a helper to write a batch and a helper to generate the statement and args. I think it should read something like:

created := make([]*Job, 0, len(records)
for toCreate := records; len(toCreate) > 0; {
    const maxBatchSize = 100
    batchSize := len(toCreate)
    if batchSize > maxBatchSize {
        batchSize = maxBatchSize
    }
    createdInBatch, err := r.createJobsWithTxnInBatch(ctx, toCreate[:batchSize])
    if err != nil {
        return nil, err
    }
    created = append(created, createdInBatch...)
    toCreate = toCreate[batchSize:]
}

func (r *Registry) createJobsWithTxnInBatch(ctx context.Context, txn *kv.Txn records []*Record) ([]*Job, error) {
    s, err  := r.sqlInstance.Session(ctx)
    if err != nil {
        return nil, errors.Wrap(err, "error getting live session")
    }
    stmt, args := batchJobInsertStmt(s, records)
    if err := r.ex.Exec(ctx, "job-rows-batch-insert", txn, stmt, args...); err != nil {
        return nil, err
    }
    // return the jobs etc. Maybe construct them when you construct the statement and args. 
}

This is much better. Thanks for the direction. I have updated.

pkg/jobs/registry.go, line 454 at r8 (raw file):

Previously, ajwerner wrote…

			if _, err = r.ex.Exec(ctx, "job-rows-batch-insert", txn, stmtSB.String(), args...,
			); err != nil {

nit: I find this wrapping to be bizarre. My preference is:

		if _, err = r.ex.Exec(
			ctx, "job-rows-batch-insert", txn, stmtSB.String(), args...,
		); err != nil {

Thanks, I have updated.

pkg/jobs/registry_test.go, line 302 at r8 (raw file):

Previously, ajwerner wrote…

Isn't this covered above?

Good catch, it's by mistake.

pkg/jobs/registry_test.go, line 334 at r8 (raw file):

Previously, ajwerner wrote…

I think you're going to have a bad time if the callback gets called more than once. You should reset the jobs slice in the callback.

Done.

pkg/jobs/registry_test.go, line 350 at r8 (raw file):

Previously, ajwerner wrote…

				// and description? I was expecting that the type will be "IMPORT" in
				// the jobs table when the job is created, but it occurred to be NULL. Is
				// that expected? Similarly, description is NULL.
When you say type what do you mean? Which table? The description is a field in the record so it should be what you set it to. Maybe set it to some expected values.

It was job_type in the internal table as we discussed offline.

pkg/sql/drop_table.go, line 509 at r8 (raw file):

Previously, ajwerner wrote…

This deserves some commentary for future readers as well as some commentary about this behavior on the function itself.

OK. Done.

pkg/sql/planner.go, line 96 at r2 (raw file):

Previously, ajwerner wrote…

the test logs show messages related to throttling and ranges going down.

I have a feeling that that's just noise at the end of the test and is not the true failure.

I have updated the structure.

pkg/sql/table.go, line 119 at r8 (raw file):

Previously, ajwerner wrote…

	var record *jobs.Record
	if cachedRecord, ok := p.extendedEvalCtx.SchemaChangeJobCache[tableDesc.ID]; ok {
		record = cachedRecord
	}

nit: replace this with the following code to allow you to eliminate some code.

record, recordExists :=  p.extendedEvalCtx.SchemaChangeJobCache[tableDesc.ID]

Done.

ajwerner

Get the linter passing and fix the nits and I think this is good to go.

Reviewed 1 of 3 files at r1, 1 of 14 files at r10.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

pkg/jobs/registry_test.go, line 334 at r8 (raw file):

Previously, sajjadrizvi (Sajjad Rizvi) wrote…

Done.

Not sure what I was talking about here

pkg/sql/drop_table.go, line 515 at r10 (raw file):

) error {
	for _, mj := range tableDesc.MutationJobs {
		mutationID := jobspb.JobID(mj.JobID)

mutationID is confusing as the name of a variable storing a job ID. MutatationID is a differnent concept which is related to jobs. jobID was a pretty good name, no?

pkg/sql/drop_table.go, line 521 at r10 (raw file):

		if record, exists := p.ExtendedEvalContext().SchemaChangeJobCache[tableDesc.ID]; exists {
			if record.JobID == mutationID {

nit: inline the condition with &&

sajjadrizvi

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/sql/drop_table.go, line 515 at r10 (raw file):

Previously, ajwerner wrote…

mutationID is confusing as the name of a variable storing a job ID. MutatationID is a differnent concept which is related to jobs. jobID was a pretty good name, no?

Yeah, I didn't realize mutation ID is an existing term.

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

pkg/bench/rttanalysis/testdata/benchmark_expectations, line 74 at r10 (raw file):

1,SystemDatabaseQueries/select_system.users_with_empty_database_Name
1,SystemDatabaseQueries/select_system.users_with_schema_Name
2,SystemDatabaseQueries/select_system.users_without_schema_Name

What is this about?

pkg/jobs/registry.go, line 416 at r10 (raw file):

	sessionID sqlliveness.SessionID, txn *kv.Txn, records []*Record,
) (string, []interface{}, []jobspb.JobID, error) {
	const columnValues = " ($%d, $%d, $%d, $%d, $%d, $%d)"

I don't think we're going to need this with the refactor I'm about to propose.

pkg/jobs/registry.go, line 424 at r10 (raw file):

Quoted 5 lines of code…

	start := timeutil.Now()
	if txn != nil {
		start = txn.ReadTimestamp().GoTime()
	}
	modifiedMicros := timeutil.ToUnixMicros(start)

nit: lift this above this call and just pass in the timestamp

pkg/jobs/registry.go, line 427 at r10 (raw file):

	// We have six columns per job.
	args := make([]interface{}, 0, len(records)*6)

I'm going to make a nit-picky suggestion for how to make this feel a little cleaner:

instanceID := r.ID()
const  numColumns = 6
columns := [numColumns]string{`id`, `status`, `payload`, `progress`, `claim_session_id`, `claim_instance_id`}
marshalPanic = func(m proto.Message) []byte {
      data, err := protoutil.Marshal(m)
       if err != nil {
           panic(err)
       }
       return data
 }
valueFns := map[string]func(*Record) interface{}{
       `id`: func(r *Record) interface{} { return r.ID },
       `status`: func(r *Record) interface{} { return StatusRunning },
       `payload`: func(r *Record) interface{} { return marshalPanic(r.Payload) },
       `progress`: func(r *Record) interface{} { return marshalPanic(r.Progress) },
       `claim_session_id`: func(r *Record) interface{} { return sessionID.UnsafeBytes() },
       `claim_instance_id`: func(r *Record) interface{} { return instanceID },
 }
appendValues := func(r *Record, val *[]interface{}) (err error) {
        defer func() {
            switch r := recover(); t.(type) {
            case nil:
            case error:
                err = errors.CombineErrors(err, errors.Wrap(r, "encoding job %d", r.ID)
            default:
                panic(r)
            }
        }()
        for _, c := range columns {
            (*vals) = append(*vals, valueFns[c](r)
        }
        return nil
 }
args := make([]interface{}, 0, len(records) * numColumns)
jobIDs := make([]jobspb.JobID, 0, len(records))

var buf strings.Builder
buf.WriteString(`INSERT INTO system.jobs (`
buf.WriteString(strings.Join(columns, ", "))
buf.WriteString(`) VALUES `)
argIdx := 0
for i, r := range records {
    if i > 0 {
        buf.WriteString(", ")
    }
    buf.WriteString("("
    for _, c := range columns {
         buf.WriteString("$")
         buf.WriteString(strconv.Itoa(argIdx))
         argIdx++
    }
    buf.WriteString(")")
    r.Progress.ModifiedMicros = modifiedMicros
    if err := appendValues(r, &args); err != nil {
        return "", nil, nil, err
    }
    jobIDs = append(jobIDs, job.ID)
}
return stmtSB.String(), args, jobIDs, nil

sajjadrizvi

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/bench/rttanalysis/testdata/benchmark_expectations, line 74 at r10 (raw file):

Previously, ajwerner wrote…

1,SystemDatabaseQueries/select_system.users_with_empty_database_Name
1,SystemDatabaseQueries/select_system.users_with_schema_Name
2,SystemDatabaseQueries/select_system.users_without_schema_Name

What is this about?

I am not sure how it happened. I have now updated only Grant test numbers.

pkg/jobs/registry.go, line 416 at r10 (raw file):

Previously, ajwerner wrote…

I don't think we're going to need this with the refactor I'm about to propose.

OK.

pkg/jobs/registry.go, line 424 at r10 (raw file):

Previously, ajwerner wrote…

	start := timeutil.Now()
	if txn != nil {
		start = txn.ReadTimestamp().GoTime()
	}
	modifiedMicros := timeutil.ToUnixMicros(start)
nit: lift this above this call and just pass in the timestamp

Done.

pkg/jobs/registry.go, line 427 at r10 (raw file):

Previously, ajwerner wrote…

I'm going to make a nit-picky suggestion for how to make this feel a little cleaner:

instanceID := r.ID()
const  numColumns = 6
columns := [numColumns]string{`id`, `status`, `payload`, `progress`, `claim_session_id`, `claim_instance_id`}
marshalPanic = func(m proto.Message) []byte {
      data, err := protoutil.Marshal(m)
       if err != nil {
           panic(err)
       }
       return data
 }
valueFns := map[string]func(*Record) interface{}{
       `id`: func(r *Record) interface{} { return r.ID },
       `status`: func(r *Record) interface{} { return StatusRunning },
       `payload`: func(r *Record) interface{} { return marshalPanic(r.Payload) },
       `progress`: func(r *Record) interface{} { return marshalPanic(r.Progress) },
       `claim_session_id`: func(r *Record) interface{} { return sessionID.UnsafeBytes() },
       `claim_instance_id`: func(r *Record) interface{} { return instanceID },
 }
appendValues := func(r *Record, val *[]interface{}) (err error) {
        defer func() {
            switch r := recover(); t.(type) {
            case nil:
            case error:
                err = errors.CombineErrors(err, errors.Wrap(r, "encoding job %d", r.ID)
            default:
                panic(r)
            }
        }()
        for _, c := range columns {
            (*vals) = append(*vals, valueFns[c](r)
        }
        return nil
 }
args := make([]interface{}, 0, len(records) * numColumns)
jobIDs := make([]jobspb.JobID, 0, len(records))

var buf strings.Builder
buf.WriteString(`INSERT INTO system.jobs (`
buf.WriteString(strings.Join(columns, ", "))
buf.WriteString(`) VALUES `)
argIdx := 0
for i, r := range records {
    if i > 0 {
        buf.WriteString(", ")
    }
    buf.WriteString("("
    for _, c := range columns {
         buf.WriteString("$")
         buf.WriteString(strconv.Itoa(argIdx))
         argIdx++
    }
    buf.WriteString(")")
    r.Progress.ModifiedMicros = modifiedMicros
    if err := appendValues(r, &args); err != nil {
        return "", nil, nil, err
    }
    jobIDs = append(jobIDs, job.ID)
}
return stmtSB.String(), args, jobIDs, nil

I have updated accordingly.

ajwerner

mod some nits and CI

Reviewed 6 of 14 files at r12.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

pkg/bench/rttanalysis/testdata/benchmark_expectations, line 43 at r12 (raw file):

19,DropSequence/drop_1_sequence
27,DropSequence/drop_2_sequences
35,DropSequence/drop_3_sequences

seems like some of these have changed.

pkg/jobs/registry.go, line 462 at r12 (raw file):

	appendValues := func(rec *Record, vals *[]interface{}) (err error) {
		defer func() {
			switch ret := recover(); ret.(type) {

ret is a confusing name for the recovered value. It's explicitly not something being returned.

pkg/sql/conn_executor.go, line 1092 at r12 (raw file):

		jobs jobsCollection

		// schemaChangeJobsCache is a map of descriptor IDs to job Records.

I wonder if this deserves a new name.

pkg/sql/conn_executor_exec.go, line 809 at r12 (raw file):

// createJobs creates jobs for the records cached in schemaChangeJobsCache
// during this transaction.
func (ex *connExecutor) createJobs(ctx context.Context) error {

nit: I'd put this below commitSQLTransactionInternal. I usually like to read functions which invoke helpers before the helpers.

pkg/sql/planner.go, line 96 at r12 (raw file):

	// SchemaChangeJobCache refers to schemaChangeJobsCache in extraTxnState.
	SchemaChangeJobCache map[descpb.ID]*jobs.Record

In parallel to the other question about names, I wonder if this deserves a new name. SchemaChangeJobRecords?

sajjadrizvi

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

pkg/sql/planner.go, line 96 at r12 (raw file):

Previously, ajwerner wrote…

In parallel to the other question about names, I wonder if this deserves a new name. SchemaChangeJobRecords?

Done, I have updated the name to SchemaChangeJobRecords.

ajwerner

, bors it when CI passes.

Reviewed 13 of 14 files at r14.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)

This commit adds a new function in Registry, 'CreateJobsWithTxn', that allows creating jobs in a batch using a single SQL statement. This function is a precursor to creating schema-changing jobs in a batch, which will result in reducing the number of round-trips between nodes during schema-changes. Release note: None

This commit changes the way schema-change jobs are created in a transaction. Previously, the jobs were created individually during the execution of a transaction. A job creation incurs a round trip to insert an entry in jobs table. This commit accumulates the jobs to be created during a transaction and creates them in a batch, which is expected to reduce the number of round-trips between distant nodes during a transaction execution. This is a step forward to improve the performance of GRANT/REVOKE queries in large databases. Release note: None Fixes: cockroachdb#64389

sajjadrizvi · 2021-08-05T19:39:36Z

TFTR!

bors r=ajwerner

craig · 2021-08-05T20:48:39Z

Build succeeded:

GitHub CI (Cockroach)

Commit cockroachdb#67991 introduced a test that turned out to be flaky. The test runs out of memory sometimes as it creates a very large batch of jobs. Nevertheless, the test doesn't contribute much value to the test suite because it only creates a large batch of jobs. The core functionality is already being tested with moderately large batch sizes. Therefore, the test is now removed in this commit. Release note: None Fixes: cockroachdb#68962

Commit cockroachdb#67991 introduced a test that turned out to be flaky. The test runs out of memory sometimes as it creates a very large batch of jobs. This fix disables job adoptions to avoid large memory use. Release note: None Fixes: cockroachdb#68962

68803: sql: add ON UPDATE syntax r=otan a=pawalt Previously, we did not have parsing implemented for ON UPDATE expressions on columns. This PR adds parsing for ON UPDATE expressions. Release note: None 69013: opt: fix panic in findJoinFilterRange r=mgartner a=mgartner This commit fixes a panic produced in `findJoinFilterRange` when attempting to access the 0-th constraint in an empty constraint set. Fixes #68975 There is no release note because the bug is not present in any releases. Release note: None 69014: jobs: fix test for batch jobs creation, marked as flaky r=ajwerner a=sajjadrizvi Commit #67991 introduced a test that turned out to be flaky. The test runs out of memory sometimes as it creates a very large batch of jobs. This fix disables job adoptions to avoid large memory use. Release note: None Fixes: #68962 69022: sql: implement combined iterator for transaction statistics r=maryliag a=Azhng Follow up to #68675 Release note: None 69031: ccl/backupccl: skip TestBackupRestoreSystemJobsProgress r=otan a=otan Refs: #68571 Reason: flaky test (this is flaking regularly on PRs) Generated by bin/skip-test. Release justification: non-production code changes Release note: None 69034: roachtest/tests/cdc: effectively disable exponential backoff in jobs r=ajwerner a=ajwerner Fixes #68963. Release note: None 69045: bazel: swap order of interfaces in generated mocks in `pkg/roachpb` r=rail a=rickystewart This makes the order match up to what we have in the checked-in generated code, quashing a diff when you `dev build --hoist-generated-code`. Release note: None Co-authored-by: Peyton Walters <peyton.walters@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Sajjad Rizvi <sajjad@cockroachlabs.com> Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>

sajjadrizvi requested a review from a team July 23, 2021 16:23

sajjadrizvi removed the request for review from a team July 23, 2021 16:43

sajjadrizvi marked this pull request as draft July 23, 2021 16:44

sajjadrizvi force-pushed the batch_job_creation branch from fc74a05 to 32be530 Compare July 24, 2021 10:23

sajjadrizvi marked this pull request as ready for review July 24, 2021 10:40

sajjadrizvi requested a review from a team July 24, 2021 10:40

sajjadrizvi changed the title ~~jobs: create jobs in a batch~~ [DNW] jobs: create jobs in a batch Jul 24, 2021

sajjadrizvi changed the title ~~[DNW] jobs: create jobs in a batch~~ [DNM] jobs: create jobs in a batch Jul 24, 2021

sajjadrizvi marked this pull request as draft July 24, 2021 23:51

sajjadrizvi force-pushed the batch_job_creation branch from 32be530 to 0771caf Compare July 25, 2021 00:01

sajjadrizvi changed the title ~~[DNM] jobs: create jobs in a batch~~ [DNM] jobs,sql: create jobs in a batch in a transactions Jul 25, 2021

sajjadrizvi force-pushed the batch_job_creation branch from 0771caf to e6c6356 Compare July 25, 2021 00:21

sajjadrizvi marked this pull request as ready for review July 25, 2021 00:28

sajjadrizvi requested a review from a team July 25, 2021 00:28

sajjadrizvi requested review from a team and removed request for a team July 25, 2021 00:48

ajwerner reviewed Jul 26, 2021

View reviewed changes

sajjadrizvi commented Jul 26, 2021

View reviewed changes

ajwerner reviewed Jul 26, 2021

View reviewed changes

sajjadrizvi force-pushed the batch_job_creation branch from e6c6356 to 441032c Compare July 27, 2021 05:28

sajjadrizvi commented Jul 27, 2021

View reviewed changes

sajjadrizvi force-pushed the batch_job_creation branch from 441032c to 704360c Compare July 27, 2021 13:52

sajjadrizvi commented Jul 27, 2021

View reviewed changes

sajjadrizvi force-pushed the batch_job_creation branch from 704360c to 622cbc6 Compare July 27, 2021 14:26

sajjadrizvi changed the title ~~[DNM] jobs,sql: create jobs in a batch in a transactions~~ jobs,sql: create jobs in a batch in a transactions Jul 27, 2021

ajwerner reviewed Jul 27, 2021

View reviewed changes

sajjadrizvi commented Jul 28, 2021

View reviewed changes

sajjadrizvi force-pushed the batch_job_creation branch from 622cbc6 to 29c3d17 Compare July 28, 2021 16:29

ajwerner reviewed Jul 28, 2021

View reviewed changes

sajjadrizvi commented Jul 28, 2021

View reviewed changes

ajwerner reviewed Jul 29, 2021

View reviewed changes

sajjadrizvi commented Jul 30, 2021

View reviewed changes

sajjadrizvi force-pushed the batch_job_creation branch 2 times, most recently from c48274c to 4e6017e Compare July 30, 2021 20:38

ajwerner approved these changes Aug 2, 2021

View reviewed changes

sajjadrizvi commented Aug 5, 2021

View reviewed changes

sajjadrizvi force-pushed the batch_job_creation branch from 4e6017e to 392aab8 Compare August 5, 2021 13:48

ajwerner reviewed Aug 5, 2021

View reviewed changes

Sajjad Rizvi added 2 commits August 5, 2021 14:18

sajjadrizvi force-pushed the batch_job_creation branch from 392aab8 to 12de76f Compare August 5, 2021 18:18

craig bot merged commit d0f2981 into cockroachdb:master Aug 5, 2021

sajjadrizvi mentioned this pull request Aug 16, 2021

jobs: fix test for batch jobs creation, marked as flaky #69014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs,sql: create jobs in a batch in a transactions #67991

jobs,sql: create jobs in a batch in a transactions #67991

sajjadrizvi commented Jul 23, 2021 •

edited

Loading

cockroach-teamcity commented Jul 23, 2021

sajjadrizvi commented Jul 25, 2021

ajwerner left a comment

ajwerner Jul 26, 2021

ajwerner Jul 26, 2021

ajwerner Jul 26, 2021

ajwerner Jul 26, 2021

ajwerner Jul 26, 2021

sajjadrizvi left a comment

ajwerner left a comment

sajjadrizvi left a comment

sajjadrizvi left a comment

sajjadrizvi left a comment

ajwerner left a comment

sajjadrizvi left a comment

ajwerner left a comment

sajjadrizvi left a comment

ajwerner left a comment

sajjadrizvi left a comment

ajwerner left a comment

sajjadrizvi left a comment

ajwerner left a comment

sajjadrizvi commented Aug 5, 2021

craig bot commented Aug 5, 2021

jobs,sql: create jobs in a batch in a transactions #67991

jobs,sql: create jobs in a batch in a transactions #67991

Conversation

sajjadrizvi commented Jul 23, 2021 • edited Loading

cockroach-teamcity commented Jul 23, 2021

sajjadrizvi commented Jul 25, 2021

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner Jul 26, 2021

Choose a reason for hiding this comment

ajwerner Jul 26, 2021

Choose a reason for hiding this comment

ajwerner Jul 26, 2021

Choose a reason for hiding this comment

ajwerner Jul 26, 2021

Choose a reason for hiding this comment

ajwerner Jul 26, 2021

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

sajjadrizvi left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

sajjadrizvi commented Aug 5, 2021

craig bot commented Aug 5, 2021

sajjadrizvi commented Jul 23, 2021 •

edited

Loading