workload/schemachange: running workload with same seed generates different schema identifiers #117422

renatolabs · 2024-01-05T21:04:06Z

Running the schemachange workload with the same seed (with COCKROACH_RANDOM_SEED=x) produces the same operations, but with different identifiers. For example, at the time of writing, these are the first two operations generated by running the following command two times:

COCKROACH_RANDOM_SEED=9082679504586840611 ./bin/workload run schemachange --concurrency 2 --max-ops 10 --verbose 1

Run 1:

{
 "workerId": 1,
 "clientTimestamp": "20:42:05.993818",
 "ops": [
  "BEGIN",
  {
   "sql": "CREATE INDEX \"IrrelevantName\" ON public.table2 (\"IrrelevantColumn\" ASC)",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table2\" does not exist (SQLSTATE 42P01)"
}
{
 "workerId": 0,
 "clientTimestamp": "20:42:05.993707",
 "ops": [
  "BEGIN",
  {
   "sql": "DROP TABLE public.table3",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table3\" does not exist (SQLSTATE 42P01)"
}

Run 2:

{
 "workerId": 1,
 "clientTimestamp": "20:42:13.542074",
 "ops": [
  "BEGIN",
  {
   "sql": "CREATE INDEX \"IrrelevantName\" ON public.table7 (\"IrrelevantColumn\" ASC)",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table7\" does not exist (SQLSTATE 42P01)"
}
{
 "workerId": 0,
 "clientTimestamp": "20:42:13.542037",
 "ops": [
  "BEGIN",
  {
   "sql": "DROP TABLE public.table8",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table8\" does not exist (SQLSTATE 42P01)"
}

There seems to be some other form of non determinism outside the seed passed that makes the identifiers different. Having schemachange generate the same identifiers has at least two big advantages:

increases the chances of us being able to reproduce failures, especially in cases where the failure happens due to interactions of several calls to the schemachange workload (for example, the failure observed in roachtest: set random seed in schemachange step of version-upgrade #117409).
ease of debugging: it's a better debugging experience if we are able to run the test with the same seed and scan the output generated by the workload if the identifiers are identical.

Jira issue: CRDB-35168

The text was updated successfully, but these errors were encountered:

rafiss · 2024-01-09T19:16:20Z

#117332 may have addressed this. @renatolabs do you have an easy way to check?

renatolabs · 2024-01-09T20:00:03Z

@renatolabs do you have an easy way to check?

Easiest way, IMO, would be to run the workload with the same seed and comparing, as I did in the issue.

I ran the commands above again on master (2c3b07e) and I still see different identifiers.

rafiss · 2024-01-09T20:57:38Z

Ah I see, thanks.

It looks like the names are generated here:

cockroach/pkg/workload/schemachange/operation_generator.go

Lines 3580 to 3585 in 6bf526e

    
           if og.randIntn(100) >= pctExisting { 
        
           	treeTableName := tree.MakeTableNameFromPrefix(tree.ObjectNamePrefix{ 
        
           		SchemaName:     tree.Name(desiredSchema), 
        
           		ExplicitSchema: true, 
        
           	}, tree.Name(fmt.Sprintf("table%d", og.newUniqueSeqNum()))) 
        
           	return &treeTableName, nil

The problem is that part of the name is og.newUniqueSeqNum() , which is just a counter shared between all the workers, and it doesn't use the RNG. So that means the names are dependent on goroutine scheduling.

We can't just make seqNum unique per worker, since then that could mean different workers would both try to create a table with the same name. But perhaps we can make the new table name be of the format table_{workerID}_{seqNum}

renatolabs · 2024-01-09T21:25:47Z

we can make the new table name be of the format table_{workerID}_{seqNum}

Yeah, I like that option!

117054: CRDB-28040 : JWKS fetch from jwks_uri r=BabuSrithar a=BabuSrithar Fixes CRDB-28040 ( JWKS fetch from jwks_uri ) This commit adds capability to fetch remote JWKS from issuer's jwks_uri endpoint. This will satisfy the requirement to have an ability to automatically fetch the new JWK when the existing JWK is rotated - without human intervention or custom scripts. Changes include 1. The existing order of token signature verification first and rest of claims next is modified to get issuer first and then the token signature verification. This change is requied to determine the issuer for which the jwks has to be fetched remotely. 2. Introduction of a new cluster setting called `server.jwt_authentication.jwks_auto_fetch.enabled` 3. Depending on the value of `server.jwt_authentication.jwks_auto_fetch.enabled` use JWKS configured through cluster setting or remotely fetch JWKS from jwks_uri of the issuer 4. Modification to exiting test cases to match the new order of verification steps. The change is backward compatible and not changes required in existing deployments and JWT Auth usage. 117226: kvserver: export latency histogram for time spent waiting for latches r=lyang24 a=lyang24 This commit adds a `HdrHistogram` that measures time spend waiting on latches for better observability on long latch waits. This histogram is stored in StorageMetrics and latch manager will receive a pointer to it when replica init. It will measure the time spend on wait method of spanlatch manager. sanity check new metrics with kv workload <img width="990" alt="Screenshot 2024-02-02 at 10 54 12 AM" src="https://github.com/cockroachdb/cockroach/assets/20375035/226610e2-dd86-424a-bf13-911cbc96dc07"> Fixes: #104682 Release note: None 118519: sql/schemachanger: remove alter table add column limitations with not null constraints r=fqazi a=fqazi Previously we had the following limitations with ALTER TABLE ADD COLUMN: 1. We disallowed ADD COLUMN ... VIRTUAL NOT NULL since our implementation did not do validation properly allowing this statement to succeed when it should fail. 2. We disallowed ADD COLUMN .... UNIQUE NOT NULL because there was a risk of data loss in our index implementation. This was resolved by allowing multiple table mutations in a single statement. This patch enables both of these statements after adding validation inside the end to end testing to ensure that DML injected scenarios work correctly. This patch also enhances some of the output generated from the DML injection tests to make things easier to diagnose. Fixes: #117647 118666: mixedversion: unify signature of user and framework functions r=DarrylWong a=renatolabs Previously, the signature of functions (hooks) provided by the user was defined by the `userFunc` type, and the implementation of framework steps was defined by the `Run` function of the `singleStep` interface. These functions were slightly different even though there was no need for the distinction. This commit unifies the signature of both types of functions and renames `userFunc` to `stepFunc` to reflect the generality. With this change, functions implemented by the framework now also have an RNG instance passed to them automatically. This will become especially useful when we have mutator steps, where some randomization will often take place. Epic: none Release note: None **mixedversion: simplify steps by removing crdbNodes requirement** Some steps implemented by the framework needed to be instantiated with `crdbNodes`, a list of nodes where cockroachdb is running. This was both repetitive and unnecessary. All steps have access to the helper struct, which should given access, via private fields, to the list of crdb nodes. 119042: builtins,workload: add setseed function; use it to make RSW more deterministic r=rafiss a=rafiss This implements the setseed function that appears in PostgreSQL. We then use it to make the ORDER BY random() clauses used in the random schema workload (RSW) deterministic. Each worker goroutine of the RSW now runs a determistic sequence of operations given the same COCKROACH_RANDOM_SEED. However, since the workload is concurrent, the overall RSW is not deterministic, since the interleaving of goroutine execution is random. informs #117422 Release note (sql change): Added the setseed() builtin function. It sets the seed for the random generator used by the random() builtin function. 119044: workload/schemachange: enable DROP SCHEMA statements r=rafiss a=rafiss This removes an outdated check for cross-schema type references. DROP SCHEMA used to not support that, but it does now. fixes #116792 Release note: None 119075: release: fix release branch pattern r=jlinder a=rail Previously, we used different patterns to match release branches in `make-and-publish-build-tagging.sh` and `make-and-publish-build=artifacts-docker.sh`. This PR syncs the patterns they use. Epic: none Release note: none Release justification: release automation changes Co-authored-by: babusrithar <babusrithar@cockroachlabs.com> Co-authored-by: lyang24 <lanqingy@usc.edu> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: Renato Costa <renato@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Rail Aliiev <rail@iqchoice.com>

renatolabs added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jan 5, 2024

blathers-crl bot added this to Triage in SQL Foundations Jan 5, 2024

renatolabs mentioned this issue Jan 5, 2024

roachtest: set random seed in schemachange step of version-upgrade #117409

Merged

rafiss self-assigned this Jan 9, 2024

rafiss mentioned this issue Jan 9, 2024

workload,randgen: make schema object names more deterministic #117585

Merged

craig bot closed this as completed in 68f836a Jan 25, 2024

SQL Foundations automation moved this from Triage to Done Jan 25, 2024

rafiss mentioned this issue Feb 10, 2024

builtins,workload: add setseed function; use it to make RSW more deterministic #119042

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workload/schemachange: running workload with same seed generates different schema identifiers #117422

workload/schemachange: running workload with same seed generates different schema identifiers #117422

renatolabs commented Jan 5, 2024 •

edited by cockroach-jira-scripts

rafiss commented Jan 9, 2024

renatolabs commented Jan 9, 2024

rafiss commented Jan 9, 2024

renatolabs commented Jan 9, 2024

workload/schemachange: running workload with same seed generates different schema identifiers #117422

workload/schemachange: running workload with same seed generates different schema identifiers #117422

Comments

renatolabs commented Jan 5, 2024 • edited by cockroach-jira-scripts

rafiss commented Jan 9, 2024

renatolabs commented Jan 9, 2024

rafiss commented Jan 9, 2024

renatolabs commented Jan 9, 2024

renatolabs commented Jan 5, 2024 •

edited by cockroach-jira-scripts