Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload/schemachange: running workload with same seed generates different schema identifiers #117422

Closed
renatolabs opened this issue Jan 5, 2024 · 4 comments · Fixed by #117585
Assignees
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@renatolabs
Copy link
Collaborator

renatolabs commented Jan 5, 2024

Running the schemachange workload with the same seed (with COCKROACH_RANDOM_SEED=x) produces the same operations, but with different identifiers. For example, at the time of writing, these are the first two operations generated by running the following command two times:

COCKROACH_RANDOM_SEED=9082679504586840611 ./bin/workload run schemachange --concurrency 2 --max-ops 10 --verbose 1

Run 1:

{
 "workerId": 1,
 "clientTimestamp": "20:42:05.993818",
 "ops": [
  "BEGIN",
  {
   "sql": "CREATE INDEX \"IrrelevantName\" ON public.table2 (\"IrrelevantColumn\" ASC)",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table2\" does not exist (SQLSTATE 42P01)"
}
{
 "workerId": 0,
 "clientTimestamp": "20:42:05.993707",
 "ops": [
  "BEGIN",
  {
   "sql": "DROP TABLE public.table3",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table3\" does not exist (SQLSTATE 42P01)"
}

Run 2:

{
 "workerId": 1,
 "clientTimestamp": "20:42:13.542074",
 "ops": [
  "BEGIN",
  {
   "sql": "CREATE INDEX \"IrrelevantName\" ON public.table7 (\"IrrelevantColumn\" ASC)",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table7\" does not exist (SQLSTATE 42P01)"
}
{
 "workerId": 0,
 "clientTimestamp": "20:42:13.542037",
 "ops": [
  "BEGIN",
  {
   "sql": "DROP TABLE public.table8",
   "expectedExecErr": "42P01"
  }
 ],
 "expectedExecErrors": "42P01",
 "expectedCommitErrors": "",
 "message": "ROLLBACK; Successfully got expected execution error.: ERROR: relation \"public.table8\" does not exist (SQLSTATE 42P01)"
}

There seems to be some other form of non determinism outside the seed passed that makes the identifiers different. Having schemachange generate the same identifiers has at least two big advantages:

  • increases the chances of us being able to reproduce failures, especially in cases where the failure happens due to interactions of several calls to the schemachange workload (for example, the failure observed in roachtest: set random seed in schemachange step of version-upgrade #117409).
  • ease of debugging: it's a better debugging experience if we are able to run the test with the same seed and scan the output generated by the workload if the identifiers are identical.

Jira issue: CRDB-35168

@renatolabs renatolabs added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jan 5, 2024
@blathers-crl blathers-crl bot added this to Triage in SQL Foundations Jan 5, 2024
@rafiss
Copy link
Collaborator

rafiss commented Jan 9, 2024

#117332 may have addressed this. @renatolabs do you have an easy way to check?

@rafiss rafiss self-assigned this Jan 9, 2024
@renatolabs
Copy link
Collaborator Author

@renatolabs do you have an easy way to check?

Easiest way, IMO, would be to run the workload with the same seed and comparing, as I did in the issue.

I ran the commands above again on master (2c3b07e) and I still see different identifiers.

@rafiss
Copy link
Collaborator

rafiss commented Jan 9, 2024

Ah I see, thanks.

It looks like the names are generated here:

if og.randIntn(100) >= pctExisting {
treeTableName := tree.MakeTableNameFromPrefix(tree.ObjectNamePrefix{
SchemaName: tree.Name(desiredSchema),
ExplicitSchema: true,
}, tree.Name(fmt.Sprintf("table%d", og.newUniqueSeqNum())))
return &treeTableName, nil

The problem is that part of the name is og.newUniqueSeqNum() , which is just a counter shared between all the workers, and it doesn't use the RNG. So that means the names are dependent on goroutine scheduling.

We can't just make seqNum unique per worker, since then that could mean different workers would both try to create a table with the same name. But perhaps we can make the new table name be of the format table_{workerID}_{seqNum}

@renatolabs
Copy link
Collaborator Author

we can make the new table name be of the format table_{workerID}_{seqNum}

Yeah, I like that option!

@craig craig bot closed this as completed in 68f836a Jan 25, 2024
SQL Foundations automation moved this from Triage to Done Jan 25, 2024
craig bot pushed a commit that referenced this issue Feb 12, 2024
117054: CRDB-28040 : JWKS fetch from jwks_uri r=BabuSrithar a=BabuSrithar

Fixes CRDB-28040 ( JWKS fetch from jwks_uri )
This commit adds capability to fetch remote JWKS from issuer's jwks_uri endpoint. This will satisfy the requirement to have an ability to automatically fetch the new JWK when the existing JWK is rotated - without human intervention or custom scripts.

Changes include

1. The existing order of token signature verification first and rest of claims next is modified to get issuer first and then the token signature verification. This change is requied to determine the issuer for which the jwks has to be fetched remotely.

2. Introduction of a new cluster setting called `server.jwt_authentication.jwks_auto_fetch.enabled`

3. Depending on the value of `server.jwt_authentication.jwks_auto_fetch.enabled` use JWKS configured through cluster setting or remotely fetch JWKS from jwks_uri of the issuer

4. Modification to exiting test cases to match the new order of verification steps.

The change is backward compatible and not changes required in existing deployments and JWT Auth usage.

117226: kvserver: export latency histogram for time spent waiting for latches r=lyang24 a=lyang24

This commit adds a `HdrHistogram` that measures time spend waiting on latches for better observability on long latch waits. This histogram is stored in StorageMetrics and latch manager will receive a pointer to it when replica init. It will measure the time spend on wait method of spanlatch manager.

sanity check new metrics with kv workload
<img width="990" alt="Screenshot 2024-02-02 at 10 54 12 AM" src="https://github.com/cockroachdb/cockroach/assets/20375035/226610e2-dd86-424a-bf13-911cbc96dc07">

Fixes: #104682

Release note: None

118519: sql/schemachanger: remove alter table add column limitations with not null constraints r=fqazi a=fqazi

Previously we had the following limitations with ALTER TABLE ADD COLUMN:

1. We disallowed ADD COLUMN ... VIRTUAL NOT NULL since our implementation did not do validation properly allowing this statement to succeed when it should fail.
2. We disallowed  ADD COLUMN .... UNIQUE NOT NULL because there was a risk of data loss in our index implementation. This was resolved by allowing multiple table mutations in a single statement.

This patch enables both of these statements after adding validation inside the end to end testing to ensure that DML injected scenarios work correctly. This patch also enhances some of the output generated from the DML injection tests to make things easier to diagnose.

Fixes: #117647





118666: mixedversion: unify signature of user and framework functions r=DarrylWong a=renatolabs

Previously, the signature of functions (hooks) provided by the user
was defined by the `userFunc` type, and the implementation of
framework steps was defined by the `Run` function of the `singleStep`
interface. These functions were slightly different even though there
was no need for the distinction.

This commit unifies the signature of both types of functions and
renames `userFunc` to `stepFunc` to reflect the generality. With this
change, functions implemented by the framework now also have an RNG
instance passed to them automatically. This will become especially
useful when we have mutator steps, where some randomization will often
take place.

Epic: none

Release note: None

**mixedversion: simplify steps by removing crdbNodes requirement**
Some steps implemented by the framework needed to be instantiated with
`crdbNodes`, a list of nodes where cockroachdb is running. This was
both repetitive and unnecessary. All steps have access to the helper
struct, which should given access, via private fields, to the list of
crdb nodes.

119042: builtins,workload: add setseed function; use it to make RSW more deterministic r=rafiss a=rafiss

This implements the setseed function that appears in PostgreSQL.

We then use it to make the ORDER BY random()
clauses used in the random schema workload (RSW) deterministic.

Each worker goroutine of the RSW now runs a determistic sequence of
operations given the same COCKROACH_RANDOM_SEED. However, since the
workload is concurrent, the overall RSW is not deterministic, since the
interleaving of goroutine execution is random.

informs #117422

Release note (sql change): Added the setseed() builtin function. It sets
the seed for the random generator used by the random() builtin function.

119044: workload/schemachange: enable DROP SCHEMA statements r=rafiss a=rafiss

This removes an outdated check for cross-schema type references. DROP
SCHEMA used to not support that, but it does now.

fixes #116792
Release note: None

119075: release: fix release branch pattern r=jlinder a=rail

Previously, we used different patterns to match release branches in `make-and-publish-build-tagging.sh` and
`make-and-publish-build=artifacts-docker.sh`.

This PR syncs the patterns they use.

Epic: none
Release note: none
Release justification: release automation changes

Co-authored-by: babusrithar <babusrithar@cockroachlabs.com>
Co-authored-by: lyang24 <lanqingy@usc.edu>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Rail Aliiev <rail@iqchoice.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
Development

Successfully merging a pull request may close this issue.

2 participants