Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asim: add randomness to range generation #106311

Closed
wenyihu6 opened this issue Jul 6, 2023 · 0 comments · Fixed by #108099
Closed

asim: add randomness to range generation #106311

wenyihu6 opened this issue Jul 6, 2023 · 0 comments · Fixed by #108099
Assignees
Labels
A-kv-simulation Relating to allocation simulation. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects

Comments

@wenyihu6
Copy link
Contributor

wenyihu6 commented Jul 6, 2023

The issue tracks work for adding randomness to range generation.

Related: #106192

This initial phase of the project focuses on creating a small scale testing
framework. This task involves introducing randomness to the range generation
process while using a constant initial cluster setup for node / store placement,
localities, and zone configurations. These configurations will be based on
widely-used default configurations which are already satisfiable and valid,
eliminating the need for additional validation.

The randomness will be primarily stemmed from varying range factors such as
replication factor, key space, number of bytes, and leaseholder placement. We
may also add extra nodes with random localities in subsequent stages, but these
should not influence the test outcome.

The test’s pass or fail criterion will be based on conformance assertion to
ensure none of the replicas are over-replicated, under-replicated, or violated
constraints.

Jira issue: CRDB-29498

@wenyihu6 wenyihu6 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 6, 2023
@wenyihu6 wenyihu6 self-assigned this Jul 6, 2023
@wenyihu6 wenyihu6 added the A-kv-simulation Relating to allocation simulation. label Jul 6, 2023
@wenyihu6 wenyihu6 added the T-kv KV Team label Jul 6, 2023
@blathers-crl blathers-crl bot added this to Incoming in KV Jul 6, 2023
@wenyihu6 wenyihu6 moved this from Incoming to Current Milestone / In Progress in KV Jul 6, 2023
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jul 18, 2023
This patch lays the backbone of the randomized testing framework. Currently, it
only supports default configuration for all options, implying that there is no
randomization yet. Additionally, it refactors some of the existing structure in
data_driven_test. Note that this should not change any existing behavior, and
the main purpose is to make future commits cleaner.

Part of: cockroachdb#106311
Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jul 18, 2023
This patch takes the first step towards a randomized framework by enabling asim
testing to randomly select a cluster information configuration from a set of
predefined choices. These choices are hardcoded and represent common cluster
configurations.

Part of: cockroachdb#106311
Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jul 20, 2023
This patch lays the backbone of the randomized testing framework. Currently, it
only supports default configuration for all options, implying that there is no
randomization yet. Additionally, it refactors some of the existing structure in
data_driven_test. Note that this should not change any existing behavior, and
the main purpose is to make future commits cleaner.

Part of: cockroachdb#106311
Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jul 21, 2023
This patch takes the first step towards a randomized framework by enabling asim
testing to randomly select a cluster information configuration from a set of
predefined choices. These choices are hardcoded and represent common cluster
configurations.

Part of: cockroachdb#106311
Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 21, 2023
Previously, the randomized testing framework depends on default settings
hardcoded in the tests, requiring users to change code-configured
parameters to change the settings. This patch converts the framework to a
data-driven approach, enabling more dynamic user inputs, more testing examples,
and greater visibility into what each iteration is testing.

TestRandomized is a randomized data-driven testing framework that validates
allocators by creating randomized configurations. It is designed for
regression and exploratory testing.

**There are three modes for every aspect of randomized generation.**
- Static Mode:
1. If randomization options are disabled (e.g. no rand_ranges command is
used), the system uses the default configurations (defined in
default_settings.go) with no randomization.
- Randomized: two scenarios occur:
2. Use default settings for randomized generation (e.g.rand_ranges)
3. Use settings specified with commands (e.g.rand_ranges
range_gen_type=zipf)

**The following commands are provided:**
```
1. "rand_cluster" [cluster_gen_type=(single_region|multi_region|any_region)]
	e.g. rand_cluster cluster_gen_type=(multi_region)
	- rand_cluster: randomly picks a predefined cluster configuration
   according to the specified type.
	- cluster_gen_type (default value is multi_region) is cluster
   configuration type. On the next eval, the cluster is generated as the
   initial state of the simulation.

2. "rand_ranges" [placement_type=(even|skewed|random|weighted_rand)]
	[replication_factor=<int>] [range_gen_type=(uniform|zipf)]
	[keyspace_gen_type=(uniform|zipf)] [weighted_rand=(<[]float64>)]
	e.g. rand_ranges placement_type=weighted_rand weighted_rand=(0.1,0.2,0.7)
	e.g. rand_ranges placement_type=skewed replication_factor=1
		 range_gen_type=zipf keyspace_gen_type=uniform
	- rand_ranges: randomly generate a distribution of ranges across stores
   based on the specified parameters. On the next call to eval, ranges and
   their replica placement are generated and loaded to initial state.
	- placement_type(default value is even): defines the type of range placement
	  distribution across stores. Once set, it remains constant across
	  iterations with no randomization involved.
	- replication_factor(default value is 3): represents the replication factor
	  of each range. Once set, it remains constant across iterations with no
	  randomization involved.
	- range_gen_type(default value is uniform): represents the type of
	  distribution used to yield the range parameter as ranges are generated
   across iterations (range ∈[1, 1000]).
	- keyspace_gen_type: represents the type of distribution used to yield the
   keyspace parameter as ranges are generated across iterations
   (keyspace ∈[1000,200000]).
	- weighted_rand: specifies the weighted random distribution among stores.
	  Requirements (will panic otherwise): 1. weighted_rand should only be
   used with placement_type=weighted_rand and vice versa. 2. Must specify a
   weight between [0.0, 1.0] for each element in the array, with each element
   corresponding to a store 3. len(weighted_rand) cannot be greater than
   number of stores 4. sum of weights in the array should be equal to 1

3. "eval" [seed=<int64>] [num_iterations=<int>] [duration=<time.Duration>]
[verbose=<bool>]
e.g. eval seed=20 duration=30m2s verbose=true
   - eval: generates a simulation based on the configuration set with the given
   commands.
   - seed(default value is int64(42)): used to create a new random number
   generator which will then be used to create a new seed for each iteration.
   - num_iterations(default value is 3): specifies the number of simulations to
   run.
   - duration(default value is 10m): defines duration of each iteration.
   - verbose(default value is false): if set to true, plots all stat(as
   specified by defaultStat) history.
```

RandTestingFramework is initialized with specified testSetting and maintains
its state across all iterations. It repeats the test with different random
configurations. Each iteration in RandTestingFramework executes the following
steps:
1. Generates a random configuration: based on whether randOption is on and
the specific settings for randomized generation.
2. Executes the simulation and checks the assertions on the final state.
3. Stores any outputs and assertion failures in a buffer.

Release note: None
Part Of: cockroachdb#106311
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 21, 2023
Previously, the randomized testing framework's output only indicates whether
each iteration passes. This lack of of detail makes checking the randomized
framework and debugging challenging. This patch adds more info to the output,
including the selected configurations, the initial state of each simulation.

Additionally, this patch removes the verbosity flag for printing history plots
as it does not seem to have any practical use cases.

New verbosity flags for eval are now supported.
```
"eval"
[verbose=(<[]("result_only","test_settings","initial_state","config_gen","topology","all")>)]
- verbose(default value is OutputResultOnly): used to set flags on what to
   show in the test output messages. By default, all details are displayed
   upon assertion failure.
   - result_only: only shows whether the test passed or failed, along with
   any failure messages
   - test_settings: displays settings used for the repeated tests
   - initial_state: displays the initial state of each test iteration
   - config_gen: displays the input configurations generated for each test
   iteration
   - topology: displays the topology of cluster configurations
   - all: display everything above
```

Part Of: [cockroachdb#106311](https://github.com/kvoli/cockroach/issues/106311)
Release Note: none
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 21, 2023
This patch allows users to modify the settings for the static mode within the
randomized testing framework. The following command is now supported:
```
4. “change_static_option”[nodes=<int>][stores_per_node=<int>]
[rw_ratio=<float64>] [rate=<float64>] [min_block=<int>] [max_block=<int>]
[min_key=<int64>] [max_key=<int64>] [skewed_access=<bool>] [ranges=<int>]
[placement_type=<gen.PlacementType>] [key_space=<int>]
[replication_factor=<int>] [bytes=<int64>] [stat=<string>] [height=<int>]
[width=<int>]
e.g. change_static_option nodes=2 stores_per_node=3 placement_type=skewed
	- Change_static_option: modifies the settings for the static mode where no
	  randomization is involved. Note that this does not change the default
	  settings for any randomized generation.
	- nodes (default value is 3): number of nodes in the generated cluster
	- storesPerNode (default value is 1): number of store per nodes in the
	  generated cluster
	- rwRatio (default value is 0.0): read-write ratio of the generated load
	- rate (default value is 0.0): rate at which the load is generated
	- minBlock (default value is 1): min size of each load event
	- maxBlock (default value is 1): max size of each load event
	- minKey (default value is int64(1)): min key of the generated load
	- maxKey (default value is int64(200000)): max key of the generated load
	- skewedAccess (default value is false): is true, workload key generator is
	  skewed (zipf)
	- ranges (default value is 1): number of generated ranges
	- keySpace (default value is 200000): keyspace for the generated range
	- placementType (default value is gen.Even): type of distribution for how
	  ranges are distributed across stores
	- replicationFactor (default value is 3): number of replica for each range
	- bytes (default value is int64(0)): size of each range in bytes
	- stat (default value is “replicas”): specifies the output to be plotted
	  for the verbose option
	- height (default value is 15): height of the plot
	- width (default value is 80): width of the plot

In addition, verbose=(static_settings) can now be used to display settings used
for static options where no randomization is involved.
```

Part of: cockroachdb#106311
Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 21, 2023
Previously, we enforce that the length of a given `weighted_rand` cannot exceed
the number of stores. This was challenging for users as they might not know the
cluster configuration that would be generated and thus do not know the number of
stores. In addition, if the length of `weighted_rand` was less than total number
of stores, any stores outside of the `weighted_rand` range would simply have
zero replicas. This could lead to confusion.

To improve user control, this patch disables the use of weighted_rand with
randomized cluster generation. Requirements to use weighted_rand:
1. use static option for cluster generation
2. specify nodes(default:3) and stores_per_node(default:1) through
change_static_option
3. ensure len(weighted_rand) == number of stores == nodes * stores_per_node

In addition to these new rules, the following existing requirements remain in
place:
1. weighted_rand should only be used with placement_type=weighted_rand and vice
versa.
2. must specify a weight between [0.0, 1.0] for each element in the array, with
each element corresponding to a store
3. sum of weights in the array should be equal to 1

Resolves: cockroachdb#106311

Release note: None
craig bot pushed a commit that referenced this issue Aug 21, 2023
…109142 #109152 #109156 #109157 #109161 #109165 #109166 #109172

107957: asim: convert randomized testing to data-driven r=kvoli a=wenyihu6

**asim: remove extra parsing for []float64, float64, time.Duration**

In cockroachdb/datadriven#45, we upstreamed the
scanning implementation in `datadriven` library. We can now handle parsing of
[]float64, float64, and time.Duration without additional handling.

Release Note: none
Epic: none

---

**asim: enable user-defined repliFactor, placement in rand range_gen**

This patch introduces two additional options for randomized range generations,
letting users define  replication factor and placement type. Although some
aspects of ranges configs are randomly generated (ranges and keyspace), these
two configurations are not randomized. Once set by the user, the configuration
will persist across iterations.

Release Note: none
Part Of: #106311

---

**asim: convert randomized testing to data-driven**
Previously, the randomized testing framework depends on default settings
hardcoded in the tests, requiring users to change code-configured
parameters to change the settings. This patch converts the framework to a
data-driven approach, enabling more dynamic user inputs, more testing examples,
and greater visibility into what each iteration is testing.

TestRandomized is a randomized data-driven testing framework that validates
allocators by creating randomized configurations. It is designed for
regression and exploratory testing.

**There are three modes for every aspect of randomized generation.**
- Static Mode:
1. If randomization options are disabled (e.g. no rand_ranges command is
used), the system uses the default configurations (defined in
default_settings.go) with no randomization.
- Randomized: two scenarios occur:
2. Use default settings for randomized generation (e.g.rand_ranges)
3. Use settings specified with commands (e.g.rand_ranges
range_gen_type=zipf)

**The following commands are provided:**
```
1. "rand_cluster" [cluster_gen_type=(single_region|multi_region|any_region)]
	e.g. rand_cluster cluster_gen_type=(multi_region)
	- rand_cluster: randomly picks a predefined cluster configuration
   according to the specified type.
	- cluster_gen_type (default value is multi_region) is cluster
   configuration type. On the next eval, the cluster is generated as the
   initial state of the simulation.

2. "rand_ranges" [placement_type=(even|skewed|random|weighted_rand)]
	[replication_factor=<int>] [range_gen_type=(uniform|zipf)]
	[keyspace_gen_type=(uniform|zipf)] [weighted_rand=(<[]float64>)]
	e.g. rand_ranges placement_type=weighted_rand weighted_rand=(0.1,0.2,0.7)
	e.g. rand_ranges placement_type=skewed replication_factor=1
		 range_gen_type=zipf keyspace_gen_type=uniform
	- rand_ranges: randomly generate a distribution of ranges across stores
   based on the specified parameters. On the next call to eval, ranges and
   their replica placement are generated and loaded to initial state.
	- placement_type(default value is even): defines the type of range placement
	  distribution across stores. Once set, it remains constant across
	  iterations with no randomization involved.
	- replication_factor(default value is 3): represents the replication factor
	  of each range. Once set, it remains constant across iterations with no
	  randomization involved.
	- range_gen_type(default value is uniform): represents the type of
	  distribution used to yield the range parameter as ranges are generated
   across iterations (range ∈[1, 1000]).
	- keyspace_gen_type: represents the type of distribution used to yield the
   keyspace parameter as ranges are generated across iterations
   (keyspace ∈[1000,200000]).
	- weighted_rand: specifies the weighted random distribution among stores.
	  Requirements (will panic otherwise): 1. weighted_rand should only be
   used with placement_type=weighted_rand and vice versa. 2. Must specify a
   weight between [0.0, 1.0] for each element in the array, with each element
   corresponding to a store 3. len(weighted_rand) cannot be greater than
   number of stores 4. sum of weights in the array should be equal to 1

3. "eval" [seed=<int64>] [num_iterations=<int>] [duration=<time.Duration>]
[verbose=<bool>]
e.g. eval seed=20 duration=30m2s verbose=true
   - eval: generates a simulation based on the configuration set with the given
   commands.
   - seed(default value is int64(42)): used to create a new random number
   generator which will then be used to create a new seed for each iteration.
   - num_iterations(default value is 3): specifies the number of simulations to
   run.
   - duration(default value is 10m): defines duration of each iteration.
   - verbose(default value is false): if set to true, plots all stat(as
   specified by defaultStat) history.
```

RandTestingFramework is initialized with specified testSetting and maintains
its state across all iterations. It repeats the test with different random
configurations. Each iteration in RandTestingFramework executes the following
steps:
1. Generates a random configuration: based on whether randOption is on and
the specific settings for randomized generation.
2. Executes the simulation and checks the assertions on the final state.
3. Stores any outputs and assertion failures in a buffer.

Release note: None
Part Of: #106311

108185: server: remove support for sticky engines r=itsbilal a=jbowens

Remove support for reusing engines from the StickyVFSRegistry. Tests should not
depend on ephemeral, in-memory engine state between server restarts, or read
closed Engine state.

Close #108119.

108467: sql: implement oidvectortypes builtin r=fqazi a=fqazi

Previously, the oidvectortypes builtin in wasn't implemented, causing a compatibility gap for tools
that need to format oidvectors. To address this, this patch adds the oidvectortypes built in.

Fixes: #107942

Release note (sql change): The oidvectortypes built-in has been implemented, which can format oidvector.

108678: closedts: make settings TenantReadOnly and public r=erikgrinaker a=erikgrinaker

It doesn't make sense for these to be `TenantWritable`, since the side transport runs below KV. Furthermore, these settings are referenced throughout our documentation, so make them public.

These should really be set only for the system tenant, and secondary tenants could simply read the system tenant's setting. This functionality runs in the host cluster below KV and it doesn't make any sense to set individual settings for tenants here. Unfortunately, this isn't currently possible with the existing settings classes, there is no way for secondary tenants to access the host's settings.

Touches #108677.

Epic: none
Release note (ops change): The following closed timestamp side-transport settings can no longer be set from secondary tenants (they did not have an effect in secondary tenants): kv.closed_timestamp.target_duration, kv.closed_timestamp.side_transport_interval, and kv.closed_timestamp.lead_for_global_reads_override.

108845: sql: add last_updated column to crdb_internal.kv_protected_ts_records r=jayshrivastava a=jayshrivastava

This change adds a `last_updated` column to the protected timestamps virtual table. This column contains the mvcc timestamp of the row. Having this column present in this table, which is included in debug zips, improves observability when debugging issues.

Informs: #104161
Release note: None
Epic: None

109029: sql: fix TestCreateStatisticsCanBeCancelled txn retry hang r=fqazi a=fqazi

Previously, this test could hang if there was an automatic
stats came in concurrently with a manual stats collection,
where the request filter would end up hanging and being called twice.
To address this patch will disable automatic stats collections
on the table.


Fixes: #109007

Release note: None

109049: concurrency: allow multiple transactions to hold locks on a single key  r=nvanbenschoten a=arulajmani

Locks on a single key are stored in the `lockState` struct. Prior to
this patch, the lock table only expected a single transaction to hold
a lock on a given key at any point in time. This restriction needs to
be lifted for shared locks, whose semantics allow multiple transactions
to hold locks on a single key.

This patch changes the `lockState` datastructure so that it can be
generalized in the future. We don't actually allow multiple transactions
to acquire locks on a single key just yet -- that'll come in a subsequent
patch.

Informs #91545

Release note: None

109087: storage: defer putBuffer release in all cases r=nvanbenschoten a=nvanbenschoten

Minor cleanup.

This commit switches the remainder of the calls to putBuffer.release to be deferred, instead of being manually called at the end of their function. The comments mentioning that the defer was "measurably slower" were introduced in 4444618, which was before Go 1.14 optimized the performance of defer. Most of these, including the more performance-sensitive calls, were already switched over to use defer in fbe8852.

Epic: None
Release note: None

109142: roachtest: Cast snapshot-recd bytes to int in disagg-rebalance r=jbowens a=itsbilal

Previously we were reading a float value as an int, which would trip up the Scan() method if the float value was large enough to be wired over in scientified notation eg. `2.3456E7`. This change ensures that Cockroach prints out the value as an integer to avoid the scan-time error in the roachtest.

Fixes #109114.

Epic: none

Release note: None

109152: build: update some configurations for remote build execution r=rail a=rickystewart

1. Use the `large` pool of executors for `enormous` test targets
2. Add (temporary) network access to the following tests: `amazon_test`,
   `base_test`, `cloudprivilege_test`, `externalconn_test`, and
   `cockroach-go-testserver-upgrade-to-master` logictests. These
   erroneously have a dependency on network assets; bugs have been
   filed for each of these.

Epic: CRDB-8308
Release note: None

109156: sql: version gate UNIQUE constraint with json column r=rafiss a=rafiss

This prevents the usage of a json column in a unique constraint, until after the upgrade is finalized.

fixes #108978
Release note: None

109157:  ci,ui: don't lint `e2e-tests` r=sjbarag a=rickystewart

This workspace has a huge download of `cypress` which was causing
CI to flake.

Epic: none
Release note: None

109161: workload: add background qos to kv workload r=bananabrick a=bananabrick

A --background-qos flag can be used in the kv workload to ensure that the generated work is treated as low priority by admission control.

Epic: none
Release note: None

109165: Revert "rangefeed/changefeed: Enable mux rangefeeds by default." r=erikgrinaker a=erikgrinaker

This reverts commit de65c54.

We decided to keep these disabled for another release, to get more real-world experience with it first.

Touches #95781.
Touches #105270.

Release note (performance improvement): The following release note no longer applies: "mux range feeds reuse connection and workers across multiple range feeds.  This mode is now enabled by default."

109166: build: more resources for building AWS dependency r=rail a=rickystewart

This is a huge package with apparently a lot of auto-generated code that was causing OOM's on EngFlow RBE. This fixes it.

Epic: none
Release note: CRDB-8308

109172: storage: Fix panic in MVCCHistories test r=jbowens a=itsbilal

storage_test.intentPrintingReadWriter previously did not support ReaderWithMustIterators.

Epic: none

Release note: None

Co-authored-by: wenyihu6 <wenyi.hu@cockroachlabs.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Arjun Nair <nair@cockroachlabs.com>
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 22, 2023
Previously, the randomized testing framework's output only indicates whether
each iteration passes. This lack of of detail makes checking the randomized
framework and debugging challenging. This patch adds more info to the output,
including the selected configurations, the initial state of each simulation.

Additionally, this patch removes the verbosity flag for printing history plots
as it does not seem to have any practical use cases.

New verbosity flags for eval are now supported.
```
"eval"
[verbose=(<[]("result_only","test_settings","initial_state","config_gen","topology","all")>)]
- verbose(default value is OutputResultOnly): used to set flags on what to
   show in the test output messages. By default, all details are displayed
   upon assertion failure.
   - result_only: only shows whether the test passed or failed, along with
   any failure messages
   - test_settings: displays settings used for the repeated tests
   - initial_state: displays the initial state of each test iteration
   - config_gen: displays the input configurations generated for each test
   iteration
   - topology: displays the topology of cluster configurations
   - all: display everything above
```

Part Of: [cockroachdb#106311](https://github.com/kvoli/cockroach/issues/106311)
Release Note: none
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 22, 2023
This patch allows users to modify the settings for the static mode within the
randomized testing framework. The following command is now supported:
```
4. “change_static_option”[nodes=<int>][stores_per_node=<int>]
[rw_ratio=<float64>] [rate=<float64>] [min_block=<int>] [max_block=<int>]
[min_key=<int64>] [max_key=<int64>] [skewed_access=<bool>] [ranges=<int>]
[placement_type=<gen.PlacementType>] [key_space=<int>]
[replication_factor=<int>] [bytes=<int64>] [stat=<string>] [height=<int>]
[width=<int>]
e.g. change_static_option nodes=2 stores_per_node=3 placement_type=skewed
	- Change_static_option: modifies the settings for the static mode where no
	  randomization is involved. Note that this does not change the default
	  settings for any randomized generation.
	- nodes (default value is 3): number of nodes in the generated cluster
	- storesPerNode (default value is 1): number of store per nodes in the
	  generated cluster
	- rwRatio (default value is 0.0): read-write ratio of the generated load
	- rate (default value is 0.0): rate at which the load is generated
	- minBlock (default value is 1): min size of each load event
	- maxBlock (default value is 1): max size of each load event
	- minKey (default value is int64(1)): min key of the generated load
	- maxKey (default value is int64(200000)): max key of the generated load
	- skewedAccess (default value is false): is true, workload key generator is
	  skewed (zipf)
	- ranges (default value is 1): number of generated ranges
	- keySpace (default value is 200000): keyspace for the generated range
	- placementType (default value is gen.Even): type of distribution for how
	  ranges are distributed across stores
	- replicationFactor (default value is 3): number of replica for each range
	- bytes (default value is int64(0)): size of each range in bytes
	- stat (default value is “replicas”): specifies the output to be plotted
	  for the verbose option
	- height (default value is 15): height of the plot
	- width (default value is 80): width of the plot

In addition, verbose=(static_settings) can now be used to display settings used
for static options where no randomization is involved.
```

Part of: cockroachdb#106311
Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 22, 2023
Previously, we enforce that the length of a given `weighted_rand` cannot exceed
the number of stores. This was challenging for users as they might not know the
cluster configuration that would be generated and thus do not know the number of
stores. In addition, if the length of `weighted_rand` was less than total number
of stores, any stores outside of the `weighted_rand` range would simply have
zero replicas. This could lead to confusion.

To improve user control, this patch disables the use of weighted_rand with
randomized cluster generation. Requirements to use weighted_rand:
1. use static option for cluster generation
2. specify nodes(default:3) and stores_per_node(default:1) through
change_static_option
3. ensure len(weighted_rand) == number of stores == nodes * stores_per_node

In addition to these new rules, the following existing requirements remain in
place:
1. weighted_rand should only be used with placement_type=weighted_rand and vice
versa.
2. must specify a weight between [0.0, 1.0] for each element in the array, with
each element corresponding to a store
3. sum of weights in the array should be equal to 1

Resolves: cockroachdb#106311

Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 24, 2023
Previously, the randomized testing framework's output only indicates whether
each iteration passes. This lack of of detail makes checking the randomized
framework and debugging challenging. This patch adds more info to the output,
including the selected configurations, the initial state of each simulation.

Additionally, this patch removes the verbosity flag for printing history plots
as it does not seem to have any practical use cases.

New verbosity flags for eval are now supported.
```
"eval"
[verbose=(<[]("result_only","test_settings","initial_state","config_gen","topology","all")>)]
- verbose(default value is OutputResultOnly): used to set flags on what to
   show in the test output messages. By default, all details are displayed
   upon assertion failure.
   - result_only: only shows whether the test passed or failed, along with
   any failure messages
   - test_settings: displays settings used for the repeated tests
   - initial_state: displays the initial state of each test iteration
   - config_gen: displays the input configurations generated for each test
   iteration
   - topology: displays the topology of cluster configurations
   - all: displays everything above
```

Part Of: [cockroachdb#106311](https://github.com/kvoli/cockroach/issues/106311)
Release Note: none

(cherry picked from commit 27e9bd438a6ef1f6f0ed7205d91f28c9b54fd314)
craig bot pushed a commit that referenced this issue Aug 25, 2023
108059: asim: better outputs for data-driven tests r=kvoli a=wenyihu6

**asim: sort before iterating over maps when printing**

Previously, the simulator iterates over an unordered map when formatting and
printing states, stores, and ranges, resulting in non-deterministic output. This
patch addresses the issue by sorting the maps by key before printing and
formatting.

Release note: none
Epic: none

---

**asim: better outputs for data-driven tests**

Previously, the randomized testing framework's output only indicates whether
each iteration passes. This lack of of detail makes checking the randomized
framework and debugging challenging. This patch adds more info to the output,
including the selected configurations, the initial state of each simulation.

Additionally, this patch removes the verbosity flag for printing history plots
as it does not seem to have any practical use cases.

New verbosity flags for eval are now supported.
```
"eval"
[verbose=(<[]("result_only","test_settings","initial_state","config_gen","topology","all")>)]
- verbose(default value is OutputResultOnly): used to set flags on what to
   show in the test output messages. By default, all details are displayed
   upon assertion failure.
   - result_only: only shows whether the test passed or failed, along with
   any failure messages
   - test_settings: displays settings used for the repeated tests
   - initial_state: displays the initial state of each test iteration
   - config_gen: displays the input configurations generated for each test
   iteration
   - topology: displays the topology of cluster configurations
   - all: displays everything above
```

Part Of: #106311
Release Note: none

Co-authored-by: wenyihu6 <wenyi.hu@cockroachlabs.com>
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 25, 2023
This patch allows users to modify the settings for the static mode within the
randomized testing framework. The following command is now supported:
```
4. “change_static_option”[nodes=<int>][stores_per_node=<int>]
[rw_ratio=<float64>] [rate=<float64>] [min_block=<int>] [max_block=<int>]
[min_key=<int64>] [max_key=<int64>] [skewed_access=<bool>] [ranges=<int>]
[placement_type=<gen.PlacementType>] [key_space=<int>]
[replication_factor=<int>] [bytes=<int64>] [stat=<string>] [height=<int>]
[width=<int>]
e.g. change_static_option nodes=2 stores_per_node=3 placement_type=skewed
	- Change_static_option: modifies the settings for the static mode where no
	  randomization is involved. Note that this does not change the default
	  settings for any randomized generation.
	- nodes (default value is 3): number of nodes in the generated cluster
	- storesPerNode (default value is 1): number of store per nodes in the
	  generated cluster
	- rwRatio (default value is 0.0): read-write ratio of the generated load
	- rate (default value is 0.0): rate at which the load is generated
	- minBlock (default value is 1): min size of each load event
	- maxBlock (default value is 1): max size of each load event
	- minKey (default value is int64(1)): min key of the generated load
	- maxKey (default value is int64(200000)): max key of the generated load
	- skewedAccess (default value is false): is true, workload key generator is
	  skewed (zipf)
	- ranges (default value is 1): number of generated ranges
	- keySpace (default value is 200000): keyspace for the generated range
	- placementType (default value is gen.Even): type of distribution for how
	  ranges are distributed across stores
	- replicationFactor (default value is 3): number of replica for each range
	- bytes (default value is int64(0)): size of each range in bytes
	- stat (default value is “replicas”): specifies the output to be plotted
	  for the verbose option
	- height (default value is 15): height of the plot
	- width (default value is 80): width of the plot

In addition, verbose=(static_settings) can now be used to display settings used
for static options where no randomization is involved.
```

Part of: cockroachdb#106311
Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 25, 2023
Previously, we enforce that the length of a given `weighted_rand` cannot exceed
the number of stores. This was challenging for users as they might not know the
cluster configuration that would be generated and thus do not know the number of
stores. In addition, if the length of `weighted_rand` was less than total number
of stores, any stores outside of the `weighted_rand` range would simply have
zero replicas. This could lead to confusion.

To improve user control, this patch disables the use of weighted_rand with
randomized cluster generation. Requirements to use weighted_rand:
1. use static option for cluster generation
2. specify nodes(default:3) and stores_per_node(default:1) through
change_static_option
3. ensure len(weighted_rand) == number of stores == nodes * stores_per_node

In addition to these new rules, the following existing requirements remain in
place:
1. weighted_rand should only be used with placement_type=weighted_rand and vice
versa.
2. must specify a weight between [0.0, 1.0] for each element in the array, with
each element corresponding to a store
3. sum of weights in the array should be equal to 1

Resolves: cockroachdb#106311

Release note: None
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Aug 25, 2023
This patch allows users to modify the settings for the static mode within the
randomized testing framework. The following command is now supported:
```
4. “change_static_option”[nodes=<int>][stores_per_node=<int>]
[rw_ratio=<float64>] [rate=<float64>] [min_block=<int>] [max_block=<int>]
[min_key=<int64>] [max_key=<int64>] [skewed_access=<bool>] [ranges=<int>]
[placement_type=<gen.PlacementType>] [key_space=<int>]
[replication_factor=<int>] [bytes=<int64>] [stat=<string>] [height=<int>]
[width=<int>]
e.g. change_static_option nodes=2 stores_per_node=3 placement_type=skewed
	- Change_static_option: modifies the settings for the static mode where no
	  randomization is involved. Note that this does not change the default
	  settings for any randomized generation.
	- nodes (default value is 3): number of nodes in the generated cluster
	- storesPerNode (default value is 1): number of store per nodes in the
	  generated cluster
	- rwRatio (default value is 0.0): read-write ratio of the generated load
	- rate (default value is 0.0): rate at which the load is generated
	- minBlock (default value is 1): min size of each load event
	- maxBlock (default value is 1): max size of each load event
	- minKey (default value is int64(1)): min key of the generated load
	- maxKey (default value is int64(200000)): max key of the generated load
	- skewedAccess (default value is false): is true, workload key generator is
	  skewed (zipf)
	- ranges (default value is 1): number of generated ranges
	- keySpace (default value is 200000): keyspace for the generated range
	- placementType (default value is gen.Even): type of distribution for how
	  ranges are distributed across stores
	- replicationFactor (default value is 3): number of replica for each range
	- bytes (default value is int64(0)): size of each range in bytes
	- stat (default value is “replicas”): specifies the output to be plotted
	  for the verbose option
	- height (default value is 15): height of the plot
	- width (default value is 80): width of the plot

In addition, verbose=(static_settings) can now be used to display settings used
for static options where no randomization is involved.
```

Part of: cockroachdb#106311
Release note: None
craig bot pushed a commit that referenced this issue Aug 25, 2023
108099: asim: enforce len(weighted_rand) == number of stores r=kvoli a=wenyihu6

**asim: enable option to change static option settings**

This patch allows users to modify the settings for the static mode within the
randomized testing framework. The following command is now supported:
```
4. “change_static_option”[nodes=<int>][stores_per_node=<int>]
[rw_ratio=<float64>] [rate=<float64>] [min_block=<int>] [max_block=<int>]
[min_key=<int64>] [max_key=<int64>] [skewed_access=<bool>] [ranges=<int>]
[placement_type=<gen.PlacementType>] [key_space=<int>]
[replication_factor=<int>] [bytes=<int64>] [stat=<string>] [height=<int>]
[width=<int>]
e.g. change_static_option nodes=2 stores_per_node=3 placement_type=skewed
	- Change_static_option: modifies the settings for the static mode where no
	  randomization is involved. Note that this does not change the default
	  settings for any randomized generation.
	- nodes (default value is 3): number of nodes in the generated cluster
	- storesPerNode (default value is 1): number of store per nodes in the
	  generated cluster
	- rwRatio (default value is 0.0): read-write ratio of the generated load
	- rate (default value is 0.0): rate at which the load is generated
	- minBlock (default value is 1): min size of each load event
	- maxBlock (default value is 1): max size of each load event
	- minKey (default value is int64(1)): min key of the generated load
	- maxKey (default value is int64(200000)): max key of the generated load
	- skewedAccess (default value is false): is true, workload key generator is
	  skewed (zipf)
	- ranges (default value is 1): number of generated ranges
	- keySpace (default value is 200000): keyspace for the generated range
	- placementType (default value is gen.Even): type of distribution for how
	  ranges are distributed across stores
	- replicationFactor (default value is 3): number of replica for each range
	- bytes (default value is int64(0)): size of each range in bytes
	- stat (default value is “replicas”): specifies the output to be plotted
	  for the verbose option
	- height (default value is 15): height of the plot
	- width (default value is 80): width of the plot

In addition, verbose=(static_settings) can now be used to display settings used
for static options where no randomization is involved.
```

Part of: #106311
Release note: None

----

**asim: enforce len(weighted_rand) == number of stores**

Previously, we enforce that the length of a given `weighted_rand` cannot exceed
the number of stores. This was challenging for users as they might not know the
cluster configuration that would be generated and thus do not know the number of
stores. In addition, if the length of `weighted_rand` was less than total number
of stores, any stores outside of the `weighted_rand` range would simply have
zero replicas. This could lead to confusion.

To improve user control, this patch disables the use of weighted_rand with
randomized cluster generation. Requirements to use weighted_rand:
1. use static option for cluster generation
2. specify nodes(default:3) and stores_per_node(default:1) through
change_static_option
3. ensure len(weighted_rand) == number of stores == nodes * stores_per_node

In addition to these new rules, the following existing requirements remain in
place:
1. weighted_rand should only be used with placement_type=weighted_rand and vice
versa.
2. must specify a weight between [0.0, 1.0] for each element in the array, with
each element corresponding to a store
3. sum of weights in the array should be equal to 1

Resolves: #106311
Release note: None

109461: sql: add error and reporting when unable to fix malformed quantile r=rharding6373 a=rharding6373

This PR captures quantile information for future reproduction when we encounter a malformed quantile that we're unable to fix, instead of inducing an inactionable panic.

Epic: None
Fixes: #109060

Release note: None

Co-authored-by: wenyihu6 <wenyi.hu@cockroachlabs.com>
Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com>
@craig craig bot closed this as completed in 667599b Aug 25, 2023
KV automation moved this from Current Milestone / In Progress to Closed Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-simulation Relating to allocation simulation. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
KV
Closed
Development

Successfully merging a pull request may close this issue.

1 participant