Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
110150: cli: fix debug pebble commands on encrypted stores r=RaduBerinde a=RaduBerinde

Currently the debug pebble commands only work correctly on an
encrypted store if the encrypted store's path is `cockroach-data` or
the store directory is passed using `--store` (in addition to being
passed to the pebble subcommand itself). What's worse, knowledge of
this subtle fact was lost among team members.

The root cause is that we are trying to resolve encryption options
using the server config.  The difficulty is that there are a bunch of
different commands and there is no unified way to obtain the store
directory of interest

To fix this, we create `autoDecryptFS`. This is a `vfs.FS`
implementation which is able to automatically detect encrypted paths
and use the correct unencrypted FS. It does this by having a list of
known encrypted stores (the ones in the `--enterprise-encryption`
flag), and looking for any of these paths as ancestors of any path in
an operation. This new implementation replaces `swappableFS` and
`absoluteFS`.

We also improve the error message when we try to open an encrypted
store without setting up the key correctly.

Fixes: #110121

Release note (bug fix): `cockroach debug pebble` commands now work
correctly with encrypted stores which don't use the default
`cockroach-data` path without having to also pass `--store`.

110173: sql: optimize persistedsqlstats flush size check r=j82w a=j82w

Problem:
The `persistedsqlstats` size check to make sure the table is not 1.5x the max size is done on every flush which is done on every node every 10 minutes by default. This can cause serialization issues as it is over the entire table. The check is unnecessary most of the time, because it should only fail if the compaction job is failing.

Solution:
1. Reduce the check interval to only be done once an hour by default, and make it configurable.
2. The system table is split in to 8 shards. Instead of checking the entire table count limit it to only one shard. This reduces the scope of the check and reduces the chance of serialization issues.

This was preivously reverted because of a flakey test because the size check is only done on a single shard. The tests are updated to increase the limit and the number of statements to make sure every shard has data.

Fixes: #109619

Release note (sql change): The persistedsqlstats table max size check is now done once an hour instead of every 10 minutes. This reduces the risk of serialization errors on the statistics tables.

110264: c2c: add region constraints replication test r=msbutler a=msbutler

This patch adds a test that ensures that a replicating tenant's regional
constraints are obeyed in the destination cluster. This test serves as an end
to end test of the span config replication work tracked in #106823.

This patch also sets the following source system tenant cluster settings in
the c2c e2e framework: kv.rangefeed.closed_timestamp_refresh_interval: 200ms,
kv.closed_timestamp.side_transport_interval: 50 ms. CDC e2e tests also set
these cluster settings.

Informs #109059

Release note: None

110334: roachtest: ensure c2c/shutdown tests start destination tenant with online node r=stevendanna a=msbutler

An earlier patch #110033 introduced a change that starts the destination tenant from any destination node, but did not consider if that node was shut down.  If the driver attempts to connect to the shut down node, the roachtest fails. This patch ensures that the tenant is started on a node that will be online.

Fixes #110317

Release note: None

110364: upgrade: remove buggy TTL repair r=rafiss a=ecwall

Fixes #110363

The TTL descriptor repair in FirstUpgradeFromReleasePrecondition incorrectly
removes TTL fields from table descriptors after incorrectly comparing the
table descriptor's TTL job schedule ID to a set of job IDs.

This change removes the repair until tests are properly added.

Release note (bug fix): Remove buggy TTL descriptor repair. Previously,
upgrading from 22.2.X to 23.1.9 incorrectly removed TTL storage params from
tables (visible via `SHOW CREATE TABLE <ttl-table>;`) while attempting to
repair table descriptors. This resulted in the node that attempts to run the
TTL job crashing due to a panic caused by the missing TTL storage params.
Clusters currently on 22.2.X should NOT be upgraded to 23.1.9 and should
be upgraded to 23.1.10 or later directly.

110431: workflows: stale.yml: update action version r=RaduBerinde a=RaduBerinde

The stale bot closes issues as "completed" instead of "not planned". More recent versions have added a configuration setting for this, and it defaults to "not planned". This commit updates the action to the latest version.

Epic: none
Release note: None

110451: engineccl: skip BenchmarkTimeBoundIterate r=RaduBerinde a=jbowens

This benchmark's assertions have recently become flaky.

Epic: none
Informs: #110299
Release note: none

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: j82w <jwilley@cockroachlabs.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Evan Wall <wall@cockroachlabs.com>
Co-authored-by: RaduBerinde <radu@cockroachlabs.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
  • Loading branch information
6 people committed Sep 12, 2023
8 parents d98ce4f + c049ba0 + 23f829b + 0022185 + f951f56 + 9abac60 + 23112c1 + b7b111c commit 40dd180
Show file tree
Hide file tree
Showing 25 changed files with 725 additions and 295 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/stale.yml
Expand Up @@ -12,7 +12,7 @@ jobs:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v3
- uses: actions/stale@v8
with:
operations-per-run: 1000
repo-token: ${{ secrets.GITHUB_TOKEN }}
Expand Down
7 changes: 7 additions & 0 deletions pkg/ccl/cliccl/debug.go
Expand Up @@ -136,6 +136,13 @@ with their env type and encryption settings (if applicable).
&storeEncryptionSpecs, cliflagsccl.EnterpriseEncryption)

cli.PopulateStorageConfigHook = fillEncryptionOptionsForStore
cli.EncryptedStorePathsHook = func() []string {
var res []string
for _, spec := range storeEncryptionSpecs.Specs {
res = append(res, spec.Path)
}
return res
}
}

// fillEncryptionOptionsForStore fills the StorageConfig fields
Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/storageccl/engineccl/BUILD.bazel
Expand Up @@ -51,6 +51,7 @@ go_test(
"//pkg/storage/enginepb",
"//pkg/testutils",
"//pkg/testutils/datapathutils",
"//pkg/testutils/skip",
"//pkg/testutils/storageutils",
"//pkg/testutils/testfixtures",
"//pkg/util/encoding",
Expand Down
2 changes: 2 additions & 0 deletions pkg/ccl/storageccl/engineccl/bench_test.go
Expand Up @@ -21,6 +21,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
"github.com/cockroachdb/cockroach/pkg/testutils/testfixtures"
"github.com/cockroachdb/cockroach/pkg/util/encoding"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
Expand Down Expand Up @@ -172,6 +173,7 @@ func runIterate(
}

func BenchmarkTimeBoundIterate(b *testing.B) {
skip.WithIssue(b, 110299)
for _, loadFactor := range []float32{1.0, 0.5, 0.1, 0.05, 0.0} {
b.Run(fmt.Sprintf("LoadFactor=%.2f", loadFactor), func(b *testing.B) {
b.Run("NormalIterator", func(b *testing.B) {
Expand Down
51 changes: 39 additions & 12 deletions pkg/ccl/streamingccl/replicationtestutils/testutils.go
Expand Up @@ -53,22 +53,25 @@ type srcInitExecFunc func(t *testing.T, sysSQL *sqlutils.SQLRunner, tenantSQL *s
type destInitExecFunc func(t *testing.T, sysSQL *sqlutils.SQLRunner) // Tenant is created by the replication stream

type TenantStreamingClustersArgs struct {
SrcTenantName roachpb.TenantName
SrcTenantID roachpb.TenantID
SrcInitFunc srcInitExecFunc
SrcNumNodes int
SrcClusterSettings map[string]string
SrcTenantName roachpb.TenantName
SrcTenantID roachpb.TenantID
SrcInitFunc srcInitExecFunc
SrcNumNodes int
SrcClusterSettings map[string]string
SrcClusterTestRegions []string

DestTenantName roachpb.TenantName
DestTenantID roachpb.TenantID
DestInitFunc destInitExecFunc
DestNumNodes int
DestClusterSettings map[string]string
DestClusterTestRegions []string
RetentionTTLSeconds int
TestingKnobs *sql.StreamingTestingKnobs
TenantCapabilitiesTestingKnobs *tenantcapabilities.TestingKnobs

MultitenantSingleClusterNumNodes int
MultitenantSingleClusterNumNodes int
MultiTenantSingleClusterTestRegions []string
}

var DefaultTenantStreamingClustersArgs = TenantStreamingClustersArgs{
Expand Down Expand Up @@ -133,6 +136,7 @@ func (c *TenantStreamingClusters) init() {
c.SrcSysSQL.Exec(c.T, `ALTER TENANT $1 SET CLUSTER SETTING sql.virtual_cluster.feature_access.manual_range_split.enabled=true`, c.Args.SrcTenantName)
c.SrcSysSQL.Exec(c.T, `ALTER TENANT $1 SET CLUSTER SETTING sql.virtual_cluster.feature_access.manual_range_scatter.enabled=true`, c.Args.SrcTenantName)
c.SrcSysSQL.Exec(c.T, `ALTER TENANT $1 SET CLUSTER SETTING sql.virtual_cluster.feature_access.zone_configs.enabled=true`, c.Args.SrcTenantName)
c.SrcSysSQL.Exec(c.T, `ALTER TENANT $1 SET CLUSTER SETTING sql.virtual_cluster.feature_access.multiregion.enabled=true`, c.Args.SrcTenantName)
if c.Args.SrcInitFunc != nil {
c.Args.SrcInitFunc(c.T, c.SrcSysSQL, c.SrcTenantSQL)
}
Expand Down Expand Up @@ -284,9 +288,28 @@ func CreateServerArgs(args TenantStreamingClustersArgs) base.TestServerArgs {
}

func startC2CTestCluster(
ctx context.Context, t *testing.T, serverArgs base.TestServerArgs, numNodes int,
ctx context.Context, t *testing.T, serverArgs base.TestServerArgs, numNodes int, regions []string,
) (*testcluster.TestCluster, url.URL, func()) {

params := base.TestClusterArgs{ServerArgs: serverArgs}

makeLocality := func(locStr string) roachpb.Locality {
return roachpb.Locality{Tiers: []roachpb.Tier{{Key: "region", Value: locStr}}}
}
if len(regions) == 1 {
params.ServerArgs.Locality = makeLocality(regions[0])
}
if len(regions) > 1 {
require.Equal(t, len(regions), numNodes)
serverArgsPerNode := make(map[int]base.TestServerArgs)
for i, locality := range regions {
param := serverArgs
param.Locality = makeLocality(locality)
serverArgsPerNode[i] = param
}
params.ServerArgsPerNode = serverArgsPerNode
}

c := testcluster.StartTestCluster(t, numNodes, params)
c.Server(0).Clock().Now()
// TODO(casper): support adding splits when we have multiple nodes.
Expand All @@ -303,7 +326,7 @@ func CreateMultiTenantStreamingCluster(

serverArgs := CreateServerArgs(args)
cluster, url, cleanup := startC2CTestCluster(ctx, t, serverArgs,
args.MultitenantSingleClusterNumNodes)
args.MultitenantSingleClusterNumNodes, args.MultiTenantSingleClusterTestRegions)

destNodeIdx := args.MultitenantSingleClusterNumNodes - 1
tsc := &TenantStreamingClusters{
Expand Down Expand Up @@ -338,15 +361,15 @@ func CreateTenantStreamingClusters(
var srcCleanup func()
g.GoCtx(func(ctx context.Context) error {
// Start the source cluster.
srcCluster, srcURL, srcCleanup = startC2CTestCluster(ctx, t, serverArgs, args.SrcNumNodes)
srcCluster, srcURL, srcCleanup = startC2CTestCluster(ctx, t, serverArgs, args.SrcNumNodes, args.SrcClusterTestRegions)
return nil
})

var destCluster *testcluster.TestCluster
var destCleanup func()
g.GoCtx(func(ctx context.Context) error {
// Start the destination cluster.
destCluster, _, destCleanup = startC2CTestCluster(ctx, t, serverArgs, args.DestNumNodes)
destCluster, _, destCleanup = startC2CTestCluster(ctx, t, serverArgs, args.DestNumNodes, args.DestClusterTestRegions)
return nil
})

Expand Down Expand Up @@ -449,8 +472,12 @@ func CreateScatteredTable(t *testing.T, c *TenantStreamingClusters, numNodes int
}

var defaultSrcClusterSetting = map[string]string{
`kv.rangefeed.enabled`: `true`,
`kv.closed_timestamp.target_duration`: `'1s'`,
`kv.rangefeed.enabled`: `true`,
// Speed up the rangefeed. These were set by squinting at the settings set in
// the changefeed integration tests.
`kv.closed_timestamp.target_duration`: `'100ms'`,
`kv.rangefeed.closed_timestamp_refresh_interval`: `'200ms'`,
`kv.closed_timestamp.side_transport_interval`: `'50ms'`,
// Large timeout makes test to not fail with unexpected timeout failures.
`stream_replication.job_liveness.timeout`: `'3m'`,
`stream_replication.stream_liveness_track_frequency`: `'2s'`,
Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/streamingccl/streamingest/BUILD.bazel
Expand Up @@ -162,6 +162,7 @@ go_test(
"//pkg/util/log",
"//pkg/util/protoutil",
"//pkg/util/randutil",
"//pkg/util/rangedesc",
"//pkg/util/span",
"//pkg/util/syncutil",
"//pkg/util/timeutil",
Expand Down
63 changes: 63 additions & 0 deletions pkg/ccl/streamingccl/streamingest/replication_stream_e2e_test.go
Expand Up @@ -35,6 +35,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/rangedesc"
"github.com/cockroachdb/cockroach/pkg/util/syncutil"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/cockroachdb/errors"
Expand Down Expand Up @@ -1135,3 +1136,65 @@ func TestLoadProducerAndIngestionProgress(t *testing.T) {
require.NoError(t, err)
require.Equal(t, jobspb.Replicating, ingestionProgress.ReplicationStatus)
}

// TestStreamingRegionalConstraint ensures that the replicating tenants regional
// constraints are obeyed during replication. This test serves as an end to end
// test of span config replication.
func TestStreamingRegionalConstraint(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

skip.UnderStressRace(t, "takes too long under stress race")

ctx := context.Background()
regions := []string{"mars", "venus", "mercury"}
args := replicationtestutils.DefaultTenantStreamingClustersArgs
args.MultitenantSingleClusterNumNodes = 3
args.MultiTenantSingleClusterTestRegions = regions
marsNodeID := roachpb.NodeID(1)

c, cleanup := replicationtestutils.CreateMultiTenantStreamingCluster(ctx, t, args)
defer cleanup()

producerJobID, ingestionJobID := c.StartStreamReplication(ctx)
jobutils.WaitForJobToRun(c.T, c.SrcSysSQL, jobspb.JobID(producerJobID))
jobutils.WaitForJobToRun(c.T, c.DestSysSQL, jobspb.JobID(ingestionJobID))

c.SrcTenantSQL.Exec(t, "CREATE DATABASE test")
c.SrcTenantSQL.Exec(t, `ALTER DATABASE test CONFIGURE ZONE USING constraints = '[+region=mars]', num_replicas = 1;`)
c.SrcTenantSQL.Exec(t, "CREATE TABLE test.x (id INT PRIMARY KEY, n INT)")
c.SrcTenantSQL.Exec(t, "INSERT INTO test.x VALUES (1, 1)")

srcTime := c.SrcCluster.Server(0).Clock().Now()
c.WaitUntilReplicatedTime(srcTime, jobspb.JobID(ingestionJobID))

checkLocalities := func(targetSpan roachpb.Span, scanner rangedesc.Scanner) func() error {
// make pageSize large enough to not affect the test
pageSize := 10000
init := func() {}

return func() error {
return scanner.Scan(ctx, pageSize, init, targetSpan, func(descriptors ...roachpb.RangeDescriptor) error {
for _, desc := range descriptors {
for _, replica := range desc.InternalReplicas {
if replica.NodeID != marsNodeID {
return errors.Newf("found table data located on another node %d", replica.NodeID)
}
}
}
return nil
})
}
}

srcCodec := keys.MakeSQLCodec(c.Args.SrcTenantID)
tableDesc := desctestutils.TestingGetPublicTableDescriptor(
c.SrcSysServer.DB(), srcCodec, "test", "x")
destCodec := keys.MakeSQLCodec(c.Args.DestTenantID)

testutils.SucceedsSoon(t,
checkLocalities(tableDesc.PrimaryIndexSpan(srcCodec), rangedesc.NewScanner(c.SrcSysServer.DB())))

testutils.SucceedsSoon(t,
checkLocalities(tableDesc.PrimaryIndexSpan(destCodec), rangedesc.NewScanner(c.DestSysServer.DB())))
}
5 changes: 3 additions & 2 deletions pkg/cli/BUILD.bazel
Expand Up @@ -5,8 +5,8 @@ load("//pkg/testutils:buildutil/buildutil.bzl", "disallowed_imports_test")
go_library(
name = "cli",
srcs = [
"absolute_fs.go",
"auth.go",
"auto_decrypt_fs.go",
"cert.go",
"cli.go",
"client_url.go",
Expand Down Expand Up @@ -56,7 +56,6 @@ go_library(
"start_windows.go",
"statement_bundle.go",
"statement_diag.go",
"swappable_fs.go",
"testutils.go",
"tsdump.go",
"userfile.go",
Expand Down Expand Up @@ -238,6 +237,7 @@ go_library(
"@com_github_cockroachdb_errors//hintdetail",
"@com_github_cockroachdb_errors//oserror",
"@com_github_cockroachdb_logtags//:logtags",
"@com_github_cockroachdb_pebble//:pebble",
"@com_github_cockroachdb_pebble//objstorage/remote",
"@com_github_cockroachdb_pebble//tool",
"@com_github_cockroachdb_pebble//vfs",
Expand Down Expand Up @@ -309,6 +309,7 @@ go_test(
name = "cli_test",
size = "large",
srcs = [
"auto_decrypt_fs_test.go",
"cert_test.go",
"cli_debug_test.go",
"cli_test.go",
Expand Down

0 comments on commit 40dd180

Please sign in to comment.