Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: enforce gc ttl strictly #45826

Merged
merged 3 commits into from
Mar 12, 2020

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Mar 7, 2020

This PR creates a cluster setting to control whether the GC TTL is enforced
"strictly". Strict GC enforcement is not a strong guarantee that we will
serve no user reads whatsoever for timestamps that are older than the current
reading of the node's clock less the GC TTL. Rather it is a setting to enable
the behavior that most of the time, most requests will see an error if they
use a timestamp which is older than the GC TTL.

The main case in which the user will see an error is if the only reason a
read would be able to request a timestamp was because of a protected timestamp.
The process of verifying a protected timestamp will prevent this case.
If the users of protected timestamps verify the protected timestamp before
they operate, or they use admin commands, they'll be fine.

In the presence of a protected timestamp, all requests can read back to
that protected timestamp.

Lease transfers are another source possibility for data to be visible at the
GC Threshold rather than the TTL.

Strict mode is controlled via a cluster setting.

Strict mode does not apply to admin commands or to system ranges.

The interesting thing with this all is that the problem which prompted this
setting in the first place was due to backups being run at a period longer
than the GC TTL and pretty much never failing.

Backups in 20.1 (in theory) are going to use protected timestamps. If they
successfully verify the protected timestamp then they'll be able to still
succeed an even higher percentage of the time! If we want to discourage this
behavior we could

  • Have the protected timestamp subsystem enforce that clients can't save
    data they currently view as expired. This is probably a bad idea because
    interestingly enough the protected timestamp subsystem ignores zone configs
    completely.

  • Have the higher levels try to do it. This seems probably better. The backup
    pulls the table descriptors and knows exactly the timestamp ranges it's
    backing up. In a lot of cases I can imagine wanting to back up data that
    is expired. Making that hard would probably be worse than making messing
    up like this worse. It probably also has zone configs. I might instead
    just warn loudly in this case.

What this will do is bring greater awareness of the GC ttl. If you don't
update or delete data frequently the GC hueristic won't kick in for quite
some time.

Release justification: bug fixes and low-risk updates to new functionality

Release note (general change): GC TTLs will now be enforced by default.
This enforcement can be controlled by disabling kv.gc_ttl.strict_enforcement

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/strict-gc-ttl-enforcement branch from 513fdfc to 3e5c6f5 Compare March 9, 2020 18:03
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 16 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/kv/kvserver/client_replica_test.go, line 2951 at r1 (raw file):

	defer leaktest.AfterTest(t)()

	// The unfortunate thing about this test is that the gcttl is in seconds and

How long does it take on average?


pkg/kv/kvserver/client_replica_test.go, line 3064 at r1 (raw file):

	t.Run("disable strict enforcement", func(t *testing.T) {
		sqlDB.Exec(t, `SET CLUSTER SETTING kv.gc_ttl.strict_enforcement = false`)
		defer sqlDB.Exec(t, `SET CLUSTER SETTING kv.gc_ttl.strict_enforcement = DEFAULT`)

Do you think it's worth structuring this test so that it will work regardless of whether kv.gc_ttl.strict_enforcement is defaulted to true or not (i.e. set to true at the beginning and avoid DEFAULT)?


pkg/kv/kvserver/client_replica_test.go, line 3104 at r1 (raw file):

		// Let's now release the protected timestamp, refresh past that, then
		// make sure that we've got enforcement. Then let's add a new one.
		// That one should not see itself respect because we haven't verified.

Were you intending to extend this test?


pkg/kv/kvserver/replica.go, line 458 at r1 (raw file):

		draining bool

		// protectedTimestampCache provides the state of the protected timestamp

s/protectedTimestampCache/cachedProtectedTS/


pkg/kv/kvserver/replica.go, line 737 at r1 (raw file):

	impliedThreshold := gc.CalculateThreshold(now, *r.mu.zone.GC)
	if threshold.Less(impliedThreshold) {
		threshold = impliedThreshold

nit: consider using .Forward() in here in both places.


pkg/kv/kvserver/replica_protected_timestamp.go, line 44 at r1 (raw file):

// makeUpdateFunc returns a function which will attempt to update the Replica's
// cached with cpts if it is non-empty. It assumes that the Replica is not

"with cpts"

this looks stale. Same issue below. I think you mean "ts".


pkg/kv/kvserver/replica_protected_timestamp.go, line 46 at r1 (raw file):

// cached with cpts if it is non-empty. It assumes that the Replica is not
// locked when the returned function is called.
func (ts *cachedProtectedTimestampState) makeUpdateFunc(r *Replica) func() {

I feel like this would be cleaner as a method on Replica instead of a closure capturing the cachedProtectedTimestampState reference and the Replica reference, but I guess this is fine if you prefer this structure.


pkg/kv/kvserver/replica_protected_timestamp.go, line 60 at r1 (raw file):

// clearIfNotNewer clears the state in cpts if it is not newer than the passed
// ts. This will enable the fastpath in the update func.

Could you make it clear that this is an optimization and describe how it works. It's pretty subtle that we're including this in the rlock path to avoid hitting the wlock path in makeUpdateFunc.


pkg/kv/kvserver/replica_protected_timestamp.go, line 105 at r1 (raw file):

earliestValidRecord

Bad copy-paste?


pkg/kv/kvserver/replica_protected_timestamp.go, line 123 at r1 (raw file):

				seen = true
			}
			return !seen

We lost the early return here. Don't we want that? It shouldn't be hard to add it back in.


pkg/kv/kvserver/replica_protected_timestamp.go, line 166 at r1 (raw file):

	// here. If we do so, update the replica's cache of its state. The guarantee
	// we provide is that if a record is successfully verified then the state will
	// be updated to include it or some protected timestamp which precedes it.

Precedes it or follows it? I thought we're ratcheting the timestamp in cachedProtectedTimestampState up.

@ajwerner ajwerner force-pushed the ajwerner/strict-gc-ttl-enforcement branch from 3e5c6f5 to 672b452 Compare March 9, 2020 20:46
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/client_replica_test.go, line 2951 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How long does it take on average?

--- PASS: TestStrictGCEnforcement (1.75s)
    --- PASS: TestStrictGCEnforcement/strict_enforcement (0.00s)
    --- PASS: TestStrictGCEnforcement/disable_strict_enforcement (0.04s)
    --- PASS: TestStrictGCEnforcement/zone_config_changes_are_respected (0.08s)
    --- PASS: TestStrictGCEnforcement/system_ranges_are_unaffected (0.02s)
    --- PASS: TestStrictGCEnforcement/protected_timestamps_are_respected (0.04s)

pkg/kv/kvserver/client_replica_test.go, line 3064 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do you think it's worth structuring this test so that it will work regardless of whether kv.gc_ttl.strict_enforcement is defaulted to true or not (i.e. set to true at the beginning and avoid DEFAULT)?

sure, done.


pkg/kv/kvserver/client_replica_test.go, line 3104 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Were you intending to extend this test?

I think I decided that there wasn't a lot of added value in this additional work.


pkg/kv/kvserver/replica.go, line 737 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider using .Forward() in here in both places.

Done here. The one on the other side is not a forward, it's a backwards if you will. I added a comment.


pkg/kv/kvserver/replica_protected_timestamp.go, line 46 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I feel like this would be cleaner as a method on Replica instead of a closure capturing the cachedProtectedTimestampState reference and the Replica reference, but I guess this is fine if you prefer this structure.

That's better.


pkg/kv/kvserver/replica_protected_timestamp.go, line 123 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We lost the early return here. Don't we want that? It shouldn't be hard to add it back in.

Yes, that was intentional. If we early return we may miss a different earliest valid record and we wouldn't be able to reliably update the state.


pkg/kv/kvserver/replica_protected_timestamp.go, line 166 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Precedes it or follows it? I thought we're ratcheting the timestamp in cachedProtectedTimestampState up.

Well, we're ratcheting readAt but we're not obviously changing the earliest record. Hence the confusion.

I've tried to make the comment clearer.

@ajwerner ajwerner force-pushed the ajwerner/strict-gc-ttl-enforcement branch 9 times, most recently from 75de851 to 8d488fa Compare March 11, 2020 13:49
@ajwerner
Copy link
Contributor Author

@nvanbenschoten this is RFAL

@ajwerner ajwerner force-pushed the ajwerner/strict-gc-ttl-enforcement branch from 8d488fa to 60fdd29 Compare March 11, 2020 14:48
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 9 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/kv/kvserver/helpers_test.go, line 473 at r2 (raw file):

}

func (r *Replica) ReadProtectedTimestamps() {

Should this take a context.Context?

@ajwerner ajwerner force-pushed the ajwerner/strict-gc-ttl-enforcement branch 2 times, most recently from 1fc3bd0 to a1fa14b Compare March 11, 2020 20:34
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/helpers_test.go, line 473 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should this take a context.Context?

Done.

This PR creates a cluster setting to control whether the GC TTL is enforced
"strictly". Strict GC enforcement is not a strong guarantee that we will
serve no user reads whatsoever for timestamps that are older than the current
reading of the node's clock less the GC TTL. Rather it is a setting to enable
the behavior that most of the time, most requests will see an error if they
use a timestamp which is older than the GC TTL.

The main case in which the user will see an error is if the only reason a
read would be able to request a timestamp was because of a protected timestamp.
The process of verifying a protected timestamp will prevent this case.
If the users of protected timestamps verify the protected timestamp before
they operate, or they use admin commands, they'll be fine.

In the presence of a protected timestamp, all requests can read back to
that protected timestamp.

Lease transfers are another source possibility for data to be visible at the
GC Threshold rather than the TTL.

Strict mode is controlled via a cluster setting.

Strict mode does not apply to admin commands or to system ranges.

The interesting thing with this all is that the problem which prompted this
setting in the first place was due to backups being run at a period longer
than the GC TTL and pretty much never failing.

Backups in 20.1 (in theory) are going to use protected timestamps. If they
successfully verify the protected timestamp then they'll be able to still
succeed an even higher percentage of the time! If we want to discourage this
behavior we could

* Have the protected timestamp subsystem enforce that clients can't save
  data they currently view as expired. This is probably a bad idea because
  interestingly enough the protected timestamp subsystem ignores zone configs
  completely.

* Have the higher levels try to do it. This seems probably better. The backup
  pulls the table descriptors and knows exactly the timestamp ranges it's
  backing up. In a lot of cases I can imagine wanting to back up data that
  is expired. Making that hard would probably be worse than making messing
  up like this worse. It probably also has zone configs. I might instead
  just warn loudly in this case.

What this will do is bring greater awareness of the GC ttl. If you don't
update or delete data frequently the GC hueristic won't kick in for quite
some time.

Release justification: bug fixes and low-risk updates to new functionality

Release note (general change): GC TTLs will now be enforced by default.
This enforcement can be controlled by disabling `kv.gc_ttl.strict_enforcement`
@ajwerner ajwerner force-pushed the ajwerner/strict-gc-ttl-enforcement branch 2 times, most recently from b99c9a1 to 984e50a Compare March 11, 2020 22:30
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r3, 2 of 2 files at r4, 3 of 3 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/ccl/changefeedccl/changefeed_test.go, line 2325 at r5 (raw file):

	) *roachpb.Error {
		if r, ok := ba.GetArg(roachpb.AdminVerifyProtectedTimestamp); ok {

nit: stray line


pkg/ccl/changefeedccl/changefeed_test.go, line 2327 at r5 (raw file):

			verifyRequestCh <- r.(*roachpb.AdminVerifyProtectedTimestampRequest)
			return roachpb.NewError(errors.New("boom"))

What error does AdminVerifyProtectedTimestamp return when it fails? Consider returning that here instead of this fake error so that readers of the test are more clear about why this might fail.


pkg/ccl/changefeedccl/changefeed_test.go, line 2350 at r5 (raw file):

			pts := cfg.ProtectedTimestampProvider
			// Make sure that the canceled job gets moved through its OnFailOrCancel
			// phase.

"phase and removes its protected timestamp."


pkg/jobs/jobs.go, line 819 at r4 (raw file):

func (sj *StartableJob) Cancel(ctx context.Context) error {
	defer sj.registry.unregister(*sj.ID())
	return sj.registry.CancelRequested(ctx, nil, *sj.id)

nit: why the sj.ID() vs sj.id?

This is important to unhook the job from the job registry.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None
This is important to fix the ProtectedTimestamp tests in the face of a
strict GC policy. We should send off an async verification request when
creating protected timestamps due to scan boundaries but that can wait
for a follow up PR.

I was hoping to avoid this one too but it seems important.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/strict-gc-ttl-enforcement branch from 984e50a to 2e0fa12 Compare March 11, 2020 22:48
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/ccl/changefeedccl/changefeed_test.go, line 2325 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: stray line

Done.


pkg/ccl/changefeedccl/changefeed_test.go, line 2327 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What error does AdminVerifyProtectedTimestamp return when it fails? Consider returning that here instead of this fake error so that readers of the test are more clear about why this might fail.

Done.


pkg/ccl/changefeedccl/changefeed_test.go, line 2350 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"phase and removes its protected timestamp."

Done.


pkg/jobs/jobs.go, line 819 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: why the sj.ID() vs sj.id?

no reason, made it consistent with the rest of the file.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@craig
Copy link
Contributor

craig bot commented Mar 11, 2020

Build failed

@ajwerner
Copy link
Contributor Author

Release justification.

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Mar 11, 2020

Build failed

@ajwerner
Copy link
Contributor Author

Flaked in something with protos and js?

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 11, 2020

Build failed

@ajwerner
Copy link
Contributor Author

flaked on #28033

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 12, 2020

Build succeeded

@craig craig bot merged commit 793a920 into cockroachdb:master Mar 12, 2020
@pbardea
Copy link
Contributor

pbardea commented Mar 16, 2020

@ajwerner does this close #45037?

@ajwerner
Copy link
Contributor Author

Indeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants