New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: dropping a large table will brick a cluster due to compactions #24029
Comments
Ok, I'm counting 22k range deletion tombstones in aggregate:
Broken down by node:
I'm surprised there's such a high variance—the replicas looked balanced when I dropped the table. |
Breaking it down on a per-sst basis, most of the 4k ssts have zero tombstones. There are a handful that have hundreds:
Digging into individual values on 008819, we see that the end key of one range and the start key of the next are adjacent in cockroach terms but different at the rocksdb level. This is the start and end key of two adjacent tombstones:
If we adjusted our endpoints so that the tombstones lined up exactly, would rocksdb be able to coalesce them into a single value? |
Let's make sure this becomes a variant of the |
Do you think that would help that much? A sufficiently scattered table can always produce maximally discontiguous ranges. |
True, but empirically it would make a big difference in this case. A very large fraction of the range tombstones in 008819 are contiguous (or would be with this change). Another upstream fix would be for rocksdb to prioritize sstables with multiple range tombstones for compaction. These are likely to be easy to compact away to the lowest level, and leaving them in higher levels is disproportionately expensive. |
Note that I'm speculating about this - I haven't found any code in rocksdb that would join two consecutive range deletions. I'm not sure if it's possible - each range tombstone has a sequence number as a payload, but I think it may be possible for those to be flattened away on compaction. |
|
The CPU profile showed that we're spending all our time inside the |
I experimented with a change that replaced the |
If you have time on your hands, you might also want to experiment with
turning on the CompactOnDeletionCollectionFactory. Not my hacked together
version for range deletion tombstones, but the official upstream one for
triggering compactions whenever an SST contains too many normal deletion
tombstones within a window:
https://github.com/facebook/rocksdb/blob/master/utilities/table_properties_collectors/compact_on_deletion_collector.h
…On Wed, Mar 21, 2018 at 11:53 AM, Tobias Schottdorf < ***@***.***> wrote:
I experimented with a change that replaced the ClearRange with a
(post-Raft, i.e. not WriteBatched) ClearIterRange. Presumably due to the
high parallelism with which these queries are thrown at RocksDB by
DistSender, the nodes ground to a halt, missing heartbeats and all. So
this alone isn't an option. I'll run this again with limited parallelism
but that will likely slow it down a lot.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#24029 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA15IKz_0Rdmv09pwIUVHNFwIXLNyanZks5tgndtgaJpZM4SwswS>
.
|
Oooh, check this out: https://github.com/facebook/rocksdb/pull/3635/files |
Good find! I'll try those out. Ping #17229 (comment) |
I tried running with https://github.com/facebook/rocksdb/pull/3635/files (on top of stock I then added @benesch's PR on top and restarted the nodes in the hope that I would see compaction activity while the node is stuck in initialization. However, that doesn't seem to happen; we only see one single thread maxing out one CPU: |
Back to the
|
It might not be a popular suggestion, but we could explore reinstating the original changes to simply drop SSTables for large aggregate suggested compactions. This requires that we set an extra flag on the suggestion indicating its a range of data which will never be rewritten. This is true after table drops and truncates, but not true after rebalances. |
Or am I misunderstanding, and dropping the files would be independent of the range tombstone problem? |
I was thinking about that too, I'm just spooked by the unintended consequences this can have as it pulls out the data even from open snapshots. |
Do you expect it would alleviate the problem? I have a PR that does it if
you want to try.
…On Wed, Mar 21, 2018 at 5:28 PM Tobias Schottdorf ***@***.***> wrote:
I was thinking about that too, I'm just spooked by the unintended
consequences this can have as it pulls out the data even from open
snapshots.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24029 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AF3MTY6b2vN-YRJ0OR0tZDRZsTeV9O3Vks5tgsXsgaJpZM4SwswS>
.
|
It's likely that it would alleviate the problem, but actually introducing this is not an option that late in the 2.0 cycle, so I'm investing my energies elsewhere. That said, if you want to try this, go ahead! Would be fun to see it in action. |
Like I said in the ISSUE facebook/rocksdb#3634, That PR can only solve non-first seek performance issues, so @benesch is right, the PR facebook/rocksdb#3635 cannot solve your problem because cockroachdb requires multiple iterators, and then each iterator only does a small amount of seeks. For this issue, I have some immature suggestions for consideration:
|
Thanks @lingbin. We're aware of these options, but it's tricky for us to implement them because we've so far relied on RocksDB giving us consistent snapshots, and we have various mechanisms that check that the multiple copies of the data we keep are perfectly synchronized. That said, somewhere down the road we're going to have to use some technique that involves either vastly improving the performance of deletion tombstone seeks (what @benesch said upstream), |
@bdarnell, re the below:
For a hail-mary short term fix, this seems to be a promising venue if it's true. Were you ever able to figure out if that should be happening? At least for testing, I can make these holes disappear, but it would be good to know if it's any good in the first place. I'm also curious what the compactions necessary to clean up this table would do even if seeks weren't affected at all. Running a manual compaction takes hours on this kind of dataset; we were implicitly assuming that everything would just kind of evaporate and it would be quick, but it doesn't seem to be what's happening here. Maybe this is due to the "holes" we leave in the keyspace, but it's unclear to me. We need this to be reasonably efficient or the Ping @a-robinson just in case anything from #21528 is applicable here. I believe the dataset there was much smaller, right? In other news, I ran the experiment with a) ClearIterRange b) tombstone-sensitive compaction and the cluster was a flaming pile of garbage for most of the time, but came out looking good: I'm not advocating that that should be our strategy for 2.0, but it's one of the so far equally bad strategies. |
Thinking more about ClearRange coalescence, it’s not clear how RocksDB
would manage it when they come in as separate commands. If you have
ClearRange [a, b) @ t1 and ClearRange [b, c) @ t3, you'd need to preserve
both of them for proper handling of any keys @ t2.
…On Thu, Mar 22, 2018 at 12:15 PM Tobias Schottdorf ***@***.***> wrote:
@bdarnell <https://github.com/bdarnell>, re the below:
If we adjusted our endpoints so that the tombstones lined up exactly,
would rocksdb be able to coalesce them into a single value?
For a hail-mary short term fix, this seems to be a promising venue if it's
true. Were you ever able to figure out if that should be happening? At
least for testing, I can make these holes disappear, but it would be good
to know if it's any good in the first place.
I'm also curious what the compactions necessary to clean up this table
would do even if seeks weren't affected at all. Running a manual compaction
takes hours on this kind of dataset; we were implicitly assuming that
everything would just kind of evaporate and it would be quick, but it
doesn't seem to be what's happening here. Maybe this is due to the "holes"
we leave in the keyspace, but it's unclear to me. We need this to be
reasonably efficient or the ClearRange option isn't one at all, even
without the seek problem. Something to investigate.
Ping @a-robinson <https://github.com/a-robinson> just in case anything
from #21528 <#21528> is
applicable here. I believe the dataset there was much smaller, right?
In other news, I ran the experiment with a) ClearIterRange b)
tombstone-sensitive compaction and the cluster was a flaming pile of
garbage for most of the time, but came out looking good:
[image: image]
<https://user-images.githubusercontent.com/5076964/37782015-46fbf22c-2dc8-11e8-90d2-fc851ffe1f40.png>
I'm not advocating that that should be our strategy for 2.0, but it's one
of the so far equally bad strategies.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24029 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA15IK9gUo5GPcEA1d2XmGKF4hltzsR6ks5tg848gaJpZM4SwswS>
.
|
RocksDB would keep track of the lowest sequence number that's still open in a snapshot and at compaction time, it joins SSTables whose sequence numbers are irrelevant. |
It took about 5 days of running I don't think there's anything particularly applicable here that we learned from that issue. The main lessons were:
|
I haven't found any code that would do this, but I can't rule it out. It would need to happen on compaction but not, I think, on all uses of the range tombstones so I'm not sure I've looked in the right places. Might be worth trying an empirical test.
Might make sense as a cluster setting to give people some way to delete large tables without knocking the cluster out completely. |
I'm on the fence about that -- with the range deletion tombstones, you can run a full compaction and you're back. That option does not exist with the ClearIterRange change; you're kinda screwed until it clears up by itself. |
While investigating range deletion performance issues, we realized that large swaths of contiguous tombstones can cause massive performance issues when seeking. Seeks of more than 15 minutes (!) were observed on one cluster [0]. The problem is that every key overlapping the range deletion tombstone must be loaded from disk, decompressed, and compared with the sequence number of the tombstone until a key is found that is not covered by the tombstone. If contiguous keys representing hundreds of gigabytes are covered by tombstones, RocksDB will need to scan hundreds of gigabytes of data. Needless to say, performance suffers. We have plans to improve seek performance in the face of range deletion tombstones upstream, but we can mitigate the issue locally, too. Iteration outside of tests is nearly always in the context of a range or some bounded span of local keys. By plumbing knowledge of the upper bound we care about for each scan to RocksDB, we can invalidate the iterator early, once the upper bound has been exceeded, rather than scanning over potentially hundreds of gigabytes of deleted keys just to find a key that we don't care about. To ensure we don't forget to specify an upper bound in a critical path, this commit requires that all iterators are either prefix iterators or declare their upper bound. This makes iterating in tests rather irksome, but I think the tradeoff is worthwhile. [0]: cockroachdb#24029 (comment) Release note: None
The current implementation of range deletion tombstones in RocksDB suffers from a performance bug that causes excessive CPU usage on every read operation in a database with many range tombstones. Dropping a large table can easily result in several thousand range deletion tombstones in one store, resulting in an unusable cluster as documented in cockroachdb#24029. Backport a refactoring of range deletion tombstone that fixes the performance problem. This refactoring has also been proposed upstream as facebook/rocksdb#4014. A more minimal change was also proposed in facebook/rocksdb#3992--and that patch better highlights the exact nature of the bug than the patch backported here, for those looking to understand the problem. But this refactoring, though more invasive, gets us one step closer to solving a related problem where range deletions can cause excessively large compactions (cockroachdb#26693). These large compactions do not appear to brick the cluster but undoubtedly have some impact on performance. Fix cockroachdb#24029. Release note: None
While investigating range deletion performance issues, we realized that large swaths of contiguous tombstones can cause massive performance issues when seeking. Seeks of more than 15 minutes (!) were observed on one cluster [0]. The problem is that every key overlapping the range deletion tombstone must be loaded from disk, decompressed, and compared with the sequence number of the tombstone until a key is found that is not covered by the tombstone. If contiguous keys representing hundreds of gigabytes are covered by tombstones, RocksDB will need to scan hundreds of gigabytes of data. Needless to say, performance suffers. We have plans to improve seek performance in the face of range deletion tombstones upstream, but we can mitigate the issue locally, too. Iteration outside of tests is nearly always in the context of a range or some bounded span of local keys. By plumbing knowledge of the upper bound we care about for each scan to RocksDB, we can invalidate the iterator early, once the upper bound has been exceeded, rather than scanning over potentially hundreds of gigabytes of deleted keys just to find a key that we don't care about. To ensure we don't forget to specify an upper bound in a critical path, this commit requires that all iterators are either prefix iterators or declare their upper bound. This makes iterating in tests rather irksome, but I think the tradeoff is worthwhile. [0]: cockroachdb#24029 (comment) Release note: None
While investigating range deletion performance issues, we realized that large swaths of contiguous tombstones can cause massive performance issues when seeking. Seeks of more than 15 minutes (!) were observed on one cluster [0]. The problem is that every key overlapping the range deletion tombstone must be loaded from disk, decompressed, and compared with the sequence number of the tombstone until a key is found that is not covered by the tombstone. If contiguous keys representing hundreds of gigabytes are covered by tombstones, RocksDB will need to scan hundreds of gigabytes of data. Needless to say, performance suffers. We have plans to improve seek performance in the face of range deletion tombstones upstream, but we can mitigate the issue locally, too. Iteration outside of tests is nearly always in the context of a range or some bounded span of local keys. By plumbing knowledge of the upper bound we care about for each scan to RocksDB, we can invalidate the iterator early, once the upper bound has been exceeded, rather than scanning over potentially hundreds of gigabytes of deleted keys just to find a key that we don't care about. To ensure we don't forget to specify an upper bound in a critical path, this commit requires that all iterators are either prefix iterators or declare their upper bound. This makes iterating in tests rather irksome, but I think the tradeoff is worthwhile. [0]: cockroachdb#24029 (comment) Release note: None
26789: Makefile: don't globally install binaries during lint r=BramGruneir a=benesch The build system takes pains to scope all binaries it installs to REPO/bin to avoid polluting the user's GOPATH/bin with binaries that are internal to the Cockroach build system. The lint target was violating this rule by running 'go install ./pkg/...', which installed every package in the repository into GOPATH/bin. This commit adjusts the rule to run 'go build ./pkg/...' instead, which installs the necessary .a files into GOPATH/pkg without installing any binaries into GOPATH/bin. Fix #26633. Release note: None 26872: engine: require iterators to specify an upper bound r=petermattis,bdarnell a=benesch As promised. The issue the first commit fixes took longer than I'd like to admit to track down. I've done my best to keep the patch as small as possible in case we want to backport to 2.0. The changes to the C++ DBIterator are unfortunately large, though. Open to less disruptive ways of plumbing the upper bound there, but I'm not seeing anything obvious. --- While investigating range deletion performance issues, we realized that large swaths of contiguous tombstones can cause massive performance issues when seeking. Seeks of more than 15 minutes (!) were observed on one cluster [0]. The problem is that every key overlapping the range deletion tombstone must be loaded from disk, decompressed, and compared with the sequence number of the tombstone until a key is found that is not covered by the tombstone. If contiguous keys representing hundreds of gigabytes are covered by tombstones, RocksDB will need to scan hundreds of gigabytes of data. Needless to say, performance suffers. We have plans to improve seek performance in the face of range deletion tombstones upstream, but we can mitigate the issue locally, too. Iteration outside of tests is nearly always in the context of a range or some bounded span of local keys. By plumbing knowledge of the upper bound we care about for each scan to RocksDB, we can invalidate the iterator early, once the upper bound has been exceeded, rather than scanning over potentially hundreds of gigabytes of deleted keys just to find a key that we don't care about. To ensure we don't forget to specify an upper bound in a critical path, this commit requires that all iterators are either prefix iterators or declare their upper bound. This makes iterating in tests rather irksome, but I think the tradeoff is worthwhile. [0]: #24029 (comment) Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
While debugging cockroachdb#24029 we discovered that RESTORE generates massive numbers of suggested compactions as it splits and scatters ranges. As the cluster rebalances, every removed replica leaves behind a range deletion tombstone and a suggested compaction over the keys it covered. Occasionally, the replica chosen for rebalancing will be a member of the last range of the cluster. This range extends from wherever the restore has last split the table it's restoring to the very last key. Suppose we're restoring a table with 10,000 ranges evenly distributed across primary keys from 1-10,000. If a replica in the last range gets rebalanced early in the restore—say, after only the first 500 ranges have been split off—at least one node in the cluster will have a suggested compaction for a range like the following: /Table/51/1/500 - /Max This creates a huge problem! The restore will eventually create 9500 more ranges in that keyspan, each about 32MiB in size. Some of those ranges will necessarily rebalance back onto the node with the suggested compaction. By the time the compaction queue gets around to processing the suggestion, there might be hundreds of gigabytes within the range. In our 2TB (replicated) store dump, snapshotted immediately after a RESTORE, there were two such massive suggested compactions, each of which took over 1h to complete. This bogs down the compaction queue with unnecessary work and makes it especially dangerous to initiate a DROP TABLE (cockroachdb#24029), as the incoming range deletion tombstones will pile up until the prior compaction finishes, and the cluster grinds to a halt in the meantime. The same problem happens whenever any replica is rebalanced away and back before the compaction queue has a chance to compact away the range deletion tombstone, though the impact is limited because the keyspan is smaller. This commit prevents the compaction queue from getting bogged down with suggestions based on outdated information. At the time the suggestion is considered for compaction, the queue checks whether the key span suggested has any live keys. The sign of even a single key is a good indicator that something has changed—usually that the replica, or one of its split children, has been rebalanced back onto the node. The compaction queue now deletes this suggestion instead of acting on it. This is a crucial piece of the fix to cockroachdb#24029. The other half, cockroachdb#26449, involves rate-limiting ClearRange requests. I suspect this change will have a nice performance boost for RESTORE. Anecdotally we've noticed that restores slow down over time. I'm willing to bet its because nonsense suggested compactions start hogging disk I/O. Release note: None
While debugging cockroachdb#24029 we discovered that RESTORE generates massive numbers of suggested compactions as it splits and scatters ranges. As the cluster rebalances, every removed replica leaves behind a range deletion tombstone and a suggested compaction over the keys it covered. Occasionally, the replica chosen for rebalancing will be a member of the last range of the cluster. This range extends from wherever the restore has last split the table it's restoring to the very last key. Suppose we're restoring a table with 10,000 ranges evenly distributed across primary keys from 1-10,000. If a replica in the last range gets rebalanced early in the restore—say, after only the first 500 ranges have been split off—at least one node in the cluster will have a suggested compaction for a range like the following: /Table/51/1/500 - /Max This creates a huge problem! The restore will eventually create 9500 more ranges in that keyspan, each about 32MiB in size. Some of those ranges will necessarily rebalance back onto the node with the suggested compaction. By the time the compaction queue gets around to processing the suggestion, there might be hundreds of gigabytes within the range. In our 2TB (replicated) store dump, snapshotted immediately after a RESTORE, there were two such massive suggested compactions, each of which took over 1h to complete. This bogs down the compaction queue with unnecessary work and makes it especially dangerous to initiate a DROP TABLE (cockroachdb#24029), as the incoming range deletion tombstones will pile up until the prior compaction finishes, and the cluster grinds to a halt in the meantime. The same problem happens whenever any replica is rebalanced away and back before the compaction queue has a chance to compact away the range deletion tombstone, though the impact is limited because the keyspan is smaller. This commit prevents the compaction queue from getting bogged down with suggestions based on outdated information. At the time the suggestion is considered for compaction, the queue checks whether the key span suggested has any live keys. The sign of even a single key is a good indicator that something has changed—usually that the replica, or one of its split children, has been rebalanced back onto the node. The compaction queue now deletes this suggestion instead of acting on it. This is a crucial piece of the fix to cockroachdb#24029. The other half, cockroachdb#26449, involves rate-limiting ClearRange requests. I suspect this change will have a nice performance boost for RESTORE. Anecdotally we've noticed that restores slow down over time. I'm willing to bet its because nonsense suggested compactions start hogging disk I/O. Release note: None
While debugging cockroachdb#24029 we discovered that RESTORE generates massive numbers of suggested compactions as it splits and scatters ranges. As the cluster rebalances, every removed replica leaves behind a range deletion tombstone and a suggested compaction over the keys it covered. Occasionally, the replica chosen for rebalancing will be a member of the last range of the cluster. This range extends from wherever the restore has last split the table it's restoring to the very last key. Suppose we're restoring a table with 10,000 ranges evenly distributed across primary keys from 1-10,000. If a replica in the last range gets rebalanced early in the restore—say, after only the first 500 ranges have been split off—at least one node in the cluster will have a suggested compaction for a range like the following: /Table/51/1/500 - /Max This creates a huge problem! The restore will eventually create 9500 more ranges in that keyspan, each about 32MiB in size. Some of those ranges will necessarily rebalance back onto the node with the suggested compaction. By the time the compaction queue gets around to processing the suggestion, there might be hundreds of gigabytes within the range. In our 2TB (replicated) store dump, snapshotted immediately after a RESTORE, there were two such massive suggested compactions, each of which took over 1h to complete. This bogs down the compaction queue with unnecessary work and makes it especially dangerous to initiate a DROP TABLE (cockroachdb#24029), as the incoming range deletion tombstones will pile up until the prior compaction finishes, and the cluster grinds to a halt in the meantime. The same problem happens whenever any replica is rebalanced away and back before the compaction queue has a chance to compact away the range deletion tombstone, though the impact is limited because the keyspan is smaller. This commit prevents the compaction queue from getting bogged down with suggestions based on outdated information. At the time the suggestion is considered for compaction, the queue checks whether the key span suggested has any live keys. The sign of even a single key is a good indicator that something has changed—usually that the replica, or one of its split children, has been rebalanced back onto the node. The compaction queue now deletes this suggestion instead of acting on it. This is a crucial piece of the fix to cockroachdb#24029. The other half, cockroachdb#26449, involves rate-limiting ClearRange requests. I suspect this change will have a nice performance boost for RESTORE. Anecdotally we've noticed that restores slow down over time. I'm willing to bet its because nonsense suggested compactions start hogging disk I/O. Release note: None
Demonstrate the performance problem with iterating through a range tombstone. name time/op RocksDBDeleteRangeIterate/entries=10-8 6.09µs ± 3% RocksDBDeleteRangeIterate/entries=1000-8 131µs ± 3% RocksDBDeleteRangeIterate/entries=100000-8 12.3ms ± 3% See cockroachdb#24029 Release note: None
26488: compactor: purge suggestions that have live data r=tschottdorf,bdarnell a=benesch While debugging #24029 we discovered that RESTORE generates massive numbers of suggested compactions as it splits and scatters ranges. As the cluster rebalances, every removed replica leaves behind a range deletion tombstone and a suggested compaction over the keys it covered. Occasionally, the replica chosen for rebalancing will be a member of the last range of the cluster. This range extends from wherever the restore has last split the table it's restoring to the very last key. Suppose we're restoring a table with 10,000 ranges evenly distributed across primary keys from 1-10,000. If a replica in the last range gets rebalanced early in the restore—say, after only the first 500 ranges have been split off—at least one node in the cluster will have a suggested compaction for a range like the following: /Table/51/1/500 - /Max This creates a huge problem! The restore will eventually create 9500 more ranges in that keyspan, each about 32MiB in size. Some of those ranges will necessarily rebalance back onto the node with the suggested compaction. By the time the compaction queue gets around to processing the suggestion, there might be hundreds of gigabytes within the range. In our 2TB (replicated) store dump, snapshotted immediately after a RESTORE, there were two such massive suggested compactions, each of which took over 1h to complete. This bogs down the compaction queue with unnecessary work and makes it especially dangerous to initiate a DROP TABLE (#24029), as the incoming range deletion tombstones will pile up until the prior compaction finishes, and the cluster grinds to a halt in the meantime. The same problem happens whenever any replica is rebalanced away and back before the compaction queue has a chance to compact away the range deletion tombstone, though the impact is limited because the keyspan is smaller. This commit prevents the compaction queue from getting bogged down with suggestions based on outdated information. At the time the suggestion is considered for compaction, the queue checks whether the key span suggested has any live keys. The sign of even a single key is a good indicator that something has changed—usually that the replica, or one of its split children, has been rebalanced back onto the node. The compaction queue now deletes this suggestion instead of acting on it. This is a crucial piece of the fix to #24029. The other half, #26449, involves rate-limiting ClearRange requests. I suspect this change will have a nice performance boost for RESTORE. Anecdotally we've noticed that restores slow down over time. I'm willing to bet its because nonsense suggested compactions start hogging disk I/O. Release note: None Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Demonstrate the performance problem with iterating through a range tombstone. name time/op RocksDBDeleteRangeIterate/entries=10-8 6.09µs ± 3% RocksDBDeleteRangeIterate/entries=1000-8 131µs ± 3% RocksDBDeleteRangeIterate/entries=100000-8 12.3ms ± 3% See cockroachdb#24029 Release note: None
With the recent improvements to RocksDB range tombstones, the Note that the test started slightly after 00:00. This test was using a freshly generated |
Why doesn't it make any sense? I think that fixture was generated with 1.1, and you're running a 2.1 beta. |
In that graph, why does available stay constant while capacity decreases?
…On Fri, Jul 13, 2018 at 8:40 PM Nikhil Benesch ***@***.***> wrote:
For some reason, cockroach was refusing to start using the old store
fixtures complaining about the version being too old. This doesn't make any
sense and needs to be investigated.
Why doesn't it make any sense? I think that fixture was generated with
1.1, and this is a 2.1 beta.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24029 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AF3MTX1xzZnmjyOUp9tin3rkW-pQoIo3ks5uGT32gaJpZM4SwswS>
.
|
See #27525. I generated that fixture with a binary at version v2.0-7.
Zfs. For ease of debugging we take a zfs snapshot after restoring the store fixtures which makes subsequent runs of the test fast. The downside is that as we start to delete data, zfs doesn't actually delete it off disk because it is referenced by the snapshot. |
|
$ roachprod create USER-FOO -n10 $ roachprod run USER-FOO 'mkdir -p /mnt/data1/cockroach && gsutil -m -q cp -r gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/stores=10/$((10#$(hostname | grep -oE [0-9]+$)))/* /mnt/data1/cockroach'
Wait ~10m for stores to download. Then drop the 2TiB table:
The cluster explodes a few minutes later as RocksDB tombstones pile up. I can no longer execute any SQL queries that read from/write to disk.
Very closely related to #21901, but thought I'd file a separate tracking issue.
/cc @spencerkimball
The text was updated successfully, but these errors were encountered: