Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: support rolling upgrades of storage format changes #1780

Closed
tamird opened this issue Jul 23, 2015 · 10 comments
Closed

storage: support rolling upgrades of storage format changes #1780

tamird opened this issue Jul 23, 2015 · 10 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Milestone

Comments

@tamird
Copy link
Contributor

tamird commented Jul 23, 2015

In order to support (hypothetical) storage format changes, we need to be able to drain a node, let it rewrite its database, and undrain it. We need an administrative tool to put a node in the draining state, the rebalancer must respect this state (and probably move data off a draining node faster than it otherwise would), and the admin tool needs to be able to watch for the drain to be completed.

This feature complements the version number to be added in #2718.

Original message follows:

After we implement the current StorageKey proposal meant to address the Raft races, we should figure out a general system that will allow us to change the StorageKey encoding without breaking existing deployments.

The benefits here can't be overstated; together with #629, this will allow us to provide a 1.0-esque stability guarantee without appreciably handcuffing us from making rather invasive modifications.

Tagging for beta.

@tamird tamird added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 23, 2015
@tamird tamird added this to the v0.1 (Beta) milestone Jul 23, 2015
@tbg
Copy link
Member

tbg commented Jul 23, 2015

from what I understood, we're not clear on that we're going to implement the StorageKey solution. @bdarnell is dusting off some of his WIP so that we can examine the issues again and weigh our options once more.

@spencerkimball
Copy link
Member

@tschottdorf a little bit of recent discussion has put the StorageKey proposal back in vogue. The gist of the new thinking is that we're all having a really hard time convincing ourselves that the variously proposed measures actually handle the races we know about, much less the ones we don't know about.

Performance is the concern with implementing the proposal. On the other hand, we need to address correctness issues most urgently.

@bdarnell
Copy link
Contributor

Even if we don't implement StorageKey, we should think before beta about how we're going to handle backwards-incompatible changes to our lowest-level storage systems, since they'll surely happen one way or another (see #1772).

@petermattis
Copy link
Collaborator

In gmail, backwards incompatible changes were performed via the replication mechanism and rolling migrations of users. I think something similar would work here and would be both general and lightweight in the code, though with a bit more work for administrators. The gist is that a node only knows how to read and write a single format. We store that format inside the rocksdb engine and barf if the on-disk format differs from what the software supports. To change the format we do rolling upgrade of the servers in the cluster: draining the data off a server, deleting the data so the server will start up empty, upgrading the software (or flipping a flag) and then bringing the server back up. We can add various complexities to this, but I much prefer this scheme to one that tries to simultaneously support multiple versions of our lowest-level formats.

@bdarnell
Copy link
Contributor

@petermattis The situation is more complicated than it was with gmail because we will have a mix of versions within a raft replica group (this is necessarily true for any zero-downtime upgrade). We can swap out different Engines (and certain engine-level details - I think Storage Keys fall into this category) more or less transparently, but our hands are tied for anything that might be visible at or above the raft level (for example, I don't think the schema version numbers proposed in #1772 could be added in this way).

A rolling update system also implies certain work that must be done in the version before the incompatible change: we must be able to administratively drain a node and ensure that new ranges will not be rebalanced onto it; the rebalancer should be able to avoid adding new load to old-version nodes as much as possible, etc.

Bigtable maintained support for its old on-disk formats and lazily converted everything to the new format as a part of the regular compaction cycle. This was also how schema changes related to locality groups were made.

@petermattis
Copy link
Collaborator

Yes, we must be able to administratively drain a node. That functionality is necessary for administration independent of handling backwards-incompatible on-disk format changes.

@bdarnell
Copy link
Contributor

The StorageKey proposal was rejected so this issue is moot.

@tamird
Copy link
Contributor Author

tamird commented Oct 14, 2015

Does it matter that the storagekey proposal was rejected? This issue is about allowing for changing our in-rocksdb data representation - it just happened to piggyback on the storagekey name.

@bdarnell
Copy link
Contributor

If we remove the StorageKey part then there's some overlap with #2718, although I guess there are parts here that are reasonably independent (e.g. the drain functionality). I'll reopen and edit the title and first message.

@bdarnell bdarnell reopened this Oct 14, 2015
@bdarnell bdarnell changed the title Figure out how to permit backwards-compatible StorageKey changes Support rolling upgrades of storage format changes Oct 14, 2015
@petermattis petermattis changed the title Support rolling upgrades of storage format changes storage: support rolling upgrades of storage format changes Feb 12, 2016
@petermattis petermattis modified the milestones: 1.0, Beta Mar 1, 2016
@spencerkimball
Copy link
Member

I'm closing this issue due to advanced age and the advent of prop eval kv (although that's clearly not a complete fix for the issues raised here). Let's not have open issues for things which aren't currently a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

5 participants