New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.11vnodesupport #623

Open
wants to merge 1 commit into
base: 3.11
from

Conversation

Projects
None yet
5 participants
@zmarois
Contributor

zmarois commented Nov 8, 2017

Fixes #618

I believe this adds support for vnodes to the 3.11 branch. I still have some integration testing to do, but I think I have unit tests covering the new features.

Should I also submit a PR against 3.x? How about master?

@zmarois

This comment has been minimized.

Contributor

zmarois commented Nov 8, 2017

A few overarching changes required:

  • Make startup not require a token. When using vnodes, we don't need to generate a token
  • Separate the notion of the initial token from the backup identifier for the node. When using vnodes, we still need a unique identifier for the backup, but all 256 initial tokens seems too long , and we still need to set initial token at startup when loading from a backup, so we can't overload token as backup id because both are used for vnodes.
    * Save the instance's tokens in the snapshot in order to start the node loading it with the initial_token that was used when taking the snapshot, step 5 in this overview (Outdated because I stripped setting tokens from backup state)
@arunagrawal84

This comment has been minimized.

Contributor

arunagrawal84 commented Nov 13, 2017

@zmarois sorry I have not looked yet into your pull request. I am on vacation. I have added reviewers to your pull request for time being.
Also, to answer your question, yes please submit a pull request for 3.x as well because we are trying to keep 3.x and 3.11 in feature parity here. Reason being 3.x is probably most widely used branch. master at this point is too far from 3.x and we plan to merge 3.x into master but it has been delayed due to new features/improvements.

@zmarois

This comment has been minimized.

Contributor

zmarois commented Nov 13, 2017

No worries @arunagrawal84. I've got a fork I'm building and integration testing with; I just found a problem today that I have since patched in, so I'm not 100% sure that this is done. I appreciate any feedback the reviewers you added have, and I'll comment when I have integration tested this better, including restoring from backups.

@zmarois

This comment has been minimized.

Contributor

zmarois commented Nov 22, 2017

I've now tested backups and restores using this change and have confirmed the data loads correctly. However, I also tried a restore without the complexity of setting the initial_token to be the same as when the backup was taken, per step 5 in this overview, and saw the data still load correctly. Perhaps my comparison of the data performed a read-repair of all the data, and that resolved data being initially loaded onto the wrong nodes. Further, I'm not quite sure how after setting initial_token on restore I'd then be able to add nodes that are vnodes, so I'd like to remove the complexity of setting initial_token when restoring from a backup at all, or at least make it configurable to do so.

I've stripped that complexity, and filed the PR against the 3.x branch.

One more thing to note: I didn't do my testing against this branch itself, but a fork with a few more changes/features to make this easier to install into my current infrastructure, namely:

(Edited comment to change links to more concise commit links)

@costimuraru

This comment has been minimized.

costimuraru commented Nov 24, 2017

@zmarois nice work! I look forward to seeing this PR merged, so that we can benefit from VNodes.

String replacedIp = "";
String extraEnvParams = null;
while (true)
{
try
{
token = DataFetcher.fetchData("http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_token");
isExternallyDefinedToken = Boolean.parseBoolean(DataFetcher.fetchData("http://127.0.0.1:8080/Priam/REST/v1/cassconfig/is_externally_defined_token"));
if (isExternallyDefinedToken) {

This comment has been minimized.

@zmarois

zmarois Nov 27, 2017

Contributor

I needed to allow token to not be set, but still distinguish between Priam being fully initialized but not having a token and it not running at all.

I could have instead differentiated these two cases with a 500 when it hasn't fully initialized, and make this startup agent treat 500s different than a 204.

@@ -104,7 +107,7 @@ public InstanceIdentity(IPriamInstanceFactory factory, IMembership membership, I
init();
}
public PriamInstance getInstance() {
PriamInstance getInstance() {

This comment has been minimized.

@zmarois

zmarois Nov 27, 2017

Contributor

Token was previously used both as the initial_token and as the identifier of a node's backup in the context of the entire cluster.

When using vnodes, I don't need an initial token, but I do still need an identifier for the backup, so I wanted to separate these two concerns. Because the backup identifier is redundant with data already in SDB, it felt better not to put in in the PriamInstance class, and instead use PriamInstance purely as the Data Access DTO, but InstanceIdentity as the business model.

/**
* Find closest token in the specified region
*/
private String closestToken(String token, String region) {

This comment has been minimized.

@zmarois

zmarois Nov 27, 2017

Contributor

I moved this into RestoreTokenSelector to keep the logic for determining the closest token in one place

@@ -54,11 +62,52 @@ public RestoreTokenSelector(ITokenManager tokenManager, @Named("backup") IBackup
* Date for which the backups are available
* @return Token as BigInteger
*/
public BigInteger getClosestToken(BigInteger tokenToSearch, Date startDate) {
public BigInteger getClosestToken(String tokenToSearch, Date startDate)

This comment has been minimized.

@zmarois

zmarois Nov 27, 2017

Contributor

The changes to this are purely to improve the error message when finding the closest token is invalid, because the snapshot was taken with multiple tokens.

@@ -15,19 +15,13 @@
*
*/
package com.netflix.priam.backup.identity;
package com.netflix.priam.identity;

This comment has been minimized.

@zmarois

zmarois Nov 27, 2017

Contributor

I moved these tests to same package as the classes they test, so they have access to the now-package-private instance properties.

@arunagrawal84

This comment has been minimized.

Contributor

arunagrawal84 commented Dec 5, 2017

@zmarois wanted to give more context: This requires some changes in our infrastructure to accommodate vnodes which is why it is taking longer than expected.

@zmarois

This comment has been minimized.

Contributor

zmarois commented Dec 5, 2017

@arunagrawal84 no worries. I appreciate the check-in. I tried to write this in a way in which, if you still have num_tokens set to 1 (which is still the default), Priam should operate identical to how it did without this branch.

If you have concrete feedback on something I did here that makes Priam not backwards-compatible, let me know; I'd gladly fix it. If its just something internal, perhaps your own fork that I do not realize isn't compatible with this, I understand.

@darkpssngr

This comment has been minimized.

darkpssngr commented Mar 22, 2018

Any updates on this ?

@zmarois

This comment has been minimized.

Contributor

zmarois commented Mar 22, 2018

FWIW, we have been running a build of our fork with this in our production environment for about two months. @darkpssngr feel free to try out a build of our 3.11 branch.

Note it does have a few more changes in it, namely:

Other than setting listen_address (which I don't understand) I don't think any of these would be breaking from the base Netflix repo if you don't explicitly opt into them, so you should be able to get back to base easy enough.

@darkpssngr

This comment has been minimized.

darkpssngr commented Mar 23, 2018

@zmarois Cool thanks . I'll try it out. Ours is a new cluster. So, I dont think backward compatibility is an issue here 👍

@zmarois

This comment has been minimized.

Contributor

zmarois commented Mar 23, 2018

Fair. Or consider it forwards compatibility with the base Netflix branch if you switch back to it.

@sbilello

This comment has been minimized.

sbilello commented Oct 22, 2018

@zmarois

This comment has been minimized.

Contributor

zmarois commented Oct 23, 2018

@sbilello unfortunately, no. My team no longer uses that. We did use it for about 6 months and it did enable us to perform a restore of a lost node at a particularly troublesome time. However, we have since moved off of Cassandra; we just couldn't justify the operational cost of managing it compared to using a hosted solution (FWIW, we actually ended up going with a hosted Postgres database, which also trades off some performance against the costs of hosted Cassandra).

If you would like to pick it up, I'd gladly fill you in with whatever knowledge I have of it. My largest open concern with it regarded assigning the correct token ranges on restore, per step 5 in this overview. While on a full restore I had seen all data seem to propagate, it did balance it out on restoring (which just takes too long), so I think something needs to be done to pre-assign the token ranges such that nodes start up and take the correct token ranges for the restored data.

I believe @darkpssngr , per this comment opted to utilize ebs snapshots. I dunno if that has worked well for them or if they have tested a partial or full restore with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment