Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 544: Bootup cookie validation considers an empty journal to signify a new bookie #712

Closed
wants to merge 9 commits into from

Conversation

sijie
Copy link
Member

@sijie sijie commented Nov 10, 2017

Descriptions of the changes in this PR:

  • read cookie from registration manager as the source-of-truth (for all potential options)
  • remove duplicated code between checkEnvironment and checkEnvironmentWithStorageExpansion

…nify a new bookie

- read cookie from registration manager as the source-of-truth (for all potential options)
- remove duplicated code between checkEnvironment and checkEnvironmentWithStorageExpansion
@asfgit
Copy link

asfgit commented Nov 10, 2017

FAILURE

--none--

@asfgit
Copy link

asfgit commented Nov 10, 2017

FAILURE

--none--

new ServerConfiguration(conf).setUseHostNameAsBookieID(true).setAdvertisedAddress(null)));
// advertised address
if (null != conf.getAdvertisedAddress()) {
addresses.add(getBookieAddress(conf));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the advertised address should be really "known" (does not throw UnknownHostException).
it is an address used by other machines to refer to "this" machine.
Is this a behaviour change ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment here. can you explain more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of UnkownHostException while dealing with the advertised address we are failing the boot.
it is possible that this name is not known/resolvable to the bookie because it is a name from another "namespace"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed on slack, there is no change from prev version so it is ok for me

b.shutdown();

conf.setAdvertisedAddress("test.bookkeeper.com");
Copy link
Contributor

@eolivelli eolivelli Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use public DNS names, it would be possible that running the test suite the process does invalid/suspicious network requests
please change to something like
test.bookkeeper.com.localhost

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't do any name resolution. but I am fine to change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments.
overall it is a very good change, code is cleaner

Copy link
Contributor

@ivankelly ivankelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the algorithm can be simplified significantly, and if we can it will avoid bugs like this creeping in again later.

String instanceId = rm.getClusterInstanceId();

// 2. build the master cookie from the configuration
Cookie.Builder builder = Cookie.generateCookie(conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda weird that we're not building the cookie from the passed in journalDirectories and ledgerDirs, but rather rebuilding these from the conf within generateCookie.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the ledger directories and journal directories to use conf only. but there are already other methods using this static method. I would prefer not changing at this moment. If you want to clean up a bit more, a separate task to do the cleanup at all the places will be better.

}
// 4. check if the cookie appear in all the directories.
List<File> missedCookieDirs = new ArrayList<>();
List<Cookie> existedCookies = Lists.newArrayList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existingCookies

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

missedCookieDirs.addAll(ledgerResult.getLeft());
existedCookies.addAll(ledgerResult.getRight());

// 5. if it is not a new environment, find out the real directories that miss cookie data
if (!newEnv && missedCookieDirs.size() > 0) {
// If we find that any of the dirs in missedCookieDirs, existed
// previously, we stop because we could be missing data
// Also, if a new ledger dir is being added, we make sure that
// that dir is empty. Else, we reject the request
Set<String> existingLedgerDirs = Sets.newHashSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to existedBefore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is untouched. I will prefer not changing it. but if you have a strong preference, I can change it.

for (Cookie journalCookie : journalCookies) {
Collections.addAll(existingLedgerDirs, journalCookie.getLedgerDirPathsFromCookie());
for (Cookie cookie : existedCookies) {
Collections.addAll(existingLedgerDirs, cookie.getLedgerDirPathsFromCookie());
}
List<File> dirsMissingData = new ArrayList<File>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to corruptOldDirs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same idea as above. this is untouched code.

for (Cookie journalCookie : journalCookies) {
Collections.addAll(existingLedgerDirs, journalCookie.getLedgerDirPathsFromCookie());
for (Cookie cookie : existedCookies) {
Collections.addAll(existingLedgerDirs, cookie.getLedgerDirPathsFromCookie());
}
List<File> dirsMissingData = new ArrayList<File>();
List<File> nonEmptyDirs = new ArrayList<File>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to nonEmptyNewDirs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not changing names for existing names at the same patch. If other people cherry-pick this change, people will know there is no name changing in existing logic.

conf,
rm
);
final boolean newEnv = null == rmCookie;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a new env, we should just add cookies to all dirs and exit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not. All the code below are validations. It has values to detect cases that cookie is missing in zookeeper, but local disks have old cookie that doesn't match the configuration.

Copy link
Contributor

@ivankelly ivankelly Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this comment before I made the comment below. I should have been more explicit. In the case of new env, go through all dirs, ensure they are empty, and if they are, add all cookies, then run the common case validation to ensure that after the cookie creation things look good. Likewise for the case where there are new directories.

The problem in this code, and the root cause for this bug, was that this code is just too complex for what it does, in particular with multiple boolean flags running through it, switching on logic here and there, which makes it hard to run through the logic in your head. What I'm suggesting, in this and the comment below, is to move the boolean flag special cases into functions that handle the special cases, and then afterwards run the common validation logic, which should then be quite simple (i.e. it's just checking that all directories have a cookie which matches the cookie in zk).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, the bug was introduced because the sequence was reorder, at that time, the logic was pretty simple. I don't see the bug was related to the branches. The biggest problem of the existing code is about the sequence of validation. This patch enforces the sequence and leaves the main validation logic untouched.

The problem I have with branches - if you took at the code, there were two almost-same validation logic, one for storage expansion, the other one doesn't. but they look almost same, it is really hard to figure out what exactly is the difference between them.

the way why you think you need branches is the misuse of newEnv flag. line 507- 537 are needed for both newEnvironment and oldEnvironment. the only difference is newEnv doesn't have to throw exception when there are dirs missing cookies, but it also has to throw exception with nonEmptyDir. I will change this a bit. let's see if you are okay with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue I had with the branching is the interleaving of the expansion stuff. It means there needs to be two missingCookieDir lists, one for dirs that don't have cookies, and one for dirs that don't have cookies, but existed previously so should have cookies, and then this problem is only rectified at the end of the validation. It would be clearer if the inconsistencies were handled at the start, and then we can check if a cookie is missing and if it is, throw immediately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjust a bit on the verification. please check and let you know.

missedCookieDirs.add(dir);
}
// 4. check if the cookie appear in all the directories.
List<File> missedCookieDirs = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this algorithm would be much simpler if moved the special cases for newEnv and expandStorage somewhere else.

In the case of new env, just add cookies to all the dirs.

In the case of expand storage:

  • check that all the dirs in rmCookie have a cookie equal to rmCookie
  • all the dirs in masterCookie that are not in rmCookie are empty
  • Update the cookie in all dirs and zk

Then in all cases run the verification that all dirs contain the same cookie as zk (in new and expand cases, they will have been updated).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see my comment above regarding newEnv. I would prefer not simplifying the validation for newEnv. I removed the branches is to ensure validation in the common path. because if you do this branch, it is hard to enforce changes in future.

@sijie
Copy link
Member Author

sijie commented Nov 10, 2017

I rename the directories introduced by this pull request and leave the directories that this code doesn't touch as same before.

@asfgit
Copy link

asfgit commented Nov 10, 2017

FAILURE

--none--

return rmCookie;
}

private static Pair<List<File>, List<Cookie>> verifyAndGetMissingDirs(Cookie masterCookie,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid to have a bookie with more than one cookie? Shouldn't all cookies match the value read from registration manager?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the allowStorageExpansion, there might be at a point that there might be two cookies (e.g. a disk failure on storage expansion, some directory is not updated yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would imply that a bookie started one time with a unavailable directory while had previously been available, which sounds like an error case. The bookie shouldn't have come up with a disk missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is 'a unavailable directory'? this method is only to verify existing cookies that available in local dirs and collecting directories that miss cookies. I am not sure we are on the same page about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm saying is that there shouldn't be a case where one directory was down, the rest had their cookie upgraded, and then that directory came back. If a directory is down, it should will throw a FileNotFoundException, and prevent boot. So it shouldn't be possible to have different cookies on a single system in most cases.

The only case I can think of where more than one cookie exists is where a failure occurred in the process of writing the cookies. In this case, we should require manual intervention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm saying is that there shouldn't be a case where one directory was down, the rest had their cookie upgraded, and then that directory came back. If a directory is down, it should will throw a FileNotFoundException, and prevent boot. So it shouldn't be possible to have different cookies on a single system in most cases.

the case you mentioned here won't happen. if a directory is missed, it is caught with directory missed cookie. If a directory missed cookie is in an old environment, that validation will fail.

The only case I can think of where more than one cookie exists is where a failure occurred in the process of writing the cookies. In this case, we should require manual intervention.

Thas was what I explained. We don't need manual intervention (and it makes things become complicated). A restart of the bookie should continue with storage expansion and succeed eventually. IMO, we should not change any operational behavior at this pull request because it only tends to address the root cause first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thas was what I explained. We don't need manual intervention (and it makes things become complicated). A restart of the bookie should continue with storage expansion and succeed eventually.

This won't be expansion though, because if the cookie exists, then the directory is non-empty so can't be used for expansion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a new directory was not written with a cookie during storage expansion. the storage expansion will continue to fix it eventually. if an existing dir's cookie is not updated during storage expansion, there are two cases : 1) a bookie starts again without expansion, that the verification will fail 2) a bookie starts again with expansion, it is okay to continue as well because the master cookie and rm cookie validation is still valid (because of the superset). that's still a healthy environment.

// If allowStorageExpansion option is set, we should
// make sure that the new set of ledger/index dirs
// is a super set of the old; else, we fail the cookie check
if (conf.getAllowStorageExpansion()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checking this in multiple places. Surely there after the initialization process, the cookie for a single bookie should match in all directories and in zk, so this check can be done in one place after checking all existing directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what do you expect here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level, I want to be able to understand what is going on in the checkEnv method without having to jump around all over the code :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can find a better solution for this, I am fine with it.

} catch (FileNotFoundException fnf) {
missedCookieDirs.add(dir);
}
// 4.1 verify the cookies in journal directories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this code ensuring that all directories from rmCookie are being checked and that the cookies are valid?
As I understand it, this is a byproduct of the fact that we called verifyIsSuperSet when reading rmCookie, but it takes a lot of digging to figure this out, and it's action from afar.

Also, do we allow journal expansion, or just ledger dir expansion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • masterCookie should be the super set of rmCookie. masterCookies is generated from configuration, so all the directories are covered.
  • we don't allow journal expansion. the storage expansion was added before multiple journals.

Copy link
Contributor

@ivankelly ivankelly Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the expansion case, masterCookie is a superset or rmCookie, so you'll end up with some directories missing cookies. It's not clear if these directories are part of rmCookie (i.e. preexisting directories) or newly added directories.

we don't allow journal expansion. the storage expansion was added before multiple journals.

This should be clear from reading the checkEnv method. Currently it is not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear if these directories are part of rmCookie (i.e. preexisting directories) or newly added directories.

this is part of verifyDirsForOldEnvironment

This should be clear from reading the checkEnv method. Currently it is not

why do you need to know this from reading checkEnv. it should be part of the logic when handling old environment, which is 'verifyDirsForOldEnvironment'. All the features about 'expansion" should only be related to old environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyDirsForOldEnvironment does take rmCookie directly (only indirectly from it having been added to existing cookie). So its unclear which cookie was the canonical cookie for the old environment (IMO it should be rmCookie).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

masterCookie is the canonical cookie. we are using masterCookie for all the validations. rmCookie is the source-of-truth of whether it is a new environment or an old environment. so that's why the sequence is:

  • generating masterCookie from configuration
  • get rmCookie so we know whether it is a new environment or old environment
  • going through all the directories listed in masterCookie to collect existing cookies and directories that missed cookies.
  • if no directory is missing cookie, this is a healthy environment. we return. otherwise, we are getting into the fixing logic. The fixing logic can be creating cookies for new environment and expanding storage directory for old environment.

The logic flow is very clear and it is only one main logic and no branches. Again, personally I don't like branches here. when I was first initially working on this change, I spent almost half an hour to understand what are the differences between two branches. all the code was just duplicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, I just realized that it's possible to have multiple cookies in the registration manager, and they can possibly be all different :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyDirsForOldEnvironment is a bad name, it should really be verifyDirsForStorageExpansion.

I think what's making me most uncomfortable about the algorithm is the fact that the check to storage expansion is hidden in the subcalls.

How about?

            // 5. if there is no directory missing cookie, nothing need to be changed.                                                                                                                                                                                                    
            if (!missedCookieDirs.isEmpty()) {
                if (null == rmCookie) {
                    fixupCookieForNewEnvironment(missedCookieDirs);
                } else if (conf.isStorageExpansionEnabled()) {
                    fixupCookieForStorageExpansion(missedCookieDirs, existingCookies);
                }

                // verify that all directories and zk contain a cookie matching the master cookie
           }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a user was switching between ip, hostname and advertised address, you can possibly have multiple cookies, because they are using different bookie ids. but this is fixed by my change. now it won't happen.

if you are always using same ip address, it is not possible to have multiple cookies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the difference between fixupCookieForNewEnvironment and fixupCookieForStorageExpansion are same at the actual fixing steps. They have to do the exactly same thing (even we add journal expansion in future). The only difference is on verification. I am fine if you want to combine verification and the actual fixing step together as a single 'fixup' call.

  2. I am not sure why do you need to verify all the directories again with master cookie. all the validation should already happen before fixup. if fixup failed, exceptions should already be thrown. I am fine with doing another around of verification by recursively calling the checkEnvironment again, there should be no duplicated verification code again.

@ivankelly
Copy link
Contributor

ivankelly commented Nov 13, 2017

For me, this algorithm should look like. I can have a go at turning it to this tomorrow if you want.

// read zk cookie

if (zk cookie is null) {
    // new directories should be empty
    // add cookie to all dirs and zk
} else if (master is superset of zk cookie and expansion enabled) {
    // old directories should match zk cookie
    // new directories should be empty
    // add master cookie to all dirs and zk
} else if (cookie needs upgrade) {
    // verify that zk cookie matches all dirs
    // add new cookie to add dirs and zk
}

// verify that zk cookie matches all dirs

@sijie
Copy link
Member Author

sijie commented Nov 13, 2017

@ivankelly please go for your proposal. I think we have a totally different thought on this. if you prefer your solution, I am fine with you doing that.

@ivankelly
Copy link
Contributor

I created a pull request into your branch.

sijie#1

But alas, lest we think the end is near, there is a bug (will comment now).

Versioned<Cookie> rmCookie = null;
for (BookieSocketAddress address : addresses) {
try {
rmCookie = Cookie.readFromRegistrationManager(rm, address);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sijie a bug. If there is a change in configuration, it is possible that we will write to a different znode than the one we read rmCookie from in this method. However, we will use the Version returned by this method, which will result in a BadVersionException.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's that possible after this change? Can you explain the sequence causing this bug?

Copy link
Contributor

@ivankelly ivankelly Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say you have bookie, ip is 10.0.0.1, hostname is foobar.

First boot it comes up with setUseHostNameAsBookieID(false), writes its cookie to /ledgers/cookies/10.0.0.1:3181.
On second boot, some disks are added, allow expansion is enabled and setUseHostNameAsBookieID is set to true. In this case, it reads a cookie from /ledgers/cookies/10.0.0.1:3181, updates the cookie for the new storage, writes to all the dirs and then tries to write the cookie to /ledgers/cookies/foobar:3181. However, this last write, it will use the version it read from /ledgers/cookies/10.0.0.1:3181, which is wrong (it shouldn't even be a setData, it should be a create).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I think a good solution would be to just delete the old cookie before creating the new one, using a multi().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If hostname is used and the IP cookie exists, the verification will already fail immediately at the first step. The second boot will never succeed. If you take a look at the test case changes, it covers your case described here.

@sijie
Copy link
Member Author

sijie commented Nov 14, 2017

@ivankelly I checked your changes. Frankly speaking, I don't see any real difference comparing to current pull request. However I am fine with those renames. Just one question, why do you remove the return in the middle? It is a stop if the environment is healthy, you shouldn't run the fixing logic.

Explicitly pass allowExpansion rather than the whole configuration, so
it's clear which methods are taking it into account.

Renames the verification for old directories in the case of storage
expansion.

Remove return statement from middle of method.
@asfgit
Copy link

asfgit commented Nov 16, 2017

FAILURE

--none--

@sijie
Copy link
Member Author

sijie commented Nov 17, 2017

retest this please

@asfgit
Copy link

asfgit commented Nov 17, 2017

FAILURE

--none--

@asfgit
Copy link

asfgit commented Nov 20, 2017

SUCCESS

--none--

@sijie
Copy link
Member Author

sijie commented Nov 20, 2017

all the tests passed now. @ivankelly @jiazhai please take a look

List<File> dirsMissingData = new ArrayList<File>();
List<File> nonEmptyDirs = new ArrayList<File>();
for (File dir : missedCookieDirs) {
if (existingLedgerDirs.contains(dir.getParentFile())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you comparing parent? it is possible they have the same common parent but different directories/mountpoints right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvrao 1) this is existing behavior. 2) because the real directory is the current sub-dir of the configured directory, so the check needs to use parent file.

}
}
if (!nonEmptyDirs.isEmpty()) {
LOG.error("Not all the new directories are empty. New directories that are not empty are: " + nonEmptyDirs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new environment, so all directories must be empty right? Why can't we simply go through all ledger/index and journal dirs ? not just missedcookie dirs also the error must not have "New" in it. Everything is new.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to use same sequence for all the branches. If we start branching at the begin, it is going to maintain many branches, which I am trying to avoid in this change. because I had spent half an hour to understand what were the differences between branches.

journalDirectories, allLedgerDirs);
} else if (allowExpansion) {
// 5.2 storage is expanding
Set<File> knownDirs = knownDirs(existingCookies);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing method name to getExistingDirs or getKnownDirs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// 5.1 new environment: all directories should be empty
verifyDirsForNewEnvironment(missedCookieDirs);
stampNewCookie(conf, masterCookie, rm, Version.NEW,
journalDirectories, allLedgerDirs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to above, these nested if statements become really easy to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't quite follow here. which line you want me to move?

// this is either a:
// - new environment
// - a directory is being added
// - a directory has been corrupted/wiped, which is an error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the directory is wiped/corrupted verifyIsSuperSet() should catch it right??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand your comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, verifyIsSuperSet() only checks if the configured directories are a superset of the directories in the previous cookie.

verifyAndGetMissingDirs(masterCookie,
allowExpansion, allLedgerDirs);
missedCookieDirs.addAll(ledgerResult.getLeft());
existingCookies.addAll(ledgerResult.getRight());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it has old cookie? that is not related to the current bookie address? Don't you have all ledger directories in the rmCookie?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if it has old cookie, it would fail the verification.
  • if it is not related to current bookie address, it would fail the verfication.
  • you don't have all the ledger dirs in the rmCookie, because of the storage expansion feature. the masterCookie has all the ledger dirs.

verifyAndGetMissingDirs(masterCookie,
allowExpansion, journalDirectories);
missedCookieDirs.addAll(journalResult.getLeft());
existingCookies.addAll(journalResult.getRight());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just pass these two lists to the verifyAndGetMissingDirs() function so you can avoid creating lists inside the routine and adding here. Unnecessary GC.

Copy link
Member Author

@sijie sijie Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this is only done during startup. don't expect this will cause any gc problem
  • passing two lists isn't really a java practise.

List<File> missedCookieDirs = new ArrayList<>();
List<Cookie> existingCookies = Lists.newArrayList();
if (null != rmCookie) {
existingCookies .add(rmCookie.getValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before .add

if no rmCookie, can't you just avoid all these cookie parsings and just check if ledger/index/and journal dirs are empty or not? That case becomes really simple and less convoluted. Otherwise, you can jump into the cookie verification storage expansion etc etc. Just the code path becomes really simple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added space

List<File> missingDirs = Lists.newArrayList();
List<Cookie> existedCookies = Lists.newArrayList();
for (File dir : dirs) {
checkDirectoryStructure(dir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this at the top.

List<File> ledgerDirs = ledgerDirsManager.getAllLedgerDirs();
checkIfDirsOnSameDiskPartition(ledgerDirs);
List<File> indexDirs = indexDirsManager.getAllLedgerDirs();
checkIfDirsOnSameDiskPartition(indexDirs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply call checkIfDirsOnSameDiskPartition(allLedgerDirs)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do that. I didn't touch this to keep the original logic unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to check (allLedgerDirs)

@sijie
Copy link
Member Author

sijie commented Nov 22, 2017

@jvrao I address some of you comments and replied all your comments. please take a look and let me know your opinions.

@sijie sijie modified the milestones: 4.7.0, 4.6.0 Nov 22, 2017
@sijie
Copy link
Member Author

sijie commented Nov 22, 2017

@jvrao ping?

1 similar comment
@sijie
Copy link
Member Author

sijie commented Nov 26, 2017

@jvrao ping?

@jvrao
Copy link
Contributor

jvrao commented Nov 27, 2017

I made my comments; your changes look generally fine; if you believe you addressed comments its ok with me.

@sijie
Copy link
Member Author

sijie commented Nov 30, 2017

it seems the merge script hits this same issue as #769 here

add "lgtm +1" to make merge script work.

@sijie sijie closed this in 819443c Nov 30, 2017
sijie added a commit to sijie/bookkeeper that referenced this pull request Nov 30, 2017
… to signify a new bookie

Descriptions of the changes in this PR:

- read cookie from registration manager as the source-of-truth (for all potential options)
- remove duplicated code between checkEnvironment and checkEnvironmentWithStorageExpansion

Author: Sijie Guo <sijie@apache.org>
Author: Ivan Kelly <ivank@apache.org>

Reviewers: Ivan Kelly <ivank@apache.org>, Enrico Olivelli <eolivelli@gmail.com>, Venkateswararao Jujjuri (JV) <None>

This closes apache#712 from sijie/fix_cookie_issue, closes apache#544
@sijie sijie deleted the fix_cookie_issue branch July 16, 2018 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants