-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE #1527: Make ExplicitLAC persistent #1532
Conversation
For persisting explicitLAC, we can follow the same approach as followed in persisting fencing information / stateBits (d69986c) in FileInfo file and special marker entry in Journal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me.
We must add extensive testing of all moving parts.
We already have tests around calls from bookie to journal and around journal internals.
I feel we should have a minimal test which combines ExplicitLAC with every ledgerstorage, in particular DbLedgerManager which is not in use as SF.
We are treating Explicit LAC as a regular addentry, we have to implement/change stats in journal, otherwise they will appear as writes.
A thought: ExplicitLAC touches LedgerStorage, do we have to handle 'fencing', an option is to check flag and report it to the writer, so that the writer will fail?
I have thought about this and I think this is not needed because the writer will eventually fail at next write, and Explicit LAC as no way to report a failure to the writer.
@@ -1182,13 +1201,24 @@ public void recoveryAddEntry(ByteBuf entry, WriteCallback cb, Object ctx, byte[] | |||
} | |||
} | |||
|
|||
public void setExplicitLac(ByteBuf entry, Object ctx, byte[] masterKey) | |||
static ByteBuf createExplicitLACEntry(long ledgerId, ByteBuf explicitLac) { | |||
ByteBuf bb = Unpooled.buffer(8 + 8 + 4 + explicitLac.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use a pooled bytebuf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm..i just followed createLedgerFenceEntry approach- https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/LedgerDescriptor.java#L56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could change that place as well.
Journal will call "release" on the ByteBuf after writing the entry to the file
@eolivelli Thanks for going through WIProgress CR. I would like to get feedback on the approach. Yes I need to add tests and I'm working on it. Regarding fenced/setexplicitlac concern you raised, if I understood you correctly you are suggesting that we should have similar check like https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java#L1235 But I don't think that this check is needed for set explicitLac call, because ExplicitLacFlush is periodic delayed operation - https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ExplicitLacFlushPolicy.java#L82 so by the time write handle pushes set explicitLac request (or set explicitLac requests might be in flight), ledger might have been recover opened (fenced), but it is legally ok for bookie to receive setExplicitLac call and set explicitLac accordingly. |
@reddycharan I agree there is no need to check fence flag. Let's move forward with current approch |
- Functional testcases for validating persisted explicitLAC - minor fixes
a348165
to
dda83cf
Compare
What about adding some test in this file, to narrow the scope of the test? This suite exercises and tests directly the interface between Bookie and Journal |
Could you also consider adding some test like these? I mean only unit tests around the 'Processor' for setExplicitLAC on bookie side. |
- added testcase for compatibility check with previous and new header versions of FileInfo.
Hey @eolivelli , considering the importance/sensitivity of this fix, I added tests at multiple levels -
So I guess, I added sufficient testing for this fix. |
rerun bookkeeper-server bookie tests |
} else { | ||
throw new IOException("Invalid journal. Contains explicitLAC " + " but layout version (" | ||
+ journalVersion + ") is too old to hold this"); | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the case of rollback? The software won't understand the special entryId and falls into this section right? Do we handle it right in handle.addentry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this journalVersion is the version of journalfile but not "JournalChannel.CURRENT_JOURNAL_FORMAT_VERSION" (current code version), we would get to this 'else' case if we found a journal older than V5 but it is having entry where entryId METAENTRY_ID_LEDGER_EXPLICITLAC, which is not possible and expected in any case. Hence we are throwing IOException here just like https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java#L783
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean any special entryIds are going to be treated as regular entryIds in this case. So my request is to create a mask to identify SPECIAL entries and filter them out. If there is anything special that the code doesn't know, flag an error and continue. I know this doesn't solve the previous release but future code changes should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on having a mask as what JV suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -206,9 +218,11 @@ public synchronized void readHeader() throws IOException { | |||
throw new IOException("Missing ledger signature while reading header for " + lf); | |||
} | |||
int version = bb.getInt(); | |||
if (version != HEADER_VERSION) { | |||
if (version > CURRENT_HEADER_VERSION) { | |||
throw new IOException("Incompatible ledger version " + version + " while reading header for " + lf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this prevents software rollback period. isn't it? Should we consider coming up ? what are the downsides of that? yeah we may not read the explicitLAC other than that, is there anything else? I am worried if there are any situations where we make the cluster not usable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code change is fine. I spoke to @sijie and @eolivelli this morning. We needed a bigger gate to identify ondisk changes. I will open an issue for that.
@sijie and @eolivelli can you please complete the review. Its been waiting for quite sometime. |
For me is quite okay but I want to merge it only after cutting 4.8 branch. |
@@ -121,6 +123,7 @@ | |||
static final long METAENTRY_ID_LEDGER_KEY = -0x1000; | |||
static final long METAENTRY_ID_FENCE_KEY = -0x2000; | |||
public static final long METAENTRY_ID_FORCE_LEDGER = -0x4000; | |||
static final long METAENTRY_ID_LEDGER_EXPLICITLAC = -0x8000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to bump the journal version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, bumping journal version
@@ -784,6 +787,23 @@ public void process(int journalVersion, long offset, ByteBuffer recBuff) throws | |||
+ " but layout version (" + journalVersion | |||
+ ") is too old to hold this"); | |||
} | |||
} else if (entryId == METAENTRY_ID_LEDGER_EXPLICITLAC) { | |||
if (journalVersion >= JournalChannel.V5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be JournalChannel.V6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, bumped journal version
} else { | ||
throw new IOException("Invalid journal. Contains explicitLAC " + " but layout version (" | ||
+ journalVersion + ") is too old to hold this"); | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on having a mask as what JV suggested.
bb.putInt(masterKey.length); | ||
bb.put(masterKey); | ||
bb.putInt(stateBits); | ||
if (this.headerVersion >= V1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to make fileInfo
version configurable in ServerConfiguration, as how we handled in Journal.
so people can upgrade binary but still sets the write version to be v0. so the bookie can run with new binary but writing old version fileinfo. this allows rollback binary.
after people verify the binary work, then can change the write version to v1. at this point, it can still rollback to configuration to v0 and using the new binary. in this approach, it gives some options for rollback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, adding config for this
@reddycharan I made a few comments regarding journal version and fileinfo version. @eolivelli if those versions are configurable, it is okay to include this change as part of 4.8. and it is actually better, because you can have code ready in 4.8 to understand the new version, but allow the new version still functioning in old version. this would provide a smooth upgrade/rollback process for users. |
also please add an entry "4.7.x to 4.8.0 upgrade" in this "Upgrade" guide and document that there are new journal version and file info version introduced. http://bookkeeper.apache.org/docs/latest/admin/upgrade/#46x-to-470-upgrade |
- bumped version of journal - introduced fileInfoFormatVersionToWrite config - testcases to validate combination of fileInfoFormatVersionToWrite and journalFormatVersionToWrite
@sijie @jvrao addressed review comments. Can you please check the new iteration I sent. regarding updating http://bookkeeper.apache.org/docs/latest/admin/upgrade/#46x-to-470-upgrade, I think it would be better and convenient to do it in different PR. Anyhow nothing is written so far for "4.7.x to 4.8.0 upgrade". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
I left a nit about a test but nothing really important.
Great work @reddycharan and @sijie !
For me it is okay to have in merged now in 4.8.
Let's wait for green light from CI
* to persist explicitLac, journalFormatVersionToWrite should be atleast | ||
* V6 and fileInfoFormatVersionToWrite should be atleast V1 | ||
*/ | ||
baseConf.setJournalFormatVersionToWrite(6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reddycharan maybe it is better to use defined constants for these values (I mean '6' here and '1' below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I thought about this. But "JournalChannel" is not public class (which contains defined constants) and similarly "FileInfo" is not public either.
I guess for the same reason actual values are used in other places aswell - https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/test/java/org/apache/bookkeeper/conf/TestBKConfiguration.java#L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ok!
Thank you for your explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ok!
Thank you for your explanation
retest this please |
It seems to me that the only remaining concern from Sijie is to upgrade the guide. |
@eolivelli created issue for upgrade doc - #1565. Will follow up with that once this is closed. |
retest this please |
rerun bookie tests |
rerun bookkeeper-server bookie tests |
@jvrao since you left a few comments here before, do you want to review this again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks all for your feedback on this work item. Gladly, tests have passed finally. |
Descriptions of the changes in this PR: For persisting explicitLAC, we can follow the same approach as followed in persisting fencing information / stateBits (d69986c) in FileInfo file and special marker entry in Journal. ExplicitLAC is kept in the memory and it can be lost in bookie reboot. Though it is an extreme corner case scenario, it can break one of the BK guarantees. " If you read once you can always read it". If all the bookies of the Write stripe were rebooted, it can loose its explicitLAC value and the client which was able to read the entry before the reboot, can't read it anymore. Master Issue: apache#1527 Author: cguttapalem <cguttapalem@salesforce.com> Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>, Venkateswararao Jujjuri (JV) <None> This closes apache#1532 from reddycharan/storeexplicitlac, closes apache#1527
Descriptions of the changes in this PR:
For persisting explicitLAC, we can follow the
same approach as followed in persisting
fencing information / stateBits (d69986c)
in FileInfo file and special marker entry
in Journal.
Motivation
ExplicitLAC is kept in the memory and it can be lost in bookie reboot.
Though it is an extreme corner case scenario, it can break one of the BK guarantees.
" If you read once you can always read it".
If all the bookies of the Write stripe were rebooted, it can loose its explicitLAC value and the client
which was able to read the entry before the reboot, can't read it anymore.
Master Issue: #1527