Open should run recovery by default #774

ivankelly · 2017-11-23T18:58:23Z

This is how it has always been and is the safer option. Non-recovery
open is for tailing, which isn't the primary usecase for open. The
primary usecase is to recover state before writing new state, and this
requires that we fence and recover the ledger.

This is how it has always been and is the safer option. Non-recovery open is for tailing, which isn't the primary usecase for open. The primary usecase is to recover state before writing new state, and this requires that we fence and recover the ledger.

eolivelli · 2017-11-23T19:00:37Z

Initially I did do, but @sijie wanted this way.
For me are ok both modes

eolivelli

I think the tests will break. Did you run the suite?

sijie · 2017-11-23T19:06:41Z

yes we intentionally made no recovery as default. because that is what people would typically expect for a "single-writer, multiple-readers" semantic. If people want to open and recovery a ledger, withRecovery(true) should be used for people aware of what exactly happen regarding "recovery".

This is a new API we are adding for bookkeeper. so it doesn't need to carry the old behavior. so I am -1 on reverting the default back to true.

ivankelly · 2017-11-23T21:15:40Z

@sijie I don't consider defaulting to false to be safe behaviour. People will migrate to the new api from the old API, or at least this is what we hope will happen. If they see a openLedger, there'll be nothing to prompt them to add the withRecovery(true). When they see an openLedgerNoRecovery(), there will be a hint to put withRecovery(false). If they get it wrong like this, they will not get any notification until their data is corrupted.

This is how I came across this. I was updating the tutorial and I replaced the openLedger* calls. Everything compiled and ran fine, and even when the leader changed, everything "looked" ok. It was only when I noticed that the entries on the difference instances differed that I realized that something was up, and it took me a while to figure out it was the defaults.

For someone who doesn't know the system as well, this will be worse. Data will be corrupted, and they won't even notice and they'll not know where to look.

If recovery defaults to true, there's no chance of corruption by omission. If someone uses recovery where they meant to not use recovery, their application will crash, but the data will be safe. So, in my view, recovery = true is a much safer default.

@sijie

ivankelly · 2017-11-23T21:16:15Z

Also, tests probably fail with this, I didn't run them before pushing. Will do so tomorrow.

merlimat · 2017-11-24T17:45:43Z

So, in my view, recovery = true is a much safer default.

+1. I'd also prefer to keep that as a default.

ivankelly · 2017-11-24T18:11:43Z

@eolivelli This seems to have passed tests fine.

sijie · 2017-11-24T19:11:48Z

I have a different view on this. There are two sets of different people, one is existing users that would migrate from old api to new api, the other one is new users that would learn to use bookkeeper.

for existing users, they already knew openLedger and openLedgerNoRecovery. when they migrate to the new api, they will find out there is only one open ledger builder in the new api, they will have to figure out how does that map to openLedger and openLedgerRecovery in the new api. The javadoc will tell you the behavior : http://bookkeeper.apache.org/docs/latest/api/javadoc/org/apache/bookkeeper/client/api/OpenBuilder.html
I doubt you check the javadoc before doing the migration.

for new users, it is more about how bookkeeper would educate people to do.

This case is different from password. password is required in old api for both createLedger, openLedger and openLedgerNoRecovery. The default behavior is you need password, which we should stick to default behavior. This case is two methods openLedger and openLedgerNoRecovery are merged into one builder. There is no 'default' behavior of existing api. It is more about what bookkeeper would educate people about: what is bookkeeper, how bookkeeper works and how to use it. do you want new users to learn openLedgerNoRecovery or openLedger?

I would love the users to learn "openLedgerNoRecovery", which is a more natural for a user to learn "multiple-readers" system. It is more close to more people would expect from a storage system, a filesystem, a streaming and messaging system, and ...

The soul of bookkeeper is the LAC protocol. The existing API undercovered the power of this for a long time - it educated people in a horrible way: you need to close a ledger to read the ledger, you need to close a ledger to get the last entry id, and blah blah. when a lot of users come to bookkeeper, they didn't realize the power of bookkeeper, saw a very limited api and system, then walked away. The way how bookkeeper was used - hdfs namenode, hedwig also didn't fully leverage the power of lac protocol. All these were due to the way how the old API educates people.

The real power of LAC goes into "tailing" and "streaming", which none of existing systems leverage that except distributedlog. More use cases come from a useful, meaningful "multiple-readers" semantic. We have to break the way how old API educates people. We should bring in an API that lead users to understand LAC, realize "multiple-readers" semantic, realize "tailing" and "streaming". Otherwise, bookkeeper usage will become very limited. No users will come to bookkeeper.

so personally I would prefer making "no recovery" as the default value. but if you guys have strong opinions on making "recovery" as default value, I would not block this. so I am -0 here.

ivankelly · 2017-11-24T19:45:25Z

@sijie I'm strongly -1 on having an openLedgerOp default to no recovery, as it puts data in peril. However, I would be fine if we split openLedgerOp into tailLedgerOp and recoverLedgerOp so the difference is explicit. How does this sound?

Regarding LAC being the soul of the protocol, I take a different view :). Fencing and recovery is the core of the protocol, it's the final piece that gives us the ability to offer TOAB. LAC is an optimization to avoid reading from the start during recovery. That said, it's a very nice optimization that allows a bunch of usecases.

ivankelly · 2017-11-24T19:45:53Z

btw, tests all pass eventually. Loads of the not enough bookies issues, but I have another patch out for that.

sijie · 2017-11-24T20:18:25Z

@ivankelly

as I said in my previous comment, I am -0 for this. that means I wouldn't block the change making recovery as default. I don't like splitting the builder into two.

the different views on LAC exactly reflects the views on how bookkeeper would look like and can be used for. LAC is not an optimization (it might be an optimization when you guys initially developed it for hdfs namenode). However, LAC is the most beautifully thing when you comes from streaming and tailing world. And it is the key differentiator how bookkeeper is different from a traditional storage system like filesystem, blobstore. LAC is the mechanism for propagating how your length of a stream grow and provides the consistency boundary for simple repeatable read consistency. while fencing and recovery can't. fencing and recovery are the mechanism for preventing split brain, which is more on writer side.

Many other storage systems have fencing and recovery mechanism. It is just a way to prevent split brain. They might do it in many different ways. However in a bytes-orient storage system, like blob store, filesystem, you can't have LAC. LAC is unique to bookkeeper, because it is record-orient storage system. LAC makes bookkeeper so unique for streaming and tailing. That is something I would like us to emphasize and educate people to use bookkeeper.

The reason why do you think fencing and recovery is cool in bookkeeper, is exactly because the way how you use bookkeeper. Most of the use cases like hdfs namenode, hedwig, pulsar are only use bookkeeper for writes. You need fencing and recovery to prevent split brain when ownership or leadership change. But this category of use cases is so limited.

in distributedlog, at Twitter, we use bookkeeper in a lot of streaming and tailing use cases, where LAC is the soul. It guarantees TOAB and simple repeatable consistency while you do streaming and tailing. That's where bookkeeper is so unique from other storage systems. If we want bookkeeper to grow into a large community, we have to educate people in a way that people will realize this uniqueness.

ivankelly · 2017-11-24T22:03:02Z

Regarding the split, if it was done, I would propose they use the same builder. So the interface would look like:

OpenBuilder newTailLedgerOp()
OpenBuilder newRecoverLedgerOp()

newTailLedgerOp would set recovery to false, and new newRecoverLedgerOp would set it to true. So the changes would only be in BookKeeper for this.

Re: philosophy, I agree, the LAC stuff is awesome. The TOAB stuff has been implements by many other systems, it's what bookkeeper exposes on top that makes it unique. But still, without that TOAB stuff, it wouldn't mean much.

sijie · 2017-11-25T00:16:08Z

I don't think we should go down with split. It doesn't make any sense. The whole idea of a fluent style API is to avoid such split. Coming from storage background, it is more natural to have there in mind: create to get a handle to write, open to get a handle to read. Recovery or no recovery is the flag provided to open, indicating what will be the open behavior. Ideally this should be done with read flags, recovery is the flag of O_FENCE or O_RECOVERY. It should be as easy and natural as how people use a filesystem: open a file to read, open doesn't have any impacts to writers unless a O_FENCE flag is explicitly provided. from a storage system perspective, I would prefer a no recovery behavior as default. People shouldn't think too much about open implicitly fence and close a ledger. But again, I wouldn't block this PR, if other people want to keep the default behavior. Lastly, if we want BookKeeper to grow beyond a logging system to become a storage, we need to think and design a API that people have common sense on a storage system.

sijie · 2017-11-25T01:09:05Z

Regarding, TOAB. LAC is the key to achieve TOAB, and make bookkeeper meaningful in multiple reader use cases. It is not a sidecar thing. Without LAC, you only can read entries after closing a ledger. The whole bookkeeper can only be used in single writer and single reader use cases. We have different views on LAC, is just because of the way how we have been used bookkeeper. In HDFS name mode, hedwig, pulsar, even pravega, bookkeeper is only used in single writer and single reader use cases, in those cases consistency on writer is important, fencing and recovery is the key. In distributed log, we use bk for streaming, multiple reader use cases, LAC is the key. Both of them are needed at some sense and key to the success of TOAB. There is no sense for the discussion to continue, I would stop here.

ivankelly · 2017-11-26T21:51:56Z

@eolivelli @jvrao Could we get some more opinions on this? In summary, we're discussing whether newOpenLedgerOp should default to doing recovery or not. @merlimat and I think yes, @sijie thinks no.

My argument is that defaulting to recovery is safer, as someone will not accidently think they've seen all entries from a ledger.
@sijie 's argument is that no recovery makes the tailing usecase more natural, and will encourage people into using it. (@sijie please expand or correct if this is misrepresentative).

sijie · 2017-11-27T01:33:13Z

the summary of my opinion on this :

from a storage system perspective, open should not have any implicit impacts on the current writer. if open wants to have a side effort (e.g. fence and recovery), it should be called out explicitly either by withRecovery(true) or withOpenFlags(O_FENCE).
from user education/adoption awareness, a multiple readers semantic has more use cases that a single reader semantic. from this sense, we should encourage tailing and have recovery default to false.
the original api was designed for logging, openLedgerNoRecovery was added after openLedger. if we want to grow beyond logging into a storage system, we should have openLedger with no recovery as default. It is what most of the people would expect from a storage system.
regarding safer point, if a user already used bookkeeper and he/she knew bookkeeper, knew the difference between openLedger and openLedgerNoRecovery, when he migrates to the new API, they will find a way mapping openLedger and openLedgerNoRecovery into the new builder API. Javadoc, documentation, and release notes will provide the guide for existing users. It is not a problem for existing users. We should develop the storage mindshare for people who is new to bookkeeper, where open ledger doesn't implicit close current writer is making more sense as a storage.

sijie · 2017-11-27T01:33:48Z

@jvrao @jiazhai and other people are welcome to chime in.

jiazhai

+1 to have recovery = false as default. From a new user's view, when do open, it seems better to have no other side effect.

jvrao · 2017-11-27T07:04:11Z

Thanks for the ping. It is an API change and it is changing old behavior.
So users need to be very careful in moving to new API. So lets take a case/scenario

non-recovery: default
If the caller doesn't know what they are doing, the ledger may never get sealed.
What is the down side of that? Replicator will seal it if it needs to, but until then last
fragment may stay under replicatated.
Recovery: Default
If the caller doesn't know what they are doing, writer may get an error.

@ivankelly Am I missing anything on the down side of non-recover default?
While I like to keep the old behavior, it is natural understanding that the open for read
is generally non invasive.

ivankelly · 2017-11-27T10:29:56Z

@jvrao

non-recovery: default
If the caller doesn't know what they are doing, the ledger may never get sealed.
What is the down side of that? Replicator will seal it if it needs to, but until then last
fragment may stay under replicatated.

@ivankelly Am I missing anything on the down side of non-recover default?

Now consider the logging/streaming/message case. What will happen if the caller doesn't know what they're doing. A writer fails so another writer wants to take over. They do a non-recovery open on the last ledger in the stream, read all the entries, and then start writing to the next ledger in the stream. There'll be a couple of issues here.

They will lose messages, because they will not see the last entries in previous ledger.
Anyone tailing the log will not move to the new ledger as the previous ledger will not be closed.
If the previous writer didn't really fail, and comes back, there's nothing to stop it writing more messages to the ledger.

And none of these problems will be immediately obvious until they've happened. By this point data will be lost/corrupted.

Contrast this to how it will work if recovery is default. If a user uses this incorrectly, their system will break early and loudly, which is much better IMO.

eolivelli · 2017-11-27T13:37:56Z

I think that the core of BK is fencing/recovery + LAC, they are pretty unique and amazing features.
When you are using BK you know what you are doing, you use BK as a building block for complex distributed systems, so I think that this default is not so important.

From a newbie perspective I think that recovery should be disabled by default, it is odd that you are opening for "read" a stream and you break the writer.

I think that old users that are switching to the new API already know openLedger vs openLedgerNoRecovery and they will read carefully the docs and they have tests cases which will break if fencing is not working as they expected, so the migration is not a real deal for me.

Overall I prefer recovery = false by default.

If you are scared of this, we could require the client to always declare the intent without any default.

For instance

withRole(Role.READER)

-> recovery = NO

and

withRole(Role.RECOVERER)

-> recovery = YES

ivankelly · 2017-11-27T21:11:34Z

@eolivelli
"When you are using BK you know what you are doing" & "they will read carefully the docs" are both counter to being easy to use. Personally, I consider that I know what I'm doing in this area, and I got it wrong when updating the tutorial, and I only spotted the mistake by pure chance. As for reading the docs, I generally only do that when something is non-obvious. createOpenLedgerOp looks easy to use and its usage seems obvious, so why would people carefully read the docs for it?

Re: no default, this is effectively the same as splitting into too calls.

eolivelli · 2017-11-27T21:29:35Z

@ivankelly when you are migrating your App you will have test cases I hope.
I think that 'easy to use' depends on the case. If you are writing a replicated state machine okay it is better to open and fence.
If you are doing simple reads from a storage fencing will break things. Ok that you will see it very soon.
If you need 'fencing' you are designing a complex system so you looked for something like BK and you studies the docs

jvrao · 2017-11-27T21:41:39Z

A writer fails so another writer wants to take over. They do a non-recovery open on the last ledger
in the stream, read all the entries, and then start writing to the next ledger in the stream.

This is not a default case, this is a special case where the new writer needs to know it is fencing.

ivankelly · 2017-11-27T21:52:53Z

This is not a default case, this is a special case where the new writer needs to know it is fencing.

This is the only case that must happen in all applications that use BK. And the requirement for fencing is not obvious. BK didn't even having fencing for the first 2 years of it's existence (and therefore, it was broken).

To a user, it's not unreasonable to think that if you open a ledger, you should be able to read everything that has been written to that ledger, but that is not the case if you do not recover. That's why I also suggested removing the "open" call completely, so a user is explicitly either "tailing" or "recovering".

ivankelly · 2017-11-27T21:57:24Z

when you are migrating your App you will have test cases I hope

This kind of error could easily be missed by tests, especially if explicitLAC is enabled. You assume a certain correctness from you're underlying systems. In BK for example, we don't have tests to check if ZK is getting splitbrained.

If you need 'fencing' you are designing a complex system so you looked for something like BK and you studies the docs

Again, not obvious to everyone that fencing is needed and they may have come to BK for any number of different reasons. Some users may study the docs before putting any code down. Others will skim the docs, write some code, get something working, and build on top of that, only referring to docs when something doesn't match their mental model and assumptions.

sijie · 2017-11-27T23:54:34Z

@ivankelly I think this problem here is more about an assumption problem. All the points you made here assume one thing - bookkeeper can only used in the use case you described: people open a ledger is to fence and close the writer. Also you have this assumption in your mind is more because you added this feature before. This assumption was kept in your mind for so long. If we are growing bookkeeper beyond this use case, we should look beyond this assumption and learn what is the most common behavior for opening and what is more neutral behavior when people use bookkeeper.

sijie · 2017-11-28T00:15:26Z

"Personally, I consider that I know what I'm doing in this area, and I got it wrong when updating the tutorial, and I only spotted the mistake by pure chance. As for reading the docs, I generally only do that when something is non-obvious. createOpenLedgerOp looks easy to use and its usage seems obvious, so why would people carefully read the docs for it?"

@ivankelly "why would people carefully read the docs for it?" - Because you built the original API. In your mind, "open a ledger" is fencing and close the ledger. It doesn't mean people will have same assumption that "open a ledger" will fence and close the ledger. In contrast, if you ask most of the people who doesn't work with bookkeeper before, what are their expectations of a open for read behavior. I believe most of people will assume open for read generally has no side effects to the ledger or the writer. A default behavior for open should be as general as other storage system, no side effects (you can open as many time as you can without impacting others), easy to reason about without reading any documentation. open-to-close behavior is a special case for general open, it is used when a leader (writer) failed, it wants to fence the previous ledger and open a new one. because open-to-close is a special case, it should be called out explicitly.

ivankelly · 2017-11-28T09:06:36Z

Looks like I'm fighting a losing battle here. In summary, the opinions are

default	recovery true	recovery false
@ivankelly	X
@merlimat	X
@sijie		X
@eolivelli		X
@jvrao		X
@jiazhai		X

I think this problem here is more about an assumption problem. All the points you made here assume one thing - bookkeeper can only used in the use case you described

It's not about assumptions. Its about bookkeeper allowing a usage, which with a minor mistake, messes up your data. My assumptions lead me to the minor mistake, but the mistake had to be easy to make in the first place.

Anyhow, looks like I'm in the minority with this opinion, so I'll leave this PR open for 24 hours, and if noone changes their mind I will close.

ivankelly · 2017-11-28T09:12:00Z

@sijie @eolivelli @jvrao @merlimat

Another thing just occurred to me. Why don't we remove recovery open completely and replace it with a forceClosed() call on ReadHandle?

The normal operation when recovering from a write crash is to open all previous ledgers, and then when you get to the last one, do a recovery open on it and make sure you've read to the end. Instead of this, we could open all ledgers the same way, and on the last one, when we get to the end and are notified that we should become the writer, do a forceClosed() and then read to the end.

It makes the closure of the ledger more explicit, makes it clear that you are going to have side effects, and removes the open as somewhere that you can make mistakes.

eolivelli · 2017-11-28T09:23:21Z

@ivankelly the new proposal of forceClose() sounds interesting to me. It is more explicit

sijie · 2017-11-28T20:20:15Z

I am not sure adding #forceClose to ReadHandle is a good idea. IMO ReadHandle should be no side-effects. what does #forceClose mean to a readhandle open for a ledger that is already closed. I think the fence/recovery behavior is better to be part of open action, rather than part of read handle.

sijie · 2017-11-28T20:21:44Z

@ivankelly : I think @jiazhai also put down his opinion as well (just to make sure everyone that express the opinion in this thread is not missed)

ivankelly · 2017-11-28T20:51:59Z

what does #forceClose mean to a readhandle open for a ledger that is already closed.
it's a noop.

I think the fence/recovery behavior is better to be part of open action, rather than part of read handle.
Same arguments for no side effects can be made for open as for read handle. Side effects don't fit naturally into either, but there needs to be some side effect somewhere to fence and close ledgers. In fact, it shouldn't be a side effect, but a primary effect, which is why it should be an explicit call, like #forceClose suggested here, or a separate call as suggested earlier (#newSealAndOpenLedgerOp?).

Updated the votes table.

ivankelly · 2017-12-01T09:24:23Z

Closing this PR. Related discussion continues on #795

… build profile Author: Norbert Kalmar <nkalmar@yahoo.com> Reviewers: andor@apache.org Closes apache#774 from nkalmar/ZK-3122

ivankelly self-assigned this Nov 23, 2017

ivankelly requested review from merlimat, sijie and eolivelli November 23, 2017 18:58

eolivelli requested changes Nov 23, 2017

View reviewed changes

ivankelly requested a review from jvrao November 26, 2017 21:52

sijie requested a review from jiazhai November 27, 2017 01:34

jiazhai reviewed Nov 27, 2017

View reviewed changes

ivankelly mentioned this pull request Nov 28, 2017

Add isSealed() to ReadHandle #765

Closed

sijie mentioned this pull request Nov 30, 2017

BP-22: Separate closing ledgers from opening ledgers #795

Merged

ivankelly closed this Dec 1, 2017

Open should run recovery by default #774

Open should run recovery by default #774

Conversation

ivankelly commented Nov 23, 2017 • edited

eolivelli commented Nov 23, 2017

eolivelli left a comment

Choose a reason for hiding this comment

sijie commented Nov 23, 2017 • edited

ivankelly commented Nov 23, 2017

ivankelly commented Nov 23, 2017

merlimat commented Nov 24, 2017

ivankelly commented Nov 24, 2017

sijie commented Nov 24, 2017

ivankelly commented Nov 24, 2017

ivankelly commented Nov 24, 2017

sijie commented Nov 24, 2017

ivankelly commented Nov 24, 2017

sijie commented Nov 25, 2017

sijie commented Nov 25, 2017

ivankelly commented Nov 26, 2017

sijie commented Nov 27, 2017

sijie commented Nov 27, 2017

jiazhai left a comment

Choose a reason for hiding this comment

jvrao commented Nov 27, 2017

ivankelly commented Nov 27, 2017 • edited by jvrao

eolivelli commented Nov 27, 2017

ivankelly commented Nov 27, 2017

eolivelli commented Nov 27, 2017

jvrao commented Nov 27, 2017 • edited

ivankelly commented Nov 27, 2017

ivankelly commented Nov 27, 2017

sijie commented Nov 27, 2017

sijie commented Nov 28, 2017

ivankelly commented Nov 28, 2017 • edited

ivankelly commented Nov 28, 2017

eolivelli commented Nov 28, 2017

sijie commented Nov 28, 2017

sijie commented Nov 28, 2017

ivankelly commented Nov 28, 2017

ivankelly commented Dec 1, 2017

ivankelly commented Nov 23, 2017 •

edited

sijie commented Nov 23, 2017 •

edited

ivankelly commented Nov 27, 2017 •

edited by jvrao

jvrao commented Nov 27, 2017 •

edited

ivankelly commented Nov 28, 2017 •

edited