New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open should run recovery by default #774
Conversation
This is how it has always been and is the safer option. Non-recovery open is for tailing, which isn't the primary usecase for open. The primary usecase is to recover state before writing new state, and this requires that we fence and recover the ledger.
Initially I did do, but @sijie wanted this way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests will break. Did you run the suite?
yes we intentionally made no recovery as default. because that is what people would typically expect for a "single-writer, multiple-readers" semantic. If people want to open and recovery a ledger, This is a new API we are adding for bookkeeper. so it doesn't need to carry the old behavior. so I am -1 on reverting the default back to true. |
@sijie I don't consider defaulting to false to be safe behaviour. People will migrate to the new api from the old API, or at least this is what we hope will happen. If they see a openLedger, there'll be nothing to prompt them to add the withRecovery(true). When they see an openLedgerNoRecovery(), there will be a hint to put withRecovery(false). If they get it wrong like this, they will not get any notification until their data is corrupted. This is how I came across this. I was updating the tutorial and I replaced the openLedger* calls. Everything compiled and ran fine, and even when the leader changed, everything "looked" ok. It was only when I noticed that the entries on the difference instances differed that I realized that something was up, and it took me a while to figure out it was the defaults. For someone who doesn't know the system as well, this will be worse. Data will be corrupted, and they won't even notice and they'll not know where to look. If recovery defaults to true, there's no chance of corruption by omission. If someone uses recovery where they meant to not use recovery, their application will crash, but the data will be safe. So, in my view, recovery = true is a much safer default. |
Also, tests probably fail with this, I didn't run them before pushing. Will do so tomorrow. |
+1. I'd also prefer to keep that as a default. |
@eolivelli This seems to have passed tests fine. |
I have a different view on this. There are two sets of different people, one is existing users that would migrate from old api to new api, the other one is new users that would learn to use bookkeeper. for existing users, they already knew openLedger and openLedgerNoRecovery. when they migrate to the new api, they will find out there is only one open ledger builder in the new api, they will have to figure out how does that map to openLedger and openLedgerRecovery in the new api. The javadoc will tell you the behavior : http://bookkeeper.apache.org/docs/latest/api/javadoc/org/apache/bookkeeper/client/api/OpenBuilder.html for new users, it is more about how bookkeeper would educate people to do. This case is different from I would love the users to learn "openLedgerNoRecovery", which is a more natural for a user to learn "multiple-readers" system. It is more close to more people would expect from a storage system, a filesystem, a streaming and messaging system, and ... The soul of bookkeeper is the LAC protocol. The existing API undercovered the power of this for a long time - it educated people in a horrible way: you need to close a ledger to read the ledger, you need to close a ledger to get the last entry id, and blah blah. when a lot of users come to bookkeeper, they didn't realize the power of bookkeeper, saw a very limited api and system, then walked away. The way how bookkeeper was used - hdfs namenode, hedwig also didn't fully leverage the power of lac protocol. All these were due to the way how the old API educates people. The real power of LAC goes into "tailing" and "streaming", which none of existing systems leverage that except distributedlog. More use cases come from a useful, meaningful "multiple-readers" semantic. We have to break the way how old API educates people. We should bring in an API that lead users to understand LAC, realize "multiple-readers" semantic, realize "tailing" and "streaming". Otherwise, bookkeeper usage will become very limited. No users will come to bookkeeper. so personally I would prefer making "no recovery" as the default value. but if you guys have strong opinions on making "recovery" as default value, I would not block this. so I am -0 here. |
@sijie I'm strongly -1 on having an openLedgerOp default to no recovery, as it puts data in peril. However, I would be fine if we split openLedgerOp into tailLedgerOp and recoverLedgerOp so the difference is explicit. How does this sound? Regarding LAC being the soul of the protocol, I take a different view :). Fencing and recovery is the core of the protocol, it's the final piece that gives us the ability to offer TOAB. LAC is an optimization to avoid reading from the start during recovery. That said, it's a very nice optimization that allows a bunch of usecases. |
btw, tests all pass eventually. Loads of the not enough bookies issues, but I have another patch out for that. |
as I said in my previous comment, I am -0 for this. that means I wouldn't block the change making recovery as default. I don't like splitting the builder into two. the different views on LAC exactly reflects the views on how bookkeeper would look like and can be used for. LAC is not an optimization (it might be an optimization when you guys initially developed it for hdfs namenode). However, LAC is the most beautifully thing when you comes from streaming and tailing world. And it is the key differentiator how bookkeeper is different from a traditional storage system like filesystem, blobstore. LAC is the mechanism for propagating how your length of a stream grow and provides the consistency boundary for simple repeatable read consistency. while fencing and recovery can't. fencing and recovery are the mechanism for preventing split brain, which is more on writer side. Many other storage systems have fencing and recovery mechanism. It is just a way to prevent split brain. They might do it in many different ways. However in a bytes-orient storage system, like blob store, filesystem, you can't have LAC. LAC is unique to bookkeeper, because it is record-orient storage system. LAC makes bookkeeper so unique for streaming and tailing. That is something I would like us to emphasize and educate people to use bookkeeper. The reason why do you think fencing and recovery is cool in bookkeeper, is exactly because the way how you use bookkeeper. Most of the use cases like hdfs namenode, hedwig, pulsar are only use bookkeeper for writes. You need fencing and recovery to prevent split brain when ownership or leadership change. But this category of use cases is so limited. in distributedlog, at Twitter, we use bookkeeper in a lot of streaming and tailing use cases, where LAC is the soul. It guarantees TOAB and simple repeatable consistency while you do streaming and tailing. That's where bookkeeper is so unique from other storage systems. If we want bookkeeper to grow into a large community, we have to educate people in a way that people will realize this uniqueness. |
Regarding the split, if it was done, I would propose they use the same builder. So the interface would look like:
newTailLedgerOp would set recovery to false, and new newRecoverLedgerOp would set it to true. So the changes would only be in BookKeeper for this. Re: philosophy, I agree, the LAC stuff is awesome. The TOAB stuff has been implements by many other systems, it's what bookkeeper exposes on top that makes it unique. But still, without that TOAB stuff, it wouldn't mean much. |
I don't think we should go down with split. It doesn't make any sense. The whole idea of a fluent style API is to avoid such split. Coming from storage background, it is more natural to have there in mind: create to get a handle to write, open to get a handle to read. Recovery or no recovery is the flag provided to open, indicating what will be the open behavior. Ideally this should be done with read flags, recovery is the flag of O_FENCE or O_RECOVERY. It should be as easy and natural as how people use a filesystem: open a file to read, open doesn't have any impacts to writers unless a O_FENCE flag is explicitly provided. from a storage system perspective, I would prefer a no recovery behavior as default. People shouldn't think too much about open implicitly fence and close a ledger. But again, I wouldn't block this PR, if other people want to keep the default behavior. Lastly, if we want BookKeeper to grow beyond a logging system to become a storage, we need to think and design a API that people have common sense on a storage system. |
Regarding, TOAB. LAC is the key to achieve TOAB, and make bookkeeper meaningful in multiple reader use cases. It is not a sidecar thing. Without LAC, you only can read entries after closing a ledger. The whole bookkeeper can only be used in single writer and single reader use cases. We have different views on LAC, is just because of the way how we have been used bookkeeper. In HDFS name mode, hedwig, pulsar, even pravega, bookkeeper is only used in single writer and single reader use cases, in those cases consistency on writer is important, fencing and recovery is the key. In distributed log, we use bk for streaming, multiple reader use cases, LAC is the key. Both of them are needed at some sense and key to the success of TOAB. There is no sense for the discussion to continue, I would stop here. |
@eolivelli @jvrao Could we get some more opinions on this? In summary, we're discussing whether newOpenLedgerOp should default to doing recovery or not. @merlimat and I think yes, @sijie thinks no. My argument is that defaulting to recovery is safer, as someone will not accidently think they've seen all entries from a ledger. |
the summary of my opinion on this :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to have recovery = false as default. From a new user's view, when do open, it seems better to have no other side effect.
Thanks for the ping. It is an API change and it is changing old behavior.
@ivankelly Am I missing anything on the down side of non-recover default? |
Now consider the logging/streaming/message case. What will happen if the caller doesn't know what they're doing. A writer fails so another writer wants to take over. They do a non-recovery open on the last ledger in the stream, read all the entries, and then start writing to the next ledger in the stream. There'll be a couple of issues here.
And none of these problems will be immediately obvious until they've happened. By this point data will be lost/corrupted. Contrast this to how it will work if recovery is default. If a user uses this incorrectly, their system will break early and loudly, which is much better IMO. |
I think that the core of BK is fencing/recovery + LAC, they are pretty unique and amazing features. From a newbie perspective I think that recovery should be disabled by default, it is odd that you are opening for "read" a stream and you break the writer. I think that old users that are switching to the new API already know openLedger vs openLedgerNoRecovery and they will read carefully the docs and they have tests cases which will break if fencing is not working as they expected, so the migration is not a real deal for me. Overall I prefer recovery = false by default. If you are scared of this, we could require the client to always declare the intent without any default. For instance
-> recovery = NO and
-> recovery = YES |
@eolivelli Re: no default, this is effectively the same as splitting into too calls. |
@ivankelly when you are migrating your App you will have test cases I hope. |
This is not a default case, this is a special case where the new writer needs to know it is fencing. |
This is the only case that must happen in all applications that use BK. And the requirement for fencing is not obvious. BK didn't even having fencing for the first 2 years of it's existence (and therefore, it was broken). To a user, it's not unreasonable to think that if you open a ledger, you should be able to read everything that has been written to that ledger, but that is not the case if you do not recover. That's why I also suggested removing the "open" call completely, so a user is explicitly either "tailing" or "recovering". |
This kind of error could easily be missed by tests, especially if explicitLAC is enabled. You assume a certain correctness from you're underlying systems. In BK for example, we don't have tests to check if ZK is getting splitbrained.
Again, not obvious to everyone that fencing is needed and they may have come to BK for any number of different reasons. Some users may study the docs before putting any code down. Others will skim the docs, write some code, get something working, and build on top of that, only referring to docs when something doesn't match their mental model and assumptions. |
@ivankelly I think this problem here is more about an assumption problem. All the points you made here assume one thing - bookkeeper can only used in the use case you described: people open a ledger is to fence and close the writer. Also you have this assumption in your mind is more because you added this feature before. This assumption was kept in your mind for so long. If we are growing bookkeeper beyond this use case, we should look beyond this assumption and learn what is the most common behavior for opening and what is more neutral behavior when people use bookkeeper. |
"Personally, I consider that I know what I'm doing in this area, and I got it wrong when updating the tutorial, and I only spotted the mistake by pure chance. As for reading the docs, I generally only do that when something is non-obvious. createOpenLedgerOp looks easy to use and its usage seems obvious, so why would people carefully read the docs for it?" @ivankelly "why would people carefully read the docs for it?" - Because you built the original API. In your mind, "open a ledger" is fencing and close the ledger. It doesn't mean people will have same assumption that "open a ledger" will fence and close the ledger. In contrast, if you ask most of the people who doesn't work with bookkeeper before, what are their expectations of a |
Looks like I'm fighting a losing battle here. In summary, the opinions are
It's not about assumptions. Its about bookkeeper allowing a usage, which with a minor mistake, messes up your data. My assumptions lead me to the minor mistake, but the mistake had to be easy to make in the first place. Anyhow, looks like I'm in the minority with this opinion, so I'll leave this PR open for 24 hours, and if noone changes their mind I will close. |
@sijie @eolivelli @jvrao @merlimat Another thing just occurred to me. Why don't we remove recovery open completely and replace it with a forceClosed() call on ReadHandle? The normal operation when recovering from a write crash is to open all previous ledgers, and then when you get to the last one, do a recovery open on it and make sure you've read to the end. Instead of this, we could open all ledgers the same way, and on the last one, when we get to the end and are notified that we should become the writer, do a forceClosed() and then read to the end. It makes the closure of the ledger more explicit, makes it clear that you are going to have side effects, and removes the open as somewhere that you can make mistakes. |
@ivankelly the new proposal of forceClose() sounds interesting to me. It is more explicit |
I am not sure adding #forceClose to |
@ivankelly : I think @jiazhai also put down his opinion as well (just to make sure everyone that express the opinion in this thread is not missed) |
Updated the votes table. |
Closing this PR. Related discussion continues on #795 |
… build profile Author: Norbert Kalmar <nkalmar@yahoo.com> Reviewers: andor@apache.org Closes apache#774 from nkalmar/ZK-3122
This is how it has always been and is the safer option. Non-recovery
open is for tailing, which isn't the primary usecase for open. The
primary usecase is to recover state before writing new state, and this
requires that we fence and recover the ledger.