-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-17: provide DataBlockHeader and implementation #1669
Conversation
@ivankelly please review this |
* Read out in format: | ||
* [ magic_word -- int ][ block_len -- int ][ first_entry_id -- long] | ||
*/ | ||
public InputStream toStream() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: @OverRide
} | ||
|
||
// Construct DataBlockHeader from InputStream | ||
public static DataBlockHeaderImpl fromStream(InputStream stream) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return DataBlockHeader, rather than the impl
/** | ||
* Get the size of this DataBlockHeader. | ||
*/ | ||
int getHeaderSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is header size useful to a user of the DataBlockHeader? I think this should return what getDataStartOffset is returning below. The contents of the header should be opaque to the user of the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Seems this is the key for some of the comments below. If user get the actual size, user could ignore the paddings, InputStream passed into 'fromStream', and returned by 'toStream' could ignore the paddings.
will change this to what you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think getHeaderSize() is needed anymore.
* Get Magic Word for data block. | ||
* It is a sequence of bytes used to identify the start of a block. | ||
*/ | ||
int getBlockMagicWord(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this useful externally? Can be static in any case.
DataInputStream dis = new DataInputStream(stream); | ||
int magic = dis.readInt(); | ||
checkState(magic == dataBlockMagicWord); | ||
return new DataBlockHeaderImpl(dis.readInt(), dis.readLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading the header from a stream, you want to move the input stream read cursor past the padding also. The user of data block header should not be aware that there is padding, and that the data only takes up part of the header. Instead, they should pass the turn a header into an input stream and get a stream that returns dataBlockHeaderAlign bytes. They should pass an input stream into this method, and this method should read dataBlockHeaderAlign bytes and return a DataBlockHeader object, even if only 4 of the bytes read are useful for building the DataBlockHeader object.
public static DataBlockHeaderImpl fromStream(InputStream stream) throws IOException { | ||
DataInputStream dis = new DataInputStream(stream); | ||
int magic = dis.readInt(); | ||
checkState(magic == dataBlockMagicWord); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkState throws an unchecked IllegalStateException. Since this is reading external data, we should throw an IOException if the magic number is wrong, or at least a checked exception, so that it can be handled more gracefully by the caller.
assertTrue(rebuild.getBlockLength() == headerLength); | ||
assertTrue(rebuild.getFirstEntryId() == firstEntryId); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Test what happens when you read a header that's too small
- Test what happens when you read complete junk.
- Test reading to the end of the stream returned by toStream(), ensuring that the number of bytes read is correct.
.writeInt(blockLength) | ||
.writeLong(firstEntryId); | ||
|
||
return new ByteBufInputStream(out, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that the true argument mean that the inputstream will release the bytebuf on close.
@ivankelly updated following your comments. |
DataInputStream dis = new DataInputStream(stream); | ||
int magic = dis.readInt(); | ||
if (magic != dataBlockMagicWord) { | ||
throw new IOException("Data block header magic word not match."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include what was expected and what was read in the error message.
// verify padding | ||
for (int index = 0; index < dataBlockHeaderAlign - dataBlockHeaderSize; index ++) { | ||
if (dis.readByte() != padding) { | ||
throw new IOException("Data block header magic word not match."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't validate that all the padding is 0. This could break BC if we add a field, to the header, but then an older version of the software tried to read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be dis.skipBytes(dataBlockHeaderAlign - dataBlockHeaderSize);
private static final int dataBlockMagicWord = 0xDBDBDBDB; | ||
// This is bigger than header size. Leaving some place for alignment and future enhancement. | ||
// Payload use this as the start offset. | ||
private static final int dataBlockHeaderAlign = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to HEADER_MAX_SIZE
public class DataBlockHeaderImpl implements DataBlockHeader { | ||
// Magic Word for data block. | ||
// It is a sequence of bytes used to identify the start of a block. | ||
private static final int dataBlockMagicWord = 0xDBDBDBDB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataBlock is redundant in these names. also, oracle java coding standard states that constants should be UPPER_CASE with underscores, so this should be MAGIC_WORD.
out.writeInt(dataBlockMagicWord) | ||
.writeInt(blockLength) | ||
.writeLong(firstEntryId) | ||
.writeBytes(new byte[dataBlockHeaderAlign - headerSize]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the padding will always be the same size, so define PADDING = new byte[x] as a constant in the class. When reading back, you can skip PADDING.length also.
* Get Magic Word for data block. | ||
* It is a sequence of bytes used to identify the start of a block. | ||
*/ | ||
static int getBlockMagicWord() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove from interface. external callers don't need it.
stream.reset(); | ||
byte streamContent[] = new byte[DataBlockHeader.getDataStartOffset()]; | ||
|
||
// stream with all 0, simulate junk data, should throw exception for header magic not match. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make these a separate test case. each test case should test one thing.
* Get the payload start offset in this block. | ||
* Space before this offset is for header and alignment. | ||
*/ | ||
static int getDataStartOffset() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove from interface. If tests need this value, they can get it from the Impl.
/** | ||
* Get the size of this DataBlockHeader. | ||
*/ | ||
int getHeaderSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think getHeaderSize() is needed anymore.
public interface DataBlockHeader { | ||
|
||
/** | ||
* Get the length of the block in bytes, including the header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this size include the header? If I have an inputstream, and im reading in, it would seem more natural, that after I've read in the header, the block length that I have is just the block length of the payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This is following the design doc, It would be helpful updating the doc if the idea changed.
And it seems more natural to me to make blockLength means the whole block length as in the design doc, since header and payload stored in the same block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Lets leave it for now, and change it if we need to when we use it.
@zhaijack lgtm. Will you add the master issue to the PR description, so it gets linked there. |
Thanks @ivankelly , added it. |
moved this to 2.1 |
Weird. It looks like the test hangs on MessageIdTest for over 2 hours. Obviously nothing to do with this change though. |
retest this please |
…1678) ### Motivation Implementation of InputStream(BlockAwareSegmentInputStream). when calls the is.read(), it return the following data: - Block header - Ledger entries starting from passed in entryId - Padding in the with bytes 0xFEDCDEAD This is based on PR #1669 (commit 3c8a846), And PR #1669 should be reviewed and merged first. Master issue #1511 ### Modifications Add class BlockAwareSegmentInputStream and test for it. ### Result unit test pass.
Motivation
Provide DataBlockHeader implementation. A block header is in format:
[ magic_word -- int ][ block_len -- int ][ first_entry_id -- long][alignment padding]
Master issue #1511
Modifications
Add class DataBlockHeader and test for it.
Result
unit test pass.