Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-17: provide DataBlockHeader and implementation #1669

Merged
merged 3 commits into from
May 3, 2018

Conversation

zhaijack
Copy link
Contributor

@zhaijack zhaijack commented Apr 27, 2018

Motivation

Provide DataBlockHeader implementation. A block header is in format:
[ magic_word -- int ][ block_len -- int ][ first_entry_id -- long][alignment padding]

Master issue #1511

Modifications

Add class DataBlockHeader and test for it.

Result

unit test pass.

@sijie
Copy link
Member

sijie commented Apr 27, 2018

@ivankelly please review this

* Read out in format:
* [ magic_word -- int ][ block_len -- int ][ first_entry_id -- long]
*/
public InputStream toStream() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: @OverRide

}

// Construct DataBlockHeader from InputStream
public static DataBlockHeaderImpl fromStream(InputStream stream) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return DataBlockHeader, rather than the impl

/**
* Get the size of this DataBlockHeader.
*/
int getHeaderSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is header size useful to a user of the DataBlockHeader? I think this should return what getDataStartOffset is returning below. The contents of the header should be opaque to the user of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Seems this is the key for some of the comments below. If user get the actual size, user could ignore the paddings, InputStream passed into 'fromStream', and returned by 'toStream' could ignore the paddings.
will change this to what you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getHeaderSize() is needed anymore.

* Get Magic Word for data block.
* It is a sequence of bytes used to identify the start of a block.
*/
int getBlockMagicWord();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful externally? Can be static in any case.

DataInputStream dis = new DataInputStream(stream);
int magic = dis.readInt();
checkState(magic == dataBlockMagicWord);
return new DataBlockHeaderImpl(dis.readInt(), dis.readLong());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the header from a stream, you want to move the input stream read cursor past the padding also. The user of data block header should not be aware that there is padding, and that the data only takes up part of the header. Instead, they should pass the turn a header into an input stream and get a stream that returns dataBlockHeaderAlign bytes. They should pass an input stream into this method, and this method should read dataBlockHeaderAlign bytes and return a DataBlockHeader object, even if only 4 of the bytes read are useful for building the DataBlockHeader object.

public static DataBlockHeaderImpl fromStream(InputStream stream) throws IOException {
DataInputStream dis = new DataInputStream(stream);
int magic = dis.readInt();
checkState(magic == dataBlockMagicWord);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkState throws an unchecked IllegalStateException. Since this is reading external data, we should throw an IOException if the magic number is wrong, or at least a checked exception, so that it can be handled more gracefully by the caller.

assertTrue(rebuild.getBlockLength() == headerLength);
assertTrue(rebuild.getFirstEntryId() == firstEntryId);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Test what happens when you read a header that's too small
  • Test what happens when you read complete junk.
  • Test reading to the end of the stream returned by toStream(), ensuring that the number of bytes read is correct.

.writeInt(blockLength)
.writeLong(firstEntryId);

return new ByteBufInputStream(out, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that the true argument mean that the inputstream will release the bytebuf on close.

@zhaijack
Copy link
Contributor Author

@ivankelly updated following your comments.

DataInputStream dis = new DataInputStream(stream);
int magic = dis.readInt();
if (magic != dataBlockMagicWord) {
throw new IOException("Data block header magic word not match.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include what was expected and what was read in the error message.

// verify padding
for (int index = 0; index < dataBlockHeaderAlign - dataBlockHeaderSize; index ++) {
if (dis.readByte() != padding) {
throw new IOException("Data block header magic word not match.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't validate that all the padding is 0. This could break BC if we add a field, to the header, but then an older version of the software tried to read it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be dis.skipBytes(dataBlockHeaderAlign - dataBlockHeaderSize);

private static final int dataBlockMagicWord = 0xDBDBDBDB;
// This is bigger than header size. Leaving some place for alignment and future enhancement.
// Payload use this as the start offset.
private static final int dataBlockHeaderAlign = 128;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to HEADER_MAX_SIZE

public class DataBlockHeaderImpl implements DataBlockHeader {
// Magic Word for data block.
// It is a sequence of bytes used to identify the start of a block.
private static final int dataBlockMagicWord = 0xDBDBDBDB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataBlock is redundant in these names. also, oracle java coding standard states that constants should be UPPER_CASE with underscores, so this should be MAGIC_WORD.

out.writeInt(dataBlockMagicWord)
.writeInt(blockLength)
.writeLong(firstEntryId)
.writeBytes(new byte[dataBlockHeaderAlign - headerSize]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the padding will always be the same size, so define PADDING = new byte[x] as a constant in the class. When reading back, you can skip PADDING.length also.

* Get Magic Word for data block.
* It is a sequence of bytes used to identify the start of a block.
*/
static int getBlockMagicWord() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove from interface. external callers don't need it.

stream.reset();
byte streamContent[] = new byte[DataBlockHeader.getDataStartOffset()];

// stream with all 0, simulate junk data, should throw exception for header magic not match.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these a separate test case. each test case should test one thing.

* Get the payload start offset in this block.
* Space before this offset is for header and alignment.
*/
static int getDataStartOffset() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from interface. If tests need this value, they can get it from the Impl.

/**
* Get the size of this DataBlockHeader.
*/
int getHeaderSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getHeaderSize() is needed anymore.

public interface DataBlockHeader {

/**
* Get the length of the block in bytes, including the header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this size include the header? If I have an inputstream, and im reading in, it would seem more natural, that after I've read in the header, the block length that I have is just the block length of the payload.

Copy link
Contributor Author

@zhaijack zhaijack Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is following the design doc, It would be helpful updating the doc if the idea changed.
And it seems more natural to me to make blockLength means the whole block length as in the design doc, since header and payload stored in the same block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Lets leave it for now, and change it if we need to when we use it.

@ivankelly
Copy link
Contributor

@zhaijack lgtm. Will you add the master issue to the PR description, so it gets linked there.

@zhaijack
Copy link
Contributor Author

Thanks @ivankelly , added it.

@sijie
Copy link
Member

sijie commented Apr 30, 2018

moved this to 2.1

@ivankelly
Copy link
Contributor

ivankelly commented May 2, 2018

Weird. It looks like the test hangs on MessageIdTest for over 2 hours. Obviously nothing to do with this change though.

@sijie
Copy link
Member

sijie commented May 2, 2018

retest this please

@sijie sijie merged commit d1e719c into apache:master May 3, 2018
sijie pushed a commit that referenced this pull request May 3, 2018
…1678)

### Motivation

Implementation of InputStream(BlockAwareSegmentInputStream). when calls the is.read(), it return the following data:
-  Block header
-  Ledger entries starting from passed in entryId
-  Padding in the with bytes 0xFEDCDEAD

This is based on PR #1669 (commit 3c8a846), And PR #1669 should be reviewed and merged first.

Master issue #1511

### Modifications

Add class BlockAwareSegmentInputStream and test for it.

### Result

unit test pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants