-
Notifications
You must be signed in to change notification settings - Fork 150
[TRAFODION-3126] Refactored HDFS client implementation should also su… #1626
Conversation
…pport Alluxio file system Alluxio doesn't support direct ByteBuffer access. Circumvented this problem by using non-direct ByteBuffer to read hdfs files when it belongs to Alluxio file system. No need to change the default setting of USE_LIBHDFS for Alluxio to work.
|
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2791/ |
|
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2791/ |
|
@zellerh If you can, please review this PR |
DaveBirdsall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Looks good to me. Though I can't verify what kind of exception is thrown if Alluxio is not installed and you take the "instanceof" operator.
zellerh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Looks good to me, with one question for my own understanding.
| // native layer to process the data | ||
| if ((! alluxioNotInstalled_) && fs_ instanceof alluxio.hadoop.FileSystem) { | ||
| savedBuf_ = buf_; | ||
| buf_ = ByteBuffer.allocate(savedBuf_.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we do this allocation every time we call the call() method? If so, could we avoid that and do this only once? The same applies to line 163 above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HdfsClient object is constructed for every range or every HdfsScanBuf size within a range in HdfsScan. The call method is called only once. So, it is ok to allocate.
However, It is possible to optimize on this, if you push the detection of file system before HdfsClient object is constructed to HdfsScan, if the allocation becomes a bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
| // Ignore the exception. It is not needed for alluxio to be installed | ||
| // for the methods of this class to work if | ||
| // alluxio filesystem is NOT required | ||
| alluxioNotInstalled_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain what error we get if the file to be read is an alluxio file and alluxio is not installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would get stack trace in the JIRA
|
You will get ClassNotFoundException or ClassDefNotFoundError during instanceof |
…pport Alluxio file system
Alluxio doesn't support direct ByteBuffer access. Circumvented
this problem by using non-direct ByteBuffer to read hdfs files
when it belongs to Alluxio file system.
No need to change the default setting of USE_LIBHDFS for Alluxio to work.