Skip to content
This repository was archived by the owner on Jun 7, 2021. It is now read-only.

Conversation

@selvaganesang
Copy link
Contributor

…pport Alluxio file system

Alluxio doesn't support direct ByteBuffer access. Circumvented
this problem by using non-direct ByteBuffer to read hdfs files
when it belongs to Alluxio file system.

No need to change the default setting of USE_LIBHDFS for Alluxio to work.

…pport Alluxio file system

Alluxio doesn't support direct ByteBuffer access. Circumvented
this problem by using non-direct ByteBuffer to read hdfs files
when it belongs to Alluxio file system.

No need to change the default setting of USE_LIBHDFS for Alluxio to work.
@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2791/

@Traf-Jenkins
Copy link

@selvaganesang
Copy link
Contributor Author

@zellerh If you can, please review this PR

Copy link
Contributor

@DaveBirdsall DaveBirdsall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Looks good to me. Though I can't verify what kind of exception is thrown if Alluxio is not installed and you take the "instanceof" operator.

Copy link
Contributor

@zellerh zellerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Looks good to me, with one question for my own understanding.

// native layer to process the data
if ((! alluxioNotInstalled_) && fs_ instanceof alluxio.hadoop.FileSystem) {
savedBuf_ = buf_;
buf_ = ByteBuffer.allocate(savedBuf_.capacity());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do this allocation every time we call the call() method? If so, could we avoid that and do this only once? The same applies to line 163 above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HdfsClient object is constructed for every range or every HdfsScanBuf size within a range in HdfsScan. The call method is called only once. So, it is ok to allocate.

However, It is possible to optimize on this, if you push the detection of file system before HdfsClient object is constructed to HdfsScan, if the allocation becomes a bottleneck.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

// Ignore the exception. It is not needed for alluxio to be installed
// for the methods of this class to work if
// alluxio filesystem is NOT required
alluxioNotInstalled_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain what error we get if the file to be read is an alluxio file and alluxio is not installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would get stack trace in the JIRA

@selvaganesang
Copy link
Contributor Author

You will get ClassNotFoundException or ClassDefNotFoundError during instanceof

@asfgit asfgit merged commit 2a6cfd1 into apache:master Jul 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants