Skip to content
This repository has been archived by the owner on Jun 7, 2021. It is now read-only.

[TRAFODION-3171] Refactor Hive sequence file reading to use the new i… #1674

Merged
merged 2 commits into from Aug 6, 2018

Conversation

selvaganesang
Copy link
Contributor

…mplementation

@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2911/

@Traf-Jenkins
Copy link

Copy link
Contributor

@sureshsubbiah sureshsubbiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Change looks great.

if (readLen <= lenRemain) {

buf_.put(byteArray, 0, readLen);
buf_.put(recDelimiter_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be worried that the 1 byte delimiter will put us past us past end of buffer as suggest in the comment in line 290? (when readLen == lenRemain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I need to consider the extra byte added for delimiter. Good catch, Will fix it.

@@ -571,6 +569,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work()
hdfsScan_ = HdfsScan::newInstance((NAHeap *)getHeap(), hdfsScanBuf_, hdfsScanBufMaxSize_,
hdfsScanTdb().hdfsIoByteArraySizeInKB_,
&hdfsFileInfoListAsArray_, beginRangeNum_, numRanges_, hdfsScanTdb().rangeTailIOSize_,
isSequenceFile(), hdfsScanTdb().recordDelimiter_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did not pass in the recordDelimiter_ previously, do we know if previous code worked correctly for text format, when the record delimiter was something other than /n ? This PR does not seem change anything for text format reads, so if there was issue previously, it might still exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need pass the record delimiter for text format because it is a raw read of text formatted table. It should have the record delimiter as per the hive metadata. In case of sequence files, the reader.next API converts the raw data (or copies) the row without record delimiter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explaining

…mplementation

Fix to resolve the issue highlighted in the review comment
@Traf-Jenkins
Copy link

New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2918/

@Traf-Jenkins
Copy link

@sureshsubbiah
Copy link
Contributor

+!

@asfgit asfgit merged commit abbfe85 into apache:master Aug 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants