[TRAFODION-3171] Refactor Hive sequence file reading to use the new i… #1674
Conversation
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2911/ |
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2911/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Change looks great.
if (readLen <= lenRemain) { | ||
|
||
buf_.put(byteArray, 0, readLen); | ||
buf_.put(recDelimiter_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be worried that the 1 byte delimiter will put us past us past end of buffer as suggest in the comment in line 290? (when readLen == lenRemain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I need to consider the extra byte added for delimiter. Good catch, Will fix it.
@@ -571,6 +569,7 @@ ExWorkProcRetcode ExHdfsScanTcb::work() | |||
hdfsScan_ = HdfsScan::newInstance((NAHeap *)getHeap(), hdfsScanBuf_, hdfsScanBufMaxSize_, | |||
hdfsScanTdb().hdfsIoByteArraySizeInKB_, | |||
&hdfsFileInfoListAsArray_, beginRangeNum_, numRanges_, hdfsScanTdb().rangeTailIOSize_, | |||
isSequenceFile(), hdfsScanTdb().recordDelimiter_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did not pass in the recordDelimiter_ previously, do we know if previous code worked correctly for text format, when the record delimiter was something other than /n ? This PR does not seem change anything for text format reads, so if there was issue previously, it might still exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need pass the record delimiter for text format because it is a raw read of text formatted table. It should have the record delimiter as per the hive metadata. In case of sequence files, the reader.next API converts the raw data (or copies) the row without record delimiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explaining
…mplementation Fix to resolve the issue highlighted in the review comment
New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2918/ |
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2918/ |
+! |
…mplementation