[TRAFODION-2483] Trafodion treat '\' as NULL in hive table #1055
Conversation
Can one of the admins verify this patch? |
5 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
core/sql/executor/ExHdfsScan.cpp
Outdated
@@ -1653,7 +1653,7 @@ char * ExHdfsScanTcb::extractAndTransformAsciiSourceToSqlRow(int &err, | |||
// for non-varchar, length of zero indicates a null value. | |||
// For all datatypes, HIVE_DEFAULT_NULL_STRING('\N') indicates a null value. | |||
if (((len == 0) && (tgtAttr && (NOT DFS2REC::isSQLVarChar(tgtAttr->getDatatype())))) || | |||
((len > 0) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0))) | |||
((len > 1) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can handle the '\' problem, but if (not possible) the user define '\' as null, this logic will break.
So here, it should be to compare the whole HIVE_DEFAULT_NULL_STRING, and make sure length is equal, as I understand.
\N is NULL, but \NA is not NULL, what will happen if there is \NAA ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current fix \NAA will not be treated as NULL. and I agree that making sure the length is equal is better.
jenkins run tests |
jenkins, add user |
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1714/ |
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/1714/ |
New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1724/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch and it should make the hive scan perform better
@@ -1653,7 +1653,7 @@ char * ExHdfsScanTcb::extractAndTransformAsciiSourceToSqlRow(int &err, | |||
// for non-varchar, length of zero indicates a null value. | |||
// For all datatypes, HIVE_DEFAULT_NULL_STRING('\N') indicates a null value. | |||
if (((len == 0) && (tgtAttr && (NOT DFS2REC::isSQLVarChar(tgtAttr->getDatatype())))) || | |||
((len > 0) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0))) | |||
((len == strlen(HIVE_DEFAULT_NULL_STRING)) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if the c++ compiler would optimize this. You could improve this further by getting the strlen only once in the TCB
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/1724/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
However, I agree with Selva's comment and given that this code is in the very inner loop it does make sense to optimize it. Alternatively, check the code generated to see whether it's being optimized by the compiler.
Another option to optimize this code is to replace strlen with sizeof(HIVE_DEFAULT_NULL_STRING) |
In the absence of further comments from @andyyangcn I will merge this PR. |
[TRAFODION-2483] fix issue: '' is treated as NULL in hive table