Skip to content
This repository has been archived by the owner on Jun 7, 2021. It is now read-only.

[TRAFODION-2483] Trafodion treat '\' as NULL in hive table #1055

Merged
merged 2 commits into from Apr 24, 2017

Conversation

andyyangcn
Copy link
Contributor

[TRAFODION-2483] fix issue: '' is treated as NULL in hive table

@Traf-Jenkins
Copy link

Can one of the admins verify this patch?

5 similar comments
@Traf-Jenkins
Copy link

Can one of the admins verify this patch?

@Traf-Jenkins
Copy link

Can one of the admins verify this patch?

@Traf-Jenkins
Copy link

Can one of the admins verify this patch?

@Traf-Jenkins
Copy link

Can one of the admins verify this patch?

@Traf-Jenkins
Copy link

Can one of the admins verify this patch?

@@ -1653,7 +1653,7 @@ char * ExHdfsScanTcb::extractAndTransformAsciiSourceToSqlRow(int &err,
// for non-varchar, length of zero indicates a null value.
// For all datatypes, HIVE_DEFAULT_NULL_STRING('\N') indicates a null value.
if (((len == 0) && (tgtAttr && (NOT DFS2REC::isSQLVarChar(tgtAttr->getDatatype())))) ||
((len > 0) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0)))
((len > 1) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can handle the '\' problem, but if (not possible) the user define '\' as null, this logic will break.
So here, it should be to compare the whole HIVE_DEFAULT_NULL_STRING, and make sure length is equal, as I understand.
\N is NULL, but \NA is not NULL, what will happen if there is \NAA ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current fix \NAA will not be treated as NULL. and I agree that making sure the length is equal is better.

@sureshsubbiah
Copy link
Contributor

jenkins run tests

@sureshsubbiah
Copy link
Contributor

jenkins, add user

@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1714/

@Traf-Jenkins
Copy link

@Traf-Jenkins
Copy link

New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1724/

Copy link
Contributor

@selvaganesang selvaganesang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and it should make the hive scan perform better

@@ -1653,7 +1653,7 @@ char * ExHdfsScanTcb::extractAndTransformAsciiSourceToSqlRow(int &err,
// for non-varchar, length of zero indicates a null value.
// For all datatypes, HIVE_DEFAULT_NULL_STRING('\N') indicates a null value.
if (((len == 0) && (tgtAttr && (NOT DFS2REC::isSQLVarChar(tgtAttr->getDatatype())))) ||
((len > 0) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0)))
((len == strlen(HIVE_DEFAULT_NULL_STRING)) && (memcmp(sourceData, HIVE_DEFAULT_NULL_STRING, len) == 0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the c++ compiler would optimize this. You could improve this further by getting the strlen only once in the TCB

@Traf-Jenkins
Copy link

Copy link
Contributor

@zellerh zellerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
However, I agree with Selva's comment and given that this code is in the very inner loop it does make sense to optimize it. Alternatively, check the code generated to see whether it's being optimized by the compiler.

@selvaganesang
Copy link
Contributor

Another option to optimize this code is to replace strlen with sizeof(HIVE_DEFAULT_NULL_STRING)

@selvaganesang
Copy link
Contributor

In the absence of further comments from @andyyangcn I will merge this PR.

@asfgit asfgit merged commit af1652d into apache:master Apr 24, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
7 participants