ARROW-15699: [C++][Gandiva] Fix implementation of left and right func… #12440

nivia007 · 2022-02-16T08:54:20Z

…tions to handle more cases

Added conditions to handle below cases:
case where left('abcdef', -6) -> "" and left('abcdef', -7) -> ""
case where right('abcdef', -6) -> "" and right('abcdef', -7) -> ""

github-actions · 2022-02-16T08:54:37Z

https://issues.apache.org/jira/browse/ARROW-15699

github-actions · 2022-02-16T08:54:38Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

projjal · 2022-03-03T06:09:42Z

cpp/src/gandiva/precompiled/string_ops.cc

@@ -2207,6 +2213,12 @@ const char* right_utf8_int32(gdv_int64 context, const char* text, gdv_int32 text
    return "";
  }

+  //case where right('abcdef', -6) -> "" and right('abcdef', -7) -> ""
+  if(number < 0 && abs(number) >= text_len) {


This is not handling mutibyte unicode characters

I tested it for left('abc¥', -4) and it is returning correct result, Would it better if I put this condition in the for loop below
if (number < 0 && char_count <= abs(number)) {
*out_len = 0;
return "";
}

text_len for strings with multibyte characters will be number of bytes rather than number of characters.. while number refers the character count

Ok, I assumed text_len is no of characters.

cpp/src/gandiva/precompiled/string_ops.cc

projjal · 2022-03-16T05:47:42Z

cpp/src/gandiva/precompiled/string_ops.cc

  int32_t start_char_pos;  // the char result start position (inclusive)
-  int32_t end_char_len;    // the char result end position (inclusive)
+  int32_t end_pos;    // the char result end position (inclusive)
+
  if (number > 0) {
    // case where right('abc', 5) ==> 'abc' start_char_pos=1.
    start_char_pos = (char_count > number) ? char_count - number : 0;


Looks like this logic is not correct for multibyte unicode characters.

Nvm. looks like it is being handled in following lines

cpp/src/gandiva/precompiled/string_ops.cc

…tions to handle more cases Proper handling of multibyte characters Added conditions to handle below cases: case where left('abcdef', -6) -> "" and left('abcdef', -7) -> "" case where right('abcdef', -6) -> "" and right('abcdef', -7) -> ""

ursabot · 2022-03-17T13:41:23Z

Benchmark runs are scheduled for baseline = 91f6585 and contender = 7e70c42. 7e70c42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.21% ⬆️0.04%] test-mac-arm
[Failed ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.34% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added Component: C++ - Gandiva Component: C++ labels Feb 16, 2022

projjal reviewed Mar 3, 2022

View reviewed changes

nivia007 force-pushed the fix_left_right_functions branch 5 times, most recently from 5803604 to 61eac98 Compare March 11, 2022 15:26

projjal reviewed Mar 16, 2022

View reviewed changes

cpp/src/gandiva/precompiled/string_ops.cc Outdated Show resolved Hide resolved

projjal reviewed Mar 16, 2022

View reviewed changes

cpp/src/gandiva/precompiled/string_ops.cc Outdated Show resolved Hide resolved

projjal reviewed Mar 16, 2022

View reviewed changes

cpp/src/gandiva/precompiled/string_ops.cc Outdated Show resolved Hide resolved

nivia007 force-pushed the fix_left_right_functions branch 4 times, most recently from aa360b5 to 5ee9753 Compare March 17, 2022 04:48

projjal approved these changes Mar 17, 2022

View reviewed changes

nivia007 force-pushed the fix_left_right_functions branch from 5ee9753 to 244a5d9 Compare March 17, 2022 08:29

pravindra closed this in 7e70c42 Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-15699: [C++][Gandiva] Fix implementation of left and right func… #12440

ARROW-15699: [C++][Gandiva] Fix implementation of left and right func… #12440

nivia007 commented Feb 16, 2022

github-actions bot commented Feb 16, 2022

github-actions bot commented Feb 16, 2022

projjal Mar 3, 2022

nivia007 Mar 3, 2022

projjal Mar 7, 2022

nivia007 Mar 7, 2022

projjal Mar 16, 2022

projjal Mar 16, 2022

ursabot commented Mar 17, 2022 •

edited

ARROW-15699: [C++][Gandiva] Fix implementation of left and right func… #12440

ARROW-15699: [C++][Gandiva] Fix implementation of left and right func… #12440

Conversation

nivia007 commented Feb 16, 2022

github-actions bot commented Feb 16, 2022

github-actions bot commented Feb 16, 2022

projjal Mar 3, 2022

Choose a reason for hiding this comment

nivia007 Mar 3, 2022

Choose a reason for hiding this comment

projjal Mar 7, 2022

Choose a reason for hiding this comment

nivia007 Mar 7, 2022

Choose a reason for hiding this comment

projjal Mar 16, 2022

Choose a reason for hiding this comment

projjal Mar 16, 2022

Choose a reason for hiding this comment

ursabot commented Mar 17, 2022 • edited

ursabot commented Mar 17, 2022 •

edited