Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6034: [C++][Gandiva] Add string functions in Gandiva #4942

Closed
wants to merge 7 commits into from

Conversation

pprudhvi
Copy link
Contributor

Add following functions in Gandiva:
substr(str, offset, len), substr(str, offset), concat(str1, str2), castVARCHAR(timestamp, len), convert_fromUTF8(binary)

@pprudhvi pprudhvi changed the title ARROW-6034: Add string functions in Gandiva ARROW-6034: [C++][Gandiva]Add string functions in Gandiva Jul 25, 2019
@pprudhvi pprudhvi force-pushed the utf8-funcs branch 2 times, most recently from e2f1742 to 54c8299 Compare July 26, 2019 05:49
@codecov-io
Copy link

codecov-io commented Jul 26, 2019

Codecov Report

Merging #4942 into master will increase coverage by 1.16%.
The diff coverage is 77.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4942      +/-   ##
==========================================
+ Coverage   87.98%   89.15%   +1.16%     
==========================================
  Files         910      722     -188     
  Lines      133521   101821   -31700     
  Branches     1418        0    -1418     
==========================================
- Hits       117483    90779   -26704     
+ Misses      16028    11042    -4986     
+ Partials       10        0      -10
Impacted Files Coverage Δ
cpp/src/gandiva/precompiled/time.cc 90.62% <0%> (-6.59%) ⬇️
cpp/src/gandiva/function_registry_string.cc 100% <100%> (ø) ⬆️
cpp/src/gandiva/precompiled/string_ops_test.cc 98.9% <100%> (+1.67%) ⬆️
cpp/src/gandiva/function_registry_datetime.cc 100% <100%> (ø) ⬆️
cpp/src/gandiva/precompiled/string_ops.cc 60% <65.9%> (+3.66%) ⬆️
cpp/src/arrow/csv/column-builder.cc 95.29% <0%> (-1.77%) ⬇️
js/src/util/fn.ts
rust/datafusion/src/execution/filter.rs
rust/arrow/src/csv/writer.rs
rust/datafusion/src/bin/main.rs
... and 187 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cac4957...f88773e. Read the comment docs.

@@ -68,7 +68,20 @@ std::vector<NativeFunction> GetStringFunctionRegistry() {

NativeFunction("like", {}, DataTypeVector{utf8(), utf8()}, boolean(),
kResultNullIfNull, "gdv_fn_like_utf8_utf8",
NativeFunction::kNeedsFunctionHolder)};
NativeFunction::kNeedsFunctionHolder),
NativeFunction("substr", {"substring"}, DataTypeVector{utf8(), int64(), int64()},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are non-obvious - can you please add comments for the params/functions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cpp/src/gandiva/precompiled/string_ops.cc Show resolved Hide resolved
uint64_t ctx_ptr = reinterpret_cast<int64>(&ctx);
int32 out_len = 0;

char* out_str = substr_utf8_int64_int64(ctx_ptr, "asdf", 4, 1, 2, &out_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a test for 0 len output ?

substr_utf8_int64_int64(ctx_ptr, "asdf", 4, 1, 0, &out_len)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kou kou changed the title ARROW-6034: [C++][Gandiva]Add string functions in Gandiva ARROW-6034: [C++][Gandiva] Add string functions in Gandiva Jul 31, 2019
@pravindra pravindra closed this in 9c14739 Aug 1, 2019
@pravindra
Copy link
Contributor

thanks for the change, @pprudhvi

pprudhvi added a commit to pprudhvi/arrow that referenced this pull request Aug 11, 2019
Add following functions in Gandiva:
substr(str, offset, len), substr(str, offset), concat(str1, str2), castVARCHAR(timestamp, len), convert_fromUTF8(binary)

Closes apache#4942 from pprudhvi/utf8-funcs and squashes the following commits:

f88773e <Prudhvi Porandla> add len 0 substr unittest
3900f8e <Prudhvi Porandla> static cast size_t to int32
2082241 <Prudhvi Porandla> add convert_fromUTF8 method
112c933 <Prudhvi Porandla> add castVARCHAR(timestamp) method
77d3cdd <Prudhvi Porandla> add concatOperator
9e2623f <Prudhvi Porandla> add unittests for substr
48c4d08 <Prudhvi Porandla> add substr methods

Authored-by: Prudhvi Porandla <prudhvi.porandla@icloud.com>
Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>
@pprudhvi pprudhvi deleted the utf8-funcs branch September 5, 2019 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants