ARROW-17812: [Gandiva][Docs] Add C++ Gandiva User Guide #14200

wjones127 · 2022-09-22T02:06:05Z

No description provided.

github-actions · 2022-09-22T02:06:29Z

https://issues.apache.org/jira/browse/ARROW-17812

wjones127 · 2022-09-22T02:17:08Z

FYI @js8544

js8544 · 2022-09-22T02:41:43Z

Thanks!

js8544 · 2022-09-22T08:26:15Z

@pitrou Would you mind reviewing this PR? I am planning on improving Gandiva's doc based on Will's work.

pitrou

This is nice and well-written, thanks a lot @wjones127 !

docs/source/cpp/gandiva.rst

pitrou · 2022-09-22T08:49:56Z

docs/source/cpp/gandiva.rst

+Gandiva was designed to take advantage of the Arrow memory format and modern
+hardware. Compiling expressions using LLVM allows the execution to be optimized
+to the local runtime environment and hardware, including available SIMD
+instructions. To minimize optimization overhead, all Gandiva functions are


Is it really all Gandiva functions (including e.g. simple additions) or the most costly ones?

Also I would say "To reduce optimization overhead".

I changed to "many", though I'm not sure which is right. I wrote this about a year ago, based on what I could understand from the Arrow and Dremio blog posts. But I've forgotten where I might have gotten this.

@pitrou As far as I understand all functions are precompiled to LLVM IR. For example addition is defined here as a C function: https://github.com/apache/arrow/blob/master/cpp/src/gandiva/precompiled/arithmetic_ops.cc#L85. During the build process, all functions will first be compiled to LLVM IR format(https://github.com/apache/arrow/blob/master/cpp/src/gandiva/precompiled/CMakeLists.txt#L88) and loaded by the LLVM engine before any computation occurs (https://github.com/apache/arrow/blob/master/cpp/src/gandiva/engine.cc#L257).

docs/source/cpp/gandiva.rst

pitrou · 2022-09-22T08:55:10Z

@ksuarez1423 Do you want to take a look at the additions here?

docs/source/cpp/gandiva.rst

ksuarez1423

Overall, this is really nice! I really like the flow of explaining trees, then evaluation of trees, without overlap. That really makes things easier to follow, and I left a comment at each place that I was a little lost.

docs/source/cpp/gandiva.rst

ksuarez1423 · 2022-09-23T20:30:54Z

docs/source/cpp/gandiva.rst

+====================
+
+Gandiva provides a general expression representation where expressions are
+represented by a tree of nodes. The expression trees are built using


It might be worth explaining what the nodes represent in particular. While I can figure it out in the next section, where function nodes are discussed, having to wait for that clarity feels weird to me.

I guess what would really be helpful would be to draw an expression tree, but idk if I want to do that in this PR.

I don't really know how to describe what a node itself means. That's why I first describe the leaves, and then the composite nodes. And then give an example.

If you have a suggestion, though, I'd happily take it.

I think something's off in the writing, then -- I did not find myself thinking of "composite nodes" after reading, only leaf nodes, then something that's above that. It might be worth going more in depth in an additional paragraph, at least.

As for how to describe them, internal nodes represent operations that handle multiple leaves, while leaves contain fields and literals, I believe? I haven't written any Gandiva, so if I'm off-key here, we could talk further.

ksuarez1423 · 2022-09-23T20:40:29Z

docs/source/cpp/gandiva.rst

+
+Once a Projector or Filter is created, it can be evaluated on Arrow record batches.
+These execution kernels are single-threaded on their own, but are designed to be
+reused to process distinct record batches in parallel.


Could an example be provided of how to do this? Even if it's just "use them in your OpenMP threads," that still provides some useful guidance.

TBH I'm not familiar enough with multi-threading in C++ to have a ready example for that. We have some task and thread pool machinery in Arrow, but that's not what users would be expected to use.

docs/source/cpp/gandiva.rst

wjones127 · 2022-09-27T21:24:18Z

@github-actions crossbow submit preview-docs

github-actions · 2022-09-27T22:23:42Z

Revision: 79deadc

Submitted crossbow builds: ursacomputing/crossbow @ actions-3d38fbfa71

Task	Status
preview-docs

Co-authored-by: Antoine Pitrou <antoine@python.org>

wjones127 · 2022-10-24T21:25:26Z

@github-actions crossbow submit preview-docs

github-actions · 2022-10-24T21:47:05Z

Revision: 7149a2c

Submitted crossbow builds: ursacomputing/crossbow @ actions-2243107e33

Task	Status
preview-docs

js8544

LGTM

docs/source/cpp/gandiva.rst

ursabot · 2022-11-09T01:32:05Z

Benchmark runs are scheduled for baseline = 8a5aa67 and contender = 5182d62. 5182d62 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.27% ⬆️0.07%] test-mac-arm
[Finished ⬇️0.27% ⬆️0.0%] ursa-i9-9960x
[Failed ⬇️0.18% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 5182d62f ec2-t3-xlarge-us-east-2
[Failed] 5182d62f test-mac-arm
[Finished] 5182d62f ursa-i9-9960x
[Finished] 5182d62f ursa-thinkcentre-m75q
[Finished] 8a5aa671 ec2-t3-xlarge-us-east-2
[Finished] 8a5aa671 test-mac-arm
[Finished] 8a5aa671 ursa-i9-9960x
[Failed] 8a5aa671 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2022-11-09T01:32:20Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

github-actions bot added the Component: Documentation label Sep 22, 2022

pitrou reviewed Sep 22, 2022

View reviewed changes

wjones127 commented Sep 23, 2022

View reviewed changes

docs/source/cpp/gandiva.rst Show resolved Hide resolved

ksuarez1423 reviewed Sep 23, 2022

View reviewed changes

github-actions bot added Component: C++ - Gandiva Component: C++ labels Sep 27, 2022

wjones127 and others added 7 commits October 24, 2022 13:40

Add user guide for Gandiva

140d6e5

Address PR feedback

b6df32d

Co-authored-by: Antoine Pitrou <antoine@python.org>

Apply suggestions from code review

212512d

Write a better example file

cfb9974

Add a little more explination on expression trees

d0a861c

CMake format

8ce8c8b

fix: make sure GANDVIA_EXPORT is excluded from doxygen

7149a2c

wjones127 force-pushed the ARROW-17812-gandiva-user-guide branch from 3099fe0 to 7149a2c Compare October 24, 2022 21:25

wjones127 requested review from pitrou and ksuarez1423 and removed request for pitrou October 31, 2022 20:00

pitrou requested review from js8544 and removed request for ksuarez1423 October 31, 2022 20:01

js8544 approved these changes Nov 1, 2022

View reviewed changes

docs/source/cpp/gandiva.rst Outdated Show resolved Hide resolved

Update docs/source/cpp/gandiva.rst

044c50e

wjones127 merged commit 5182d62 into apache:master Nov 8, 2022

asfimport mentioned this pull request Nov 23, 2022

[C++][Documentation] Add Gandiva User Guide #20425

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-17812: [Gandiva][Docs] Add C++ Gandiva User Guide #14200

ARROW-17812: [Gandiva][Docs] Add C++ Gandiva User Guide #14200

wjones127 commented Sep 22, 2022

github-actions bot commented Sep 22, 2022

wjones127 commented Sep 22, 2022

js8544 commented Sep 22, 2022

js8544 commented Sep 22, 2022

pitrou left a comment

pitrou Sep 22, 2022

pitrou Sep 22, 2022

wjones127 Sep 22, 2022

js8544 Sep 23, 2022

pitrou commented Sep 22, 2022

ksuarez1423 left a comment

ksuarez1423 Sep 23, 2022

wjones127 Sep 23, 2022

ksuarez1423 Sep 27, 2022

ksuarez1423 Sep 23, 2022

wjones127 Sep 27, 2022

wjones127 commented Sep 27, 2022

github-actions bot commented Sep 27, 2022

wjones127 commented Oct 24, 2022

github-actions bot commented Oct 24, 2022

js8544 left a comment

ursabot commented Nov 9, 2022

ursabot commented Nov 9, 2022

ARROW-17812: [Gandiva][Docs] Add C++ Gandiva User Guide #14200

ARROW-17812: [Gandiva][Docs] Add C++ Gandiva User Guide #14200

Conversation

wjones127 commented Sep 22, 2022

github-actions bot commented Sep 22, 2022

wjones127 commented Sep 22, 2022

js8544 commented Sep 22, 2022

js8544 commented Sep 22, 2022

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Sep 22, 2022

ksuarez1423 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjones127 commented Sep 27, 2022

github-actions bot commented Sep 27, 2022

wjones127 commented Oct 24, 2022

github-actions bot commented Oct 24, 2022

js8544 left a comment

Choose a reason for hiding this comment

ursabot commented Nov 9, 2022

ursabot commented Nov 9, 2022