-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15016: [R] show_exec_plan
for an arrow_dplyr_query
#13541
Merged
Merged
Changes from all commits
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
403d9df
add a `_ToString` method for the ExecPlan
dragosmg 4a1f94e
add `ToString` method for the `ExecPlan` R6 object
dragosmg f03ced0
define `show_arrow_query()` and add it to `print.arrow_dplyr_query()`
dragosmg fdd20be
rename `show_arrow_query()` to `show_exec_plan()` + unit tests
dragosmg 350bea3
lint
dragosmg c889ec0
use `expect_snapshot()` instead of `expect_output()`
dragosmg 1260540
remove empty row
dragosmg 1f4801d
update test and snapshot
dragosmg 254e14c
document + export `show_exec_plan()`
dragosmg 6294765
example + redocument
dragosmg cb5d5b7
add `show_exec_plan` to r/_pkgdown.yml under *Computation*
dragosmg 424f12a
bump ci
dragosmg ccd8f76
run example only when {dplyr} is available
dragosmg 83e5492
remove snapshots and test with `expect_output()`
dragosmg b9c0c30
lint
dragosmg 07de250
bump ci
dragosmg 0e3b894
docs
dragosmg 3aab4e0
improved `show_exec_plan()` docs
dragosmg 3355135
Merge branch 'master' into show_query
dragosmg b9b4d49
use `adq`
dragosmg 4779f08
add minimal test
dragosmg 5730cec
failing unit tests for `arrange()` and `head()` "nodes"
dragosmg 23f5f22
add `show_query.arrow-dplyr_query()` and `explain.arrow_dplyr_query()`
dragosmg 2eb3cdb
tests for `show_query()` & `explain()` + clarifications
dragosmg 2b33495
move dataset tests to test-dataset-dplyr.R + unskip
dragosmg 4ff3065
clean-up
dragosmg d686f9d
clean-up
dragosmg d452a40
define `ExecPlan_ToStringWithSink` C++ function
dragosmg f4a1ba4
define the `ToStringWithSink()` R6 ExecPlan method and use it in `sh…
dragosmg ac7d756
updated unit tests
dragosmg 761b66d
comments
dragosmg a36f826
clang style
dragosmg d887d42
comment
dragosmg effc963
clang style
dragosmg 4571df4
clang style
dragosmg d1e4ce7
clang style
dragosmg e2220fe
update dataset tests
dragosmg 3c96fb7
comment
dragosmg 068dc6d
bump ci
dragosmg fb2a5a0
don't run the cyclocomp linter on the ExecPlan R6 definition
dragosmg c14a497
typo
dragosmg aac9edc
`&&`
dragosmg 3faecc6
rename `ToStrinWithSink` to `ToString` and keep only this method
dragosmg 1afcce9
docs
dragosmg 1de3447
C++ lint
dragosmg f40e5d2
add `ExecPlan_prepare` helper and update `ExecPlan_ToString` C++ func…
dragosmg 8aae3c3
update ToString R6 ExecPlan method
dragosmg fae74ae
c++ lint
dragosmg ebd4a02
C++ lint
dragosmg 311b70b
another C++ lint
dragosmg d0bc159
simpler ExecPlan_prepare
dragosmg d7834b0
C++ lint
dragosmg 9a02651
more C++ linting
dragosmg ac37a9d
extend ExecPlan_prepare and use it for ExecPlan_run
dragosmg 238e406
C++ lint
dragosmg 9a34255
C++ lint
dragosmg 1db36b6
include safe call into R header file
dragosmg 2153451
revert to old ExecPlan_run
dragosmg 807aa88
c++ lint
dragosmg 6141c45
simplified ExecPlan_prepare
dragosmg 9af1403
`ExecPlan_prepare` to return both the plan and the sink node
dragosmg 1f88ff7
C++ lint
dragosmg fbb4c1e
fix the rabbit hole that I led Dragos down
paleolimbot da73019
always call startproducing
paleolimbot ee90354
bump ci
dragosmg 66dcd22
updated unit tests + snapshots
dragosmg 0482c6c
updated dataset unit tests + snapshots
dragosmg ad5024f
remove `BuildAndShow` + add `ToString`
dragosmg 20799f9
update `as_record_batch_reader` and use it inside `show_exec_plan()`
dragosmg 21bc8c3
docs
dragosmg 6376bb2
lints
dragosmg 5751543
revert to da730196fc62b7a1572dc2d40e2550744039a142
dragosmg e29a835
don't start producing the plan
dragosmg 1309e0f
remove separate tests for `show_query()` and `explain()`
dragosmg 5f890dc
revert to fb2a5a00fc24d16da8e0a596256e0c18b02f5289 and use the `Build…
dragosmg c4912a4
docs + comments to indicate the duplicate code in:
dragosmg 5aaa3cd
removed `ungroup()` call
dragosmg ce8b17e
clang lint
dragosmg 0ffa30a
comment
dragosmg 6a8c753
warn and don't build & print the ExecPlan when we have a nested query
dragosmg 7fc20ca
Merge branch 'master' into show_query
dragosmg 35e9ef8
comments in `ExecPlan_prepare`
dragosmg File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't follow the discussion: why not use show_query() or explain() here? I saw something about wanting to massage the output to make explain() prettier, but why not use this for explain() today since that's something people know, and it's about logical plans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a link to the design doc, also linked in the PR description and the Jira ticket.
In short, in my opinion, a first step would be to have
show_exec_plan()
as a function that outputs exactly what we get from C++. That's what I interpreted the scope of the ticket to be.My proposal rests on this point-of-view: I think we need, for accessibility reasons, 2 separate ways of surfacing details regarding the ExecPlan, for 2 separate audiences (one readable by the seasoned Arrow developer and one readable by the regular R user).
show_exec_plan()
would cater to the first audience and, thus, focus on minimising cognitive load by keeping things the same across languages.explain()
should probably not aim to be both and would be targeted towards the second audience and do it well. By well I understand here as in a way that makes sense to the regular R user, who expects arrow to just work. In this context,explain()
would be a tool allowing them to inspect what is going on in a dplyr-like pipeline, but do so in a language they are familiar with.Having an
explain()
method would absolutely be super useful, but I do not think that is fully scoped out yet. Probably, before we start the implementation ofexplain()
, we should flesh out what the output ofexplain()
will look like. I strongly believe we should start from the description of the generic which says: "gives more details about an object (...) and is more focused on human readable output" (my underlining).In this PR there are already some really good suggestions of stuff we could include in the output of a function like
explain()
(e.g. to coverquery_can_stream()
).If we add to the above the time constraint of the impending release, I do not think extending this PR to cover
explain()
would be the best course of action.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see your case for why not
explain()
, you want to make something more R-user friendly. So then this seems likeshow_query()
. The only reason I see in your doc for not using show_query is "show_query()
states that one of its aims is to provide a more human readable output thatstr()
". That's a really low bar here, given what str() does with R6 classes.My recommendation, for what it's worth: in this PR, make this function be both
show_query
andexplain
, and in a followup, add more human embellishment toexplain()
. (dbplyr:::explain.tbl_sql
, for example, calls show_query() and then prints more details after it.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with your recommendation. One of my earlier drafts looked a bit more like
explain.tbl_sql
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nealrichardson Just to clarify, is your recommendation to have
show_exec_plan()
as a stand-alone function +show_query.arrow_dplyr_query()
andexplain.arrow_dplyr_query()
or just the 2 methods and to discardshow_exec_plan()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recommendation was the latter, but I don't object to also having a standalone show_exec_plan()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having
show_exec_plan()
separately as it allows us to write a bit of documentation vsshow_query()
andexplain()
. I'm also aware that this would be the first instance in which document dplyr-like behaviour (AFAIK there is no such documentation in the pkgdown website / help files).