ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain #7993

jorgecarleitao · 2020-08-18T19:05:26Z

I admit I find this API a bit counter-intuitive: coming from spark, I would be expect a string when I call df.explain()?. However, I am following the commitment of understanding explain as a table with one row and one column and leave the collect and print for the users to handle.

github-actions · 2020-08-18T19:18:35Z

https://issues.apache.org/jira/browse/ARROW-9760

rust/datafusion/src/dataframe.rs

alamb

The implementation looks good to me. I don't have any strong opinion or feedback on the API design

rust/datafusion/src/dataframe.rs

alamb · 2020-08-19T12:46:41Z

That makes sense to me

…

On Tue, Aug 18, 2020 at 11:24 PM Jorge Leitao ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In rust/datafusion/src/dataframe.rs <#7993 (comment)>: > @@ -174,4 +174,18 @@ pub trait DataFrame { /// Return the logical plan represented by this DataFrame. fn to_logical_plan(&self) -> LogicalPlan; + + /// Return a DataFrame with the explanation of its plan so far. + /// + /// ``` + /// # use datafusion::prelude::*; + /// # use datafusion::error::Result; + /// # fn main() -> Result<()> { + /// let mut ctx = ExecutionContext::new(); + /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?; + /// let batches = df.limit(100)?.explain(false)?.collect()?; + /// # Ok(()) + /// # } + /// ``` + fn explain(&self, verbose: bool) -> Result<Arc<dyn DataFrame>>; I find it poor design that .explain prints directly to the stdout in spark. IMO saving 1 extra line (print) of code is not a sufficiently good reason to outright spam stdout and limit so much what a user can do with .explain. Some downstream consequences of this decision in spark: - it makes it much more difficult to log it correctly - the popular pyspark can't use it to convert it to a Python string and prettify it when it is being used in notebooks I agree with fn explain(&self, verbose: bool) -> String (prob. Result<String>). For a user, the difference is df.explain() vs println("{}", df.explain()?) I find the latter more expressive of the user's intention, and gives them the freedom to pipe the result to whatever stream they want. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7993 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADXZMJFG5NSF3L4FDEOIT3SBNAWNANCNFSM4QD6L7YA> .

andygrove · 2020-08-19T15:57:27Z

After thinking about this some more, I'm also fine with the current implementation that returns a DataFrame.

jorgecarleitao · 2020-08-19T18:39:08Z

Ready to go, then!

@andygrove

FYI @andygrove and @alamb I admit I find this API a bit counter-intuitive: coming from spark, I would be expect a string when I call `df.explain()?`. However, I am following the commitment of understanding `explain` as a table with one row and one column and leave the collect and print for the users to handle. Closes apache#7993 from jorgecarleitao/df_explain Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>

@andygrove

FYI @andygrove and @alamb I admit I find this API a bit counter-intuitive: coming from spark, I would be expect a string when I call `df.explain()?`. However, I am following the commitment of understanding `explain` as a table with one row and one column and leave the collect and print for the users to handle. Closes apache#7993 from jorgecarleitao/df_explain Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>

@andygrove

FYI @andygrove and @alamb I admit I find this API a bit counter-intuitive: coming from spark, I would be expect a string when I call `df.explain()?`. However, I am following the commitment of understanding `explain` as a table with one row and one column and leave the collect and print for the users to handle. Closes apache#7993 from jorgecarleitao/df_explain Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>

Added DataFrame::explain.

f67bb64

andygrove reviewed Aug 18, 2020

View reviewed changes

rust/datafusion/src/dataframe.rs Show resolved Hide resolved

alamb approved these changes Aug 18, 2020

View reviewed changes

rust/datafusion/src/dataframe.rs Show resolved Hide resolved

andygrove added Component: Rust Component: Rust - DataFusion labels Aug 18, 2020

andygrove approved these changes Aug 19, 2020

View reviewed changes

andygrove closed this in 3cb0bd8 Aug 19, 2020

jorgecarleitao deleted the df_explain branch September 30, 2020 15:19

asfimport mentioned this pull request Oct 9, 2020

[Rust] [DataFusion] Implement DataFrame::explain #17310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain #7993

ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain #7993

jorgecarleitao commented Aug 18, 2020

github-actions bot commented Aug 18, 2020

alamb left a comment

alamb commented Aug 19, 2020 via email

andygrove commented Aug 19, 2020

jorgecarleitao commented Aug 19, 2020

ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain #7993

ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain #7993

Conversation

jorgecarleitao commented Aug 18, 2020

github-actions bot commented Aug 18, 2020

alamb left a comment

Choose a reason for hiding this comment

alamb commented Aug 19, 2020 via email

andygrove commented Aug 19, 2020

jorgecarleitao commented Aug 19, 2020