ARROW-9683: [Rust][DataFusion] Add debug printing to physical plans and associated types #7925

alamb · 2020-08-10T16:01:11Z

For ARROW-9653, I was trying to debug the execution plan and I would have found it easier if there had been a way to display the execution plan to better understand and isolate the issue. This would also be nice to have as part of EXPLAIN plan functionality in ARROW-9654

In general, for debugging purposes, we would like to be able to dump out an execution plan. To do so in the idiomatic rust way, I made ExecutionPlan also implement std::fmt::Debug and then followed rustc guidance until I got everything that was needed

Here is an example plan for "SELECT c1, c2, MIN(c3) FROM aggregate_test_100 GROUP BY c1, c2" when printed using println!("{:#?}", plan):

physical plan is HashAggregateExec {
    group_expr: [
        Column {
            name: "c1",
        },
        Column {
            name: "c2",
        },
    ],
    aggr_expr: [
        Min {
            expr: Column {
                name: "c3",
            },
        },
    ],
    input: DataSourceExec {
        schema: Schema {
            fields: [
                Field {
                    name: "c1",
                    data_type: Utf8,
                    nullable: false,
                    dict_id: 0,
                    dict_is_ordered: false,
                },
                Field {
                    name: "c2",
                    data_type: UInt32,
                    nullable: false,
                    dict_id: 0,
                    dict_is_ordered: false,
                },
                Field {
                    name: "c3",
                    data_type: Int8,
                    nullable: false,
                    dict_id: 0,
                    dict_is_ordered: false,
                },
            ],
            metadata: {},
        },
        partitions.len: 1,
    },
    schema: Schema {
        fields: [
            Field {
                name: "c1",
                data_type: Utf8,
                nullable: true,
                dict_id: 0,
                dict_is_ordered: false,
            },
            Field {
                name: "c2",
                data_type: UInt32,
                nullable: true,
                dict_id: 0,
                dict_is_ordered: false,
            },
            Field {
                name: "MIN(c3)",
                data_type: Int64,
                nullable: true,
                dict_id: 0,
                dict_is_ordered: false,
            },
        ],
        metadata: {},
    },
}

alamb · 2020-08-10T16:02:25Z

rust/datafusion/src/execution/physical_plan/datasource.rs

@@ -61,6 +73,12 @@ pub struct DatasourcePartition {
    batch_iter: Arc<Mutex<dyn RecordBatchReader + Send + Sync>>,
 }

+impl Debug for DatasourcePartition {


RecordBatchReader does not implement Debug so had to implement directly here.

alamb · 2020-08-10T16:03:29Z

rust/datafusion/src/execution/physical_plan/parquet.rs

@@ -110,6 +112,12 @@ struct ParquetPartition {
    iterator: Arc<Mutex<dyn RecordBatchReader + Send + Sync>>,
 }

+impl Debug for ParquetPartition {
+    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
+        f.debug_struct("ParquetPartition").finish()


As above, this is not #derive because RecordBatchReader does not implement Debug

alamb · 2020-08-10T16:04:30Z

rust/datafusion/src/execution/physical_plan/udf.rs

+            .field("name", &self.name)
+            .field("args", &self.args)
+            .field("return_type", &self.return_type)
+            .field("fun", &"<FUNC>")


The ScanarUdf link is typedef'd to a function pointer and does not implement Debug leading to the two implementations in this file

github-actions · 2020-08-10T16:06:22Z

https://issues.apache.org/jira/browse/ARROW-9683

…nd associated types

alamb · 2020-08-10T19:51:08Z

Rebased to pick up 37ee600

andygrove

LGTM. Thanks @alamb

…nd associated types For ARROW-9653, I was trying to debug the execution plan and I would have found it easier if there had been a way to display the execution plan to better understand and isolate the issue. This would also be nice to have as part of EXPLAIN plan functionality in ARROW-9654 In general, for debugging purposes, we would like to be able to dump out an execution plan. To do so in the idiomatic rust way, I made `ExecutionPlan` also implement `std::fmt::Debug` and then followed `rustc` guidance until I got everything that was needed Here is an example plan for `"SELECT c1, c2, MIN(c3) FROM aggregate_test_100 GROUP BY c1, c2"` when printed using `println!("{:#?}", plan)`: ``` physical plan is HashAggregateExec { group_expr: [ Column { name: "c1", }, Column { name: "c2", }, ], aggr_expr: [ Min { expr: Column { name: "c3", }, }, ], input: DataSourceExec { schema: Schema { fields: [ Field { name: "c1", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, }, Field { name: "c2", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, }, Field { name: "c3", data_type: Int8, nullable: false, dict_id: 0, dict_is_ordered: false, }, ], metadata: {}, }, partitions.len: 1, }, schema: Schema { fields: [ Field { name: "c1", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, }, Field { name: "c2", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, }, Field { name: "MIN(c3)", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, }, ], metadata: {}, }, } ``` Closes apache#7925 from alamb/alamb/ARROW-9683-debug Authored-by: alamb <andrew@nerdnetworks.org> Signed-off-by: Andy Grove <andygrove@nvidia.com>

alamb commented Aug 10, 2020

View reviewed changes

andygrove added Component: Rust Component: Rust - DataFusion labels Aug 10, 2020

ARROW-9683: [Rust][DataFusion] Add debug printing to physical plans a…

9a59730

…nd associated types

alamb force-pushed the alamb/ARROW-9683-debug branch from e21b7a5 to 9a59730 Compare August 10, 2020 19:50

andygrove approved these changes Aug 10, 2020

View reviewed changes

nevi-me approved these changes Aug 11, 2020

View reviewed changes

andygrove closed this in e31e5d4 Aug 11, 2020

alamb deleted the alamb/ARROW-9683-debug branch August 11, 2020 15:28

asfimport mentioned this pull request Sep 29, 2020

[Rust][DataFusion] Implement Debug for ExecutionPlan trait #25739

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-9683: [Rust][DataFusion] Add debug printing to physical plans and associated types #7925

ARROW-9683: [Rust][DataFusion] Add debug printing to physical plans and associated types #7925

alamb commented Aug 10, 2020

alamb Aug 10, 2020

alamb Aug 10, 2020

alamb Aug 10, 2020

github-actions bot commented Aug 10, 2020

alamb commented Aug 10, 2020

andygrove left a comment

ARROW-9683: [Rust][DataFusion] Add debug printing to physical plans and associated types #7925

ARROW-9683: [Rust][DataFusion] Add debug printing to physical plans and associated types #7925

Conversation

alamb commented Aug 10, 2020

alamb Aug 10, 2020

Choose a reason for hiding this comment

alamb Aug 10, 2020

Choose a reason for hiding this comment

alamb Aug 10, 2020

Choose a reason for hiding this comment

github-actions bot commented Aug 10, 2020

alamb commented Aug 10, 2020

andygrove left a comment

Choose a reason for hiding this comment