Skip to content

Comet should fallback to Spark for streaming queries #4115

@comphead

Description

@comphead

Describe the bug

Found in https://github.com/apache/datafusion-comet/actions/runs/25006628164/job/73231896519?pr=4003

[info] Current State: ACTIVE
[info] Thread State: RUNNABLE
[info] 
[info] Logical Plan:
[info] ~WriteToMicroBatchDataSource MemorySink, 221e27b9-9a64-4a81-8f67-f4b69f9517da, Update
[info] +- ~Aggregate [_1#770], [_1#770, size(collect_set(value#773, 0, 0), false) AS size(collect_set(value))#778]
[info]    +- ~Project [_1#770, _2#771, value#773]
[info]       +- ~Generate explode(_2#771), false, [value#773]
[info]          +- ~StreamingDataSourceV2ScanRelation[_1#770, _2#771] MemoryStreamDataSource
[info]   at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:372)
[info]   at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:226)
[info]   Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 33.0 failed 1 times, most recent failure: Lost task 3.0 in stage 33.0 (TID 97) (370c4b0e19c5 executor driver): org.apache.comet.CometNativeException: list array
[info]         at datafusion_comet_jni_bridge::errors::init::{{closure}}(__internal__:0)
[info]         at std::panicking::panic_with_hook(__internal__:0)
[info]         at std::panicking::panic_handler::{closure#0}(__internal__:0)
[info]         at std::sys::backtrace::__rust_end_short_backtrace::<std::panicking::panic_handler::{closure#0}, !>(__internal__:0)
[info]         at __rustc::rust_begin_unwind(__internal__:0)
[info]         at core::panicking::panic_fmt(__internal__:0)
[info]         at core::option::expect_failed(__internal__:0)
[info]         at <datafusion_functions_aggregate::array_agg::DistinctArrayAggAccumulator as datafusion_expr_common::accumulator::Accumulator>::merge_batch(__internal__:0)

Steps to reproduce

No response

Expected behavior

Comet should fallback to Spark if the query is streaming

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingpriority:lowMinor issues, test failures, tooling, cosmetic

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions