Implement StreamTable and StreamTableProvider (#7994) #8021

tustvold · 2023-11-01T20:34:12Z

Which issue does this PR close?

Rationale for this change

See ticket

What changes are included in this PR?

Implements StreamTable and StreamTableProvider as in the design proposal. Also defines a DefaultTableFactory that wires this up as the default for unbounded tables.

Are these changes tested?

Are there any user-facing changes?

tustvold · 2023-11-01T20:39:10Z

datafusion/core/tests/fifo.rs

-            let execution_start = Instant::now();
-            write_to_fifo(&file, &header, execution_start, broken_pipe_timeout).unwrap();
-            for (cnt, line) in enumerate(lines) {
-                while waiting_lock.load(Ordering::SeqCst) && cnt > wait_until {


I had some difficulty following the various locking, etc... I think the new implementation is broadly equivalent, but I may be missing some nuance

tustvold · 2023-11-01T20:39:59Z

datafusion/core/tests/fifo.rs

+        ) -> Result<u64> {
+            let config = self.0.clone();
+            let (sender, mut receiver) = tokio::sync::mpsc::channel::<RecordBatch>(2);
+            // Note: FIFO Files support poll so this could use AsyncFd


This is an example of the potential benefits of a custom TableProvider, it can use external knowledge about the nature of the files in question.

tustvold · 2023-11-01T20:41:04Z

datafusion/core/tests/fifo.rs

@@ -394,8 +491,6 @@ mod unix_test {
                a2  INT NOT NULL
            )
            STORED AS CSV
-            WITH HEADER ROW


There is no particular reason that CSV headers could not be supported, but it was simpler to just not read/write them for the purposes of this test

metesynnada · 2023-11-02T16:48:45Z

Thx @tustvold, I believe that the proof of concept is looking good. My opinion:

Instead of creating a new TableProvider for FIFO, we could use a StreamingTableProvider including its factory. We can obtain the correct factory by using the UNBOUNDED (can be change like STREAM) keyword in SQL. Additionally, we can leverage a new StreamSinkExec and valid DataSink implementations to handle different use cases like FIFO or Kafka. I think this approach could scale effectively. For FIFO, there are different types of formats as well, also a potential Kafka Sink may implement different types of BatchSerializers. This was in my mind for a time but couldn't find the necessary time to code it. Does it seem like fine plan?

metesynnada · 2023-11-02T16:50:12Z

I believe we can use your implementation of DataSink for FIFO, nicely coded 👍🏻

tustvold · 2023-11-02T17:01:41Z

Instead of creating a new TableProvider for FIFO, we could use a StreamingTableProvider including its factory

So this is what I started on implementing, but I wasn't really sure how this interface would materially differ from the existing interfaces used by this PR. In particular I'm struggling to devise an interface that would substantially reduce the amount of boilerplate, whilst still preserving the necessary flexibility. Perhaps I am missing something?

alamb

I reviewed this PR and I think it looks really nice -- thank you @tustvold and @metesynnada 🏆

I also filed #8035 with some additonal information about PartitionStream

metesynnada · 2023-11-03T07:20:52Z

Instead of creating a new TableProvider for FIFO, we could use a StreamingTableProvider including its factory

So this is what I started on implementing, but I wasn't really sure how this interface would materially differ from the existing interfaces used by this PR. In particular I'm struggling to devise an interface that would substantially reduce the amount of boilerplate, whilst still preserving the necessary flexibility. Perhaps I am missing something?

To help with this subject, let's break it down into smaller parts. We can begin by focusing on the FIFO table and finding ways to optimize the SQL involved. Additionally, for external tables that are unbounded, we can create FIFO tables with the proper formatting, starting with CSV and AVRO. When it comes to session state, we keep TableProviderFactory as

    /// This is used to create [`TableProvider`] instances for the
    /// `CREATE EXTERNAL TABLE ... STORED AS <FORMAT>` for custom file
    /// formats other than those built into DataFusion
    table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,

which is not scalable. Currently, Datafusion SQL is closely intertwined with the ListingTable approach, which resulted in FIFO support within the ListingTable. To avoid this, we need to be able to use multiple TableProviderFactory instances for a single format, by specifying the connector.

TL:DR, this part should change to scale the support.

async fn create_custom_table(
        &self,
        cmd: &CreateExternalTable,
    ) -> Result<Arc<dyn TableProvider>> {
        let state = self.state.read().clone();
        let file_type = cmd.file_type.to_uppercase();
        let factory =
            &state
                .table_factories
                .get(file_type.as_str())
                .ok_or_else(|| {
                    DataFusionError::Execution(format!(
                        "Unable to find factory for {}",
                        cmd.file_type
                    ))
                })?;
        let table = (*factory).create(&state, cmd).await?;
        Ok(table)
    }

tustvold · 2023-11-03T09:07:44Z

which is not scalable

I don't see why you couldn't have a TableProviderFactory that itself delegates out to other TableProviderFactory based on some criteria? Perhaps I am misunderstanding what you mean by scalable?

different use cases like FIFO or Kafka

Perhaps it is worth discussing where such functionality should live? In this repo, in downstream codebases, somewhere inbetween e.g. datafusion-contrib? Unless I'm mistaken currently support for this mostly only exists in downstream repos, with no support in DF for creating or managing fifo files?

tustvold · 2023-11-03T09:09:48Z

I've marked this ready for review as I think the feedback is this is a sound approach, but feel free to correct me if I am wrong

ozankabak

Please do not merge this yet -- AFAICT this breaks use cases including creating external tables reading from FIFO files that used to have built-in support.

We like the idea of having a FIFOTable, separating a single sink into FileSink and a StreamSink etc. However, we don't seem to agree on the design yet. We will soon share our design proposal with diagrams and API level specs.

tustvold · 2023-11-03T12:34:55Z

AFAICT this breaks use cases including creating external tables reading from FIFO files that used to have built-in support

TBC this PR simply updates the fifo tests and makes no changes to ListingTable, which would instead be a follow on PR. The idea of this PR is to show that removing the functionality from ListingTable, whilst it would still represent a breaking change, would not require major architectural changes to existing workloads, nor would it require introducing new abstractions to DF. If you feel this hasn't been demonstrated please let me know how I can articulate this better.

ozankabak · 2023-11-03T14:10:58Z

We have discussed this today in detail. We will come back with a design/proposal early next week. Let's iterate together to find the best design. We will also help with the implementation when we converge.

tustvold · 2023-11-08T09:12:20Z

Given I think this PR has demonstrated the existing APIs can accommodate streaming use-cases, perhaps we might move forward with this and potentially refine things later? I'm keen to find a way to keep things moving so we don't hold up #8029 and by extension the next DF release which is waiting on an arrow bug fix apache/arrow-rs#5050

alamb · 2023-11-08T14:28:17Z

I don't think we should merge this without getting consensus / ensuring the synnada team has reasonable way forward, especially there are other options (e.g. an arrow patch release) that can unblock the datafusion release

In particular, I think the tradeoffs between different implementation options and order come down to who is going to do the work, and I think there are ways it doesn't all fall into @tustvold . I will try and set up a call / some forum to discuss this and will post notes on the outcome

ozankabak · 2023-11-08T20:18:24Z

We anticipate to have @metesynnada's design proposal, which aims to embody the discussions in Issue #7994, in a weeks time. The proposed design will provide an API for streaming sources that will decouple them from ListingTable. We propose to not merge this draft, iterate on the upcoming proposal to converge on a design as a community, and hold off on upgrading to arrow-rs 49.0.0 during this process. We do so as this upgrade removes the append functionality from object_store, which is critical to the current implementation of FIFO functionality in Datafusion.

We aim to resolve the situation in approximately two weeks, with the following end state: (1) We will have merged the converged-on design decoupling ListingTable from streaming sources, (2) We will proceed with the upgrade to arrow-rs 49.0.0.

In the meantime, we may make a serde-fix-backported 48.1.0 release of arrow-rs to get Datafusion benefit from this fix.

alamb · 2023-11-08T20:20:03Z

We anticipate to have @metesynnada's design proposal, which aims to embody the discussions in Issue #7994, in a weeks time. The proposed design will provide an API for streaming sources that will decouple them from ListingTable. We propose to not merge this draft, iterate on the upcoming proposal to converge on a design as a community, and hold off on upgrading to arrow-rs 49.0.0 during this process. We do so as this upgrade removes the append functionality from object_store, which is critical to the current implementation of FIFO functionality in Datafusion.

We aim to resolve the situation in approximately two weeks, with the following end state: (1) We will have merged the converged-on design decoupling ListingTable from streaming sources, (2) We will proceed with the upgrade to arrow-rs 49.0.0.

In the meantime, we may make a serde-fix-backported 48.1.0 release of arrow-rs to get Datafusion benefit from this fix.

I think this makes a lot of sense -- thank you @ozankabak . I will prepare a 48.0.1 patch release candidate -- see apache/arrow-rs#5050 (comment)

alamb · 2023-11-08T20:20:36Z

Marking as draft so we don't accidentally merge this as we work on the updated design

alamb · 2023-11-08T21:33:35Z

We have discussed this today in detail. We will come back with a design/proposal early next week. Let's iterate together to find the best design. We will also help with the implementation when we converge.

@ozankabak -- @tustvold said he may have some time to help (and would be interested in assisting the design and implementation to help accelerate this process).

Is it possible to describe the use case you have in more detail?

Given that this particular PR replicates the functionality that already exists in DataFusion as far as we can tell (using the extension APIs), we think we must not fully understand the use case for which you are designing a solution.

ozankabak · 2023-11-08T21:46:11Z

I will talk to @metesynnada tomorrow and check with him to see where he is at in terms of the proposal. If he is close to getting something he can share soon, it'd be better to just talk about the proposal and collaborate on it directly. If he is not close to having something publishable, we will explain the use cases we are trying to support beforehand. I will circle back tomorrow.

@tustvold said he may have some time to help (and would be interested in assisting the design and implementation to help accelerate this process)

thank you, we appreciate it

tustvold · 2023-11-14T12:54:30Z

datafusion/core/src/datasource/mod.rs

+                                },
+                            });
+                        }
+                        // Cannot find expression in the projected_schema, stop iterating


This is necessary as the sort key may reference columns not found in the projected schema.

This change matches get_projected_output_ordering

tustvold · 2023-11-14T12:58:26Z

datafusion/core/src/datasource/stream.rs

+    fn writer(&self) -> Result<Box<dyn RecordBatchWriter>> {
+        match &self.encoding {
+            StreamEncoding::Csv => {
+                let header = self.header && !self.location.exists();


I wasn't entirely sure about this, CsvSink currently has the following logic which I've attempted to replicate here

let serializer: Box<dyn BatchSerializer> = Box::new(if file_size > 0 { CsvSerializer::new() .with_builder(inner_clone) .with_header(false) } else { CsvSerializer::new() .with_builder(inner_clone) .with_header(options_clone.writer_options.header()) });

tustvold · 2023-11-14T12:59:07Z

datafusion/core/src/datasource/stream.rs

@@ -177,7 +190,7 @@ impl TableProvider for StreamTable {
    }

    fn table_type(&self) -> TableType {
-        TableType::Temporary
+        TableType::Base


This is necessary so that Drop behaves correctly

tustvold · 2023-11-14T13:23:48Z

unbounded_file_with_symmetric_join has failed on OS X, I have no looked into if this is an intermittent failure, but I note that this test has been ignored on main since d6c2233#diff-2f6b05fb3bc4e54422c680dfa47dd8a993cc93fcaa056f678e455cc5d83ae6d0R237

I'm not sure what the history here is, is this a known flaky test that I should keep ignored?

Edit: looks to just be flaky

ozankabak · 2023-11-14T16:16:36Z

Thanks @tustvold, we will look into this early morning tomorrow and report our findings

metesynnada · 2023-11-15T07:45:40Z

The previous locks ensure that the FIFO is read and written in a streaming manner, simulating infinite data streams. This guarantees that each batch produced after a specific amount of data is written.

I plan to submit a corrective commit soon to address stucking caused by asynchronous processing and configuration issues in the current implementation.

tustvold · 2023-11-15T09:57:23Z

Thank you, I've also just pushed a commit that configures the readers to respect the configured batch size which may help

metesynnada · 2023-11-15T09:59:48Z

Thank you, I've also just pushed a commit that configures the readers to respect the configured batch size which may help

Yes, it is necessary as well. However, I need to figure out some points on async runtime. Then, we can move on I think.

tustvold · 2023-11-15T10:04:50Z

However, I need to figure out some points on async runtime

If you're able to articulate the issue I may be able to help out here.

tustvold · 2023-11-15T10:32:43Z

CI failures are due to logical merge conflicts - #8187

metesynnada · 2023-11-15T10:48:07Z

datafusion/core/src/datasource/stream.rs

+        let config = self.0.clone();
+        let schema = self.0.schema.clone();
+        let batch_size = ctx.session_config().batch_size();
+        let mut builder = RecordBatchReceiverStreamBuilder::new(schema, 2);


I am attempting to eliminate the tokio channel dependency, which is not necessary in theory. However, the dyn RecordBatchReader<Item = Result<arrow_array::RecordBatch, arrow_schema::ArrowError>> is not implementing Send and Sync. Therefore, creating an unfold directly from them is not possible. Do you have any suggestions on how to achieve this without creating a dedicated stream struct similar to what we are doing in ExecutionPlan.execute()?

The tokio channel is necessary because we're performing blocking file IO no?

For synchronous IO, using spawn_blocking is the rule of thumb, you are right. We can keep this.

Yeah, it is actually a bug that we don't appear to currently be doing this. I've filed #8188 which I'll follow up with once this is merged in

metesynnada

I think we can merge this when the tests are passed.

tustvold · 2023-11-15T12:01:11Z

Thank you for your help getting this over the line.

@ozankabak any final comments or are you happy for me to proceed?

ozankabak · 2023-11-15T12:14:23Z

@tustvold, thank you for helping out with the stopgap. @metesynnada will use this as a foundation as he implements the proposal to completion. Feel free to go ahead and merge this after CI passes 🚀

alamb

Thank you everyone -- I know there were some communication bumps getting here, but this looks like a really nice design in my mind and sets us up for good things to come.

Thanks again

alamb · 2023-11-15T12:40:52Z

datafusion/core/src/datasource/stream.rs

+
+/// The data encoding for [`StreamTable`]
+#[derive(Debug, Clone)]
+pub enum StreamEncoding {


alamb · 2023-11-15T12:42:54Z

datafusion/sqllogictest/test_files/ddl.slt

@@ -750,7 +750,7 @@ query TT
 explain select c1 from t;
 ----
 logical_plan TableScan: t projection=[c1]
-physical_plan CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/empty.csv]]}, projection=[c1], infinite_source=true, has_header=true
+physical_plan StreamingTableExec: partition_sizes=1, projection=[c1], infinite_source=true


alamb · 2023-11-15T12:44:08Z

datafusion/core/src/datasource/stream.rs

+
+/// The configuration for a [`StreamTable`]
+#[derive(Debug)]
+pub struct StreamConfig {


Should we note here that this is likely to undergo API changes to support additional usecases (like streaming to/from Kafka)?

I'm not sure this would necessarily change materially, it parallels ListingTableConfig which is used by ListingTable, but I suppose we shall see

alamb · 2023-11-15T12:45:00Z

datafusion/core/src/datasource/stream.rs

+    }
+}
+
+/// A [`TableProvider`] for a stream source, such as a FIFO file


Can we maybe also add some comments here about how this is similar / different than ListingTable to help new people navigate the code and understand when to use one vs the other?

We could totally do this as a follow on PR as well

I will add some comments in a follow on PR

follow up: #8192

Implement FIFO using extension points (apache#7994)

bf58d3e

github-actions bot added the core Core datafusion crate label Nov 1, 2023

tustvold mentioned this pull request Nov 1, 2023

Decouple Streaming Use-Case from File IO Abstractions #7994

Closed

tustvold commented Nov 1, 2023

View reviewed changes

Clippy

30b69cf

This was referenced Nov 2, 2023

Update arrow 49.0.0 and object_store 0.8.0 #8029

Merged

Remove ObjectStore::append apache/arrow-rs#5016

Merged

alamb mentioned this pull request Nov 2, 2023

Minor: Improve documentation for PartitionStream and StreamingTableExec #8035

Merged

alamb reviewed Nov 2, 2023

View reviewed changes

tustvold marked this pull request as ready for review November 3, 2023 09:08

ozankabak reviewed Nov 3, 2023

View reviewed changes

alamb mentioned this pull request Nov 8, 2023

48.0.1 arrow patch release apache/arrow-rs#5050

Closed

6 tasks

alamb marked this pull request as draft November 8, 2023 20:20

Fix project sort keys and CSV headers

4de45aa

github-actions bot added the sqllogictest label Nov 14, 2023

tustvold commented Nov 14, 2023

View reviewed changes

tustvold requested review from alamb, metesynnada and ozankabak November 14, 2023 14:12

Respect batch size on read

49de1e4

metesynnada reviewed Nov 15, 2023

View reviewed changes

metesynnada and others added 3 commits November 15, 2023 13:52

Tests are updated

e559731

Resolving clippy

917d060

Merge remote-tracking branch 'upstream/main' into fifo-custom-table

899c6f8

metesynnada approved these changes Nov 15, 2023

View reviewed changes

ozankabak approved these changes Nov 15, 2023

View reviewed changes

alamb added the api change Changes the API exposed to users of the crate label Nov 15, 2023

alamb approved these changes Nov 15, 2023

View reviewed changes

tustvold merged commit 020b8fc into apache:main Nov 15, 2023
22 checks passed

This was referenced Nov 15, 2023

Add more stream docs #8192

Merged

Remove FileWriterMode and ListingTableInsertMode (#7994) #8017

Merged

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Implement StreamTable and StreamTableProvider (#7994) #8021

Implement StreamTable and StreamTableProvider (#7994) #8021

Conversation

tustvold commented Nov 1, 2023 • edited

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metesynnada commented Nov 2, 2023

metesynnada commented Nov 2, 2023

tustvold commented Nov 2, 2023

alamb left a comment

Choose a reason for hiding this comment

metesynnada commented Nov 3, 2023

tustvold commented Nov 3, 2023 • edited

tustvold commented Nov 3, 2023

ozankabak left a comment

Choose a reason for hiding this comment

tustvold commented Nov 3, 2023

ozankabak commented Nov 3, 2023

tustvold commented Nov 8, 2023 • edited

alamb commented Nov 8, 2023

ozankabak commented Nov 8, 2023

alamb commented Nov 8, 2023

alamb commented Nov 8, 2023

alamb commented Nov 8, 2023 • edited

ozankabak commented Nov 8, 2023

Choose a reason for hiding this comment

tustvold Nov 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Nov 14, 2023 • edited

ozankabak commented Nov 14, 2023

metesynnada commented Nov 15, 2023 • edited

tustvold commented Nov 15, 2023

metesynnada commented Nov 15, 2023

tustvold commented Nov 15, 2023

tustvold commented Nov 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metesynnada left a comment

Choose a reason for hiding this comment

tustvold commented Nov 15, 2023

ozankabak commented Nov 15, 2023

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Nov 1, 2023 •

edited

tustvold commented Nov 3, 2023 •

edited

tustvold commented Nov 8, 2023 •

edited

alamb commented Nov 8, 2023 •

edited

tustvold Nov 14, 2023 •

edited

tustvold commented Nov 14, 2023 •

edited

metesynnada commented Nov 15, 2023 •

edited