COPY TO "/dev/stdout" (or pipe) fails #2296

dforsber · 2021-09-17T12:29:28Z

Writing to stdout/pipe?

DuckDB is not able to open /dev/stdout or a named pip (mkfifo) for writing when the output is redirected.

Specifying "/dev/stdout" as output file works and e.g. CSV data is printed on terminal, but whenever trying to redirect the output to file or pipe it, DuckDB fails to open the output "file". Reason being most probably that DuckDB tries to open the output with more permissions than plain write ("w").

Use case could for example be to pipe Parquet through DuckDB without having to write data on disk.

Maybe the pipe filesystem could be extended to support output pipes/stdout as well?

On OSX (v0.2.9 1776611ab):

duckdb :memory: "COPY (SELECT * FROM 'input.parquet') TO '/dev/stdout' WITH (FORMAT 'Parquet');" | \
    aws s3 cp - s3://bucket/out.parquet
Error: IO Error: Cannot open file "/dev/stdout": Permission denied

The text was updated successfully, but these errors were encountered:

hannes · 2021-09-17T13:31:32Z

Why did you want to write a Parquet file to stdout again?

dforsber · 2021-09-17T15:31:47Z

Why did you want to write a Parquet file to stdout again?

I updated the example on the description to show how Parquet can be written directly to S3 with piping for example. No need to write it to disk and then copy from disk to S3. Alternative would be to add direct S3 write support.

dforsber · 2021-09-17T15:37:13Z

But Parquet seems to be a good candidate for output streaming as the metadata comes in the end (no need to construct the full file before). So, to be able to stream write the output improves overall throughput time when you read from S3 and output to S3.

…_WRONLY permissions so we can correctly write to fifo streams

Fix #2296: Avoid requesting O_RDWR permissions when we only need O_WRONLY so we can write to FIFO streams

dforsber · 2021-12-23T18:24:59Z

Think there is regression as this bug has re-appeared?

Mytherin · 2021-12-23T23:53:28Z

Could you be more specific? This seems to work fine for me on both Linux and MacOS:

duckdb -c "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format parquet)" > test.parquet
duckdb -c "select count(*) from 'test.parquet'"
┌──────────────┐
│ count_star() │
├──────────────┤
│ 10000        │
└──────────────┘

dforsber · 2021-12-24T07:15:10Z

Hmm, my apologies, mixed that with the issue referenced. The error came from parquet-tools, not duckdb. 👍🏻

dforsber · 2021-12-24T21:27:21Z

Actually, the pipe is different than directing output to file:

% duckdb :memory: "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format parquet)" | aws s3 cp - s3://mybucket/test.parquet
Error: Not implemented Error: PipeFileSystem: FileSync is not implemented!

So, I think DuckDB fails to write to pipe as it tries to do FileSync for it.

mskyttner · 2023-01-18T10:45:16Z

Maybe this is a regression? Using duckdb CLI duckdb v0.5.1 7c11132 and now v0.6.1 919cad2 I see this when trying to COPY some CSV to /dev/stdout:

duckdb -c "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format csv)" > test.csv
Error: IO Error: Cannot open file "/dev/stdout.tmp": Permission denied

Seems to have worked at some point.

Should this issue be reopened or am I doing it wrong?

Mytherin · 2023-01-18T11:33:21Z

That looks like a regression indeed, thanks for reporting!

You could use the use_tmp_file false option as a work-around:

duckdb -c "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format csv, use_tmp_file false)" > test.csv

Mytherin self-assigned this Sep 20, 2021

Mytherin added a commit to Mytherin/duckdb that referenced this issue Sep 20, 2021

Fix duckdb#2296: Avoid getting O_RDWR permissions when we only need O…

2548dce

…_WRONLY permissions so we can correctly write to fifo streams

Mytherin linked a pull request Sep 20, 2021 that will close this issue

Fix #2296: Avoid requesting O_RDWR permissions when we only need O_WRONLY so we can write to FIFO streams #2299

Merged

Mytherin closed this as completed in #2299 Sep 20, 2021

Mytherin added a commit that referenced this issue Sep 20, 2021

Merge pull request #2299 from Mytherin/master

ee3e405

Fix #2296: Avoid requesting O_RDWR permissions when we only need O_WRONLY so we can write to FIFO streams

mskyttner mentioned this issue Dec 20, 2021

Reading a parquet written to "/dev/stdout" from "/dev/stdin" using duckdb CLI and a pipe fails with Error: PipeFileSystem: GetLastModifiedTime is not implemented! #2826

Closed

Mytherin mentioned this issue Jan 18, 2023

Avoid writing .tmp file when redirecting stdout to a file #5930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COPY TO "/dev/stdout" (or pipe) fails #2296

COPY TO "/dev/stdout" (or pipe) fails #2296

dforsber commented Sep 17, 2021 •

edited

hannes commented Sep 17, 2021

dforsber commented Sep 17, 2021

dforsber commented Sep 17, 2021

dforsber commented Dec 23, 2021

Mytherin commented Dec 23, 2021

dforsber commented Dec 24, 2021

dforsber commented Dec 24, 2021

mskyttner commented Jan 18, 2023

Mytherin commented Jan 18, 2023

COPY TO "/dev/stdout" (or pipe) fails #2296

COPY TO "/dev/stdout" (or pipe) fails #2296

Comments

dforsber commented Sep 17, 2021 • edited

hannes commented Sep 17, 2021

dforsber commented Sep 17, 2021

dforsber commented Sep 17, 2021

dforsber commented Dec 23, 2021

Mytherin commented Dec 23, 2021

dforsber commented Dec 24, 2021

dforsber commented Dec 24, 2021

mskyttner commented Jan 18, 2023

Mytherin commented Jan 18, 2023

dforsber commented Sep 17, 2021 •

edited