Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COPY TO "/dev/stdout" (or pipe) fails #2296

Closed
dforsber opened this issue Sep 17, 2021 · 9 comments · Fixed by #2299
Closed

COPY TO "/dev/stdout" (or pipe) fails #2296

dforsber opened this issue Sep 17, 2021 · 9 comments · Fixed by #2299
Assignees

Comments

@dforsber
Copy link

dforsber commented Sep 17, 2021

Writing to stdout/pipe?

DuckDB is not able to open /dev/stdout or a named pip (mkfifo) for writing when the output is redirected.

Specifying "/dev/stdout" as output file works and e.g. CSV data is printed on terminal, but whenever trying to redirect the output to file or pipe it, DuckDB fails to open the output "file". Reason being most probably that DuckDB tries to open the output with more permissions than plain write ("w").

Use case could for example be to pipe Parquet through DuckDB without having to write data on disk.

Maybe the pipe filesystem could be extended to support output pipes/stdout as well?

On OSX (v0.2.9 1776611ab):

duckdb :memory: "COPY (SELECT * FROM 'input.parquet') TO '/dev/stdout' WITH (FORMAT 'Parquet');" | \
    aws s3 cp - s3://bucket/out.parquet
Error: IO Error: Cannot open file "/dev/stdout": Permission denied
@hannes
Copy link
Member

hannes commented Sep 17, 2021

Why did you want to write a Parquet file to stdout again?

@dforsber
Copy link
Author

Why did you want to write a Parquet file to stdout again?

I updated the example on the description to show how Parquet can be written directly to S3 with piping for example. No need to write it to disk and then copy from disk to S3. Alternative would be to add direct S3 write support.

@dforsber
Copy link
Author

But Parquet seems to be a good candidate for output streaming as the metadata comes in the end (no need to construct the full file before). So, to be able to stream write the output improves overall throughput time when you read from S3 and output to S3.

@Mytherin Mytherin self-assigned this Sep 20, 2021
Mytherin added a commit to Mytherin/duckdb that referenced this issue Sep 20, 2021
…_WRONLY permissions so we can correctly write to fifo streams
Mytherin added a commit that referenced this issue Sep 20, 2021
Fix #2296: Avoid requesting O_RDWR permissions when we only need O_WRONLY so we can write to FIFO streams
@dforsber
Copy link
Author

Think there is regression as this bug has re-appeared?

@Mytherin
Copy link
Collaborator

Could you be more specific? This seems to work fine for me on both Linux and MacOS:

duckdb -c "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format parquet)" > test.parquet
duckdb -c "select count(*) from 'test.parquet'"
┌──────────────┐
│ count_star() │
├──────────────┤
│ 10000        │
└──────────────┘

@dforsber
Copy link
Author

Hmm, my apologies, mixed that with the issue referenced. The error came from parquet-tools, not duckdb. 👍🏻

@dforsber
Copy link
Author

Actually, the pipe is different than directing output to file:

% duckdb :memory: "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format parquet)" | aws s3 cp - s3://mybucket/test.parquet
Error: Not implemented Error: PipeFileSystem: FileSync is not implemented!

So, I think DuckDB fails to write to pipe as it tries to do FileSync for it.

@mskyttner
Copy link

Maybe this is a regression? Using duckdb CLI duckdb v0.5.1 7c11132 and now v0.6.1 919cad2 I see this when trying to COPY some CSV to /dev/stdout:

duckdb -c "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format csv)" > test.csv
Error: IO Error: Cannot open file "/dev/stdout.tmp": Permission denied

Seems to have worked at some point.

Should this issue be reopened or am I doing it wrong?

@Mytherin
Copy link
Collaborator

That looks like a regression indeed, thanks for reporting!

You could use the use_tmp_file false option as a work-around:

duckdb -c "copy (select * from range(10000) tbl(i)) to '/dev/stdout' (format csv, use_tmp_file false)" > test.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants