Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support copy command to load csv/parquet or s3 file into table #2627

Closed
3 tasks done
Tracked by #2525
sundy-li opened this issue Nov 3, 2021 · 7 comments
Closed
3 tasks done
Tracked by #2525

Support copy command to load csv/parquet or s3 file into table #2627

sundy-li opened this issue Nov 3, 2021 · 7 comments
Assignees
Labels
A-query Area: databend query

Comments

@sundy-li
Copy link
Member

sundy-li commented Nov 3, 2021

Summary

Description for this feature.

support copy command to load csv file into table.

COPY table_name FROM '/path/to/test.csv'

task list:

Reference:
https://www.postgresql.org/docs/13/sql-copy.html
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
https://duckdb.org/docs/sql/statements/copy

@ZhiHanZ
Copy link
Collaborator

ZhiHanZ commented Nov 3, 2021

I think it should be a query semantic in http handler?

@sundy-li
Copy link
Member Author

sundy-li commented Nov 3, 2021

I think it should be a query semantic in http handler?

But we need to upload localfile, how could http handler handle this?

@BohuTANG
Copy link
Member

BohuTANG commented Nov 3, 2021

I think it should be a query semantic in http handler?

But we need to upload localfile, how could http handler handle this?

  1. http handler provides a put file(like xx_yy.csv) command to a path as tmp_dir which configed in the query -- this upload a file to the internal stage
  2. query supports the COPY command: COPY INTO table_name FROM 'xx_yy.csv' -- query read the file from the internal stage

https://docs.snowflake.com/en/sql-reference/sql/put.html
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#loading-files-from-an-internal-stage

@BohuTANG BohuTANG added the A-query Area: databend query label Nov 3, 2021
@BohuTANG BohuTANG changed the title [bendctl] support copy command to load csv file into table Support copy command to load csv/parquet file into table Nov 4, 2021
@BohuTANG BohuTANG changed the title Support copy command to load csv/parquet file into table Support copy command to load csv/parquet or s3 file into table Nov 4, 2021
@BohuTANG BohuTANG mentioned this issue Nov 4, 2021
2 tasks
@sundy-li sundy-li self-assigned this Nov 4, 2021
@flaneur2020
Copy link
Member

I think it should be a query semantic in http handler?

But we need to upload localfile, how could http handler handle this?

  1. http handler provides a put file(like xx_yy.csv) command to a path as tmp_dir which configed in the query -- this upload a file to the internal stage
  2. query supports the COPY command: COPY INTO table_name FROM 'xx_yy.csv' -- query read the file from the internal stage

https://docs.snowflake.com/en/sql-reference/sql/put.html https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#loading-files-from-an-internal-stage

the csv file have to located on the bend-query instance's local filesystem? 🤔

@flaneur2020
Copy link
Member

how about copy file from a s3 or hdfs path? like:

COPY INTO table_name FROM 's3://xx_yy.csv?s3_token=xxxx'

@sundy-li
Copy link
Member Author

sundy-li commented Nov 4, 2021

how about copy file from a s3 or hdfs path? like:

COPY INTO table_name FROM 's3://xx_yy.csv?s3_token=xxxx'

Yes, we can judge the protocol like "s3", "http", "hdfs", "file". I'll implement a base version to support local filesystem.

@BohuTANG
Copy link
Member

BohuTANG commented Dec 1, 2021

Are there any more tasks for this issue?

@sundy-li sundy-li closed this as completed Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query
Projects
None yet
Development

No branches or pull requests

4 participants