Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. #31921

Open
UnamedRus opened this issue Nov 28, 2021 · 1 comment
Labels

Comments

@UnamedRus
Copy link
Contributor

Use case

Describe the solution you'd like

Give ability to get corresponding row number of file being ingested via s3/hdfs/file table function.

Describe alternatives you've considered

 SELECT row_number() OVER (PARTITION BY _path) FROM s3(...)

But it's too slow.

Additional context

Probably we can have some other columns like _raw in case of parsing error.

@UnamedRus UnamedRus changed the title virtual column _row_number for file, s3, hdfs functions. virtual column _row_number, _error for file, s3, hdfs functions. Dec 25, 2021
@UnamedRus UnamedRus changed the title virtual column _row_number, _error for file, s3, hdfs functions. virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. Dec 25, 2021
@tolmalev
Copy link

tolmalev commented Jan 2, 2024

Hi!
Just wanted to bump this feature request.
For parquet files some virtual columns could be quite useful:

  • _row_group_number - number of the row group in current file for the given row
  • _row_number_in_row_group - number of row within current row group
  • _row_number - row number within the file

Right now it's possible to use row_number() over () but it has some huge disadvantages. It limits thread count to 1 and also requires memory to materialize full query results before any aggregation can be applied on top of such table.

Such virtual columns would unblock some interesting options for parquet files processing

@ClickHouse ClickHouse deleted a comment from UnamedRus Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants