virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. #31921

UnamedRus · 2021-11-28T00:52:23Z

Use case

Describe the solution you'd like

Give ability to get corresponding row number of file being ingested via s3/hdfs/file table function.

Describe alternatives you've considered

 SELECT row_number() OVER (PARTITION BY _path) FROM s3(...)

But it's too slow.

Additional context

Probably we can have some other columns like _raw in case of parsing error.

The text was updated successfully, but these errors were encountered:

tolmalev · 2024-01-02T23:03:31Z

Hi!
Just wanted to bump this feature request.
For parquet files some virtual columns could be quite useful:

_row_group_number - number of the row group in current file for the given row
_row_number_in_row_group - number of row within current row group
_row_number - row number within the file

Right now it's possible to use row_number() over () but it has some huge disadvantages. It limits thread count to 1 and also requires memory to materialize full query results before any aggregation can be applied on top of such table.

Such virtual columns would unblock some interesting options for parquet files processing

UnamedRus added the feature label Nov 28, 2021

UnamedRus changed the title ~~virtual column _row_number for file, s3, hdfs functions.~~ virtual column _row_number, _error for file, s3, hdfs functions. Dec 25, 2021

UnamedRus changed the title ~~virtual column _row_number, _error for file, s3, hdfs functions.~~ virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. Dec 25, 2021

ClickHouse deleted a comment from UnamedRus Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. #31921

virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. #31921

UnamedRus commented Nov 28, 2021

tolmalev commented Jan 2, 2024

virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. #31921

virtual column _row_number, _error, _raw_value for file, s3, hdfs functions. #31921

Comments

UnamedRus commented Nov 28, 2021

tolmalev commented Jan 2, 2024