-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditionally allow to keep partition_by columns when using PARTITIONED BY #10971
Comments
Seems like a good idea to me |
I spent some time looking into how to implement this. I have a very basic change that I think would bring the functionality: main...hveiga:datafusion:issue-10971 I am looking for guidance on two aspects:
I see that Thanks! |
sqllogictests are a good place to start. You can do a
You could check for the new option in
|
Is your feature request related to a problem or challenge?
I am using Datafusion to partition some data stored in parquet files into a different set of parquet files. I would like those newly created files to contain the columns I am partitioning by, however currently the column gets removed as it becomes part of the file directory structure. Something like:
COPY (SELECT col1, col2, col3, col4 FROM my_external_table) TO '/output' STORED AS PARQUET PARTITIONED BY (col1) OPTIONS (compression 'zstd(2)');
some_random_file_name.parquet
will not contain the columncol1
. I would like to keep it as it might be needed later for other use cases.Describe the solution you'd like
I would like to have a flag as part of the
OPTIONS
of theCOPY
statement to conditionally allow the column to remain in the partitioned files. For example:keep_partition_by_columns
, set tofalse
by default.Describe alternatives you've considered
None. I don't think there is an alternative at the moment.
Additional context
Related discussion: #10962
I also don't know if this might have implications when reading a hive-partitioned directory structure as you would have a given column in the parquet files and also as part of the directory structure, but it's worth pointing it out in case there might be a collision.
The text was updated successfully, but these errors were encountered: