Skip to content

[core] Fix partition column generate wrong partition spec#4349

Merged
JingsongLi merged 2 commits intoapache:masterfrom
ulysses-you:part-value
Oct 23, 2024
Merged

[core] Fix partition column generate wrong partition spec#4349
JingsongLi merged 2 commits intoapache:masterfrom
ulysses-you:part-value

Conversation

@ulysses-you
Copy link
Copy Markdown
Contributor

@ulysses-you ulysses-you commented Oct 18, 2024

Purpose

Paimon uses .toString to generate partition value, which is not accurate for some data types. like date/binary. Say, Spark engine would use a Cast to convert a partition object to string value. So this pr changes to use cast to generate partition value.

Add a new config partition.legacy-name to support switch to use previous toString behavior, and by default use the legacy behavior(.toString).

An example that using binary type partition column would cause failure.

CREATE TABLE pt (
    id BIGINT,
    c1 STRING
) using paimon
PARTITIONED BY (day binary);

insert into table pt values(1, 'a', cast('2021' as binary));
select * from pt;
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (192.168.0.102 executor driver): java.io.FileNotFoundException: File 'warehouse/default.db/pt/day=%5BB@4a045a11/bucket-0/data-91c064a3-a0a1-4042-9d5a-cc82a23af7ff-0.parquet' not found, Possible causes: 1.snapshot expires too fast, you can configure 'snapshot.time-retained' option with a larger value. 2.consumption is too slow, you can improve the performance of consumption (For example, increasing parallelism).

Tests

add test

API and Format

no

Documentation

added docs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move cast related files from core to common due to the InternalRowPartitionComputer is at common.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ulysses-you for the contribution.

But this is really dangerous to compatibility. It may be better to keep old style by default.

@ulysses-you
Copy link
Copy Markdown
Contributor Author

But this is really dangerous to compatibility. It may be better to keep old style by default.

@JingsongLi +1

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit a9f2d2c into apache:master Oct 23, 2024
@ulysses-you ulysses-you deleted the part-value branch October 24, 2024 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants