Skip to content

[core] fix inaccurate record count in snapshot for data-evolution tables#7779

Closed
steFaiz wants to merge 2 commits into
apache:masterfrom
steFaiz:fix_snapshot_row_number
Closed

[core] fix inaccurate record count in snapshot for data-evolution tables#7779
steFaiz wants to merge 2 commits into
apache:masterfrom
steFaiz:fix_snapshot_row_number

Conversation

@steFaiz
Copy link
Copy Markdown
Contributor

@steFaiz steFaiz commented May 7, 2026

Purpose

For data evolution table, record count in snapshot should have different computation logic.

Tests

See org.apache.paimon.table.DataEvolutionTableTest

@steFaiz steFaiz marked this pull request as draft May 7, 2026 09:37
@steFaiz steFaiz marked this pull request as ready for review May 7, 2026 11:36
@JingsongLi
Copy link
Copy Markdown
Contributor

The recordCount in the snapshot should be calculated strictly according to the file and should not be merged. Additionally, the primary key table has not been merged.

@steFaiz
Copy link
Copy Markdown
Contributor Author

steFaiz commented May 8, 2026

The recordCount in the snapshot should be calculated strictly according to the file and should not be merged. Additionally, the primary key table has not been merged.

Thanks! I got that!

@steFaiz steFaiz closed this May 8, 2026
JingsongLi pushed a commit that referenced this pull request May 14, 2026
Mirrors spark implementation.
This PR originates from #7779. In
our internal case, we find it hard for us to quickly get the real
records num for data evolution tables, especially for each partition.
Currently both snapshots system table and partitions system table only
shows unmerged records num. We could get the accurate values by count(*)
agg pushdown (probably with group by clause) through flink OLAP queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants