Skip to content

[SUPPORT] hudi mor table on Spark upsert may have "small files" problems #5435

@Doorwood

Description

@Doorwood

Hi!~When i am reading hudi code about getting small files on hoodie spark upsert,one question bothers me a lot.
image
If I use hoodie mor table written by spark,as long as I update one record,it will create one log file for this fileslice to record this update operation.After this update operation,i insert a new record,it needs to find a small file to write,but the fileslice who has log files can not be treated as small file,so that hoodie may create a new filegroup to hold this record.What i want u understand,it may create a lot of small file with a lot of file groups.
I have tested the question i described,and actually it causes "small files" problems.
image
I am looking forward to someone can help me solve this problem!

Metadata

Metadata

Assignees

Labels

area:writerWrite client and core write operationspriority:mediumModerate impact; usability gaps

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions