-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
area:writerWrite client and core write operationsWrite client and core write operationspriority:mediumModerate impact; usability gapsModerate impact; usability gaps
Description
Hi!~When i am reading hudi code about getting small files on hoodie spark upsert,one question bothers me a lot.

If I use hoodie mor table written by spark,as long as I update one record,it will create one log file for this fileslice to record this update operation.After this update operation,i insert a new record,it needs to find a small file to write,but the fileslice who has log files can not be treated as small file,so that hoodie may create a new filegroup to hold this record.What i want u understand,it may create a lot of small file with a lot of file groups.
I have tested the question i described,and actually it causes "small files" problems.

I am looking forward to someone can help me solve this problem!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:writerWrite client and core write operationsWrite client and core write operationspriority:mediumModerate impact; usability gapsModerate impact; usability gaps