You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
What would you like to be improved?
There are too many small files or too many deleted files in the native iceberg table, which is unacceptable to AMS, because scanning table files consumes a lot of memory, which will cause AMS OOM and crash!
Therefore, we must figure out how to use less memory when native iceberg do optimization.
How should we improve?
I have some simple ideas for everyone to discuss and exchange:
Scan iceberg files in batches: For example, create a scan queue to make memory consumption controllable.
Exteral/Separable optimize planer: Separate the planer and ams services to avoid service unavailability caused by scanning files, and can assign a large number of table file scanning tasks to multiple planers.
Rewrite small files/delete files by submitting new Spark/Flink rewrite action for unoptimized tables.
Making the memory consumption of Optimizing controllable is of great value, and there are two main issues:
the memory consumption of planning, which means scanning files
the memory consumption of executing, which means reading and writing files
Here, we are discussing the planning issue.
I think if only a limited number of files are processed by the plan at a time, the OOM could be avoided.
Scan iceberg files in batches: For example, create a scan queue to make memory consumption controllable.
This seems to be a reasonable solution. Is the file scan still working in AMS? Could you give more details about what a scan queue is going to be like?
Search before asking
What would you like to be improved?
There are too many small files or too many deleted files in the native iceberg table, which is unacceptable to AMS, because scanning table files consumes a lot of memory, which will cause AMS OOM and crash!
Therefore, we must figure out how to use less memory when native iceberg do optimization.
How should we improve?
I have some simple ideas for everyone to discuss and exchange:
Are you willing to submit PR?
Subtasks
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: