-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Motivation
Recently we faced with timeouts from MySQL metadata storage. After investigation I found out that DRUID queries are not optimized for large amount of segments. We have 12M segments in druid_segments table and 5M of them are used. Most queries from Overlord/Coordinator use index by used or by datasource and scans hundreds of thousands or even millions rows every time. Considering size of the table (80G+ in our case) it leads to i/o overload.
Proposed changes
I found 3 sources of the load:
- In DRUID 28+ we have the new used_status_last_updated column and some background migration process in Coordinator which updates the initial NULL values in this column.
FROM druid_segments
WHERE used_status_last_updated IS NULL
AND used = 0
LIMIT 100;
it uses index by used, so it tried to scan 7M rows, got timeout every time and started over.
I replaced index by used with composite index
CREATE INDEX idx_used_used_status ON druid_segments (used, used_status_last_updated);
ALTER TABLE druid_segments DROP INDEX idx_used;
and the load decreased significantly.
- Overlords issue queried like this
SELECT payload
FROM druid_segments
WHERE used = 1
AND dataSource = 'DATASOURCE'
AND (
(start < '2025-10-29T14:00:00.000Z' AND `end` > '2025-10-29T13:00:00.000Z')
OR (start = '-146136543-09-08T08:23:32.096Z' AND `end` != '146140482-04-24T15:36:27.903Z' AND `end` > '2025-10-29T13:00:00.000Z')
OR (start != '-146136543-09-08T08:23:32.096Z' AND `end` = '146140482-04-24T15:36:27.903Z' AND start < '2025-10-29T14:00:00.000Z')
OR (start = '-146136543-09-08T08:23:32.096Z' AND `end` = '146140482-04-24T15:36:27.903Z')
);
according to explain they use used or dataSource indexes so they scans large amount of rows just to find segments inside one hour. I added 2 indexes
CREATE INDEX idx_seg_ds_used_start ON druid_segments (dataSource, used, start);
CREATE INDEX idx_seg_ds_used_end ON druid_segments (dataSource, used, `end`);
and our i/o load disappeared.
- Coordinator refreshes its world by
SELECT payload FROM druid_segments WHERE used=true
but I guess it will be fixed by new incremental segments cache.
Operational impact
Just to mention, you should not add indexes on large tables directly. I used pt-online-schema-change with load limits.