MySQL metadata DB optimization

### Motivation

Recently we faced with timeouts from MySQL metadata storage. After investigation I found out that DRUID queries are not optimized for large amount of segments. We have 12M segments in druid_segments table and 5M of them are used.  Most queries from Overlord/Coordinator use index by `used` or by `datasource` and scans hundreds of thousands or even millions rows every time. Considering size of the table (80G+ in our case) it leads to i/o overload.

### Proposed changes

I found 3 sources of the load:

1. In DRUID 28+ we have the new used_status_last_updated column and some background migration process in Coordinator which updates the initial NULL values in this column.

```
FROM druid_segments
WHERE used_status_last_updated IS NULL
  AND used = 0
LIMIT 100;
```

it uses index by `used`, so it tried to scan 7M rows, got timeout every time and started over.
I replaced index by `used` with composite index

```
CREATE INDEX idx_used_used_status ON druid_segments (used, used_status_last_updated);
ALTER TABLE druid_segments DROP INDEX idx_used;
```

and the load decreased significantly.

2. Overlords issue queried like this

```
SELECT payload
FROM druid_segments
WHERE used = 1
  AND dataSource = 'DATASOURCE'
  AND (
    (start < '2025-10-29T14:00:00.000Z' AND `end` > '2025-10-29T13:00:00.000Z')
    OR (start = '-146136543-09-08T08:23:32.096Z' AND `end` != '146140482-04-24T15:36:27.903Z' AND `end` > '2025-10-29T13:00:00.000Z')
    OR (start != '-146136543-09-08T08:23:32.096Z' AND `end` = '146140482-04-24T15:36:27.903Z' AND start < '2025-10-29T14:00:00.000Z')
    OR (start = '-146136543-09-08T08:23:32.096Z' AND `end` = '146140482-04-24T15:36:27.903Z')
  );
```
according to explain they use `used` or `dataSource` indexes so they scans large amount of rows just to find segments inside one hour. I added 2 indexes

```
CREATE INDEX idx_seg_ds_used_start ON druid_segments (dataSource, used, start);
CREATE INDEX idx_seg_ds_used_end   ON druid_segments (dataSource, used, `end`);
```

and our i/o load disappeared.

3. Coordinator refreshes its world by 

```
SELECT payload FROM druid_segments WHERE used=true
```

but I guess it will be fixed by new incremental segments cache.

### Operational impact

Just to mention, you should not add indexes on large tables directly. I used pt-online-schema-change with load limits.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MySQL metadata DB optimization #18707

Motivation

Proposed changes

Operational impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MySQL metadata DB optimization #18707

Description

Motivation

Proposed changes

Operational impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions