-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed as not planned
Description
Affected Version
0.22.2
Description
- Cluster size: 2 brokers, 2 routers, 2 coordinators, 37 historicals (15 hot, 21 cold, 1 frozen), 2 overlords, 43 middlemanagers
- Steps to reproduce the problem: One morning we found that, during the night, a massive rebalanced happened leaving many servers at 100% disk usage while others in the same tier were left empty. After restarting the coordinator segments were better balanced but we started noticing this issue. Many servers in two different tiers had their disks full while reporting not being full. A restart on the historical or the coordinator doesn't fix it.
Coordinator log:
Oct 25 08:59:53 druid-master-1 java[19246]: 2021-10-25T08:59:53,185 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://stde2-hhot-10.stde2] Failed segment[datasource_2021-09-04T11:00:00.000Z_2021-09-04T12:00:00.000Z_2021-09-04T11:00:00.016Z_226] request[SegmentChangeRequestLoad] with cause [Exception loading segment[datasource_2021-09-04T11:00:00.000Z_2021-09-04T12:00:00.000Z_2021-09-04T11:00:00.016Z_226]].
Oct 25 08:59:53 druid-master-1 java[19246]: 2021-10-25T08:59:53,475 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://stde2-hhot-01.stde2] Failed segment[datasource_2021-08-31T19:00:00.000Z_2021-08-31T20:00:00.000Z_2021-08-31T19:00:00.019Z_449] request[SegmentChangeRequestLoad] with cause [Exception loading segment[datasource_2021-08-31T19:00:00.000Z_2021-08-31T20:00:00.000Z_2021-08-31T19:00:00.019Z_449]].
Oct 25 08:59:55 druid-master-1 java[19246]: 2021-10-25T08:59:55,918 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://stde2-hhot-01.stde2] Failed segment[datasource_2021-10-24T17:00:00.000Z_2021-10-24T18:00:00.000Z_2021-10-24T17:00:00.012Z_339] request[SegmentChangeRequestLoad] with cause [Exception loading segment[datasource_2021-10-24T17:00:00.000Z_2021-10-24T18:00:00.000Z_2021-10-24T17:00:00.012Z_339]].
One of the historicals affected:
Oct 25 09:47:19 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:19,998 INFO [SimpleDataSegmentChangeHandler-105] org.apache.druid.server.SegmentManager - Attempting to close segment datasource_2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z_2021-08-25T08:00:00.024Z_41
Oct 25 09:47:19 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:19,998 INFO [SimpleDataSegmentChangeHandler-105] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/segment-cache-1/datasource/2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z/2021-08-25T08:00:00.024Z/41]
Oct 25 09:47:19 stde2-hhot-01.stde2java[58946]: 2021-10-25T09:47:19,999 WARN [SimpleDataSegmentChangeHandler-105] org.apache.druid.segment.loading.StorageLocation - SegmentDir[/druid/segment-cache-1/datasource/2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z/2021-08-25T08:00:00.024Z/41] is not found under this location[/druid/segment-cache-1]
Oct 25 09:47:20 stde2-hhot-01.stde2java[58946]: 2021-10-25T09:47:20,001 INFO [SimpleDataSegmentChangeHandler-24] org.apache.druid.server.SegmentManager - Attempting to close segment datasource_2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z_2021-08-25T08:00:00.024Z_25
Oct 25 09:47:20 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:20,001 INFO [SimpleDataSegmentChangeHandler-24] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/segment-cache-1/datasource/2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z/2021-08-25T08:00:00.024Z/25]
Oct 25 09:47:20 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:20,001 WARN [SimpleDataSegmentChangeHandler-24] org.apache.druid.segment.loading.StorageLocation - SegmentDir[/druid/segment-cache-1/datasource/2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z/2021-08-25T08:00:00.024Z/25] is not found under this location[/druid/segment-cache-1]
Oct 25 09:47:20 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:20,003 INFO [SimpleDataSegmentChangeHandler-108] org.apache.druid.server.SegmentManager - Attempting to close segment datasource_2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z_2021-08-25T08:00:00.024Z_10
Oct 25 09:47:20 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:20,003 INFO [SimpleDataSegmentChangeHandler-108] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/segment-cache-2/datasource/2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z/2021-08-25T08:00:00.024Z/10]
Oct 25 09:47:20 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:20,005 INFO [SimpleDataSegmentChangeHandler-4] org.apache.druid.server.SegmentManager - Attempting to close segment ds2_2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z_2021-08-25T08:00:00.086Z_25
Oct 25 09:47:20 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:20,006 INFO [SimpleDataSegmentChangeHandler-4] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/segment-cache-2/ds2/2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z/2021-08-25T08:00:00.086Z/25]
Oct 25 09:47:20 stde2-hhot-01.stde2 java[58946]: 2021-10-25T09:47:20,007 WARN [SimpleDataSegmentChangeHandler-4] org.apache.druid.segment.loading.StorageLocation - SegmentDir[/druid/segment-cache-2/ds2/2021-08-25T08:00:00.000Z_2021-08-25T09:00:00.000Z/2021-08-25T08:00:00.086Z/25] is not found under this location[/druid/segment-cache-2]
Reactions are currently unavailable

