Disk free space threshold - at least a Warning message in the log file #8367

j0r0 · 2014-11-06T15:26:24Z

Hi,
Today i had the issue, that all my replica shards were not starting.
After 3 hours i enabled DEBUG in logging.yml and finally spotted:
Less than the required 15.0% free disk threshold (11.348983564561003% free) on node [blahblah], preventing allocation.

So i freed some space and everything is back online.

It would be great if ES emits a (at least)warning in the log file.

Having not enough disk space to allocate the shard is worth warning about. Closes elastic#8367

nik9000 · 2014-11-06T15:45:19Z

I had to fix this for someone recently as well. Lots of warning would be annoying (and maybe counter productive if you don't partition your disks sanely) but would still make the problem less mysterious.

You could always turn off the warnings if you don't like them with logging config.

Fixes an issue where only absolute bytes were taken into account when kicking off an automatic reroute due to disk usage. Also randomized the tests to use either an absolute value or a percentage so this is tested. Also adds logging for each node over the high and low watermark every time a new cluster info usage is gathered (defaults to every 30 seconds). Related to elastic#8368 Fixes elastic#8367

Fixes an issue where only absolute bytes were taken into account when kicking off an automatic reroute due to disk usage. Also randomized the tests to use either an absolute value or a percentage so this is tested. Also adds logging for each node over the high and low watermark every time a new cluster info usage is gathered (defaults to every 30 seconds). Related to #8368 Fixes #8367

synhershko · 2014-11-27T15:36:59Z

@dakrone I worked with a client for whom this fix may have made things even worse. A log entry every 30 seconds means the log file keeps getting bigger and bigger. If there is no place to move shards to, or if it takes more time for the move to finish (think large indexes, or many re-allocations pending) then this may as well crash the node due to 0 diskspace left after a short while. My 2c.

dakrone · 2014-11-27T15:53:01Z

@synhershko opened #8686 to address this.

Fixes an issue where only absolute bytes were taken into account when kicking off an automatic reroute due to disk usage. Also randomized the tests to use either an absolute value or a percentage so this is tested. Also adds logging for each node over the high and low watermark every time a new cluster info usage is gathered (defaults to every 30 seconds). Related to elastic#8368 Fixes elastic#8367

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Nov 6, 2014

Raise log level on DiskThresholdDecider

233acaf

Having not enough disk space to allocate the shard is worth warning about. Closes elastic#8367

nik9000 mentioned this issue Nov 6, 2014

Raise log level on DiskThresholdDecider #8368

Closed

clintongormley assigned dakrone Nov 6, 2014

dakrone mentioned this issue Nov 7, 2014

Take percentage watermarks into account for reroute listener #8382

Closed

dakrone closed this as completed in 3712d97 Nov 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk free space threshold - at least a Warning message in the log file #8367

Disk free space threshold - at least a Warning message in the log file #8367

j0r0 commented Nov 6, 2014

nik9000 commented Nov 6, 2014

synhershko commented Nov 27, 2014

dakrone commented Nov 27, 2014

Disk free space threshold - at least a Warning message in the log file #8367

Disk free space threshold - at least a Warning message in the log file #8367

Comments

j0r0 commented Nov 6, 2014

nik9000 commented Nov 6, 2014

synhershko commented Nov 27, 2014

dakrone commented Nov 27, 2014