You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Launch a producer of jobs that pushes them one by one. I have done it through Python scripts, but anything will do.
After couple of seconds I get this:
pid 84155
bind 3 127.0.0.1:11000
accept 11
pid 84157
bind 3 127.0.0.1:11000
../beanstalkd/beanstalkd: ./wal/binlog.149: unknown version: 0
../beanstalkd/beanstalkd: Errors reading one or more WAL files.
../beanstalkd/beanstalkd: Continuing. You may be missing data.
accept 11
pid 84159
bind 3 127.0.0.1:11000
../beanstalkd/beanstalkd: ./wal/binlog.149: unknown version: 0
../beanstalkd/beanstalkd: Errors reading one or more WAL files.
../beanstalkd/beanstalkd: Continuing. You may be missing data.
accept 11
pid 84161
bind 3 127.0.0.1:11000
../beanstalkd/beanstalkd: ./wal/binlog.149: unknown version: 0
../beanstalkd/beanstalkd: Errors reading one or more WAL files.
../beanstalkd/beanstalkd: Continuing. You may be missing data.
beanstalk was able to start in that mode, but it kept trying to read the same file.
I tried to delete binlog.149 and got:
pid 84241
bind 3 127.0.0.1:11000
../beanstalkd/beanstalkd: walg.c:459 in walread: open ./wal/binlog.149: No such file or directory
I stopped generating crashes and I tried to consume all the jobs:
% l wal
total 2000
-r-------- 1 thorn staff 1.0M Aug 14 08:51 binlog.1721
-rw------- 1 thorn staff 0B Aug 14 08:31 lock
All this on mac where fsync cannot be turned on due bug #539 . These errors mean that log file was not read and was completely ignored, everything in that file was lost most probably.
After I have forced beanstalk to fsync on macOS and run it with "-f0", the rate of new errors was reduced, but they were still present.
The somewhat reliable way to run beanstalkd is to have "-f 0", fsync on every command. That means very bad performance. The default setting of never fsync-ing is a bad idea. This way user can lose the whole binlog file instead of just couple of milliseconds of data.
Let's look at the benchmarks to get an idea of good default value for -f:
Performance degrades only for the fsync period -> zero. 50ms is a good enough tradeoff between performance and the amount of data lost in a crash.
The text was updated successfully, but these errors were encountered:
ysmolski
changed the title
crashes in fsync=off mode make whole binlog files get corrupted
crashes in default fsync=off mode corrupts whole binlog files
Aug 14, 2019
ysmolski
changed the title
crashes in default fsync=off mode corrupts whole binlog files
crash in default fsync=off mode corrupts whole binlog files
Aug 14, 2019
My assumption was wrong about the whole binlog being corrupted. The only corrupted part is the last written job or the empty log file (zeroed files). And that is totally okay since write was prevented by the kill, that record was not written in full into the file and will be ignored on the next start. Command for the jobs that was not written in full into disk won't return anything, client will not get ACK, so he must retry later.
Phew. 🐳
default fsync period with small values (<50ms) still sounds good because it does not hurt performance, but decreases amount of lost data because of machine crash.
I never expected that it would be so easy reproduce this. But anyway. Start beanstalkd in a loop:
Launch a producer of jobs that pushes them one by one. I have done it through Python scripts, but anything will do.
After couple of seconds I get this:
beanstalk was able to start in that mode, but it kept trying to read the same file.
I tried to delete binlog.149 and got:
More time passed, more errors were accumulated:
I stopped generating crashes and I tried to consume all the jobs:
All this on mac where fsync cannot be turned on due bug #539 . These errors mean that log file was not read and was completely ignored, everything in that file was lost most probably.
After I have forced beanstalk to fsync on macOS and run it with "-f0", the rate of new errors was reduced, but they were still present.
The somewhat reliable way to run beanstalkd is to have "-f 0", fsync on every command. That means very bad performance. The default setting of never fsync-ing is a bad idea. This way user can lose the whole binlog file instead of just couple of milliseconds of data.
Let's look at the benchmarks to get an idea of good default value for -f:
Performance degrades only for the fsync period -> zero. 50ms is a good enough tradeoff between performance and the amount of data lost in a crash.
The text was updated successfully, but these errors were encountered: