New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IXFR serial quirks when NSD restarts and only have old zone file on disk #227
Comments
This is a very good analysis @klaus-nicat! I think the |
Indeed excellent analysis @klaus-nicat !
commit-serial is the serial received from transfer and stored on disk in a transfer file to be processed. |
TODO: create a tpkg test to be certain
@wtoorop We have a zone which gets updates every few seconds, and we regularly also see "update failed". It seems that happens when multiple IXFRs are applied - see log below. The weird thing is that NSD activates the new zone and then the "update failed" is about on old serial of this zone. Can it be, that this is caused by the same bug? If not, I would open another issue.
|
Your suggested fix does not work. Starting with serial=1, then write, then increase serial=2, then NSD restart: a) after NSD restart, now NSD requests IXFR with serial 1 (the last written to disk, that is good), but the served zone is not updated. NSD sill serves serial=1
|
It does look related. Your situation at 13:11:24 would be a reason to send along the committed serial (because the served serial is not yet 1083742842). So let's try to take it along in this issue for the moment :) |
That's even worse! I'll change the title of the PR! Thanks for trying it out immediately. |
I guess current lazy write to disk is not possible with IXFR and zone must be written to disk immediately. OR there need to be two copies of data for IXFR - one on disk, written when zone is written to disk and other version which is in ram and reflects zone status in ram. |
Hi! Any news? Can we help getting this fixed? Thanks |
Not yet. Sorry @klaus-nicat . @k0ekk0ek do you think you can make a bit of time to have a look? |
@wtoorop, @klaus-nicat, I'll have a look. |
@klaus-nicat, I've picked this one up. I'm creating a small reproducer first for easy validation and to avoid regressions in the future and then implement a fix. I'll report back ASAP. |
@klaus-nicat, I've added a test to verify correct behavior in the future (so it currently fails 😅). I'll start working on the actual fix tomorrow. |
That's a smart idea ;-) Thanks |
Setup: NSD 4.6 (compiled with --disable-radix-tree and --enable-packed).
NSD is Secondary. Serial on Primary is 1, Serial on NSD is 1.
Then, write the zonefile:
Now, upate the zone on the Primary to serial=2. NSD will IXFR the new zone, but does not write it to disk:
Now restart NSD. NSD will serve the old version with serial=1, as this is the latest version on disk.
So, now NSD serves serial=1, but when checking against the Primary, NSD uses serial=2. NSD should use serial=1, as serial=1 is the version that NSD has on disk. (btw: is there somewhere a description what the commit-serial is?)
Now, when the serial is increased to 3, NSD requests IXFR with serial=2. Hence, NSD receives the differences from 2-to-3, and then detects that the DIFF can not be applied to the local existing zone with serial=1:
Proof, that NSD requests IXFR with serial=2, although it only has serial=1 available local:
Now, as NSD "falled back" to served-serial=commit-serial=1, it can recover on the next check:
So. When NSD restarts and the served serial goes backwards due to "only old zone on disk", also the "committed-serial" (what is the commited serial?) should be lowered to the served serial, so that IXFR requests use the proper serial.
The text was updated successfully, but these errors were encountered: