I recently tracked down a bug that causes slaves to fail to update the zone from a PowerDNS master. This issue was very hard to reproduce as it happened seemingly randomly - only about 10% of the time. I finally managed to pin down a dump of the DNS traffic when this bug occurred.
This is how the bug usually appears:
This also fixes many other cache-related bugs, such as differing SOA records being delivered to clients and slaves, and outdated records being served by the master even after it has notified slaves. (The bug in question prevents the cache from being cleared in response to a detection of a new SOA.)
To reproduce this bug locally, with BIND slave(s):
After step 3, you'll notice that the slaves are notified and acknowledge the notification, but they don't AXFR. This is because they received the stale SOA from the cache.
This bug is mitigated or made harder to reproduce by having a shorter cache TTL, preventing slaves from periodically checking the SOA, and/or preventing direct querying of the master server. However, all current installations where PowerDNS is the master are susceptible to this problem (assuming the slaves check the SOA before they AXFR).
This bug appears to have been introduced in svn revision 1221, so it is likely present in all PowerDNS versions compiled since June 2008.
I have written two separate patches to fix this bug, depending on how you would like to fix it. The "clean" patch, which I would prefer be used, splits PacketCache::purge into two separate functions, one which accepts no arguments and clears the entire cache, and one which accepts a const string argument, clearing the cache of all entries related to the zone specified in the argument. The "ugly" patch is much shorter (one line); it just inserts a "dummy" argument in the temporary vector passed to PacketCache::purge.
Attachment '' (purge_clean.patch) https://gist.github.com/5466729
Attachment '' (purge_ugly.patch) https://gist.github.com/5466730
I forgot to mention: this was the bug I mentioned in #powerdns a few days ago. (I'm mr_flea)
A small correction:
This issue is also fixed if the cache-ttl option is set to 0, as that will prevent anything from being cached at all. (This is obviously not desirable for servers that accept requests from the internet.)
Applause! Thank you very much, I merged the pretty patch.