Skip to content

dnstap logging significantly affects unbound performance (regression in 1.11) #305

@abulimov

Description

@abulimov

With unbound 1.11 being first release that switched away from libfstrm (#164, #264), we observe number of regressions with dnstap logging over unix socket (first one reported as #304)

This is quite a big one.

In our production environment we've noticed a direct correlation between dnstap logs being consumed and unbound performance with 1.11.

Strangely enough, the quicker the dnstap stream was consumed by dnstap service, the more CPU was unbound using.

We've noticed it when on few machines with especially powerful hardware it went absolutely crazy, when we sped up dnstap service ~1.5 times by simply skipping half of samples, the CPU used by unbound went up 10x, from 200% to 2000%.

While investigating the issue with synthetic load on different hardware, I can observe that simply having dnstap socket consumed by dnstap process significantly increases the CPU used by unbound.

For example:

unbound 1.11, dnstap logging over unix socket, dnstap service running:
unbound uses ~120% of CPU under synthetic load

when we stop dnstap service, unbound immediately uses less CPU (~80%).

Same server, same load, with unbound 1.9.6, we have unbound CPU usage stable at around ~80%, with or without dnstap service running and consuming logs (as one would expect).

I understand that dnstap uses bidirectional protocol, and when there is no consumer running the unbound doesn't send any dnstap samples.
But before 1.11 sending samples had no significant impact on Unbound itself, and we've been using dnstap logging with unbound for a few years now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions