New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rec: Purge map of failed auths periodically by keeping a last changed timestamp. #8525
Rec: Purge map of failed auths periodically by keeping a last changed timestamp. #8525
Conversation
…stamp. SyncRes thread local storage includes a map of failed auths which was only cleaned if a specific IP was contacted again and that contact succeeded. Persistent failing auths or auths that are never tried again remained in the map. While here add code to dump the failed servers map. Might (partially?) solve PowerDNS#7771.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Don't we have the same situation with the ednsstatus
map, by the way? We do need to keep EDNS status entries for quite longer, though, I would say several hours, perhaps one day, since the last time we used it.
I'll take a look at the EDNS status map as well. |
The mode is set to |
I expected a longer validity but OK. |
Switch away from a ref to a pointer because of above and use modern init for EDNSStatus.
size_t size() const | ||
{ | ||
return d_cont.size(); | ||
} | ||
|
||
void prune(time_t cutoff) { | ||
for (auto it = d_cont.begin(); it != d_cont.end(); ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea of how many entries we usually have in there? If it might be a lot perhaps we need to use a multi_index keyed on the last
field to avoid having to scan the whole map.
{ | ||
for (auto it = t_sstorage.ednsstatus.begin(); it != t_sstorage.ednsstatus.end(); ) { | ||
if (it->second.modeSetAt <= cutoff) { | ||
it = t_sstorage.ednsstatus.erase(it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about the number of entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one of our small recursors (sub 1kqps) that has only been up for a week we have 200,000 edns status entries - if that data point helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With cleanup, I suppose it will follow the same trend as the nsspeed size, which in that instance is an order of magnitude smaller.
In my test runs (with the purging) continously resolving |
Are these numbers before without this PR? Or does it stay that high even when regularly purged? |
The numbers I reported are with this PR included, so with purging. I'll look into the multi-index thing soon. I currently do not want to stop this test run so see how it is doing in the long run (multi-days). After than I can add some code to see how fast the purging is and compare it with a multi-index solution. |
OK! Let me know if you are not familiar with how multi-index works and want some pointers (no pun intended)! |
I'm not familiar with multi-index, but I'd like to give it a shot myself as a start. BTW, the numbers above are the accumulated ones for a rec with 16 threads. So to get per-thread estimates you need to divide by 16. |
@rgacogne Can you take a look to see if I'm doing this right? If so we might need to generalize this, since the other tables are purged in a similar manner. |
e1f41c4
to
98b946e
Compare
Friday I became smart enough to have a periodic
While the RSS creep is visible here (2507572 -> 2740668) it must be noted that the VSZ (5415688) does not grow.
there is no incentive for the kernel to evict physical pages, even if they are unused. Add to that the glibc malloc hangs on to pages, it does no unmap them afaik. To summarize:
Next steps:
On the longer term:
|
Another observation: I applied memory pressure to the point where the machine started needing swap by running a program allocating and touching lots of memory. I saw almost the whole rec process being paged out to swap. After I stopped the memory eating process, about 1.1G of rec was paged in again after 10min or so. The whole time VSZ remained constant. Now the process is at:
So the short-term working set of this process is about 1.1G. |
Measurement shows that a linear scan of the threads ednsstatus map takes about 0.024s for 123515 entries. This was on a 2 thread test run, where (with the given test setup) it is expected to grow to about 300k entries per thread. For the 16 threads case the table size would be smaller. Testing that setup I'm seeing cleanup times around 4-23ms for 24k entries. Note that the variance is higher here since my test machine is a 4 core one, so running 16threads on it will introduce (much) more scheduling noise. So I think that a multi_index solution is not really needed at the moment. And I wonder if it's worth the added complexity. I'll do some test runs with the multi_index one next. I think a good plan is:
|
I'll be squashing the multi_index code out, save a copy for future reference. class fails_t : public boost::noncopyable
{
public:
typedef unsigned long counter_t;
struct value_t {
value_t(const ComboAddress &a) : address(a) {}
ComboAddress address;
mutable counter_t value{0};
time_t last{0};
};
typedef multi_index_container<value_t,
indexed_by<
ordered_unique<tag<ComboAddress>, member<value_t, ComboAddress, &value_t::address>>,
ordered_non_unique<tag<time_t>, member<value_t, time_t, &value_t::last>>
>> cont_t;
cont_t getMap() const {
return d_cont;
}
counter_t value(const ComboAddress& t) const
{
auto i = d_cont.find(t);
if (i == d_cont.end()) {
return 0;
}
return i->value;
}
counter_t incr(const ComboAddress& t, const struct timeval & now)
{
auto i = d_cont.insert(t).first;
if (i->value < std::numeric_limits<counter_t>::max()) {
i->value++;
}
auto &ind = d_cont.get<ComboAddress>();
ind.modify(i, [t = now.tv_sec](value_t &val) { val.last = t;});
return i->value;
}
void clear(const ComboAddress& a)
{
d_cont.erase(a);
}
void clear()
{
d_cont.clear();
}
size_t size() const
{
return d_cont.size();
}
void prune(time_t cutoff) {
auto &ind = d_cont.get<time_t>();
ind.erase(ind.begin(), ind.upper_bound(cutoff));
}
private:
cont_t d_cont;
};
struct EDNSStatus {
EDNSStatus(const ComboAddress &arg) : address(arg) {}
ComboAddress address;
time_t modeSetAt{0};
mutable enum EDNSMode { UNKNOWN=0, EDNSOK=1, EDNSIGNORANT=2, NOEDNS=3 } mode{UNKNOWN};
};
struct ednsstatus_t : public multi_index_container<EDNSStatus,
indexed_by<
ordered_unique<tag<ComboAddress>, member<EDNSStatus, ComboAddress, &EDNSStatus::address>>,
ordered_non_unique<tag<time_t>, member<EDNSStatus, time_t, &EDNSStatus::modeSetAt>>
>> {
void reset(index<ComboAddress>::type &ind, iterator it) {
ind.modify(it, [](EDNSStatus &s) { s.mode = EDNSStatus::EDNSMode::UNKNOWN; s.modeSetAt = 0; });
}
void setMode(index<ComboAddress>::type &ind, iterator it, EDNSStatus::EDNSMode mode) {
it->mode = mode;
}
void setTS(index<ComboAddress>::type &ind, iterator it, time_t ts) {
ind.modify(it, [ts](EDNSStatus &s) { s.modeSetAt = ts; });
}
}; |
98b946e
to
90e8ea7
Compare
This assumes that the number of authoritative servers contacted by a thread decreases with the number of threads, which might be true in the long-term. It's clearly not true when
OK, I can see how that makes sense. I'm still a bit worried about the impact of linear scanning in real-world setups, though. We have learned in the past that things that were negligible in small setups could have dramatic consequences in large scale ones, like metrics reporting, pipes being blocking and so on. So I think we really should get the multi_index version in master quickly, so that we have a solution ready if the linear scan proves to be a problem. |
Agreed. But note that the 600k edns size was an observed aggregate number of a 16 threads test run. |
@rgacogne can you take a last look? esp. the |
@@ -556,32 +576,35 @@ int SyncRes::asyncresolveWrapper(const ComboAddress& ip, bool ednsMANDATORY, con | |||
else { | |||
ret=asyncresolve(ip, sendQname, type, doTCP, sendRDQuery, EDNSLevel, now, srcmask, ctx, d_outgoingProtobufServers, d_frameStreamServers, luaconfsLocal->outgoingProtobufExportConfig.exportTypes, res, chained); | |||
} | |||
// ednsstatus might be cleared, so do a new lookup | |||
ednsstatus = &t_sstorage.ednsstatus[ip]; // does this include port? YES | |||
mode = &ednsstatus->mode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should update oldmode
as well here? The only difference it would make is that if the status has already been updated in the meantime, we will set modeSetAt
again. It's a matter of seconds, though, so it probably doesn't matter.
Short description
SyncRes thread local storage includes a map of failed auths which was
only cleaned if a specific IP was contacted again and that contact
succeeded. Persistent failing auths or auths that are never tried
again remained in the map.
While here add code to dump the failed servers map. Might (partially?)
solve #7771.
Checklist
I have: