New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snmp plugin: double free or heap corruption #2291
Comments
|
Hi @cullorblind, thank you very much for reporting this issue! Is there any chance that you could create a stack trace of this problem? The Core file wiki page has steps to walk you through the process. Thanks and best regards, |
|
I think I got it here. Let me know if you need any more info. |
|
That's perfect, thanks @cullorblind! This shows that the problem is triggered by a call to |
|
The current implementation always returns Am I right in assuming that you're using libsnmp30 5.7.2.1+dfsg-1~dfsg? |
|
I don't have the extra ~dfsg on mine, but the rest matches exactly. |
|
Hi! It looks like Otherwise the Is I'm right? |
|
As I understand sources, second arg of Proofs: The Other usage examples of https://github.com/hardaker/net-snmp/blob/a7bc508a8930a484c3a666cbea4ab226d2a3aa88/perl/SNMP/SNMP.xs#L1087 The question is: How that worked before? I think in case of this topic, break (and exit from In other cases, exit from IMHO, the solution is to move Or, may be quite more clear solution (patch against 5.7): |
|
I don't know if this helps, but I managed to get a core file from collectd while it was running instead of just "collectd -T". One of our APs was rebooted and before any clients reconnected, collectd polled and crashed/restarted a few times until clients reconnected. It looks mostly the same without the snmp_free_pdu section. |
That is the same line L1496 of Can you please try to test mine proposed patch? |
|
I finally got a VM spun up where I could compile/test this. It seems to be working. The "empty" tables returned on APs with no clients now log to syslog properly without crashing. I've manually copied the snmp.so plugin over to my collector for a proper stress test with 204 APs. ~12 of those currently have no clients and it's running okay so far. |
|
Could you please show
or something similar to show which data is present in case of AP with 0 clients. I think we also need that output for each 'table' separately (like requests collectd does): Sorry, I'm not familiar enough with SNMP, so you required to set correct MIB/OID in these commands. I'm expect to get something like this: Full tree Subtree(s) Thanks. |
That message has nothing good as it shows what IMHO these cases are rare, but I also think there is two mistakes, and only one was fixed.
|
|
Here is an AP with 0 clients: (they only respond to v1) Here's another with 2 clients just for comparison: |
|
Wondering if snmptable would be useful. It knows the difference between tables/not tables and also knows when a table is empty. Edit: nevermind that. It only recognizes tables if the MIBs are installed. Just tested this from another box and it asks "Was that a table?" on .7 too.. |
|
Hi! What about direct request of appropriate 'tables'? I.e. table should be requested not from but from Please check these commands: They might be interesting because Collectd queries values / instance names from these OID(s) directly, and does not use top-level |
|
Here's the direct requests. |
|
Ok. Thanks. Can you please try to add some debugging to your already patched code? I think what exit from That is my last request for today, and it does not require fast answer :-) I will try to configure my Collectd to request some empty tables too. :-) |
|
Here's the updated log output Edit: No problem. Thanks for your work on a great plugin. And I think pulling just about any empty OID should replicate the problem. |
|
You reply super-fast ;-) Can you please try it? |
|
I'm kindof procrastinating other work, and this is fairly easy to do right now. =) |
|
Yeah! I found OIDs which reply the same result like yours
In my case that is I have put these OIDs into config: In case of collectd-5.1 which is installed on that machine, there is deadloop with this config. ... and collectd is unable to terminate correctly. The tcpdump of this: Collectd-5.7 dies with "Type # 5" which you see in your logs is ASN_NULL. |
|
@octo, what do you think about all this? Should we only patch The issue which I see is: collectd does not check for That means what no variable values will be in this net-snmp tries to solve this by 'fixing' the PDU (removing the error-prone OID) and do retry. Should collectd do the same? |
snmp_sess_synch_response() always frees request PDU, in both case of request error and success. If error condition occurs inside of `while (status == 0)` loop, double free of `req` happens. Issue: collectd#2291
snmp_sess_synch_response() always frees request PDU, in both case of request error and success. If error condition occurs inside of `while (status == 0)` loop, double free of `req` happens. Issue: collectd#2291
Collectd does not check for `res->errstat` value after `snmp_sess_synch_response()` call. In case of error, there is no any data in `res->variables` actually, but variables are tried to be processed as usual. Suffix calculation will fail, so all subtrees will be marked as failed, not only one subtree which caused an error. The csnmp_instance_list_add() call will fail too, and, as result, `csnmp_read_table` will finish it's work without any data submission. The log message like "snmp plugin: host HOSTNAME: csnmp_instance_list_add failed", which is put into logs in this case, also has no enough diagnostic data. Added code to proper check for `res->errstat` and to try to get available data. Issue: collectd#2291
|
Hi! Sorry for commits noise. Let's check f3168df changes, which added the check of Before the patch: Collectd stops after first response failed. After patch: As we see, Collectd handles an exception and re-request the rest of data available. But it appears what rest of data will be useless in most of cases - datatype will not be submitted with any 'column' completely missing. At other hand, we can show error message about wrong OID / missing data and stop query of rest of data too, like it happens before this patch. What do you think? |
|
Oh boy. Looking at the sources of The first block in ss = snmp_sess_session(sessp);
if (ss == NULL) {
return STAT_ERROR;
}So clearly Then we come to this block: if (snmp_sess_send(sessp, pdu) == 0) {
snmp_free_pdu(pdu);
state->status = STAT_ERROR;
} else {
state->waiting = 1;
state->reqid = pdu->reqid;
}So one of two things happens:
I'm very sad to say that I cannot explain that stacktrace. I didn't find a bug report nor a commit which seems to have fixed this, though I don't have access to Best regards, |
|
The So, there is no problems with This piece of |
Collectd does not check for `res->errstat` value after `snmp_sess_synch_response()` call. In case of error, there is no any data in `res->variables` actually, but variables are tried to be processed as usual. Suffix calculation will fail, so all subtrees will be marked as failed, not only one subtree which caused an error. The csnmp_instance_list_add() call will fail too, and, as result, `csnmp_read_table` will finish it's work without any data submission. The log message like "snmp plugin: host HOSTNAME: csnmp_instance_list_add failed", which is put into logs in this case, also has no enough diagnostic data. Added code to proper check for `res->errstat` and to try to get available data. Issue: collectd#2291
snmp_sess_synch_response() always frees request PDU, in both case of request error and success. If error condition occurs inside of `while (status == 0)` loop, double free of `req` happens. Issue: #2291 Signed-off-by: Florian Forster <octo@collectd.org>
Issue: #2291 Signed-off-by: Florian Forster <octo@collectd.org>
|
This has been fixed. |
|
This issue has been assigned CVE-2017-16820 |
Expected behavior
I am polling connected clients across about 150 ubiquiti access points from the SNMP Station table. A few of the Access points have 0 clients, and therefore return nothing. If the instance string returns nothing it should gracefully log(optional) and move on to the next task.
Actual behavior
After much testing I narrowed it down to a single access point which I realized had no clients attached. With only that access point configured I get the double free or corruption error.
collectd -T
*** Error in `collectd': double free or corruption (!prev): 0x000000000236dd10 ***
Aborted
If the instance OID returns no data it dies here.
Configs to reproduce
my_types.db
ubnt_rm5_sta signal:GAUGE:U:U, noisefloor:GAUGE:U:U, ccq:GAUGE:U:U, amq:GAUGE:U:U, amc:GAUGE:U:Usnmp-rm5.conf
Extra info from the mib for what I'm polling.
Let me know if you need any more information.
edit:fixed formatting on files/output.
The text was updated successfully, but these errors were encountered: