New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipmi plugin: Implemented IPMI LAN connection and plugin improved #2024
Conversation
00c365b
to
cac7575
Compare
|
ping |
|
Is this ever going to get merged into master? |
|
I will rebase this when needed by somebody, just ping me. |
|
Ping ! ;-) |
|
Rebased. Please, put your feedbacks here. That will increase chanses this to be merged. |
|
Wow, thanks for rebasing it so quickly! |
* Added instances support * Removed the sensor removal in case of reading errors * Added the lock to do not request new readings until previous reading is complete * Disabled SDRs local cache * Added the check for sensors 'is readable' flag * Added the check for sensor event reading type (if sensor is discrete or not) * Added checks for sensor states 'scanning disabled' and 'initial update in progress' * Changes in error reporting
|
Patch rebased to current master, updated with respect of changes recently added to ipmi.c |
| INFO("ipmi plugin: %s", msg); | ||
| break; | ||
| case IPMI_LOG_WARNING: | ||
| NOTICE("ipmi plugin: %s", msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not also be a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Maryam!
Thanks for your feedback.
The OpenIPMI/ipmi_log.h has the following explanation of error reporting levels:
/* General information about something that happens. */
IPMI_LOG_INFO,
/* An internal error (not reported directly to the user) occurred,
but it's not a big deal, the system can still operate. */
IPMI_LOG_WARNING,
/* An internal error (not reported directly to the user) occured,
and it will affect the operation of the system. This system
will still operation, but probably in a degraded mode. */
IPMI_LOG_SEVERE,
/* An internal error occured, and it is dangerous for the system
to keep operating. */
IPMI_LOG_FATAL,
/* When returning errors to the user, this will often be used to
report the general cause of the error if the cause would be
difficult to determine from just the error return code. */
IPMI_LOG_ERR_INFO,
By this explanation, and by typical use of Collectd WARNING() (), I think NOTICE() suits better.
Of course, I can be wrong...
It would be nice if Collectd will have the similar table of error logging level explanation. (I didn't find it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
src/ipmi.c
Outdated
| } else { | ||
| char errbuf[128] = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure 128 is big enough? most other errbufs use 1024... I would also suggest replacing the magic # with a #defined value
#define ERR_BUF_SIZE 1024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was inspired by OpenIPMI examlpes code:
https://github.com/wrouesnel/openipmi/blob/master/sample/waiter_sample.c#L285
https://github.com/wrouesnel/openipmi/blob/master/cmdlang/ipmish.c#L595
https://github.com/wrouesnel/openipmi/blob/master/sample/eventd.c#L53
https://github.com/wrouesnel/openipmi/blob/master/include/OpenIPMI/ipmi_err.h#L161
I agree with your suggestion, that would be better to remove the magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sources for ipmi_get_error_string return values are:
- strerror() + 4 bytes of prefix ("OS: ")
- ipmi_get_cc_string() (ipmi_ccodes[] array) + 6 bytes prefix - fits 128 bytes
- rmcpp_error_codes[] + 7 bytes prefix - fits 128 bytes
- sol_error_codes[] + 5 bytes prefix - fits 128 bytes
- "Success (No error)" string - fits 128 bytes
- "Unknown: Unknown" string - fits 128 bytes
The longest is strerror(). Will 1020 (1024 - 4) be enough for strerror()?
| return; | ||
| } | ||
|
|
||
| vl.values = &(value_t){.gauge = value}; | ||
| vl.values_len = 1; | ||
|
|
||
| if (st->host != NULL) | ||
| sstrncpy(vl.host, st->host, sizeof(vl.host)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think vl.host gets filled at a higher level usually? not sure if it's explicitly needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right - vl.host can be filled by hostname_g at higher level.
But in this case we need to set vl.host of the metric to the value set in configuration, not to the hostname_g value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
src/ipmi.c
Outdated
| c_ipmi_error("domain_connection_change_handler", err); | ||
|
|
||
| if (!still_connected) | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would a simple info/debug message not be useful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At practice, that will be reported by a c_ipmi_error() call at several lines above.
Also, that will be reported by c_ipmi_log() handler too.
An example:
[2017-10-08 19:09:49] ipmi plugin: one 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
[2017-10-08 19:09:49] ipmi plugin: one 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
[2017-10-08 19:09:49] ipmi plugin: domain_connection_change_handler failed for `one`: OS: Connection timed out
I have added a code to send Collectd notification in case of connection lost/restored and will publish it in updated PR.
src/ipmi.c
Outdated
| sfree(st); | ||
| } /* void c_ipmi_free_instance */ | ||
|
|
||
| void c_ipmi_add_instance(c_ipmi_instance_t *instance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using a llist_t for instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there is a strong opinion/rules about this?
Why would you prefer a list instead of chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no clear-cut advise/guideline for or against the generic linked list. I'd say that if you need to look up by (string) key or you'd need a double-linked list, then the generic implementation can abstract away some complexity.
HTH, —octo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that if you need to look up by (string) key or you'd need a double-linked list
There is no such need here.
* Do not request reading unless domain is connected * Report instance name in error messages
|
Updated. |
|
Also I'm going to add an option to disable all analog sensors polling. I have an quite old Intel SR2500 platform wich IPMI MC is stuck due to network polling. Now it does not respond to any requests - remote or local and server needs to be completely powered off (physically unplugged from outlet) to reboot MC. |
fcacd70
to
2ff2422
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rpv-tomsk,
thanks for all you work on this! The code looks really good! I left quite some review comments, but the only blocking ones are:
pthread_tis a struct on some platforms and we need to change some initialization and can't simply compare with 0.- The init() callback may be called more than once and the function needs to be able to handle this.
- The config needs to mention the legacy support.
Details for all of these are inline.
Best regards,
—octo
| The B<ipmi plugin> allows to monitor server platform status using the Intelligent | ||
| Platform Management Interface (IPMI). Local and remote interfaces are supported. | ||
|
|
||
| The plugin configuration consists of one or more B<Instance> blocks which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention that Instance blocks take a string argument. Maybe a synopsis of this plugin's config would be useful.
src/ipmi.c
Outdated
| #if COLLECT_DEBUG | ||
| case IPMI_LOG_DEBUG_START: | ||
| case IPMI_LOG_DEBUG: | ||
| fprintf(stderr, "ipmi plugin: %s\n", msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not DEBUG()?
| } else { | ||
| char errbuf[ERR_BUF_SIZE] = {0}; | ||
| ipmi_get_error_string(err, errbuf, sizeof(errbuf) - 1); | ||
|
|
||
| if (IPMI_IS_IPMI_ERR(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to differentiate these cases when we're printing the same info message in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These messages are different in the second line of output format:
"%s." vs "%s (%#x)." and in a source of error code (IPMI_GET_OS_ERR(err) / IPMI_GET_SOL_ERR(err)).
The ipmi_get_error_string() does not always put error code into its output, so that appended locally for unified and more detailed message.
src/ipmi.c
Outdated
| return -1; | ||
| } | ||
| if (st->connaddr != NULL) { | ||
| char *ip_addrs[1] = {NULL}, *ports[1] = {NULL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At your option: I find both of the below easier to read:
You can either initialize the arrays:
char *ip_addrs[] = {
strdup(st->connaddr),
};or simply use pointers:
char *ip_addr = strdup(st->connaddr);
ipmi_ip_setup_con(&ip_addr, /* … */);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this note!
I think there will be a tiny memleak with these strdup().
Also, their return value is not checked and ... they are unneded at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about the strdup() ;) If they're not needed, all the better, you can just pass &st->connaddr to ipmi_open_domain().
| } | ||
|
|
||
| sensor_list_remove_all(); | ||
| os_handler->free_os_handler(os_handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please set os_handler = NULL; just to be on the save side.
| } | ||
|
|
||
| ipmi_open_option_t open_option[1] = {[0] = {.option = IPMI_OPEN_OPTION_ALL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At your option:
ipmi_open_option_t opts[] = {
{.option = IPMI_OPEN_OPTION_ALL, .ival = 1},
#ifdef IPMI_OPEN_OPTION_USE_CACHE
/* OpenIPMI-2.0.17 and later */
{.option = IPMI_OPEN_OPTION_USE_CACHE, .ival = 1},
#endif
};
ipmi_open_domain(/* … */, opts, STATIC_ARRAY_SIZE(opts), /* … */);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks excellent, but I can't compile this...
Code:
ipmi_open_option_t opts[] = {
{.option = IPMI_OPEN_OPTION_ALL, .ival = 1},
#ifdef IPMI_OPEN_OPTION_USE_CACHE
/* OpenIPMI-2.0.17 and later: Disable SDR cache in local file */
{.option = IPMI_OPEN_OPTION_USE_CACHE, .ival = 0},
#endif
};
provides an error message:
src/ipmi.c:899: error: unknown field 'ival' specified in initializer
cc1: warnings being treated as errors
src/ipmi.c:899: error: missing braces around initializer
src/ipmi.c:899: error: (near initialization for 'opts[0].<anonymous>')
src/ipmi.c:902: error: unknown field 'ival' specified in initializer
I'm completely unfamiliar with a such syntax, sorry...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the struct has an unnamed union. This should work:
ipmi_open_option_t opts[] = {
{.option = IPMI_OPEN_OPTION_ALL, {.ival = 1}},
#ifdef IPMI_OPEN_OPTION_USE_CACHE
/* OpenIPMI-2.0.17 and later */
{.option = IPMI_OPEN_OPTION_USE_CACHE, {.ival = 0}},
#endif
};
ipmi_open_domain(/* … */, opts, STATIC_ARRAY_SIZE(opts), /* … */);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for update!
src/ipmi.c
Outdated
|
|
||
| ipmi_init(os_handler); | ||
| ip_addrs[0] = strdup(st->connaddr); | ||
| ports[0] = strdup(IPMI_LAN_STD_PORT_STR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most other plugins with an Address option also have a Port option to let the user override the default. Would that make sense for the ipmi plugin, too?
src/ipmi.c
Outdated
| @@ -476,6 +605,8 @@ static int sensor_threshold_event_handler( | |||
| enum ipmi_value_present_e value_present, unsigned int raw_value, | |||
| double value, void *cb_data, ipmi_event_t *event) { | |||
|
|
|||
| c_ipmi_instance_t *st = (c_ipmi_instance_t *)cb_data; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, at your option: you can implicitly cast void* to any other pointer type, i.e.:
c_ipmi_instance_t *st = cb_data;Here and below.
src/ipmi.c
Outdated
| notification_t n = {NOTIF_OKAY, cdtime(), "", "", "ipmi", "", "", "", NULL}; | ||
|
|
||
| sstrncpy(n.host, hostname_g, sizeof(n.host)); | ||
| sstrncpy(n.host, (st->host != NULL) ? st->host : hostname_g, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repeats at least two more times. Would it make sense to have a function that initializes a notification_t based on a c_ipmi_instance_t *? I.e.:
static notification_t notification_init(c_ipmi_instance_t const *st);
/* … */
notification_t n = notification_init(st);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about notification_init() from daemon/common.c?
It used only once, in virt plugin.
|
|
||
| If a IPMI connection state changes after initialization time of a minute | ||
| a notification is sent. Defaults to B<false>. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a note about legacy support, i.e. that any option other than Instance will trigger legacy config handling which skipts the <Instance> block and that this support will go away in the next major version of collectd.
|
Hi, Florian! Thanks for a review! I made some changes and push it here. |
8bce656
to
4207aaf
Compare
src/ipmi.c
Outdated
| return -1; | ||
| if (st->connaddr != NULL) { | ||
| status = ipmi_ip_setup_con( | ||
| &st->connaddr, (char * [1]){IPMI_LAN_STD_PORT_STR}, 1, st->authtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, at your option: I find &(char *){IPMI_LAN_STD_PORT_STR} easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find this syntax, but with no success.
Thanks, amended.
src/ipmi.c
Outdated
| c_ipmi_instance_t *st; | ||
| char callback_name[3 * DATA_MAX_NAME_LEN]; | ||
|
|
||
| os_handler = ipmi_posix_thread_setup_os_handler(SIGIO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init() is called every time the "kstat" structure changes, e.g. when devices are added to or removed from the system. As I said, it's a legacy and a bit broken and we should fix it, but it's out of scope for this PR.
a5ae096
to
e04af9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!
|
|
||
| static notification_t c_ipmi_notification_init(c_ipmi_instance_t const *st, | ||
| int severity) { | ||
| notification_t n = {severity, cdtime(), "", "", "ipmi", "", "", "", NULL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At your option: this would be a bit nicer with a named struct initializer:
notification_t n = {
.severity = severity,
.time = cdtime(),
.plugin = "ipmi",
};All the fields that are not explicitly initialized are guaranteed to be set to zero / NULL.
First |
|
Yeah! Many thanks to you for all what you have done for this! |
|
@rpv-tomsk congrats on the first merge :) Keep up the good work! |
|
Thanks, Maryam! |
Changes:
Please review and merge.
Superseedes #2023