Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipmi plugin: Implemented IPMI LAN connection and plugin improved #2024

Merged
merged 9 commits into from Oct 10, 2017

Conversation

rpv-tomsk
Copy link
Contributor

@rpv-tomsk rpv-tomsk commented Nov 5, 2016

Changes:

  • Added instances support and IPMI LAN connection option
  • Added a notification on connection state change (notify about connection issues)
  • Removed the sensor removal in case of reading errors
  • Added the lock to do not request new readings until previous reading is complete
  • Disabled SDRs local cache
  • Added the check for sensor 'is readable' flag
  • Added the check for sensor event reading type (if sensor is discrete or not)
  • Added checks for sensor states 'scanning disabled' and 'initial update in progress'
  • Changes in error reporting

Please review and merge.

Superseedes #2023

@rpv-tomsk rpv-tomsk force-pushed the master-ipmi branch 2 times, most recently from 00c365b to cac7575 Compare November 5, 2016 07:29
@rpv-tomsk
Copy link
Contributor Author

ping

@3add3287
Copy link

3add3287 commented Aug 4, 2017

Is this ever going to get merged into master?

@rpv-tomsk
Copy link
Contributor Author

I will rebase this when needed by somebody, just ping me.

@3add3287
Copy link

3add3287 commented Aug 4, 2017

Ping ! ;-)

@rpv-tomsk
Copy link
Contributor Author

Rebased.

Please, put your feedbacks here. That will increase chanses this to be merged.

@3add3287
Copy link

3add3287 commented Aug 4, 2017

Wow, thanks for rebasing it so quickly!
Now the question is: what does it take the powers that be to merge it into collectd master?

* Added instances support
* Removed the sensor removal in case of reading errors
* Added the lock to do not request new readings until previous reading is complete
* Disabled SDRs local cache
* Added the check for sensors 'is readable' flag
* Added the check for sensor event reading type (if sensor is discrete or not)
* Added checks for sensor states 'scanning disabled' and 'initial update in progress'
* Changes in error reporting
@rpv-tomsk
Copy link
Contributor Author

Patch rebased to current master, updated with respect of changes recently added to ipmi.c
Unfortunately, I can't do nothing with CI build failure. That is a 'timestamp is in future' problem.

@octo octo added this to the 5.8 milestone Oct 8, 2017
@octo octo added the Feature label Oct 8, 2017
INFO("ipmi plugin: %s", msg);
break;
case IPMI_LOG_WARNING:
NOTICE("ipmi plugin: %s", msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not also be a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Maryam!

Thanks for your feedback.

The OpenIPMI/ipmi_log.h has the following explanation of error reporting levels:

/* General information about something that happens. */
    IPMI_LOG_INFO,

    /* An internal error (not reported directly to the user) occurred,
       but it's not a big deal, the system can still operate. */
    IPMI_LOG_WARNING,

    /* An internal error (not reported directly to the user) occured,
       and it will affect the operation of the system.  This system
       will still operation, but probably in a degraded mode. */
    IPMI_LOG_SEVERE,

    /* An internal error occured, and it is dangerous for the system
       to keep operating. */
    IPMI_LOG_FATAL,

    /* When returning errors to the user, this will often be used to
       report the general cause of the error if the cause would be
       difficult to determine from just the error return code. */
    IPMI_LOG_ERR_INFO,

By this explanation, and by typical use of Collectd WARNING() (), I think NOTICE() suits better.
Of course, I can be wrong...

It would be nice if Collectd will have the similar table of error logging level explanation. (I didn't find it.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

src/ipmi.c Outdated
} else {
char errbuf[128] = {0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure 128 is big enough? most other errbufs use 1024... I would also suggest replacing the magic # with a #defined value

#define ERR_BUF_SIZE 1024

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sources for ipmi_get_error_string return values are:

  1. strerror() + 4 bytes of prefix ("OS: ")
  2. ipmi_get_cc_string() (ipmi_ccodes[] array) + 6 bytes prefix - fits 128 bytes
  3. rmcpp_error_codes[] + 7 bytes prefix - fits 128 bytes
  4. sol_error_codes[] + 5 bytes prefix - fits 128 bytes
  5. "Success (No error)" string - fits 128 bytes
  6. "Unknown: Unknown" string - fits 128 bytes

The longest is strerror(). Will 1020 (1024 - 4) be enough for strerror()?

return;
}

vl.values = &(value_t){.gauge = value};
vl.values_len = 1;

if (st->host != NULL)
sstrncpy(vl.host, st->host, sizeof(vl.host));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think vl.host gets filled at a higher level usually? not sure if it's explicitly needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right - vl.host can be filled by hostname_g at higher level.
But in this case we need to set vl.host of the metric to the value set in configuration, not to the hostname_g value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

src/ipmi.c Outdated
c_ipmi_error("domain_connection_change_handler", err);

if (!still_connected)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would a simple info/debug message not be useful here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At practice, that will be reported by a c_ipmi_error() call at several lines above.
Also, that will be reported by c_ipmi_log() handler too.

An example:

[2017-10-08 19:09:49] ipmi plugin: one 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
[2017-10-08 19:09:49] ipmi plugin: one 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
[2017-10-08 19:09:49] ipmi plugin: domain_connection_change_handler failed for `one`: OS: Connection timed out

I have added a code to send Collectd notification in case of connection lost/restored and will publish it in updated PR.

src/ipmi.c Outdated
sfree(st);
} /* void c_ipmi_free_instance */

void c_ipmi_add_instance(c_ipmi_instance_t *instance) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using a llist_t for instances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there is a strong opinion/rules about this?
Why would you prefer a list instead of chain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no clear-cut advise/guideline for or against the generic linked list. I'd say that if you need to look up by (string) key or you'd need a double-linked list, then the generic implementation can abstract away some complexity.

HTH, —octo

Copy link
Contributor Author

@rpv-tomsk rpv-tomsk Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that if you need to look up by (string) key or you'd need a double-linked list

There is no such need here.

* Do not request reading unless domain is connected
* Report instance name in error messages
@rpv-tomsk
Copy link
Contributor Author

Updated.

@rpv-tomsk
Copy link
Contributor Author

Also I'm going to add an option to disable all analog sensors polling.
So, the Collectd will handle only SEL and will send notifications only, without metrics.
That will be an good use-case of Collectd as IPMI events translator.

I have an quite old Intel SR2500 platform wich IPMI MC is stuck due to network polling. Now it does not respond to any requests - remote or local and server needs to be completely powered off (physically unplugged from outlet) to reboot MC.

@rpv-tomsk rpv-tomsk force-pushed the master-ipmi branch 2 times, most recently from fcacd70 to 2ff2422 Compare October 9, 2017 19:21
Copy link
Member

@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rpv-tomsk,

thanks for all you work on this! The code looks really good! I left quite some review comments, but the only blocking ones are:

  • pthread_t is a struct on some platforms and we need to change some initialization and can't simply compare with 0.
  • The init() callback may be called more than once and the function needs to be able to handle this.
  • The config needs to mention the legacy support.

Details for all of these are inline.

Best regards,
—octo

The B<ipmi plugin> allows to monitor server platform status using the Intelligent
Platform Management Interface (IPMI). Local and remote interfaces are supported.

The plugin configuration consists of one or more B<Instance> blocks which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mention that Instance blocks take a string argument. Maybe a synopsis of this plugin's config would be useful.

src/ipmi.c Outdated
#if COLLECT_DEBUG
case IPMI_LOG_DEBUG_START:
case IPMI_LOG_DEBUG:
fprintf(stderr, "ipmi plugin: %s\n", msg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not DEBUG()?

} else {
char errbuf[ERR_BUF_SIZE] = {0};
ipmi_get_error_string(err, errbuf, sizeof(errbuf) - 1);

if (IPMI_IS_IPMI_ERR(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to differentiate these cases when we're printing the same info message in all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These messages are different in the second line of output format:

"%s." vs "%s (%#x)." and in a source of error code (IPMI_GET_OS_ERR(err) / IPMI_GET_SOL_ERR(err)).

The ipmi_get_error_string() does not always put error code into its output, so that appended locally for unified and more detailed message.

src/ipmi.c Outdated
return -1;
}
if (st->connaddr != NULL) {
char *ip_addrs[1] = {NULL}, *ports[1] = {NULL};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At your option: I find both of the below easier to read:

You can either initialize the arrays:

char *ip_addrs[] = {
  strdup(st->connaddr),
};

or simply use pointers:

char *ip_addr = strdup(st->connaddr);
ipmi_ip_setup_con(&ip_addr, /* … */);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this note!

I think there will be a tiny memleak with these strdup().
Also, their return value is not checked and ... they are unneded at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about the strdup() ;) If they're not needed, all the better, you can just pass &st->connaddr to ipmi_open_domain().

}

sensor_list_remove_all();
os_handler->free_os_handler(os_handler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set os_handler = NULL; just to be on the save side.

}

ipmi_open_option_t open_option[1] = {[0] = {.option = IPMI_OPEN_OPTION_ALL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At your option:

ipmi_open_option_t opts[] = {
  {.option = IPMI_OPEN_OPTION_ALL, .ival = 1},
#ifdef IPMI_OPEN_OPTION_USE_CACHE
  /* OpenIPMI-2.0.17 and later */
  {.option = IPMI_OPEN_OPTION_USE_CACHE, .ival = 1},
#endif
};
ipmi_open_domain(/* … */, opts, STATIC_ARRAY_SIZE(opts), /* … */);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks excellent, but I can't compile this...
Code:

  ipmi_open_option_t opts[] = {
    {.option = IPMI_OPEN_OPTION_ALL, .ival = 1},
#ifdef IPMI_OPEN_OPTION_USE_CACHE
    /* OpenIPMI-2.0.17 and later: Disable SDR cache in local file */
    {.option = IPMI_OPEN_OPTION_USE_CACHE, .ival = 0},
#endif
  };

provides an error message:

src/ipmi.c:899: error: unknown field 'ival' specified in initializer
cc1: warnings being treated as errors
src/ipmi.c:899: error: missing braces around initializer
src/ipmi.c:899: error: (near initialization for 'opts[0].<anonymous>')
src/ipmi.c:902: error: unknown field 'ival' specified in initializer

I'm completely unfamiliar with a such syntax, sorry...

Copy link
Member

@octo octo Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the struct has an unnamed union. This should work:

ipmi_open_option_t opts[] = {
  {.option = IPMI_OPEN_OPTION_ALL, {.ival = 1}},
#ifdef IPMI_OPEN_OPTION_USE_CACHE
  /* OpenIPMI-2.0.17 and later */
  {.option = IPMI_OPEN_OPTION_USE_CACHE, {.ival = 0}},
#endif
};
ipmi_open_domain(/* … */, opts, STATIC_ARRAY_SIZE(opts), /* … */);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for update!

src/ipmi.c Outdated

ipmi_init(os_handler);
ip_addrs[0] = strdup(st->connaddr);
ports[0] = strdup(IPMI_LAN_STD_PORT_STR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most other plugins with an Address option also have a Port option to let the user override the default. Would that make sense for the ipmi plugin, too?

src/ipmi.c Outdated
@@ -476,6 +605,8 @@ static int sensor_threshold_event_handler(
enum ipmi_value_present_e value_present, unsigned int raw_value,
double value, void *cb_data, ipmi_event_t *event) {

c_ipmi_instance_t *st = (c_ipmi_instance_t *)cb_data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, at your option: you can implicitly cast void* to any other pointer type, i.e.:

c_ipmi_instance_t *st = cb_data;

Here and below.

src/ipmi.c Outdated
notification_t n = {NOTIF_OKAY, cdtime(), "", "", "ipmi", "", "", "", NULL};

sstrncpy(n.host, hostname_g, sizeof(n.host));
sstrncpy(n.host, (st->host != NULL) ? st->host : hostname_g,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repeats at least two more times. Would it make sense to have a function that initializes a notification_t based on a c_ipmi_instance_t *? I.e.:

static notification_t notification_init(c_ipmi_instance_t const *st);
/* … */
notification_t n = notification_init(st);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about notification_init() from daemon/common.c?
It used only once, in virt plugin.


If a IPMI connection state changes after initialization time of a minute
a notification is sent. Defaults to B<false>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a note about legacy support, i.e. that any option other than Instance will trigger legacy config handling which skipts the <Instance> block and that this support will go away in the next major version of collectd.

@rpv-tomsk
Copy link
Contributor Author

Hi, Florian!

Thanks for a review! I made some changes and push it here.
I will add Port option support later, most other suggestions was accepted and code improved.

src/ipmi.c Outdated
return -1;
if (st->connaddr != NULL) {
status = ipmi_ip_setup_con(
&st->connaddr, (char * [1]){IPMI_LAN_STD_PORT_STR}, 1, st->authtype,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, at your option: I find &(char *){IPMI_LAN_STD_PORT_STR} easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find this syntax, but with no success.
Thanks, amended.

src/ipmi.c Outdated
c_ipmi_instance_t *st;
char callback_name[3 * DATA_MAX_NAME_LEN];

os_handler = ipmi_posix_thread_setup_os_handler(SIGIO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init() is called every time the "kstat" structure changes, e.g. when devices are added to or removed from the system. As I said, it's a legacy and a bit broken and we should fix it, but it's out of scope for this PR.

Copy link
Member

@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!


static notification_t c_ipmi_notification_init(c_ipmi_instance_t const *st,
int severity) {
notification_t n = {severity, cdtime(), "", "", "ipmi", "", "", "", NULL};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At your option: this would be a bit nicer with a named struct initializer:

notification_t n = {
  .severity = severity,
  .time = cdtime(),
  .plugin = "ipmi",
};

All the fields that are not explicitly initialized are guaranteed to be set to zero / NULL.

@rpv-tomsk rpv-tomsk merged commit 0587ef9 into collectd:master Oct 10, 2017
@octo
Copy link
Member

octo commented Oct 10, 2017

rpv-tomsk merged commit 0587ef9 into collectd:master

First post merge 🎉

@rpv-tomsk
Copy link
Contributor Author

Yeah! Many thanks to you for all what you have done for this!
And thanks for Collectd project itself!

@maryamtahhan
Copy link
Contributor

@rpv-tomsk congrats on the first merge :) Keep up the good work!

@rpv-tomsk
Copy link
Contributor Author

Thanks, Maryam!

@rpv-tomsk rpv-tomsk deleted the master-ipmi branch July 13, 2019 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants