Skip to content

Commit

Permalink
pcie_errors: plugin to read PCIe errors
Browse files Browse the repository at this point in the history
The pcie plugin collects PCI Express errors from Device Status in Capability
structure and from Advanced Error Reporting Extended Capability where available.
At every read it polls config space of PCI Express devices and dispatches
notification for every error that is found. Notif OK is sent after error is cleared.

Change-Id: I559f4035df76ab2934969a3c46cd4e98b93aba9a
Signed-off-by: Kamil Wiatrowski <kamilx.wiatrowski@intel.com>
  • Loading branch information
kwiatrox committed Jun 8, 2018
1 parent 0706e1c commit 8146785
Show file tree
Hide file tree
Showing 8 changed files with 1,450 additions and 0 deletions.
18 changes: 18 additions & 0 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -1381,6 +1381,24 @@ ovs_stats_la_LDFLAGS = $(PLUGIN_LDFLAGS) $(BUILD_WITH_LIBYAJL_LDFLAGS)
ovs_stats_la_LIBADD = $(BUILD_WITH_LIBYAJL_LIBS)
endif

if BUILD_PLUGIN_PCIE_ERRORS
pkglib_LTLIBRARIES += pcie_errors.la
pcie_errors_la_SOURCES = src/pcie_errors.c
pcie_errors_la_CPPFLAGS = $(AM_CPPFLAGS)
pcie_errors_la_LDFLAGS = $(PLUGIN_LDFLAGS)

test_plugin_pcie_errors_SOURCES = \
src/pcie_errors_test.c \
src/daemon/utils_llist.c \
src/daemon/configfile.c \
src/daemon/types_list.c
test_plugin_pcie_errors_CPPFLAGS = $(AM_CPPFLAGS)
test_plugin_pcie_errors_LDFLAGS = $(PLUGIN_LDFLAGS)
test_plugin_pcie_errors_LDADD = liboconfig.la libplugin_mock.la
check_PROGRAMS += test_plugin_pcie_errors
TESTS += test_plugin_pcie_errors
endif

if BUILD_PLUGIN_PERL
pkglib_LTLIBRARIES += perl.la
perl_la_SOURCES = src/perl.c
Expand Down
4 changes: 4 additions & 0 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,10 @@ Features
OVS documentation.
<http://openvswitch.org/support/dist-docs/INSTALL.rst.html>

- pcie_errors
Read errors from PCI Express Device Status and AER extended capabilities.
<https://www.design-reuse.com/articles/38374/pcie-error-logging-and-handling-on-a-typical-soc.html>

- perl
The perl plugin implements a Perl-interpreter into collectd. You can
write your own plugins in Perl and return arbitrary values using this
Expand Down
13 changes: 13 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,12 @@ if test "x$ac_system" = "xLinux"; then
AC_DEFINE([HAVE_CAPABILITY], [1], [Define to 1 if you have cap_get_proc() (-lcap).])
fi

# For pcie_errors plugin
AC_CHECK_HEADERS([linux/pci_regs.h],
[have_pci_regs_h="yes"],
[have_pci_regs_h="no (linux/pci_regs.h not found)"]
)

else
have_linux_raid_md_u_h="no"
have_linux_wireless_h="no"
Expand Down Expand Up @@ -6229,6 +6235,7 @@ plugin_nfs="no"
plugin_numa="no"
plugin_ovs_events="no"
plugin_ovs_stats="no"
plugin_pcie_errors="no"
plugin_perl="no"
plugin_pinba="no"
plugin_processes="no"
Expand Down Expand Up @@ -6307,6 +6314,10 @@ if test "x$ac_system" = "xLinux"; then
plugin_ovs_events="yes"
plugin_ovs_stats="yes"
fi
if test "x$have_pci_regs_h" = "xyes"; then
plugin_pcie_errors="yes"
fi
fi
if test "x$ac_system" = "xOpenBSD"; then
Expand Down Expand Up @@ -6684,6 +6695,7 @@ AC_PLUGIN([openvpn], [yes], [OpenVPN client stat
AC_PLUGIN([oracle], [$with_oracle], [Oracle plugin])
AC_PLUGIN([ovs_events], [$plugin_ovs_events], [OVS events plugin])
AC_PLUGIN([ovs_stats], [$plugin_ovs_stats], [OVS statistics plugin])
AC_PLUGIN([pcie_errors], [$plugin_pcie_errors], [PCIe errors plugin])
AC_PLUGIN([perl], [$plugin_perl], [Embed a Perl interpreter])
AC_PLUGIN([pf], [$have_net_pfvar_h], [BSD packet filter (PF) statistics])
# FIXME: Check for libevent, too.
Expand Down Expand Up @@ -7105,6 +7117,7 @@ AC_MSG_RESULT([ openvpn . . . . . . . $enable_openvpn])
AC_MSG_RESULT([ oracle . . . . . . . $enable_oracle])
AC_MSG_RESULT([ ovs_events . . . . . $enable_ovs_events])
AC_MSG_RESULT([ ovs_stats . . . . . . $enable_ovs_stats])
AC_MSG_RESULT([ pcie_errors . . . . . $enable_pcie_errors])
AC_MSG_RESULT([ perl . . . . . . . . $enable_perl])
AC_MSG_RESULT([ pf . . . . . . . . . $enable_pf])
AC_MSG_RESULT([ pinba . . . . . . . . $enable_pinba])
Expand Down
7 changes: 7 additions & 0 deletions src/collectd.conf.in
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@
#@BUILD_PLUGIN_ORACLE_TRUE@LoadPlugin oracle
#@BUILD_PLUGIN_OVS_EVENTS_TRUE@LoadPlugin ovs_events
#@BUILD_PLUGIN_OVS_STATS_TRUE@LoadPlugin ovs_stats
#@BUILD_PLUGIN_PCIE_ERRORS_TRUE@LoadPlugin pcie_errors
#@BUILD_PLUGIN_PERL_TRUE@LoadPlugin perl
#@BUILD_PLUGIN_PINBA_TRUE@LoadPlugin pinba
#@BUILD_PLUGIN_PING_TRUE@LoadPlugin ping
Expand Down Expand Up @@ -1130,6 +1131,12 @@
# Bridges "br0" "br_ext"
#</Plugin>

#<Plugin pcie_errors>
# Source "sysfs"
# ReportMasked false
# PersistentNotifications false
#</Plugin>

#<Plugin perl>
# IncludeDir "/my/include/path"
# BaseName "Collectd::Plugins"
Expand Down
47 changes: 47 additions & 0 deletions src/collectd.conf.pod
Original file line number Diff line number Diff line change
Expand Up @@ -6265,6 +6265,53 @@ Default: empty (monitor all bridges)

=back

=head2 Plugin C<pcie_errors>

The I<pcie_errors> plugin collects PCI Express errors from Device Status in Capability
structure and from Advanced Error Reporting Extended Capability where available.
At every read it polls config space of PCI Express devices and dispatches
notification for every error that is set. It checks for new errors at every read.
The device is indicated in plugin_instance according to format "domain:bus:dev.fn".
Errors are divided into categories indicated by type_instance: "correctable", and
for uncorrectable errors "non_fatal" or "fatal".
Fatal errros are reported as I<NOTIF_FAILURE> and all others as I<NOTIF_WARNING>.

B<Synopsis:>

<Plugin "pcie_errors">
Source "sysfs"
AccessDir "/sys/bus/pci"
ReportMasked false
PersistentNotifications false
</Plugin>

B<Options:>

=over 4

=item B<Source> B<sysfs>|B<proc>

Use B<sysfs> or B<proc> to read data from /sysfs or /proc.
The default value is B<sysfs>.

=item B<AccessDir> I<dir>

Directory used to access device config space. It is optional and defaults to
/sys/bus/pci for B<sysfs> and to /proc/bus/pci for B<proc>.

=item B<ReportMasked> B<false>|B<true>

If true plugin will notify errors that are set to masked in Error Mask register.
Such errors are not reported to the PCI Express Root Complex. Defaults to
B<false>.

=item B<PersistentNotifications> B<false>|B<true>

If false plugin will dispatch notfication only on set/clear of error.
The ones already reported will be ignored. Defaults to B<false>.

=back

=head2 Plugin C<perl>

This plugin embeds a Perl-interpreter into collectd and provides an interface
Expand Down
Loading

0 comments on commit 8146785

Please sign in to comment.