Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Linux kernel machine check handling middleware

branch: master

Merge pull request #3 from bldewolf/master

Fix for AMD family 15 detection
latest commit 1532cc5ce6
andikleen authored March 25, 2014
Octocat-spinner-32 input add xen test file too May 20, 2011
Octocat-spinner-32 tests Clean all test files in make clean May 02, 2013
Octocat-spinner-32 triggers triggers: fix cache-error-trigger to use AFFECTED_CPUS November 11, 2010
Octocat-spinner-32 .gitignore Add more files to .gitignore May 11, 2009
Octocat-spinner-32 CHANGES documentation updates October 25, 2010
Octocat-spinner-32 Makefile Add Ivy Bridge support to mcelog January 18, 2013
Octocat-spinner-32 README documentation updates October 25, 2010
Octocat-spinner-32 README.releases document new release scheme October 20, 2013
Octocat-spinner-32 TODO Initial import of mcelog-0.8pre + some old patches September 04, 2008
Octocat-spinner-32 TODO-diskdb Hook up diskdb memory error for intel May 11, 2009
Octocat-spinner-32 bitfield.c Don't drop MSB in number fields October 06, 2009
Octocat-spinner-32 bitfield.h Move test_prefix() definition from p4.c to bitfield.h September 18, 2012
Octocat-spinner-32 cache.c Fix cache map parsing May 02, 2013
Octocat-spinner-32 cache.h Initial yellow bit support November 26, 2009
Octocat-spinner-32 client.c Add GPL headers to trigger.c / client.c November 26, 2009
Octocat-spinner-32 client.h Initial client support November 26, 2009
Octocat-spinner-32 config.c Merge branch 'master' of git://git.kernel.org/pub/scm/utils/cpu/mce/m… March 01, 2010
Octocat-spinner-32 config.h Move credentials setup into config.c November 30, 2009
Octocat-spinner-32 core2.c Update core2/P6old support September 10, 2008
Octocat-spinner-32 core2.h Update core2/P6old support September 10, 2008
Octocat-spinner-32 db.c Add notes that old diskdb files are obsolete November 26, 2009
Octocat-spinner-32 db.h Initial import of mcelog-0.8pre + some old patches September 04, 2008
Octocat-spinner-32 dbquery.c Use memutil functions everywhere May 22, 2009
Octocat-spinner-32 dimm.c Add notes that old diskdb files are obsolete November 26, 2009
Octocat-spinner-32 dimm.h Initial import of mcelog-0.8pre + some old patches September 04, 2008
Octocat-spinner-32 diskdb.c Add notes that old diskdb files are obsolete November 26, 2009
Octocat-spinner-32 diskdb.h Disable on disk DIMM database by default May 10, 2009
Octocat-spinner-32 dmi.c mcelog: accept large SMBIOS tables March 21, 2012
Octocat-spinner-32 dmi.h Close /dev/mem file descriptor when not needed anymore December 04, 2009
Octocat-spinner-32 dunnington.c Fix incorrect dunnington specific decoding September 07, 2009
Octocat-spinner-32 dunnington.h Add Dunnington support September 16, 2008
Octocat-spinner-32 eventloop.c Handle old glibc without ppoll December 27, 2009
Octocat-spinner-32 eventloop.h Wait for children in non daemon mode November 26, 2009
Octocat-spinner-32 intel.c mcelog: Add the model number of Haswell server March 07, 2014
Octocat-spinner-32 intel.h Add Haswell client cpuids (0x3C, 0x45 and 0x46) June 21, 2013
Octocat-spinner-32 ivy-bridge.c mcelog: Add missing entry to Ivy Bridge memory controller decode table September 11, 2013
Octocat-spinner-32 ivy-bridge.h Add Ivy Bridge support to mcelog January 18, 2013
Octocat-spinner-32 k8.c Add support for calling icc static verifier May 23, 2009
Octocat-spinner-32 k8.h Enable -Wextra and some more warnings and clean them in code May 23, 2009
Octocat-spinner-32 leaky-bucket.c Revert "mcelog: Make leaky-bucket log more sane." December 20, 2013
Octocat-spinner-32 leaky-bucket.h Make threshold logging configurable November 30, 2009
Octocat-spinner-32 list.h Initial memdb support November 26, 2009
Octocat-spinner-32 lk10-mcelog.pdf Add LK10 mcelog paper October 17, 2010
Octocat-spinner-32 mce.pdf Initial import of mcelog-0.8pre + some old patches September 04, 2008
Octocat-spinner-32 mcelog.8 Modify --daemon logic to allow syslog-only output March 25, 2014
Octocat-spinner-32 mcelog.c Merge pull request #3 from bldewolf/master March 25, 2014
Octocat-spinner-32 mcelog.conf Fix socket-tracing typo in default configuration July 11, 2011
Octocat-spinner-32 mcelog.cron Initial import of mcelog-0.8pre + some old patches September 04, 2008
Octocat-spinner-32 mcelog.h Add Haswell client cpuids (0x3C, 0x45 and 0x46) June 21, 2013
Octocat-spinner-32 mcelog.init Write pidfile by default in daemon mode February 26, 2010
Octocat-spinner-32 mcelog.logrotate Reopen log files on SIGUSR1 in daemon mode February 26, 2010
Octocat-spinner-32 memdb.c Add method to lookup whether a DIMM exists in memdb September 18, 2012
Octocat-spinner-32 memdb.h Add method to lookup whether a DIMM exists in memdb September 18, 2012
Octocat-spinner-32 memutil.c Check for out of memory in asprintf February 27, 2010
Octocat-spinner-32 memutil.h Add xalloc_nonzero() to memutil May 24, 2009
Octocat-spinner-32 msg.c Reopen log files on SIGUSR1 in daemon mode February 26, 2010
Octocat-spinner-32 msg.h Reopen log files on SIGUSR1 in daemon mode February 26, 2010
Octocat-spinner-32 msr.c Add Ivy Bridge support to mcelog January 18, 2013
Octocat-spinner-32 nehalem.c Add Xeon75xx support January 21, 2010
Octocat-spinner-32 nehalem.h Add Xeon75xx support January 21, 2010
Octocat-spinner-32 p4.c mcelog: Decode new simple error code number 6 February 19, 2014
Octocat-spinner-32 p4.h Pass socketid to cache error trigger November 26, 2009
Octocat-spinner-32 page.c Add method to lookup whether a DIMM exists in memdb September 18, 2012
Octocat-spinner-32 page.h Fix CMCI overflow count handling January 21, 2010
Octocat-spinner-32 paths.h Write pidfile by default in daemon mode February 26, 2010
Octocat-spinner-32 rbtree.c Initial page predictive failure analysis support November 26, 2009
Octocat-spinner-32 rbtree.h Initial page predictive failure analysis support November 26, 2009
Octocat-spinner-32 sandy-bridge.c Add Memory Controller decode for SandyBridge-EP platform November 05, 2012
Octocat-spinner-32 sandy-bridge.h Add support for Sandy Bridge extended error logging September 18, 2012
Octocat-spinner-32 server.c Lower size of the ctl buffer to avoid potential DOS March 10, 2010
Octocat-spinner-32 server.h Initial memdb support November 26, 2009
Octocat-spinner-32 sysfs.c Fix fstat warning August 04, 2011
Octocat-spinner-32 sysfs.h Move sysfs write functions from page.c to sysfs.c November 28, 2009
Octocat-spinner-32 trigger.c mcelog: Abstract forking triggers for reuse September 09, 2010
Octocat-spinner-32 trigger.h mcelog: Abstract forking triggers for reuse September 09, 2010
Octocat-spinner-32 tsc.c Enable -Wextra and some more warnings and clean them in code May 23, 2009
Octocat-spinner-32 tsc.h Enable -Wextra and some more warnings and clean them in code May 23, 2009
Octocat-spinner-32 tulsa.c Add Tulsa support for Cache Bus controller and Bus and Interconnect E… September 07, 2009
Octocat-spinner-32 tulsa.h Add Intel Xeon 71xx (Tulsa) MCA decoding support May 05, 2009
Octocat-spinner-32 version.h Upgrade version number November 28, 2009
Octocat-spinner-32 xeon75xx.c Remove old xeon75xx aux format support November 11, 2010
Octocat-spinner-32 xeon75xx.h Add Xeon75xx support January 21, 2010
Octocat-spinner-32 yellow.c Clarify syslog yellow bit warning message February 27, 2010
Octocat-spinner-32 yellow.h Pass socketid to cache error trigger November 26, 2009
README
mcelog is the user space backend for logging machine check errors
reported by the hardware to the kernel. The kernel does the immediate
actions (like killing processes etc.) and mcelog decodes the errors
and manages various other advanced error responses like
offlining memory, CPUs or triggering events.

It primarily handles machine checks and thermal events, which
are reported for errors detected by the CPU.

It is recommended that mcelog runs on all x86 machines, both
64bit (since early 2.6) and 32bit (since 2.6.32)

mcelog can run in several modi: cronjob, trigger, daemon

cronjob is the old method. mcelog runs every 5 minutes from cron and checks
for errors. Disadvantage of this is that it can delay error reporting 
significantly (upto 10 minutes) and does not allow mcelog to keep extended state.

trigger is a newer method where the kernel runs mcelog on a error.
This is configured with 
echo /usr/sbin/mcelog > /sys/devices/system/machinecheck/machinecheck0/trigger
This is faster, but still doesn't allow mcelog to keep state,
and has relatively high overhead for each error because a program has
to be initialized from scratch.

In daemon mode mcelog runs continuously as a daemon in the background
and wait for errors. It is enabled by running mcelog --daemon & 
from a init script. This is the fastest and most feature-ful.

The recommended mode is daemon, because several new functions (like page error
predictive failure analysis) require a continuously running daemon.

Documentation:

The primary reference documentation are the man pages.
lk10-mcelog.pdf has a overview over the errors mcelog handles
(originally from Linux Kongress 2010)
mce.pdf is a very old paper describing the first releases of mcelog
(some parts are obsolete)

For distributors:

Please install a init script by default that runs mcelog in daemon mode.
The mcelog.init script is a good starting point.

Also install a logrotated file (mcelog.logrotate) or equivalent 
when mcelog is running in daemon mode. 

These two are not in make install.

The installation also requires a config file (/etc/mcelog.conf) and
the default triggers. These are all installed by "make install"

/dev/mcelog is needed for mcelog operation
If it's not there it can be created with mknod /dev/mcelog c 10 227
Normally it should be created automatically in udev.

Security:

mcelog needs to run as root because it might trigger actions like
page-offlining, which require CAP_SYS_ADMIN. Also it opens /dev/mcelog
and a unix socket for client support.

It also opens /dev/mem to parse the BIOS DMI tables. It is careful
to close the file descriptor and unmap any mappings after using them.

There is support for changing the user in daemon mode after opening
the device and the sockets, but that would stop triggers from
doing corrective action that require root.

In principle it would be possible to only keep CAP_SYS_ADMIN
for page-offling, but that would prevent triggers from doing root
only actions not covered by it (and CAP_SYS_ADMIN is not that different 
from full root)

In daemon mode mcelog listens to a unix socket and processes
requests from mcelog --client. This can be disabled in the configuration file.
The uid/gid of the requestor is checked on access and is configurable
(default 0/0 only). The command parsing code is very straight forward
(server.c) The client parsing/reply is currently done with full privileges
of the daemon.

Testing:

There is a simple test suite in tests/. The test suite requires root to 
run and access to mce-inject and a kernel with MCE injection support 
(CONFIG_X86_MCE_INJECT).  It will kill any running mcelog daemon.

Run it with "make test"

The test suite requires the mce-inject tool, available from
git://git.kernel.org/pub/utils/cpu/mce/mce-inject.git
The mce-inject executable must be either in $PATH or in the
../mce-inject directory.

You can also test under valgrind with "make valgrind-test". For 
this valgrind needs to be installed of course.  Advanced
valgrind options can be specified with 
make VALGRIND="valgrind --option" valgrind-test

Other checks:

make iccverify and make clangverify run the static verifiers
in clang and icc respectively.

License:

This program is licensed under the subject of the GNU Public General
License, v.2

Something went wrong with that request. Please try again.