Based off work by Tim Hockin @ Google, Inc, 2006
Documentation License:
Creative Commons Attribution 3.0 United States License
Modern firmware implementations often reserve a small region in non-volatile memory to store diagnostic and debugging information. However, there is no universal standard for the format. There is no standard location for the log to reside. And there is no standard, free/open-source code to store and retrieve records.
Google's firmware event log format is based off of a well-understood standard format -- SMBIOS System Event Log (Type 15). SMBIOS, however, usually exists strictly in RAM. We require that the event is stored persistently so that information about it can be retrieved at any later point in time, so methods have been defined to store the data in non-volatile memory (assumed to be flash memory).
This event log will store information about events critical to system operation. Such events include DRAM errors, bus failures, and arbitrary OEM-defined events such as basic kernel crash information and crisis recovery notification. The events which are logged are implementation-specific; that is, a given implementation might not actually recognize or commit log entries for every event.
SMBIOS: System Management BIOS. SMBIOS is a public specification which defines structured data about the system and methods for accessing that data.
What we want is a way to keep track of a few dozen or possibly hundreds of events persistently across power cycles and reboots.
We will use a log in the system flash (memory mapped), which can be safely shrunk and copied with a double-buffered copy-and-shrink operation (more in the "Log shrinking" section below). We will enable logging of significant events which occur any time the system is in a running state, whether it's booting or whether the OS is operational.
This will provide rich, detailed information about the recent history of a system, which can help us when diagnosing systemic problems, both during platform development as well as after deployment.
The SMBIOS specification defines that the event log consists of an arbitrary length header (0 length is valid), followed immediately by the log data. There are a number of options for log access method, the simplest of which is memory-mapped. We will be using the system flash (which is usually already memory mapped) to store the event log.
The SMBIOS Event Log specification is not perfect, and creates a few rather unfortunate limitations. The largest possible event log, including any header is 65535 (0xffff) bytes. We will make our log as large as allowed. The last byte of the 64 KB flash sector will be left as 0xff, which is conveniently the end-of-log marker.
Log data is stored as a series of variable-length event records in chronological order. The SMBIOS specification does not provide for ring-buffer semantics, which makes for a less than ideal failure mode. If the log reaches capacity, the log is simply marked as full. In order to avoid this failure mode we will check the log size whenever we write an event. If the log has crossed a size threshold, the log will be automatically shrunk and relocated (for a safe copy operation - see "Log shrinking" below).
The event log header is found at the start of the log area. It can be either a standard format or an OEM-specific format. We will define it as an OEM-specific packed structure with the following layout:
struct elog_header {
uint32_t elog_magic; /* "ELOG" when read as bytes, 0x474f4c45 when read as a DWORD */
int32_t elog_sequence; /* log sequence number */
uint8_t elog_version; /* log header version */
uint8_t elog_size; /* log header size */
uint8_t reserved1; /* reserved (0xff) */
uint8_t reserved2; /* reserved (0xff) */
uint8_t elog_data[0]; /* log data */
};
The elog_magic field is used to identify the active log area. Since we will be using a two buffer areas, the signature could be at either of two locations.
The elog_version field indicates what the layout of the log header is. This document describes the version 1 header.
The elog_size field indicates the size of the elog_header, in bytes.
The most significant bit of the elog_sequence indicates whether the value is valid or not: if the bit is set (i.e. the sequence number is negative), then the sequence number is not valid. The most significant byte of the sequence number must be the very last byte written when relocating the log, which will ensure that the entire log has been committed to flash before the header is considered valid. In the case of a mid-copy failure, it is possible that two instances of the magic number or even two valid elog_header structures could exist in the flash. The larger sequence number determines which header is correct.
The elog_sequence plus the number of events in the log yields the total event count. The event count must be essentially monotonic. The only times the event count may move backwards is when the log is cleared, in which case the elog_sequence becomes 0, and in the case of a wrap-around.
The following pseudo-code describes the BIOS log-locating algorithm:
elog_header *find_elog() {
hdr1 = pointer to log area #1
hdr2 = pointer to log area #2
final_hdr = NULL
if (hdr1.elog_magic == "ELOG" && hdr1.eloq_sequence > 0)
hdr1_valid = TRUE
if (hdr2.elog_magic == "ELOG" && hdr2.eloq_sequence > 0)
hdr2_valid = TRUE
/* if neither area is valid, initialize one of them */
if (!hdr1_valid && !hdr2_valid) {
reinit_elog_area(hdr1)
hdr1_valid = TRUE
}
if (hdr1_valid && !hdr2_valid) {
final_hdr = hdr1
goto check_version
}
if (hdr2_valid && !hdr1_valid) {
final_hdr = hdr2
goto check_version
}
/*
* If we get here, they both appear to be valid. Handling
* wrap-around is left as an exercise for the implementer,
* though is probably not necessary.
*/
final_hdr = hdr1
if (hdr2.elog_sequence > hdr1.elog_sequence) {
final_hdr= hdr2
}
check_version:
/*
* We specifically check this here, rather than earlier to avoid
* a potential error case when both logs appear valid, but have
* different versions.
*
* We could check for <= and then adapt old versions, but it's
* probably not worth it (yet).
*/
if (final_hdr.elog_version != ELOG_HEADER_VER) {
reinit_elog_area(final_hdr)
}
/* make sure the other header invalid */
if (final_hdr == hdr1)
erase_elog_header(hdr2)
if (final_hdr == hdr2)
erase_elog_header(hdr1)
return final_hdr
}
The SMBIOS specification defines the layout of an event structure. Each event consists of a well defined, fixed size structure and an optional variable length data payload. The fixed size portion of an SMBIOS event can be interpreted as a packed structure of the following form:
struct {
uint8_t event_id;
uint8_t event_size;
uint8_t event_year;
uint8_t event_month;
uint8_t event_day;
uint8_t event_hour;
uint8_t event_minute;
uint8_t event_second;
uint8_t event_data[0];
};
The event_id field is a value defined in the "Event types" section below.
The event_size field defines the total size of the event, including the fixed and variable data portions.
The event_time.md field is the timestamp of the event. Each byte is a BCD value. The six bytes represent year, month, day, hour, minute, second.
The event_data.md field is the variable length data payload. The contents of this field are detailed in the "Event payloads" section below. Every event has at least 1 byte of payload. The final (or only) byte of payload is used as a sum-to-zero byte. All valid events will have a byte checksum of zero. There is still a small chance that a corrupted event, including corrupted data could sum to zero, but this is probably negligible. If we want to be even more bulletproof, we could make the checksum be 2 bytes. If the 1 byte sum is < 0xff, the second checksum byte is 0. If the 1 byte sum is exactly 0xff, then the first checksum byte is 0xfe and the second checksum byte is 1. This is probably not worth the complexity or storage cost of 1 byte per event.
The SMBIOS specification defines several standard types and leaves room for OEM defined types. Below is the list of SMBIOS standard events (As of version 2.6 of the spec):
Event ID | Meaning |
---|---|
0x00 | Reserved |
0x01 | Single-bit ECC error |
0x02 | Multi-bit ECC error |
0x03 | Memory parity error |
0x04 | Bus timeout |
0x05 | IO channel check |
0x06 | Software NMI |
0x07 | POST memory resize |
0x08 | POST error |
0x09 | PCI parity error |
0x0A | PCI system error |
0x0B | CPU failure |
0x0C | EISA FailSafe Timer timeout |
0x0D | Correctable memory log disabled |
0x0E | Specific event type log disabled |
0x10 | System limit exceeded (e.g. temperature) |
0x11 | Async HW timer (WDT) timeout |
0x12 | System configuration information |
0x13 | Hard disk information |
0x14 | System reconfigured |
0x15 | Uncorrectable CPU-complex error |
0x16 | Log area reset/cleared |
0x17 | System boot |
0x80-0xFE | OEM-specific events |
0xFF | End of log |
The SMBIOS specification allows for a variable-length data payload on each event. This section details the payload for each event. If an event is not detailed here, there is no data payload other than the checksum. All payloads are assumed to be packed structures.
This event indicates that a single bit ECC (SBE) memory error had occurred. The event payload dimm_number
indicates the DIMM that failed.
struct {
uint8_t dimm_number; // Failed dimm number
uint8_t checksum;
};
This event indicates that a multi-bit ECC (MBE) memory error had occurred. The event payload is same as in Event 0x01.
struct {
uint8_t dimm_number; // Failed dimm number
uint8_t checksum;
};
This event indicates that a memory parity error had occurred. The event payload is same as in Event 0x01.
struct {
uint8_t dimm_number; // Failed dimm number
uint8_t checksum;
};
This event indicates that a bus timeout had occurred while accessing a bus. The payload which
field indicates the type of the bus such as PCI,
I2C etc. The sub_type
field refers to the instance of the bus (say AMB I2C etc).
struct {
uint8_t which;
uint16_t sub_type;
uint8_t checksum;
};
This event indicates that an error had occurred in an I/O channel in the system. The payload which
field indicates type of the error as follows:
- 0x0001 - Syncflood
- 0x0002 - CRC error
The device
field contains the PCI function address (PFA) of the device reporting the error. The PFA is encoded as:
- bits [15:8] - PCI bus number
- bits [7:3] - PCI device number
- bits [2-0] - PCI function number
struct {
uint8_t which;
uint16_t device;
uint8_t checksum;
};
This payload-less event indicates that a S/W NMI had occurred.
struct {
uint8_t checksum;
};
Note: All events contains checksum as standard payload
This event indicates that memory resizing had occurred during the firmware POST. There is no payload associated with this event except checksum.
struct {
uint8_t checksum;
};
This event lists the POST errors occurred during the boot process and reports them in a bit pattern defined in SMBIOS specification.
struct {
uint32_t post_error_map; // POST error bit map
uint8_t checksum;
};
These events indicates that PCI PERR/SERR had occurred and lists the device address in device
field of the payload.
struct {
uint16_t device;
uint8_t checksum;
};
This event indicates that a CPU failure had occurred. The nature of the error is indicated in sub_type
field as follows:
- 0x01 - CPU mismatch
- 0x02 - CPU IERR# assertion
- 0x03 - CPU BINIT# assertion
The cpu_number
field identifies the CPU that caused the failure.
struct {
uint8_t sub_type;
uint16_t cpu_number;
uint8_t checksum;
};
This event indicates a timeout had occurred in the EISA bus. This event does not have any payload other than checksum.
struct {
uint8_t checksum;
};
This event indicates that the logging of correctable memory errors had been disabled. This event does not have any payload other than checksum.
struct {
uint8_t checksum;
};
struct {
uint8_t event_type;
uint8_t checksum;
};
The event_type
field indicates which type of logs are disabled. This event occurs when a particular event happens too frequently and the logger decided to stop logging it to avoid event log full condition (all events are of same type!).
struct {
uint8_t which;
uint8_t checksum;
};
The which
field points to the parameter that exceeded the system limit. The values for this field are not defined.
struct {
uint8_t timer;
uint8_t checksum;
};
The timer field indicates which watchdog timer indicated a timeout. Known values for this field are:
- 0x0001 - TCO watchdog
- 0x0002 - .Net watchdog
No payload format is defined for these events.
struct {
uint8_t which;
uint8_t checksum;
};
The which field indicates which aspect of the system was reconfigured. Known values for this field are:
- 0x0001 - DIMMs reconfigured
struct {
uint8_t subtype;
uint16_t cpu_number;
uint8_t checksum;
};
The subtype
field indicates type of the CPU error. cpu_number
identifies the CPU that generated the error.
struct {
uint16_t bytes;
uint32_t boot_num;
uint8_t checksum;
};
The bytes field indicates how many bytes of the log area were discarded.
NOTE: Since the number of bytes cleared cannot be 0, the count field is 0-based. That is, the value of 0 indicates 1 byte is cleared. This way we can indicate values from 1 to 64K (0xFFFF). Also it is decided not to increase the size of bytes field from uint16_t to uint32_t because the current elog size can be only 64K.
struct {
uint32_t bootnum;
uint8_t checksum;
};
The bootnum
field contains the sequential boot number of the boot being logged. This will always be the first event logged during boot up.
Because the SMBIOS specification does not provide ring-buffer semantics, we want to avoid the "log full" scenario if at all possible. When writing new events to the log, we must ensure that there is space available for the new event. If there is not space available, then the log must be shrunk and re-written.
Because flash bits are only programmable from '1' to '0', it is effectively a write-once medium, with sector erases required in order to re-write. Flash chips have a variety of erasable sector sizes. Because we have multiple parts, we must be ready to handle the worst case - 64 KB erasable areas. That means that you can only erase the device 64 KB at a time. There is no way to erase less than that (on at least some flash chips). When we need to shrink the log, we need to avoid any time window where the log is not present on the flash chip. To do this, we will use a double-buffered copy operation. When we near capacity on one 64 KB log, we will copy a portion of the log to a second flash block, and then indicate that the second block is valid. Only after the new log is committed to the flash medium can we erase the old log. This should provide a reliable way of shrinking the log without risking a corrupted or missing log.
Shrinking the log will not be a fast operation. We want to avoid doing the log shrink at run-time if at all possible. To do this, we will evaluate the fullness of the log at bootup time. If the log is found to have crossed the "fullness threshold" (defined below), we can do a log shrink immediately, which will hopefully reduce the need to do log shrinks when logging events. This does not eliminate the need to check for space when logging an event, but should reduce the likelihood of not finding space.
To shrink the log, we scan forward from the start of the log until we cross the shrink threshold. Because we only want to discard whole events, we might discard slightly more than the nominal shrink size. We can then copy the remainder of the log to the new location. The following pseudo-code describes the BIOS log-shrinking algorithm in more detail: elog_header *shrink_elog(elog_header *src_hdr) { if (src_hdr == log area #1) { dst_hdr = log area #2 } else { dst_hdr = log_area #1 }
src_offset = 0
nevents_dropped = 0
while (src_offset < ELOG_SHRINK_SIZE) {
src_offset += event_size(src_hdr.elog_data + src_offset)
nevents_dropped++
}
log_size -= src_offset
erase_flash(dst_hdr)
copy_flash(dst_hdr.elog_data, src_hdr.elog_data + src_offset, log_size)
copy_header(dst_hdr, src_hdr) /* not the sequence # */
log_event(dst_hdr, partial clear)
write_sequence_number(dst_hdr, src_hdr.elog_sequence + nevents_dropped)
erase_elog_header(src_hdr)
}
The following constants are examples for a flash part with sector size of 64KB: /* * The SMBIOS spec defines the log area size (including header) as a 16 bit field. */ #define ELOG_TOTAL_SIZE 0xffff #define ELOG_SHRINK_SIZE 0x4000 #define ELOG_FULL_THRESHOLD 0xf000```