Skip to content

Latest commit

 

History

History
444 lines (344 loc) · 18.9 KB

FirmwareEventLogDesign.md

File metadata and controls

444 lines (344 loc) · 18.9 KB

Based off work by Tim Hockin @ Google, Inc, 2006

Documentation License:

Creative Commons Attribution 3.0 United States License

Background

Modern firmware implementations often reserve a small region in non-volatile memory to store diagnostic and debugging information. However, there is no universal standard for the format. There is no standard location for the log to reside. And there is no standard, free/open-source code to store and retrieve records.

Google's firmware event log format is based off of a well-understood standard format -- SMBIOS System Event Log (Type 15). SMBIOS, however, usually exists strictly in RAM. We require that the event is stored persistently so that information about it can be retrieved at any later point in time, so methods have been defined to store the data in non-volatile memory (assumed to be flash memory).

This event log will store information about events critical to system operation. Such events include DRAM errors, bus failures, and arbitrary OEM-defined events such as basic kernel crash information and crisis recovery notification. The events which are logged are implementation-specific; that is, a given implementation might not actually recognize or commit log entries for every event.

Terminology

SMBIOS: System Management BIOS. SMBIOS is a public specification which defines structured data about the system and methods for accessing that data.

Overview

What we want is a way to keep track of a few dozen or possibly hundreds of events persistently across power cycles and reboots.

We will use a log in the system flash (memory mapped), which can be safely shrunk and copied with a double-buffered copy-and-shrink operation (more in the "Log shrinking" section below). We will enable logging of significant events which occur any time the system is in a running state, whether it's booting or whether the OS is operational.

This will provide rich, detailed information about the recent history of a system, which can help us when diagnosing systemic problems, both during platform development as well as after deployment.

Detailed Design

The SMBIOS specification defines that the event log consists of an arbitrary length header (0 length is valid), followed immediately by the log data. There are a number of options for log access method, the simplest of which is memory-mapped. We will be using the system flash (which is usually already memory mapped) to store the event log.

The SMBIOS Event Log specification is not perfect, and creates a few rather unfortunate limitations. The largest possible event log, including any header is 65535 (0xffff) bytes. We will make our log as large as allowed. The last byte of the 64 KB flash sector will be left as 0xff, which is conveniently the end-of-log marker.

Log data is stored as a series of variable-length event records in chronological order. The SMBIOS specification does not provide for ring-buffer semantics, which makes for a less than ideal failure mode. If the log reaches capacity, the log is simply marked as full. In order to avoid this failure mode we will check the log size whenever we write an event. If the log has crossed a size threshold, the log will be automatically shrunk and relocated (for a safe copy operation - see "Log shrinking" below).

Log header

The event log header is found at the start of the log area. It can be either a standard format or an OEM-specific format. We will define it as an OEM-specific packed structure with the following layout:

struct elog_header {
        uint32_t elog_magic;      /* "ELOG" when read as bytes, 0x474f4c45 when read as a DWORD */

        int32_t  elog_sequence;   /* log sequence number */

        uint8_t  elog_version;    /* log header version */
        uint8_t  elog_size;       /* log header size */
        uint8_t  reserved1;       /* reserved (0xff) */

        uint8_t  reserved2;       /* reserved (0xff) */

        uint8_t  elog_data[0];    /* log data */
};

The elog_magic field is used to identify the active log area. Since we will be using a two buffer areas, the signature could be at either of two locations.

The elog_version field indicates what the layout of the log header is. This document describes the version 1 header.

The elog_size field indicates the size of the elog_header, in bytes.

The most significant bit of the elog_sequence indicates whether the value is valid or not: if the bit is set (i.e. the sequence number is negative), then the sequence number is not valid. The most significant byte of the sequence number must be the very last byte written when relocating the log, which will ensure that the entire log has been committed to flash before the header is considered valid. In the case of a mid-copy failure, it is possible that two instances of the magic number or even two valid elog_header structures could exist in the flash. The larger sequence number determines which header is correct.

The elog_sequence plus the number of events in the log yields the total event count. The event count must be essentially monotonic. The only times the event count may move backwards is when the log is cleared, in which case the elog_sequence becomes 0, and in the case of a wrap-around.

The following pseudo-code describes the BIOS log-locating algorithm:

elog_header *find_elog() {
        hdr1 = pointer to log area #1
        hdr2 = pointer to log area #2
        final_hdr = NULL

        if (hdr1.elog_magic == "ELOG" && hdr1.eloq_sequence > 0)
                hdr1_valid = TRUE
        if (hdr2.elog_magic == "ELOG" && hdr2.eloq_sequence > 0)
                hdr2_valid = TRUE

        /* if neither area is valid, initialize one of them */
        if (!hdr1_valid && !hdr2_valid) {
                reinit_elog_area(hdr1)
                hdr1_valid = TRUE 
        }

        if (hdr1_valid && !hdr2_valid) { 
                final_hdr = hdr1
                goto check_version
        }
        if (hdr2_valid && !hdr1_valid) { 
                final_hdr = hdr2
               goto check_version
        }

        /*
         * If we get here, they both appear to be valid.  Handling
         * wrap-around is left as an exercise for the implementer,
         * though is probably not necessary.
         */
        final_hdr = hdr1
        if (hdr2.elog_sequence > hdr1.elog_sequence) {
                final_hdr= hdr2
        }
check_version:
        /*
         * We specifically check this here, rather than earlier to avoid
         * a potential error case when both logs appear valid, but have
         * different versions.
         *
         * We could check for <= and then adapt old versions, but it's
         * probably not worth it (yet).
         */
        if (final_hdr.elog_version != ELOG_HEADER_VER) {
                reinit_elog_area(final_hdr)
        }

        /* make sure the other header invalid */
        if (final_hdr == hdr1)
                erase_elog_header(hdr2)
        if (final_hdr == hdr2)
                erase_elog_header(hdr1)

        return final_hdr
}

Events

The SMBIOS specification defines the layout of an event structure. Each event consists of a well defined, fixed size structure and an optional variable length data payload. The fixed size portion of an SMBIOS event can be interpreted as a packed structure of the following form:

struct {
        uint8_t event_id;
        uint8_t event_size;
        uint8_t event_year;
        uint8_t event_month;
        uint8_t event_day;
        uint8_t event_hour;
        uint8_t event_minute;
        uint8_t event_second;
        uint8_t event_data[0];
};

The event_id field is a value defined in the "Event types" section below.

The event_size field defines the total size of the event, including the fixed and variable data portions.

The event_time.md field is the timestamp of the event. Each byte is a BCD value. The six bytes represent year, month, day, hour, minute, second.

The event_data.md field is the variable length data payload. The contents of this field are detailed in the "Event payloads" section below. Every event has at least 1 byte of payload. The final (or only) byte of payload is used as a sum-to-zero byte. All valid events will have a byte checksum of zero. There is still a small chance that a corrupted event, including corrupted data could sum to zero, but this is probably negligible. If we want to be even more bulletproof, we could make the checksum be 2 bytes. If the 1 byte sum is < 0xff, the second checksum byte is 0. If the 1 byte sum is exactly 0xff, then the first checksum byte is 0xfe and the second checksum byte is 1. This is probably not worth the complexity or storage cost of 1 byte per event.

Event types

The SMBIOS specification defines several standard types and leaves room for OEM defined types. Below is the list of SMBIOS standard events (As of version 2.6 of the spec):

Event ID Meaning
0x00 Reserved
0x01 Single-bit ECC error
0x02 Multi-bit ECC error
0x03 Memory parity error
0x04 Bus timeout
0x05 IO channel check
0x06 Software NMI
0x07 POST memory resize
0x08 POST error
0x09 PCI parity error
0x0A PCI system error
0x0B CPU failure
0x0C EISA FailSafe Timer timeout
0x0D Correctable memory log disabled
0x0E Specific event type log disabled
0x10 System limit exceeded (e.g. temperature)
0x11 Async HW timer (WDT) timeout
0x12 System configuration information
0x13 Hard disk information
0x14 System reconfigured
0x15 Uncorrectable CPU-complex error
0x16 Log area reset/cleared
0x17 System boot
0x80-0xFE OEM-specific events
0xFF End of log

EventPayloads

The SMBIOS specification allows for a variable-length data payload on each event. This section details the payload for each event. If an event is not detailed here, there is no data payload other than the checksum. All payloads are assumed to be packed structures.

Event 0x01 - Single bit ECC memory error

This event indicates that a single bit ECC (SBE) memory error had occurred. The event payload dimm_number indicates the DIMM that failed.

struct {
    uint8_t  dimm_number; // Failed dimm number
    uint8_t  checksum;
};

Event 0x02 - Multi bit ECC memory error

This event indicates that a multi-bit ECC (MBE) memory error had occurred. The event payload is same as in Event 0x01.

struct {
    uint8_t  dimm_number; // Failed dimm number
    uint8_t  checksum;
};

Event 0x03 - Memory parity error

This event indicates that a memory parity error had occurred. The event payload is same as in Event 0x01.

struct {
    uint8_t  dimm_number; // Failed dimm number
    uint8_t  checksum;
};

Event 0x04 - Bus timeout

This event indicates that a bus timeout had occurred while accessing a bus. The payload which field indicates the type of the bus such as PCI, I2C etc. The sub_type field refers to the instance of the bus (say AMB I2C etc).

struct {
    uint8_t  which;
    uint16_t sub_type;
    uint8_t  checksum;
};

Event 0x05 - IO channel check

This event indicates that an error had occurred in an I/O channel in the system. The payload which field indicates type of the error as follows:

  • 0x0001 - Syncflood
  • 0x0002 - CRC error

The device field contains the PCI function address (PFA) of the device reporting the error. The PFA is encoded as:

  • bits [15:8] - PCI bus number
  • bits [7:3] - PCI device number
  • bits [2-0] - PCI function number
struct {
    uint8_t  which;
    uint16_t device;
    uint8_t  checksum;
};

Event 0x06 - S/W NMI

This payload-less event indicates that a S/W NMI had occurred.

struct {
    uint8_t  checksum;
};

Note: All events contains checksum as standard payload

Event 0x07 - POST memory resize

This event indicates that memory resizing had occurred during the firmware POST. There is no payload associated with this event except checksum.

struct {
    uint8_t  checksum;
};

Event 0x08 - POST error event

This event lists the POST errors occurred during the boot process and reports them in a bit pattern defined in SMBIOS specification.

struct {
    uint32_t post_error_map; // POST error bit map
    uint8_t  checksum;
};

Event 0x09 - PCI PERR

Event 0x0A - PCI SERR

These events indicates that PCI PERR/SERR had occurred and lists the device address in device field of the payload.

struct {
    uint16_t device;
    uint8_t  checksum;
};

Event 0x0B - CPU failure

This event indicates that a CPU failure had occurred. The nature of the error is indicated in sub_type field as follows:

  • 0x01 - CPU mismatch
  • 0x02 - CPU IERR# assertion
  • 0x03 - CPU BINIT# assertion

The cpu_number field identifies the CPU that caused the failure.

struct {
    uint8_t  sub_type;
    uint16_t cpu_number;
    uint8_t  checksum;
};

Event 0x0C - EISA timeout

This event indicates a timeout had occurred in the EISA bus. This event does not have any payload other than checksum.

struct {
    uint8_t  checksum;
};

Event 0x0D - Correctable memory log disabled

This event indicates that the logging of correctable memory errors had been disabled. This event does not have any payload other than checksum.

struct {
    uint8_t  checksum;
};

Event 0x0E - Log disabled

struct {
    uint8_t  event_type;
    uint8_t  checksum;
};

The event_type field indicates which type of logs are disabled. This event occurs when a particular event happens too frequently and the logger decided to stop logging it to avoid event log full condition (all events are of same type!).

Event 0x10 - System limit exceeded

struct {
    uint8_t  which;
    uint8_t  checksum;
};

The which field points to the parameter that exceeded the system limit. The values for this field are not defined.

Event 0x11 - Async HW timer (WDT) timeout

struct {
    uint8_t  timer;
    uint8_t  checksum;
};

The timer field indicates which watchdog timer indicated a timeout. Known values for this field are:

  • 0x0001 - TCO watchdog
  • 0x0002 - .Net watchdog

Event 0x12 - System configuration information

Event 0x13 - HDD information

No payload format is defined for these events.

Event 0x14 - System reconfigured

struct {
    uint8_t  which;
    uint8_t  checksum;
};

The which field indicates which aspect of the system was reconfigured. Known values for this field are:

  • 0x0001 - DIMMs reconfigured

Event 0x15 - Uncorrectable CPU-complex error

struct {
    uint8_t  subtype;
    uint16_t cpu_number;
    uint8_t  checksum;
};

The subtype field indicates type of the CPU error. cpu_number identifies the CPU that generated the error.

Event 0x16 - Log area reset/cleared

struct {
    uint16_t bytes;
    uint32_t boot_num;
    uint8_t  checksum;
};

The bytes field indicates how many bytes of the log area were discarded.

NOTE: Since the number of bytes cleared cannot be 0, the count field is 0-based. That is, the value of 0 indicates 1 byte is cleared. This way we can indicate values from 1 to 64K (0xFFFF). Also it is decided not to increase the size of bytes field from uint16_t to uint32_t because the current elog size can be only 64K.

Event 0x17 - System boot

struct {
    uint32_t bootnum;
    uint8_t  checksum;
};

The bootnum field contains the sequential boot number of the boot being logged. This will always be the first event logged during boot up.

Log shrinking

Because the SMBIOS specification does not provide ring-buffer semantics, we want to avoid the "log full" scenario if at all possible. When writing new events to the log, we must ensure that there is space available for the new event. If there is not space available, then the log must be shrunk and re-written.

Because flash bits are only programmable from '1' to '0', it is effectively a write-once medium, with sector erases required in order to re-write. Flash chips have a variety of erasable sector sizes. Because we have multiple parts, we must be ready to handle the worst case - 64 KB erasable areas. That means that you can only erase the device 64 KB at a time. There is no way to erase less than that (on at least some flash chips). When we need to shrink the log, we need to avoid any time window where the log is not present on the flash chip. To do this, we will use a double-buffered copy operation. When we near capacity on one 64 KB log, we will copy a portion of the log to a second flash block, and then indicate that the second block is valid. Only after the new log is committed to the flash medium can we erase the old log. This should provide a reliable way of shrinking the log without risking a corrupted or missing log.

Shrinking the log will not be a fast operation. We want to avoid doing the log shrink at run-time if at all possible. To do this, we will evaluate the fullness of the log at bootup time. If the log is found to have crossed the "fullness threshold" (defined below), we can do a log shrink immediately, which will hopefully reduce the need to do log shrinks when logging events. This does not eliminate the need to check for space when logging an event, but should reduce the likelihood of not finding space.

To shrink the log, we scan forward from the start of the log until we cross the shrink threshold. Because we only want to discard whole events, we might discard slightly more than the nominal shrink size. We can then copy the remainder of the log to the new location. The following pseudo-code describes the BIOS log-shrinking algorithm in more detail: elog_header *shrink_elog(elog_header *src_hdr) { if (src_hdr == log area #1) { dst_hdr = log area #2 } else { dst_hdr = log_area #1 }

    src_offset = 0
    nevents_dropped = 0
    while (src_offset < ELOG_SHRINK_SIZE) {
            src_offset += event_size(src_hdr.elog_data + src_offset)
            nevents_dropped++
    }
    log_size -= src_offset

    erase_flash(dst_hdr)
    copy_flash(dst_hdr.elog_data, src_hdr.elog_data + src_offset, log_size)
    copy_header(dst_hdr, src_hdr) /* not the sequence # */
    log_event(dst_hdr, partial clear)
    write_sequence_number(dst_hdr, src_hdr.elog_sequence + nevents_dropped)
    erase_elog_header(src_hdr)

}

The following constants are examples for a flash part with sector size of 64KB: /* * The SMBIOS spec defines the log area size (including header) as a 16 bit field. */ #define ELOG_TOTAL_SIZE 0xffff #define ELOG_SHRINK_SIZE 0x4000 #define ELOG_FULL_THRESHOLD 0xf000```