Skip to content

Releases: open-power/skiboot

v7.1

18 Sep 14:16
v7.1
Compare
Choose a tag to compare

v6.2

14 Dec 05:40
v6.2
Compare
Choose a tag to compare

skiboot-6.2

skiboot v6.2 was released on Friday December 14th 2018. It is the first
release of skiboot 6.2, which becomes the new stable release of skiboot
following the 6.1 release, first released July 11th 2018.

Skiboot 6.2 will mark the basis for op-build v2.2.

skiboot v6.2 contains all bug fixes as of [skiboot-6.0.14]{role="ref"},
and [skiboot-5.4.10]{role="ref"} (the currently maintained stable
releases).

For how the skiboot stable releases work, see [stable-rules]{role="ref"}
for details.

This release has been a longer cycle than typical for a variety of
reasons. It also contains a lot of cleanup work and minor bug fixes
(much like skiboot 6.1 did).

Over skiboot 6.1, we have the following changes:

General

Since v6.2-rc2:

  • i2c: Fix i2c request hang during opal init if timers are not checked

    If an i2c request cannot go through the first time, because the bus
    is found in error and need a reset or it's locked by the OCC for
    example, the underlying i2c implementation is using timers to manage
    the request. However during opal init, opal pollers may not be
    called, it depends in the context in which the i2c request is made.
    If the pollers are not called, the timers are not checked and we can
    end up with an i2c request which will not move foward and skiboot
    hangs.

    Fix it by explicitly checking the timers if we are waiting for an
    i2c request to complete and it seems to be taking a while.

Since v6.1:

  • cpu: Quieten OS endian switch messages

    Users see these when loading an OS from Petitboot: :

    [  119.486794100,5] OPAL: Switch to big-endian OS
    [  120.022302604,5] OPAL: Switch to little-endian OS
    

    Which is expected and doesn't provide any information the user can
    act on. Switch them to PR_INFO so they still appear in the log, but
    not on the serial console.

  • Recognise signed VERSION partition

    A few things need to change to support a signed VERSION partition:

    • A signed VERSION partition will be 4K +
      SECURE_BOOT_HEADERS_SIZE (4K).
    • The VERSION partition needs to be loaded after secure/trusted
      boot is set up, and therefore after nvram_init().
    • Added to the trustedboot resources array.

    This also moves the ipmi_dt_add_bmc_info() call to after
    flash_dt_add_fw_version() since it adds info to
    ibm,firmware-versions.

  • Run pollers in time_wait() when not booting

    This only bit us hard with hiomap in one scenario.

    Our OPAL API has been OPAL_POLL_EVENTS may be needed to make
    forward progress on ongoing operations, and the internal to skiboot
    API has been that time_wait() of a suitable time will run pollers
    (on at least one CPU) to help ensure forward progress can be made.

    In a perfect world, interrupts are used but they may: a) be
    disabled, or
    b) the thing we're doing can't use interrupts because computers are
    generally terrible.

    Back in 3db397e (circa 2015), we changed skiboot so that we'd
    run pollers only on the boot CPU, and not if we held any locks. This
    was to reduce the chance of programming code that could deadlock, as
    well as to ensure that we didn't just thrash all the cachelines for
    running pollers all over a large system during boot, or hard spin on
    the same locks on all secondary CPUs.

    The problem arises if the OS we're booting makes an OPAL call early
    on, with interrupts disabled, that requires a poller to run to make
    forward progress. An example of this would be OPAL_WRITE_NVRAM
    early in Linux boot (where Linux sets up the partitions it wants) -
    something that occurs iff we've had to reformat NVRAM this boot
    (i.e. first boot or corrupted NVRAM).

    The hiomap implementation should arguably not rely on synchronous
    IPMI messages, but this is a future improvement (as was for mbox
    before it). The mbox-flash code solved this problem by spinning on
    check_timers().

    More generically though, the approach of running the pollers when no
    longer booting means we behave more in line with what the API is
    meant to be, rather than have this odd case of "time_wait() for a
    condition that could also be tripped by an interrupt works fine
    unless the OS is up and running but hasn't set interrupts up yet".

  • ipmi: Reduce ipmi_queue_msg_sync() polling loop time to 10ms

    On a plain boot, this reduces the time spent in OPAL by ~170ms on
    p9dsu. This is due to hiomap (currently) using synchronous IPMI
    messages.

    It will also significantly reduce latency on runtime flash
    operations for hiomap, as we'll spend typically 10-20ms in OPAL
    rather than 100-200ms. It's not an ideal solution to that, but
    it's a quick and obvious win for jitter.

  • core/device: NULL pointer dereference fix

  • core/flash: NULL pointer dereference fixes

  • core/cpu: Call memset with proper cpu_thread offset

  • libflash: Add ipmi-hiomap, and prefer it for PNOR access

    ipmi-hiomap implements the PNOR access control protocol formerly
    known as "the mbox protocol" but uses IPMI instead of the AST LPC
    mailbox as a transport. As there is no-longer any mailbox involved
    in this alternate implementation the old protocol name is quite
    misleading, and so it has been renamed to "the hiomap protoocol"
    (Host I/O Mapping protocol). The same commands and events are used
    though this client-side implementation assumes v2 of the protocol is
    supported by the BMC.

    The code is a heavily-reworked copy of the mbox-flash source and is
    introduced this way to allow for the mbox implementation's eventual
    removal.

    mbox-flash should in theory be renamed to mbox-hiomap for
    consistency, but as it is on life-support effective immediately we
    may as well just remove it entirely when the time is right.

  • opal/hmi: Handle early HMIs on thread0 when secondaries are still in
    OPAL.

    When primary thread receives a CORE level HMI for timer facility
    errors while secondaries are still in OPAL, thread 0 ends up in
    rendez-vous waiting for secondaries to get into hmi handling. This
    is because OPAL runs with MSR(EE=0) and hence HMIs are delayed on
    secondary threads until they are given to Linux OS. Fix this by
    adding a check for secondary state and force them in hmi handling by
    queuing job on secondary threads.

    I have tested this by injecting HDEC parity error very early during
    Linux kernel boot. Recovery works fine for non-TB errors. But if TB
    is bad at this very eary stage we already doomed.

    Without this patch we see: :

    [  285.046347408,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c
    [  285.051160609,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c
    [  285.055359021,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
    [  285.055361439,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e14000) Timer Facility Error
    [  286.232183823,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc1)
    [  287.409002056,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc1)
    [  289.073820164,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc1)
    [  290.250638683,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc2)
    [  291.427456821,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc2)
    [  293.092274807,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc2)
    [  294.269092904,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc3)
    [  295.445910944,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc3)
    [  297.110728970,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc3)
    

    After this patch: :

    [  259.401719351,7] OPAL: Start CPU 0x0841 (PIR 0x0841) -> 0x000000000000a83c
    [  259.406259572,7] OPAL: Start CPU 0x0842 (PIR 0x0842) -> 0x000000000000a83c
    [  259.410615534,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c
    [  259.415444519,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c
    [  259.419641401,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
    [  259.419644124,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e04000) Timer Facility Error
    [  259.419650678,7] HMI: Sending hmi job to thread 1
    [  259.419652744,7] HMI: Sending hmi job to thread 2
    [  259.419653051,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
    [  259.419654725,7] HMI: Sending hmi job to thread 3
    [  259.419654916,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
    [  259.419658025,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
    [  259.419658406,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:2: TFMR(2e12002870e04000) Timer Facility Error
    [  259.419663095,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:3: TFMR(2e12002870e04000) Timer Facility Error
    [  259.419655234,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:1: TFMR(2e12002870e04000) Timer Facility Error
    [  259.425109779,7] OPAL: Start CPU 0x0845 (PIR 0x0845) -> 0x000000000000a83c
    [  259.429870681,7] OPAL: Start CPU 0x0846 (PIR 0x0846) -> 0x000000000000a83c
    [  259.434549250,7] OPAL: Start CPU 0x0847 (PIR 0x0847) -> 0x000000000000a83c
    
  • core/cpu: Fix memory allocation for job array

    fixes: 7a3f307 cor...

Read more

v6.0.3

23 May 22:23
v6.0.3
Compare
Choose a tag to compare

skiboot-6.0.3

skiboot 6.0.3 was released on Wednesday May 23rd, 2018. It replaces
:ref:skiboot-6.0.2 as the current stable release in the 6.0.x series.

It is recommended that 6.0.3 be used instead of any previous 6.0.x version.

Over :ref:skiboot-6.0.3, we have bug fixes related to i2c booting in
secure mode, and general functionality with a TPM present. These changes are:

  • p8-i2c: Remove force reset

    Force reset was added as an attempt to work around some issues with TPM
    devices locking up their I2C bus. In that particular case the problem
    was that the device would hold the SCL line down permanently due to a
    device firmware bug. The force reset doesn't actually do anything to
    alleviate the situation here, it just happens to reset the internal
    master state enough to make the I2C driver appear to work until
    something tries to access the bus again.

    On P9 systems with secure boot enabled there is the added problem
    of the "diagostic mode" not being supported on I2C masters A,B,C and
    D. Diagnostic mode allows the SCL and SDA lines to be driven directly
    by software. Without this force reset is impossible to implement.

    This patch removes the force reset functionality entirely since:

    a) it doesn't do what it's supposed to, and
    b) it's butt ugly code

    Additionally, turn p8_i2c_reset_engine() into p8_i2c_reset_port().
    There's no need to reset every port on a master in response to an
    error that occurred on a specific port.

  • libstb/i2c-driver: Bump max timeout

    We have observed some TPMs clock streching the I2C bus for signifigant
    amounts of time when processing commands. The same TPMs also have
    errata that can result in permernantly locking up a bus in response to
    an I2C transaction they don't understand. Using an excessively long
    timeout to prevent this in the field.

  • Add TPM timeout workaround

    Set the default timeout for any bus containing a TPM to one second. This
    is needed to work around a bug in the firmware of certain TPMs that will
    clock strech the I2C port the for up to a second. Additionally, when the
    TPM is clock streching it responds to a STOP condition on the bus by
    bricking itself. Clearing this error requires a hard power cycle of the
    system since the TPM is powered by standby power.

v6.0.2

23 May 22:23
v6.0.2
Compare
Choose a tag to compare

skiboot-6.0.2

skiboot 6.0.2 was released on Friday May 18th, 2018. It replaces
:ref:skiboot-6.0.1 as the current stable release in the 6.0.x series.

It is recommended that 6.0.2 be used instead of any previous 6.0.x version.

Over :ref:skiboot-6.0.1, we one bug fix:

  • cpu: Clear PCR SPR in opal_reinit_cpus()

    Currently if Linux boots with a non-zero PCR, things can go bad where
    some early userspace programs can take illegal instructions. This is
    being fixed in Linux, but in the mean time, we should cleanup in
    skiboot also.

    This could exhibit itself as petitboot getting killed with SIGILL and
    no boot devices showing up, but only in a situation where you've done
    a kdump from a kernel running a p8 compat guest

v6.0.1

23 May 22:22
v6.0.1
Compare
Choose a tag to compare

skiboot-6.0.1

skiboot 6.0.1 was released on Wednesday May 16th, 2018. It replaces
:ref:skiboot-6.0 as the current stable release in the 6.0.x series.

It is recommended that 6.0.1 be used instead of any previous 6.0.x version
due to the bug fixes and debugging enhancements in it.

Over :ref:skiboot-6.0, we have two bug fixes:

  • OpenBMC: use 0x3a as OEM command for partial add esel.

    This fixes the bug where skiboot would never send an eSEL to the BMC.

  • Add location code to NPU2 HMI logging

    The current HMI error message does not specifiy where the HMI
    error occured.

    The original error message was ::

    NPU: FIR#0 FIR 0x0080100000000000 mask 0x009a48180f01ffff

    The enhanced error message is ::

    NPU2: [Loc: UOPWR.0000000-Node0-Proc0] P:0 FIR#0 FIR 0x0000100000000000 mask 0x009a48180f03ffff

v6.0

23 May 22:22
v6.0
Compare
Choose a tag to compare

skiboot-6.0

skiboot v6.0 was released on Friday May 11th 2018. It is the first
release of skiboot 6.0, which is the new stable release of skiboot
following the 5.11 release, first released April 6th 2018.

Skiboot 6.0 is the basis for op-build v2.0 and will is required for
POWER9 systems.

skiboot v6.0 contains all bug fixes as of :ref:skiboot-5.11,
:ref:skiboot-5.10.5, and :ref:skiboot-5.4.9 (the currently maintained
stable releases). We do not expect any further stable releases in the
5.10.x series, nor in the 5.11.x series.

For how the skiboot stable releases work, see :ref:stable-rules for details.

Over skiboot-5.11, we have the following changes:

New Features

Since 6.0-rc1:

  • Update default stop-state-disable mask to cut only stop11

    Stability improvements in microcode for stop4/stop5 are
    available in upstream hcode images. Stop4 and stop5 can
    be safely enabled by default.

    Use ~0xE0000000 to cut all but stop0,1,2 in case there
    are any issues with stop4/5.

    example: ::

    nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF

    Note: that DD2.1 chips that have a frequency <1867Mhz possible need to
    run a hcode image different than the default in op-build (set
    BR2_HCODE_LATEST_VERSION=y in your config)

  • ibm,firmware-versions: add hcode to device tree

    op-build commit 736a08b996e292a449c4996edb264011dfe56a40
    added hcode to the VERSION partition, let's parse it out
    and let the user know.

  • ipmi: Add BMC firmware version to device tree

    BMC Get device ID command gives BMC firmware version details. Lets add this
    to device tree. User space tools will use this information to display BMC
    version details.

Since 5.11:

  • Disable stop states from OPAL

    On ZZ, stop4,5,11 are enabled for PowerVM, even though doing
    so may cause problems with OPAL due to bugs in hcode.

    For other platforms, this isn't so much of an issue as
    we can just control stop states by the MRW. However the
    rebuild-the-world approach to changing values there is a bit
    annoying if you just want to rule out a specific stop state
    from being problematic.

    Provide an nvram option to override what's disabled in OPAL.

    The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)

    You can set an NVRAM override with: ::

    nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF
    

    This nvram override will disable all stop states.

  • interrupts: Create an "interrupts" property in the OPAL node

    Deprecate the old "opal-interrupts", it's still there, but the new
    property follows the standard and allow us to specify whether an
    interrupt is level or edge sensitive.

    Similarly create "interrupt-names" whose content is identical to
    "opal-interrupts-names".

  • SBE: Add timer support on POWER9

    SBE on P9 provides one shot programmable timer facility. We can use this
    to implement OPAL timers and hence limit the reliance on the Linux
    heartbeat (similar to HW timer facility provided by SLW on P8).

  • Add SBE driver support

    SBE (Self Boot Engine) on P9 has two different jobs:

    • Boot the chip up to the point the core is functional
    • Provide various services like timer, scom, stash MPIPL, etc., at runtime

    We will use SBE for various purposes like timer, MPIPL, etc.

  • opal:hmi: Add missing processor recovery reason string.

    With this patch now we see reason string printed for CORE_WOF[43] bit. ::

    [ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
    [ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
    [ 477.352242181,7] HMI: PC - Thread hang recovery

  • Add DIMM actual speed to device tree

    Recent HDAT provides DIMM actuall speed. Lets add this to device tree.

  • Fix DIMM size property

    Today we parse vpd blob to get DIMM size information. This is limited
    to FSP based system. HDAT provides DIMM size value. Lets use that to
    populate device tree. So that we can get size information on BMC based
    system as well.

  • PCI: Set slot power limit when supported

    The PCIe slot capability can be implemented in a root or switch
    downstream port to set the maximum power a card is allowed to draw
    from the system. This patch adds support for setting the power limit
    when the platform has defined one.

  • hdata/spira: parse vpd to add part-number and serial-number to xscom@ node

    Expected by FWTS and associates our processor with the part/serial
    number, which is obviously a good thing for one's own sanity.

Improved HMI Handling
^^^^^^^^^^^^^^^^^^^^^

  • opal/hmi: Add documentation for opal_handle_hmi2 call

  • opal/hmi: Generate hmi event for recovered HDEC parity error.

  • opal/hmi: check thread 0 tfmr to validate latched tfmr errors.

    Due to P9 errata, HDEC parity and TB residue errors are latched for
    non-zero threads 1-3 even if they are cleared. But these are not
    latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr
    value and ignore them on non-zero threads if they are not present on
    thread 0.

  • opal/hmi: Print additional debug information in rendezvous.

  • opal/hmi: Fix handling of TFMR parity/corrupt error.

    While testing TFMR parity/corrupt error it has been observed that HMIs are
    delivered twice for this error

    • First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
    • Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.

    On second HMI we end up throwing "HMI: TB invalid without core error
    reported" even though TB is in a valid state.

  • opal/hmi: Stop flooding HMI event for TOD errors.

    Fix the issue where every thread on the chip sends HMI event to host for
    TOD errors. TOD errors are reported to all the core/threads on the chip.
    Any one thread can fix the error and send event. Rest of the threads don't
    need to send HMI event unnecessarily.

  • opal/hmi: Fix soft lockups during TOD errors

    There are some TOD errors which do not affect working of TOD and TB. They
    stay in valid state. Hence we don't need rendez vous for TOD errors that
    does not affect TB working.

    TOD errors that affects TOD/TB will report a global error on TFMR[44]
    alongwith bit 51, and they will go in rendez vous path as expected.

    But the TOD errors that does not affect TB register sets only TFMR bit 51.
    The TFMR bit 51 is cleared when any single thread clears the TOD error.
    Once cleared, the bit 51 is reflected to all the cores on that chip. Any
    thread that reads the TFMR register after the error is cleared will see
    TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through
    rendez-vous path and threads that see TFMR[51]=0, returns doing
    nothing. This ends up in a soft lockups in host kernel.

    This patch fixes this issue by not considering TOD interrupt (TFMR[51])
    as a core-global error and hence avoiding rendez-vous path completely.
    Instead threads that see TFMR[51]=1 will now take different path that
    just do the TOD error recovery.

  • opal/hmi: Do not send HMI event if no errors are found.

    For TOD errors, all the cores in the chip get HMIs. Any one thread from any
    core can fix the issue and TFMR will have error conditions cleared. Rest of
    the threads need take any action if TOD errors are already cleared. Hence
    thread 0 of every core should get a fresh copy of TFMR before going ahead
    recovery path. Initialize recover = -1, so that if no errors found that
    thread need not send a HMI event to linux. This helps in stop flooding host
    with hmi event by every thread even there are no errors found.

  • opal/hmi: Initialize the hmi event with old value of HMER.

    Do this before we check for TFAC errors. Otherwise the event at host console
    shows no error reported in HMER register.

    Without this patch the console event show HMER with all zeros ::

    [ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
    [ 216.753498] Error detail: Timer facility experienced an error
    [ 216.753509] HMER: 0000000000000000
    [ 216.753518] TFMR: 3c12000870e04000

    After this patch it shows old HMER values on host console: ::

    [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
    [ 2237.652651] Error detail: Timer facility experienced an error
    [ 2237.652766] HMER: 0840000000000000
    [ 2237.652837] TFMR: 3c12000870e04000

  • opal/hmi: Rework HMI handling of TFAC errors

    This patch reworks the HMI handling for TFAC errors by introducing
    4 rendez-vous points improve the thread synchronization while handling
    timebase errors that requires all thread to clear dirty data from TB/HDEC
    register before clearing the errors.

  • opal/hmi: Don't bother passing HMER to pre-recovery cleanup

    The test for TFAC error is now redundant so we remove it and
    remove the HMER argument.

  • opal/hmi: Move timer related error handling to a separate function

    Currently no functional change. This is a first step to completely
    rewriting how these things are handled.

  • opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux

    It returns a 64-bit flags mask currently set to provide info
    about which timer facilities were lost, and whether an event
    was generated.

  • opal/hmi: Remove races in clearing HMER

    Writing to HMER acts as an "AND". The current code writes back the
    value we originally read with the bits we handled cleared. This is
    racy, if a new bit gets set in HW after the original read, we'll end
    up clearing it without handling it.

    Instead, use an all 1's mask with only the bit handled cleared.

  • opal/hmi: Don't re-read HMER multiple times

    We want to make sure all reporting and actions are based
    upon the same snapshot of HMER...

Read more

v6.0-rc2

23 May 22:21
v6.0-rc2
Compare
Choose a tag to compare
v6.0-rc2 Pre-release
Pre-release

skiboot-6.0-rc2

skiboot v6.0-rc2 was released on Wednesday May 9th 2018. It is the second
release candidate of skiboot 6.0, which will become the new stable release
of skiboot following the 5.11 release, first released April 6th 2018.

Skiboot 6.0 will mark the basis for op-build v2.0 and will be required for
POWER9 systems.

skiboot v6.0-rc2 contains all bug fixes as of :ref:skiboot-5.11,
:ref:skiboot-5.10.5, and :ref:skiboot-5.4.9 (the currently maintained
stable releases). Once 6.0 is released, we do not expect any further
stable releases in the 5.10.x series, nor in the 5.11.x series.

For how the skiboot stable releases work, see :ref:stable-rules for details.

The current plan is to cut the final 6.0 in early May (maybe in a day or two
after this -rc if things look okay), with skiboot 6.0
being for all POWER8 and POWER9 platforms in op-build v2.0.

Over skiboot-6.0-rc1, we have the following changes:

  • Update default stop-state-disable mask to cut only stop11

    Stability improvements in microcode for stop4/stop5 are
    available in upstream hcode images. Stop4 and stop5 can
    be safely enabled by default.

    Use ~0xE0000000 to cut all but stop0,1,2 in case there
    are any issues with stop4/5.

    example: ::

    nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF

    Note: that DD2.1 chips that have a frequency <1867Mhz possible need to
    run a hcode image different than the default in op-build (set
    BR2_HCODE_LATEST_VERSION=y in your config)

  • ibm,firmware-versions: add hcode to device tree

    op-build commit 736a08b996e292a449c4996edb264011dfe56a40
    added hcode to the VERSION partition, let's parse it out
    and let the user know.

  • ipmi: Add BMC firmware version to device tree

    BMC Get device ID command gives BMC firmware version details. Lets add this
    to device tree. User space tools will use this information to display BMC
    version details.

  • mambo: Enable XER CA32 and OV32 bits on P9

    POWER9 adds 32 bit carry and overflow bits to the XER, but we need to
    set the relevant CTRL1 bit to enable them.

  • Makefile: Fix building natively on ppc64le

    When on ppc64le and CROSS is not set by the environment, make assumes
    ppc64 and sets a default CROSS. Check for ppc64le as well, so that
    'make' works out of the box on ppc64le.

  • p9dsu: timeout for variant detection, default to 2uess

  • core/direct-controls: improve p9_stop_thread error handling

    p9_stop_thread should fail the operation if it finds the thread was
    already quiescd. This implies something else is doing direct controls
    on the thread (e.g., pdbg) or there is some exceptional condition we
    don't know how to deal with. Proceeding here would cause things to
    trample on each other, for example the hard lockup watchdog trying to
    send a sreset to the core while it is stopped for debugging with pdbg
    will end in tears.

    If p9_stop_thread times out waiting for the thread to quiesce, do
    not hit it with a core_start direct control, because we don't know
    what state things are in and doing more things at this point is worse
    than doing nothing. There is no good recipe described in the workbook
    to de-assert the core_stop control if it fails to quiesce the thread.
    After timing out here, the thread may eventually quiesce and get
    stuck, but that's simpler to debug than undefied behaviour.

  • core/direct-controls: fix p9_cont_thread for stopped/inactive threads

    Firstly, p9_cont_thread should check that the thread actually was
    quiesced before it tries to resume it. Anything could happen if we
    try this from an arbitrary thread state.

    Then when resuming a quiesced thread that is inactive or stopped (in
    a stop idle state), we must not send a core_start direct control,
    clear_maint must be used in these cases.

  • occ: Use major version number while checking the pstate table format

    The minor version increments of the pstate table are backward
    compatible. The minor version is changed when the pstate table
    remains same and the existing reserved bytes are used for pointing
    new data. So use only major version number while parsing the pstate
    table. This will allow old skiboot to parse the pstate table and
    handle minor version updates.

  • hmi: Clear unknown debug trigger

    On some systems, seeing hangs like this when Linux starts: ::

    [ 170.027252763,5] OCC: All Chip Rdy after 0 ms
    [ 170.062930145,5] INIT: Starting kernel at 0x20011000, fdt at 0x30ae0530 366247 bytes)
    [ 171.238270428,5] OPAL: Switch to little-endian OS
    

    If you look at the in memory skiboot console (or do nvram -p ibm,skiboot --update-config log-level-driver=7) we see the console get
    spammed with: ::

    [ 5209.109790675,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
    [ 5209.109792716,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
    [ 5209.109794695,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
    [ 5209.109796689,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
    

    We're taking the debug trigger (bit 17) early on, before the
    hmi_debug_trigger function in the kernel is set up.

    This clears the HMI in Skiboot and reports to the kernel instead of
    bringing down the machine.

  • core/hmi: assign flags=0 in case nothing set by handle_hmi_exception

    Theoretically we could have returned junk to the OS in this parameter.

  • SLW: Fix mambo boot to use stop states

    After commit 35c66b8 ("SLW: Move MAMBO simulator checks to
    slw_init"), mambo boot no longer calls add_cpu_idle_state_properties()
    and as such we never enable stop states.

    After adding the call back, we get more testing coverage as well
    as faster mambo SMT boots.

  • phb4: Hardware init updates

    CFG Write Request Timeout was incorrectly set to informational and not
    fatal for both non-CAPI and CAPI, so set it to fatal. This was a
    mistake in the specification. Correcting this fixes a niche bug in
    escalation (which is necessary on pre-DD2.2) that can cause a checkstop
    due to a NCU timeout.

    In addition, set the values in the timeout control registers to match.
    This fixes an extremely rare and unreproducible bug, though the current
    timings don't make sense since they're higher than the NCU timeout (16)
    which will checkstop the machine anyway.

  • SLW: quieten 'Configuring self-restore' for DARN,NCU_SPEC_BAR and HRMOR

  • Experimental support for building with Clang

  • Improvements to testing and Travis CI

v6.0-rc1

01 May 06:06
v6.0-rc1
Compare
Choose a tag to compare
v6.0-rc1 Pre-release
Pre-release

skiboot-6.0-rc1


skiboot v6.0-rc1 was released on Tuesday May 1st 2018. It is the first
release candidate of skiboot 6.0, which will become the new stable
release of skiboot following the 5.11 release, first released April
6th 2018.

Skiboot 6.0 will mark the basis for op-build v2.0 and will be required
for POWER9 systems.

skiboot v6.0-rc1 contains all bug fixes as of skiboot-5.11,
skiboot-5.10.5, and skiboot-5.4.9 (the currently maintained stable
releases). Once 6.0 is released, we do not expect any further stable
releases in the 5.10.x series, nor in the 5.11.x series.

For how the skiboot stable releases work, see Skiboot stable tree
rules and releases for details.

The current plan is to cut the final 6.0 in early May, with skiboot
6.0 being for all POWER8 and POWER9 platforms in op-build v2.0.

Over skiboot-5.11, we have the following changes:

New Features

  • Disable stop states from OPAL

    On ZZ, stop4,5,11 are enabled for PowerVM, even though doing so may
    cause problems with OPAL due to bugs in hcode.

    For other platforms, this isn’t so much of an issue as we can just
    control stop states by the MRW. However the rebuild-the-world
    approach to changing values there is a bit annoying if you just want
    to rule out a specific stop state from being problematic.

    Provide an nvram option to override what’s disabled in OPAL.

    The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)

    You can set an NVRAM override with:

    nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF

    This nvram override will disable all stop states.

  • interrupts: Create an “interrupts” property in the OPAL node

    Deprecate the old “opal-interrupts”, it’s still there, but the new
    property follows the standard and allow us to specify whether an
    interrupt is level or edge sensitive.

    Similarly create “interrupt-names” whose content is identical to
    “opal-interrupts-names”.

  • SBE: Add timer support on POWER9

    SBE on P9 provides one shot programmable timer facility. We can use
    this to implement OPAL timers and hence limit the reliance on the
    Linux heartbeat (similar to HW timer facility provided by SLW on
    P8).

  • Add SBE driver support

    SBE (Self Boot Engine) on P9 has two different jobs: - Boot the chip
    up to the point the core is functional - Provide various services
    like timer, scom, stash MPIPL, etc., at runtime

    We will use SBE for various purposes like timer, MPIPL, etc.

  • opal:hmi: Add missing processor recovery reason string.

    With this patch now we see reason string printed for CORE_WOF[43]
    bit.

    [ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
    [ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
    [ 477.352242181,7] HMI: PC - Thread hang recovery

  • Add DIMM actual speed to device tree

    Recent HDAT provides DIMM actuall speed. Lets add this to device
    tree.

  • Fix DIMM size property

    Today we parse vpd blob to get DIMM size information. This is
    limited to FSP based system. HDAT provides DIMM size value. Lets use
    that to populate device tree. So that we can get size information on
    BMC based system as well.

  • PCI: Set slot power limit when supported

    The PCIe slot capability can be implemented in a root or switch
    downstream port to set the maximum power a card is allowed to draw
    from the system. This patch adds support for setting the power limit
    when the platform has defined one.

  • hdata/spira: parse vpd to add part-number and serial-number to
    xscom@ node

    Expected by FWTS and associates our processor with the part/serial
    number, which is obviously a good thing for one’s own sanity.

Improved HMI Handling

  • opal/hmi: Add documentation for opal_handle_hmi2 call

  • opal/hmi: Generate hmi event for recovered HDEC parity error.

  • opal/hmi: check thread 0 tfmr to validate latched tfmr errors.

    Due to P9 errata, HDEC parity and TB residue errors are latched for
    non-zero threads 1-3 even if they are cleared. But these are not
    latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0
    tfmr value and ignore them on non-zero threads if they are not
    present on thread 0.

  • opal/hmi: Print additional debug information in rendezvous.

  • opal/hmi: Fix handling of TFMR parity/corrupt error.

    While testing TFMR parity/corrupt error it has been observed that
    HMIs are delivered twice for this error

    • First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.

    • Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0
      with valid TB.

    On second HMI we end up throwing “HMI: TB invalid without core error
    reported” even though TB is in a valid state.

  • opal/hmi: Stop flooding HMI event for TOD errors.

    Fix the issue where every thread on the chip sends HMI event to host
    for TOD errors. TOD errors are reported to all the core/threads on
    the chip. Any one thread can fix the error and send event. Rest of
    the threads don’t need to send HMI event unnecessarily.

  • opal/hmi: Fix soft lockups during TOD errors

    There are some TOD errors which do not affect working of TOD and TB.
    They stay in valid state. Hence we don’t need rendez vous for TOD
    errors that does not affect TB working.

    TOD errors that affects TOD/TB will report a global error on
    TFMR[44] alongwith bit 51, and they will go in rendez vous path as
    expected.

    But the TOD errors that does not affect TB register sets only TFMR
    bit 51. The TFMR bit 51 is cleared when any single thread clears the
    TOD error. Once cleared, the bit 51 is reflected to all the cores on
    that chip. Any thread that reads the TFMR register after the error
    is cleared will see TFMR bit 51 reset. Hence the threads that see
    TFMR[51]=1, falls through rendez-vous path and threads that see
    TFMR[51]=0, returns doing nothing. This ends up in a soft lockups in
    host kernel.

    This patch fixes this issue by not considering TOD interrupt
    (TFMR[51]) as a core-global error and hence avoiding rendez-vous
    path completely. Instead threads that see TFMR[51]=1 will now take
    different path that just do the TOD error recovery.

  • opal/hmi: Do not send HMI event if no errors are found.

    For TOD errors, all the cores in the chip get HMIs. Any one thread
    from any core can fix the issue and TFMR will have error conditions
    cleared. Rest of the threads need take any action if TOD errors are
    already cleared. Hence thread 0 of every core should get a fresh
    copy of TFMR before going ahead recovery path. Initialize recover =
    -1, so that if no errors found that thread need not send a HMI event
    to linux. This helps in stop flooding host with hmi event by every
    thread even there are no errors found.

  • opal/hmi: Initialize the hmi event with old value of HMER.

    Do this before we check for TFAC errors. Otherwise the event at host
    console shows no error reported in HMER register.

    Without this patch the console event show HMER with all zeros

    [ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
    [ 216.753498] Error detail: Timer facility experienced an error
    [ 216.753509] HMER: 0000000000000000
    [ 216.753518] TFMR: 3c12000870e04000

    After this patch it shows old HMER values on host console:

    [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
    [ 2237.652651] Error detail: Timer facility experienced an error
    [ 2237.652766] HMER: 0840000000000000
    [ 2237.652837] TFMR: 3c12000870e04000

  • opal/hmi: Rework HMI handling of TFAC errors

    This patch reworks the HMI handling for TFAC errors by introducing 4
    rendez-vous points improve the thread synchronization while handling
    timebase errors that requires all thread to clear dirty data from
    TB/HDEC register before clearing the errors.

  • opal/hmi: Don’t bother passing HMER to pre-recovery cleanup

    The test for TFAC error is now redundant so we remove it and remove
    the HMER argument.

  • opal/hmi: Move timer related error handling to a separate function

    Currently no functional change. This is a first step to completely
    rewriting how these things are handled.

  • opal/hmi: Add a new opal_handle_hmi2 that returns direct info to
    Linux

    It returns a 64-bit flags mask currently set to provide info about
    which timer facilities were lost, and whether an event was
    generated.

  • opal/hmi: Remove races in clearing HMER

    Writing to HMER acts as an “AND”. The current code writes back the
    value we originally read with the bits we handled cleared. This is
    racy, if a new bit gets set in HW after the original read, we’ll end
    up clearing it without handling it.

    Instead, use an all 1’s mask with only the bit handled cleared.

  • opal/hmi: Don’t re-read HMER multiple times

    We want to make sure all reporting and actions are based upon the
    same snapshot of HMER in case bits get added by HW while we are in
    OPAL.

libflash and ffspart

Many improvements to the ffspart utility and libflash have come in
this release, making ffspart suitable for building bit-identical
PNOR images as the existing tooling used by op-build. The plan is to
switch op-build to use this infrastructure in the not too distant
future.

  • libflash/blocklevel: Make read/write be ECC agnostic for callers

    The blocklevel abstraction allows for regions of the backing store
    to be marked as ECC protected so that blocklevel can decode/encode
    the ECC bytes into the buffer automatically without the caller
    having to be ECC aware.

    Unfortunately this abstraction is far from perfect, this is only
    useful if reads and w...

Read more

v5.10.5

01 May 06:08
v5.10.5
Compare
Choose a tag to compare

skiboot-5.10.5


skiboot 5.10.5 was released on Tuesday April 24th, 2018. It replaces
skiboot-5.10.4 as the current stable release in the 5.10.x series.

It is recommended that 5.10.5 be used instead of any previous 5.10.x
version due to the bug fixes and debugging enhancements in it.

Over skiboot-5.10.4, we have four bug fixes:

  • npu2/hw-procedures: fence bricks on GPU reset

    The NPU workbook defines a way of fencing a brick and getting the
    brick out of fence state. We do have an implementation of bringing
    the brick out of fenced/quiesced state. We do the latter in our
    procedures, but to support run time reset we need to do the former.

    The fencing ensures that access to memory behind the links will not
    lead to HMI’s, but instead SUE’s will be populated in cache (in the
    case of speculation). The expectation is then that prior to and
    after reset, the operating system components will flush the cache
    for the region of memory behind the GPU.

    This patch does the following:

    1. Implements a npu2_dev_fence_brick() function to set/clear
      fence state

    2. Clear FIR bits prior to clearing the fence status

    3. Clear’s the fence status

    4. We take the powerbus out of CQ fence much later now, in
      credits_check() which is the last hardware procedure called
      after link training.

  • hdata/spira: parse vpd to add part-number and serial-number to
    xscom@ node

    Expected by FWTS and associates our processor with the part/serial
    number, which is obviously a good thing for one’s own sanity.

  • hw/imc: Check for pause_microcode_at_boot() return status

    pause_microcode_at_boot() loops through all the chip’s ucode control
    block and pause the ucode if it is in the running state. But it does
    not fail if any of the chip’s ucode is not initialised.

    Add code to return a failure if ucode is not initialized in any of
    the chip. Since pause_microcode_at_boot() is called just before
    attaching the IMC device nodes in imc_init(), add code to check for
    the function return.

  • core/cpufeatures: Fix setting DARN and SCV HWCAP feature bits

    DARN and SCV has been assigned AT_HWCAP2 (32-63) bits:

    #define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn /
    #define PPC_FEATURE2_SCV 0x00100000 /
    scv syscall */

    A cpufeatures-aware OS will not advertise these to userspace without
    this patch.

v5.11

01 May 06:10
v5.11
Compare
Choose a tag to compare

skiboot-5.11


skiboot v5.11 was released on Friday April 6th 2018. It is the first
release of skiboot 5.11, which is now the new stable release of
skiboot following the 5.10 release, first released February 23rd 2018.

It is not expected to keep the 5.11 branch around for long, and
instead quickly move onto a 6.0, which will mark the basis for op-
build v2.0 and will be required for POWER9 systems.

It is expected that skiboot 6.0 will follow very shortly. Consider
5.11 more of a beta release to 6.0 than anything. For POWER9 systems
it should certainly be more solid than previous releases though.

skiboot v5.11 contains all bug fixes as of skiboot-5.10.4 and
skiboot-5.4.9 (the currently maintained stable releases). There may
be more 5.10.x stable releases, it will depend on demand.

For how the skiboot stable releases work, see Skiboot stable tree
rules and releases for details.

Over skiboot-5.10, we have the following changes:

New Platforms

  • Add VESNIN platform support

    The Vesnin platform from YADRO is a 4 socked POWER8 system with up
    to 8TB of memory with 460GB/s of memory bandwidth in only 2U. Many
    kudos to the team from Yadro for submitting their code upstream!

New Features

  • fast-reboot: enable by default for POWER9

    • Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is
      used
  • PCI tunneled operations on PHB4

    • phb4: set PBCQ Tunnel BAR for tunneled operations

      P9 supports PCI tunneled operations (atomics and as_notify) that
      are initiated by devices.

      A subset of the tunneled operations require a response, that must
      be sent back from the host to the device. For example, an atomic
      compare and swap will return the compare status, as swap will only
      performed in case of success. Similarly, as_notify reports if the
      target thread has been woken up or not, because the operation may
      fail.

      To enable tunneled operations, a device driver must tell the host
      where it expects tunneled operation responses, by setting the PBCQ
      Tunnel BAR Response register with a specific value within the
      range of its BARs.

      This register is currently initialized by enable_capi_mode(). But,
      as tunneled operations may also operate in PCI mode, a new API is
      required to set the PBCQ Tunnel BAR Response register, without
      switching to CAPI mode.

      This patch provides two new OPAL calls to get/set the PBCQ Tunnel
      BAR Response register.

      Note: as there is only one PBCQ Tunnel BAR register, shared
      between all the devices connected to the same PHB, only one of
      these devices will be able to use tunneled operations, at any
      time.

    • phb4: set PHB CMPM registers for tunneled operations

      P9 supports PCI tunneled operations (atomics and as_notify) that
      require setting the PHB ASN Compare/Mask register with a 16-bit
      indication.

      This register is currently initialized by enable_capi_mode(). But,
      as tunneled operations may also work in PCI mode, the ASN
      Compare/Mask register should rather be initialized in
      phb4_init_ioda3().

      This patch also adds “ibm,phb-indications” to the device tree, to
      tell Linux the values of CAPI, ASN, and NBW indications, when
      supported.

      Tunneled operations tested by IBM in CAPI mode, by Mellanox
      Technologies in PCI mode.

  • Tie tm-suspend fw-feature and opal_reinit_cpus() together

    Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED)
    always returns OPAL_UNSUPPORTED.

    This ties the tm suspend fw-feature to the
    opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when
    tm suspend is disabled, we correctly report it to the kernel. For
    backwards compatibility, it’s assumed tm suspend is available if the
    fw-feature is not present.

    Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N
    DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and
    below has TM disabled completely (not just suspend).

    We are using opal_reinit_cpus() to determine this setting (rather
    than the device tree/HDAT) as some future firmware may let us change
    this dynamically after boot. That is not the case currently though.

Power Management

  • SLW: Increase stop4-5 residency by 10x

    Using DGEMM benchmark we observed there was a drop of 5-9%
    throughput with and without stop4/5. In this benchmark the GPU waits
    on the cpu to wakeup and provide the subsequent data block to
    compute. The wakup latency accumulates over the run and shows up as
    a performance drop.

    Linux enters stop4/5 more aggressively for its wakeup latency.
    Increasing the residency from 1ms to 10ms makes the performance drop
    <1%

  • occ: Set up OCC messaging even if we fail to setup pstates

    This means that we no longer hit this bug if we fail to get valid
    pstates from the OCC.

    [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
    echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
    [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
    [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
    [ 10.318805] Disabling lock debugging due to kernel taint
    [ 10.318808] Severe Machine check interrupt [Not recovered]
    [ 10.318812] NIP [000000003003e434]: 0x3003e434
    [ 10.318813] Initiator: CPU
    [ 10.318815] Error type: Real address [Load/Store (foreign)]
    [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
    [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
    [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
    [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
    [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
    [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1

mbox based platforms

For platforms using the mbox protocol for host flash access (all BMC
based OpenPOWER systems, most OpenBMC based systems) there have been
some hardening efforts in the event of the BMC being poorly behaved.

  • mbox: Reduce default BMC timeouts

    Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin
    for 70 seconds waiting for a BMC to come back. This also makes the
    current default of 30 seconds a bit pointless, is it far too short
    to be a worse case wait time but too long to avoid hitting
    hardlockup detectors and wrecking havoc inside host linux.

    Just change it to three seconds so that host linux will survive and
    that, reads and writes will fail but at least the host stays up.

    Also refactored the waiting loop just a bit so that it’s easier to
    read.

  • mbox: Harden against BMC daemon errors

    Bugs present in the BMC daemon mean that skiboot gets presented with
    mbox windows of size zero. These windows cannot be valid and skiboot
    already detects these conditions.

    Currently skiboot warns quite strongly about the occurrence of these
    problems. The problem for skiboot is that it doesn’t take any
    action. Initially I wanting to avoid putting policy like this into
    skiboot but since these bugs aren’t going away and skiboot barfing
    is leading to lockups and ultimately the host going down something
    needs to be done.

    I propose that when we detect the problem we fail the mbox call and
    punt the problem back up to Linux. I don’t like it but at least it
    will cause errors to cascade and won’t bring the host down. I’m not
    sure how Linux is supposed to detect this or what it can even do but
    this is better than a crash.

    Diagnosing a failure to boot if skiboot its self fails to read flash
    may be marginally more difficult with this patch. This is because
    skiboot will now only print one warning about the zero sized window
    rather than continuously spitting it out.

Fast Reboot Improvements

Around fast-reboot we have made several improvements to harden the
fast reboot code paths and resort to a full IPL if something doesn’t
look right.

  • core/fast-reboot: zero memory after fast reboot

    This improves the security and predictability of the fast reboot
    environment.

    There can not be a secure fence between fast reboots, because a
    malicious OS can modify the firmware itself. However a well-behaved
    OS can have a reasonable expectation that OS memory regions it has
    modified will be cleared upon fast reboot.

    The memory is zeroed after all other CPUs come up from fast reboot,
    just before the new kernel is loaded and booted into. This allows
    image preloading to run concurrently, and will allow parallelisation
    of the clearing in future.

  • core/fast-reboot: verify mem regions before fast reboot

    Run the mem_region sanity checkers before proceeding with fast
    reboot.

    This is the beginning of proactive sanity checks on opal data for
    fast reboot (with complements the reactive disable_fast_reboot
    cases). This is encouraged to re-use and share any kind of debug
    code and unit test code.

  • fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they
    exist

  • core/fast-reboot: disable fast reboot upon fundamental
    entry/exit/locking errors

    This disables fast reboot in several more cases where serious errors
    like lock corruption or call re-entrancy are detected.

  • capp: Disable fast-reboot whenever enable_capi_mode() is called

    This patch updates phb4_set_capi_mode() to disable fast-reboot
    whenever enable_capi_mode() is called, irrespective to its...

Read more