Skip to content

Releases: open-power/skiboot

v5.10.4

01 May 06:07
v5.10.4
Compare
Choose a tag to compare

skiboot-5.10.4


skiboot 5.10.4 was released on Wednesday April 4th, 2018. It replaces
skiboot-5.10.3 as the current stable release in the 5.10.x series.

It is recommended that 5.10.3 be used instead of any previous 5.10.x
version due to the bug fixes and debugging enhancements in it.

Over skiboot-5.10.3, we have one bug fix:

  • xive: disable store EOI support

    Hardware has limitations which would require to put a sync after
    each store EOI to make sure the MMIO operations that change the ESB
    state are ordered. This is a killer for performance and the PHBs do
    not support the sync. So remove the store EOI for the moment, until
    hardware is improved.

    Also, while we are at changing the XIVE source flags, let’s fix the
    settings for the PHB4s which should follow these rules :

    • SHIFT_BUG for DD10

    • STORE_EOI for DD20 and if enabled

    • TRIGGER_PAGE for DDx0 and if not STORE_EOI

v5.10.3

01 May 06:07
v5.10.3
Compare
Choose a tag to compare

skiboot-5.10.3


skiboot 5.10.3 was released on Thursday March 28th, 2018. It replaces
skiboot-5.10.2 as the current stable release in the 5.10.x series.

It is recommended that 5.10.3 be used instead of any previous 5.10.x
version due to the bug fixes and debugging enhancements in it.

Over skiboot-5.10.2, we have a few improvements and bug fixes:

  • NPU2: dump NPU2 registers on npu2 HMI

    Due to the nature of debugging npu2 issues, folk are wanting the
    full list of NPU2 registers dumped when there’s a problem.

    This is different than the solution introduced in 5.10.1 as there we
    would dump the registers in a way that would trigger a FIR bit that
    would confuse PRD.

  • npu2: Add performance tuning SCOM inits

    Peer-to-peer GPU bandwidth latency testing has produced some tunable
    values that improve performance. Add them to our device
    initialization.

    File these under things that need to be cleaned up with nice
    #defines for the register names and bitfields when we get time.

    A few of the settings are dependent on the system’s particular
    NVLink topology, so introduce a helper to determine how many links
    go to a single GPU.

  • hw/npu2: Assign a unique LPARSHORTID per GPU

    This gets used elsewhere to index items in the XTS tables.

  • occ: Set up OCC messaging even if we fail to setup pstates

    This means that we no longer hit this bug if we fail to get valid
    pstates from the OCC.

    [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
    echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
    [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
    [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
    [ 10.318805] Disabling lock debugging due to kernel taint
    [ 10.318808] Severe Machine check interrupt [Not recovered]
    [ 10.318812] NIP [000000003003e434]: 0x3003e434
    [ 10.318813] Initiator: CPU
    [ 10.318815] Error type: Real address [Load/Store (foreign)]
    [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
    [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
    [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
    [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
    [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
    [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1

  • core/fast-reboot: disable fast reboot upon fundamental
    entry/exit/locking errors

    This disables fast reboot in several more cases where serious errors
    like lock corruption or call re-entrancy are detected.

  • core/opal: allow some re-entrant calls

    This allows a small number of OPAL calls to succeed despite re-
    entering the firmware, and rejects others rather than aborting.

    This allows a system reset interrupt that interrupts OPAL to do
    something useful. Sreset other CPUs, use the console, which allows
    xmon to work or stack traces to be printed, reboot the system.

    Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which
    is used for many other things that does not mean a serious permanent
    error.

  • core/opal: abort in case of re-entrant OPAL call

    The stack is already destroyed by the time we get here, so there is
    not much point continuing.

  • npu2: Disable fast reboot

    Fast reboot does not yet work right with the NPU. It’s been disabled
    on NVLink and OpenCAPI machines. Do the same for NVLink2.

    This amounts to a port of 3e45779 (“npu: Fix broken fast
    reset”) from the npu code to npu2.

v5.11-rc1

01 May 06:09
v5.11-rc1
Compare
Choose a tag to compare
v5.11-rc1 Pre-release
Pre-release

skiboot-5.11-rc1


skiboot v5.11-rc1 was released on Wednesday March 28th 2018. It is the
first release candidate of skiboot 5.11, which will become the new
stable release of skiboot following the 5.10 release, first released
February 23rd 2018.

It is not expected to keep the 5.11 branch around for long, and
instead quickly move onto a 6.0, which will mark the basis for op-
build v2.0 and will be required for POWER9 systems.

skiboot v5.11-rc1 contains all bug fixes as of skiboot-5.10.3 and
skiboot-5.4.9 (the currently maintained stable releases). There may
be more 5.10.x stable releases, it will depend on demand.

For how the skiboot stable releases work, see Skiboot stable tree
rules and releases for details.

The current plan is to cut the final 5.11 in March, with skiboot 5.11
being for all POWER8 and POWER9 platforms in op-build v1.22. This
release is targeted to early POWER9 systems.

Over skiboot-5.10, we have the following changes:

New Platforms

  • Add VESNIN platform support

    The Vesnin platform from YADRO is a 4 socked POWER8 system with up
    to 8TB of memory with 460GB/s of memory bandwidth in only 2U. Many
    kudos to the team from Yadro for submitting their code upstream!

New Features

  • fast-reboot: enable by default for POWER9

    • Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is
      used
  • PCI tunneled operations on PHB4

    • phb4: set PBCQ Tunnel BAR for tunneled operations

      P9 supports PCI tunneled operations (atomics and as_notify) that
      are initiated by devices.

      A subset of the tunneled operations require a response, that must
      be sent back from the host to the device. For example, an atomic
      compare and swap will return the compare status, as swap will only
      performed in case of success. Similarly, as_notify reports if the
      target thread has been woken up or not, because the operation may
      fail.

      To enable tunneled operations, a device driver must tell the host
      where it expects tunneled operation responses, by setting the PBCQ
      Tunnel BAR Response register with a specific value within the
      range of its BARs.

      This register is currently initialized by enable_capi_mode(). But,
      as tunneled operations may also operate in PCI mode, a new API is
      required to set the PBCQ Tunnel BAR Response register, without
      switching to CAPI mode.

      This patch provides two new OPAL calls to get/set the PBCQ Tunnel
      BAR Response register.

      Note: as there is only one PBCQ Tunnel BAR register, shared
      between all the devices connected to the same PHB, only one of
      these devices will be able to use tunneled operations, at any
      time.

    • phb4: set PHB CMPM registers for tunneled operations

      P9 supports PCI tunneled operations (atomics and as_notify) that
      require setting the PHB ASN Compare/Mask register with a 16-bit
      indication.

      This register is currently initialized by enable_capi_mode(). But,
      as tunneled operations may also work in PCI mode, the ASN
      Compare/Mask register should rather be initialized in
      phb4_init_ioda3().

      This patch also adds “ibm,phb-indications” to the device tree, to
      tell Linux the values of CAPI, ASN, and NBW indications, when
      supported.

      Tunneled operations tested by IBM in CAPI mode, by Mellanox
      Technologies in PCI mode.

  • Tie tm-suspend fw-feature and opal_reinit_cpus() together

    Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED)
    always returns OPAL_UNSUPPORTED.

    This ties the tm suspend fw-feature to the
    opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when
    tm suspend is disabled, we correctly report it to the kernel. For
    backwards compatibility, it’s assumed tm suspend is available if the
    fw-feature is not present.

    Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N
    DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and
    below has TM disabled completely (not just suspend).

    We are using opal_reinit_cpus() to determine this setting (rather
    than the device tree/HDAT) as some future firmware may let us change
    this dynamically after boot. That is not the case currently though.

Power Management

  • SLW: Increase stop4-5 residency by 10x

    Using DGEMM benchmark we observed there was a drop of 5-9%
    throughput with and without stop4/5. In this benchmark the GPU waits
    on the cpu to wakeup and provide the subsequent data block to
    compute. The wakup latency accumulates over the run and shows up as
    a performance drop.

    Linux enters stop4/5 more aggressively for its wakeup latency.
    Increasing the residency from 1ms to 10ms makes the performance drop
    <1%

  • occ: Set up OCC messaging even if we fail to setup pstates

    This means that we no longer hit this bug if we fail to get valid
    pstates from the OCC.

    [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
    echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
    [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
    [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
    [ 10.318805] Disabling lock debugging due to kernel taint
    [ 10.318808] Severe Machine check interrupt [Not recovered]
    [ 10.318812] NIP [000000003003e434]: 0x3003e434
    [ 10.318813] Initiator: CPU
    [ 10.318815] Error type: Real address [Load/Store (foreign)]
    [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
    [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
    [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
    [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
    [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
    [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1

mbox based platforms

For platforms using the mbox protocol for host flash access (all BMC
based OpenPOWER systems, most OpenBMC based systems) there have been
some hardening efforts in the event of the BMC being poorly behaved.

  • mbox: Reduce default BMC timeouts

    Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin
    for 70 seconds waiting for a BMC to come back. This also makes the
    current default of 30 seconds a bit pointless, is it far too short
    to be a worse case wait time but too long to avoid hitting
    hardlockup detectors and wrecking havoc inside host linux.

    Just change it to three seconds so that host linux will survive and
    that, reads and writes will fail but at least the host stays up.

    Also refactored the waiting loop just a bit so that it’s easier to
    read.

  • mbox: Harden against BMC daemon errors

    Bugs present in the BMC daemon mean that skiboot gets presented with
    mbox windows of size zero. These windows cannot be valid and skiboot
    already detects these conditions.

    Currently skiboot warns quite strongly about the occurrence of these
    problems. The problem for skiboot is that it doesn’t take any
    action. Initially I wanting to avoid putting policy like this into
    skiboot but since these bugs aren’t going away and skiboot barfing
    is leading to lockups and ultimately the host going down something
    needs to be done.

    I propose that when we detect the problem we fail the mbox call and
    punt the problem back up to Linux. I don’t like it but at least it
    will cause errors to cascade and won’t bring the host down. I’m not
    sure how Linux is supposed to detect this or what it can even do but
    this is better than a crash.

    Diagnosing a failure to boot if skiboot its self fails to read flash
    may be marginally more difficult with this patch. This is because
    skiboot will now only print one warning about the zero sized window
    rather than continuously spitting it out.

Fast Reboot Improvements

Around fast-reboot we have made several improvements to harden the
fast reboot code paths and resort to a full IPL if something doesn’t
look right.

  • core/fast-reboot: zero memory after fast reboot

    This improves the security and predictability of the fast reboot
    environment.

    There can not be a secure fence between fast reboots, because a
    malicious OS can modify the firmware itself. However a well-behaved
    OS can have a reasonable expectation that OS memory regions it has
    modified will be cleared upon fast reboot.

    The memory is zeroed after all other CPUs come up from fast reboot,
    just before the new kernel is loaded and booted into. This allows
    image preloading to run concurrently, and will allow parallelisation
    of the clearing in future.

  • core/fast-reboot: verify mem regions before fast reboot

    Run the mem_region sanity checkers before proceeding with fast
    reboot.

    This is the beginning of proactive sanity checks on opal data for
    fast reboot (with complements the reactive disable_fast_reboot
    cases). This is encouraged to re-use and share any kind of debug
    code and unit test code.

  • fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they
    exist

  • core/fast-reboot: disable fast reboot upon fundamental
    entry/exit/locking errors

    This disables fast reboot in several more cases where serious errors
    like lock corruption or call re-entrancy are detected.

  • capp: Disable fast-reboot whenever enable_capi_mode() is called

    This patch updates phb4_set_capi_mode() to disable fast-reboot
    whenever enable_capi_mode() is called, irres...

Read more

v5.10.2

01 May 06:07
v5.10.2
Compare
Choose a tag to compare

skiboot-5.10.2


skiboot 5.10.2 was released on Tuesday March 6th, 2018. It replaces
skiboot-5.10.1 as the current stable release in the 5.10.x series.

Over skiboot-5.10.1, we have one improvement:

  • Tie tm-suspend fw-feature and opal_reinit_cpus() together

    Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED)
    always returns OPAL_UNSUPPORTED.

    This ties the tm suspend fw-feature to the
    opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when
    tm suspend is disabled, we correctly report it to the kernel. For
    backwards compatibility, it’s assumed tm suspend is available if the
    fw-feature is not present.

    Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N
    DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and
    below has TM disabled completely (not just suspend).

    We are using opal_reinit_cpus() to determine this setting (rather
    than the device tree/HDAT) as some future firmware may let us change
    this dynamically after boot. That is not the case currently though.

v5.10.1

01 May 06:06
v5.10.1
Compare
Choose a tag to compare

skiboot-5.10.1


skiboot 5.10.1 was released on Thursday March 1st, 2018. It replaces
skiboot-5.10 as the current stable release in the 5.10.x series.

Over skiboot-5.10, we have an improvement for debugging NPU2/NVLink
problems and a bug fix. These changes are:

  • NPU2 HMIs: dump out a LOT of npu2 registers for debugging

  • libflash/blocklevel: Correct miscalculation in
    blocklevel_smart_erase()

    This fixes a bug in pflash.

    If blocklevel_smart_erase() detects that the smart erase fits entire
    in one erase block, it has an early bail path. In this path it
    miscaculates where in the buffer the backend needs to read from to
    perform the final write.

    Fixes: #151

v5.10

23 Feb 03:36
v5.10
Compare
Choose a tag to compare

skiboot-5.10

skiboot v5.10 was released on Friday February 23rd 2018. It is the first
release of skiboot 5.10, and becomes the new stable release of skiboot
following the 5.9 release, first released October 31st 2017.

skiboot v5.10 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9. We do not forsee any further 5.9.x releases.

For how the skiboot stable releases work, see stable-rules for details.

Over skiboot-5.9, we have the following changes:

New Features

Since skiboot-5.10-rc3:

  • sensor-groups: occ: Add support to disable/enable sensor group

    This patch adds a new opal call to enable/disable a sensor group.
    This call is used to select the sensor groups that needs to be
    copied to main memory by OCC at runtime.

  • sensors: occ: Add energy counters

    Export the accumulated power values as energy sensors. The
    accumulator field of power sensors are used for representing energy
    counters which can be exported as energy counters in Linux hwmon
    interface.

  • sensors: Support reading u64 sensor values

    This patch adds support to read u64 sensor values. This also adds
    changes to the core and the backend implementation code to make this
    API as the base call. Host can use this new API to read sensors upto
    64bits.

    This adds a list to store the pointer to the kernel u32 buffer, for
    older kernels making async sensor u32 reads.

  • dt: add /cpus/ibm,powerpc-cpu-features device tree bindings

    This is a new CPU feature advertising interface that is
    fine-grained, extensible, aware of privilege levels, and gives
    control of features to all levels of the stack (firmware,
    hypervisor, and OS).

    The design and binding specification is described in detail in doc/.

Since skiboot-5.10-rc2:

  • DT: Add "version" property under ibm, firmware-versions node

    First line of VERSION section in PNOR contains firmware version. Use
    that to add "version" property under firmware versions dt node.

    Sample output:

    root@xxx2:/proc/device-tree/ibm,firmware-versions# lsprop
    version          "witherspoon-ibm-OP9_v1.19_1.94"
    

Since skiboot-5.10-rc1:

  • hw/npu2: Implement logging HMI actions

Since skiboot-5.9:

  • hdata: Parse IPL FW feature settings

    Add parsing for the firmware feature flags in the HDAT. This
    indicates the settings of various parameters which are set at IPL
    time by firmware.

  • opal/xstop: Use nvram option to enable/disable sw checkstop.

    Add a mechanism to enable/disable sw checkstop by looking at nvram
    option opal-sw-xstop=<enable/disable>.

    For now this patch disables the sw checkstop trigger unless
    explicitly enabled through nvram option 'opal-sw-xstop=enable'i for
    p9. This will allow an opportunity to get host kernel in panic path
    or xmon for unrecoverable HMIs or MCE, to be able to debug the issue
    effectively.

    To enable sw checkstop in opal issue following command: :

    nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
    

    NOTE: This is a workaround patch to disable sw checkstop by
    default to gain control in host kernel for better checkstop
    debugging. Once we have most of the checkstop issues
    stabilized/resolved, revisit this patch to enable sw checkstop by
    default.

    For p8 platform it will remain enabled by default unless explicitly
    disabled.

    To disable sw checkstop on p8 issue following command: :

    nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
    
  • hdata: Parse SPD data

    Parse SPD data and populate device tree.

    list of properties parsing from SPD: :

    [root@ltc-wspoon dimm@d00f]# lsprop .
    memory-id        0000000c (12)      # DIMM type
    product-version  00000032 (50)      # Module Revision Code
    device_type      "memory-dimm-ddr4"
    serial-number    15d9acb6 (366587062)
    status           "okay"
    size             00004000 (16384)
    phandle          000000bd (189)
    ibm,loc-code     "UOPWR.0000000-Node0-DIMM7"
    part-number      "36ASF2G72PZ-2G6B2   "
    reg              0000d007 (53255)
    name             "dimm"
    manufacturer-id  0000802c (32812)  # Vendor ID, we can get vendor name from this ID
    

    Also update documentation.

  • hdata: Add memory hierarchy under xscom node

    We have memory to chip mapping but doesn't have complete memory
    hierarchy. This patch adds memory hierarchy under xscom node. This
    is specific to P9 system as these hierarchy may change between
    processor generation.

    It uses memory controller ID details and populates nodes like:

    : xscom@<addr>/mcbist@<mcbist_id>/mcs@<mcs_id>/mca@<mca_id>/dimm@<resource_id>

    Also this patch adds few properties under dimm node. Finally make
    sure xscom nodes created before calling memory_parse().

Fast Reboot and Quiesce

We have a preliminary fast reboot implementation for POWER9 systems,
which we look to enabling by default in the next release.

The OPAL Quiesce calls are designed to improve reliability and
debuggability around reboot and error conditions. See the full API
documentation for details: opal-quiesce.

  • fast-reboot: bare bones fast reboot implementation for POWER9

    This is an initial fast reboot implementation for p9 which has only
    been tested on the Witherspoon platform, and without the use of
    NPUs, NX/VAS, etc.

    This has worked reasonably well so far, with no failures in about
    100 reboots. It is hidden behind the traditional fast-reboot
    experimental nvram option, until more platforms and configurations
    are tested.

  • fast-reboot: move boot CPU clean-up logically together with
    secondaries

    Move the boot CPU clean-up and state transition to active, logically
    together with secondaries. Don't release secondaries from fast
    reboot hold until everyone has cleaned up and transitioned to
    active.

    This is cosmetic, but it is helpful to run the fast reboot state
    machine the same way on all CPUs.

  • fast-reboot: improve failure error messages

    Change existing failure error messages to PR_NOTICE so they get
    printed to the console, and add some new ones. It's not a more
    severe class because it falls back to IPL on failure.

  • fast-reboot: quiesce opal before initiating a fast reboot

    Switch fast reboot to use quiescing rather than "wait for a while".

    If firmware can not be quiesced, then fast reboot is skipped. This
    significantly improves the robustness of fast reboot in the face of
    bugs or unexpected latencies.

    Complexity of synchronization in fast-reboot is reduced, because we
    are guaranteed to be single-threaded when quiesce succeeds, so locks
    can be removed.

    In the case that firmware can be quiesced, then it will generally
    reduce fast reboot times by nearly 200ms, because quiescing usually
    takes very little time.

  • core: Add support for quiescing OPAL

    Quiescing is ensuring all host controlled CPUs (except the current
    one) are out of OPAL and prevented from entering. This can be use in
    debug and shutdown paths, particularly with system reset sequences.

    This patch adds per-CPU entry and exit tracking for OPAL calls, and
    adds logic to "hold" or "reject" at entry time, if OPAL is quiesced.

    An OPAL call is added, to expose the functionality to Linux, where
    it can be used for shutdown, kexec, and before generating sreset
    IPIs for debugging (so the debug code does not recurse into OPAL).

  • dctl: p9 increase thread quiesce timeout

    We require all instructions to be completed before a thread is
    considered stopped, by the dctl interface. Long running instructions
    like cache misses and CI loads may take a significant amount of time
    to complete, and timeouts have been observed in stress testing.

    Increase the timeout significantly, to cover this. The workbook just
    says to poll, but we like to have timeouts to avoid getting stuck in
    firmware.

POWER9 power saving

There is much improved support for deeper sleep/idle (stop) states on
POWER9.

  • OCC: Increase max pstate check on P9 to 255

    This has changed from P8, we can now have > 127 pstates.

    This was observed on Boston during WoF bring up.

  • SLW: Add idle state stop5 for DD2.0 and above

    Adding stop5 idle state with rough residency and latency numbers.

  • SLW: Add p9_stop_api calls for IMC

    Add p9_stop_api for EVENT_MASK and PDBAR scoms. These scoms are
    lost on wakeup from stop11.

  • SCOM restore for DARN and XIVE

    While waking up from stop11, we want NCU_DARN_BAR to have enable
    bit set. Without this stop_api call, the value restored is without
    enable bit set. We loose NCU_SPEC_BAR when the quad goes into
    stop11, stop_api will restore while waking up from stop11.

  • SLW: Call p9_stop_api only if deep_states are enabled

    All init time p9_stop_api calls have been isolated to
    slw_late_init. If p9_stop_api fails, then the deep states can be
    excluded from device tree.

    For p9_stop_api called after device-tree for cpuidle is created ,
    has_deep_states will be used to check if this call is even
    required.

  • Better handle errors in setting up sleep states (p9_stop_api)

    We won't put affected stop states in the device tree if the wakeup
    engine is not present or has failed.

  • SCOM Restore: Increased the EQ SCOM restore limit.

    Commit increases the SCOM restore limit from 16 to 31.
    -...

Read more

v5.10-rc4

23 Feb 03:37
v5.10-rc4
Compare
Choose a tag to compare
v5.10-rc4 Pre-release
Pre-release

skiboot-5.10-rc4

skiboot v5.10-rc4 was released on Wednesday February 21st 2018. It is
the fourth release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.

skiboot v5.10-rc4 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.

Over skiboot-5.10-rc3, we have the following changes:

  • core: Fix mismatched names between reserved memory nodes &
    properties

    OPAL exposes reserved memory regions through the device tree in both
    new (nodes) and old (properties) formats.

    However, the names used for these don't match - we use a generated
    cell address for the nodes, but the plain region name for the
    properties.

    This fixes a warning from FWTS

  • sensor-groups: occ: Add support to disable/enable sensor group

    This patch adds a new opal call to enable/disable a sensor group.
    This call is used to select the sensor groups that needs to be
    copied to main memory by OCC at runtime.

  • sensors: occ: Add energy counters

    Export the accumulated power values as energy sensors. The
    accumulator field of power sensors are used for representing energy
    counters which can be exported as energy counters in Linux hwmon
    interface.

  • sensors: Support reading u64 sensor values

    This patch adds support to read u64 sensor values. This also adds
    changes to the core and the backend implementation code to make this
    API as the base call. Host can use this new API to read sensors upto
    64bits.

    This adds a list to store the pointer to the kernel u32 buffer, for
    older kernels making async sensor u32 reads.

  • dt: add /cpus/ibm,powerpc-cpu-features device tree bindings

    This is a new CPU feature advertising interface that is
    fine-grained, extensible, aware of privilege levels, and gives
    control of features to all levels of the stack (firmware,
    hypervisor, and OS).

    The design and binding specification is described in detail in doc/.

  • phb3/phb4/p7ioc: Document supported TCE sizes in DT

    Add a new property, "ibm,supported-tce-sizes", to advertise to Linux
    how big the available TCE sizes are. Each value is a bit shift, from
    smallest to largest.

  • phb4: Fix TCE page size

    The page sizes for TCEs on P9 were inaccurate and just copied from
    PHB3, so correct them.

  • Revert "pci: Shared slot state synchronisation for hot reset"

    An issue was found in shared slot reset where the system can be
    stuck in an infinite loop, pull the code out until there's a proper
    fix.

    This reverts commit 1172a6c.

  • hdata/iohub: Use only wildcard slots for pluggables

    We don't want to cause a VID:DID check against pluggable devices, as
    they may use multiple devids.

    Narrow the condition under which VID:DID is listed in the dt, so
    that we'll end up creating a wildcard slot for these instead.

  • increase log verbosity in debug builds

  • Add -debug to version on DEBUG builds

  • cpu_wait_job: Correctly report time spent waiting for job

v5.10-rc3

23 Feb 03:37
v5.10-rc3
Compare
Choose a tag to compare
v5.10-rc3 Pre-release
Pre-release

skiboot-5.10-rc3

skiboot v5.10-rc3 was released on Thursday February 15th 2018. It is the
third release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.

skiboot v5.10-rc3 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.

Over skiboot-5.10-rc2, we have the following changes:

  • vas: Disable VAS/NX-842 on some P9 revisions

    VAS/NX-842 are not functional on some P9 revisions, so disable them
    in hardware and skip creating their device tree nodes.

    Since the intent is to prevent OS from configuring VAS/NX, we remove
    only the platform device nodes but leave the VAS/NX DT nodes under
    xscom (i.e we don't skip add_vas_node() in hdata/spira.c)

  • phb4: Only escalate freezes on MMIO load where necessary

    In order to work around a hardware issue, MMIO load freezes were
    escalated to fences on every chip. Now that hardware no longer
    requires this, restrict escalation to the chips that actually need
    it.

  • pflash: Fix makefile dependency issue

  • DT: Add "version" property under ibm, firmware-versions node

    First line of VERSION section in PNOR contains firmware version. Use
    that to add "version" property under firmware versions dt node.

    Sample output:

    root@xxx2:/proc/device-tree/ibm,firmware-versions# lsprop
    version          "witherspoon-ibm-OP9_v1.19_1.94"
    
  • npu2: Disable TVT range check when in bypass mode

    On POWER9 the GPUs need to be able to access the MMIO memory space.
    Therefore the TVT range check needs to include the MMIO address
    space. As any possible range check would cover all of memory anyway
    this patch just disables the TVT range check all together when
    bypassing the TCE tables.

  • hw/npu2: support creset of npu2 devices

    creset calls in the hw procedure that resets the PHY, we don't take
    them out of reset, just put them in reset.

    this fixes a kexec issue.

  • ATTN: Enable flush instruction cache bit in HID register

    In P9, we have to enable "flush the instruction cache" bit along
    with "attn instruction support" bit to trigger attention.

  • capi: Enable channel tag streaming for PHB in CAPP mode

    We re-enable channel tag streaming for PHB in CAPP mode as without
    it PEC was waiting for cresp for each DMA write command before
    sending a new DMA write command on the Powerbus. This resulted in
    much lower DMA write performance than expected.

    The patch updates enable_capi_mode() to remove the masking of
    channel_streaming_en bit in PBCQ Hardware Configuration Register.
    Also does some re-factoring of the code that updates this register
    to use xscom_write_mask instead of xscom_read followed by a
    xscom_write.

  • core/device.c: Fix dt_find_compatible_node

    dt_find_compatible_node() and
    dt_find_compatible_node_on_chip() are used to find device nodes
    under a parent/root node with a given compatible property.

    dt_next(root, prev) is used to walk the child nodes of the given
    parent and takes two arguments - root contains the parent node to
    walk whilst prev contains the previous child to search from so that
    it can be used as an iterator over all children nodes.

    The first iteration of dt_find_compatible_node(root, prev) calls
    dt_next(root, root) which is not a well defined operation as prev
    is assumed to be child of the root node. The result is that when a
    node contains no children it will start returning the parent nodes
    siblings until it hits the top of the tree at which point a NULL
    derefence is attempted when looking for the root nodes parent.

    Dereferencing NULL can result in undesirable data exceptions during
    system boot and untimely non-hilarious system crashes. dt_next()
    should not be called with prev == root. Instead we add a check to
    dt_next() such that passing prev = NULL will cause it to start
    iterating from the first child node (if any).

  • stb: Put correct label (for skiboot) into container

    Hostboot will expect the label field of the stb header to contain
    "PAYLOAD" for skiboot or it will fail to load and run skiboot.

    The failure looks something like this: :

    53.40896|ISTEP 20. 1 - host_load_payload
    53.65840|secure|Secureboot Failure plid = 0x90000755, rc = 0x1E07
    
    53.65881|System shutting down with error status 0x1E07
    53.67547|================================================
    53.67954|Error reported by secure (0x1E00) PLID 0x90000755
    53.67560|  Container's component ID does not match expected component ID
    53.67561|  ModuleId   0x09 SECUREBOOT::MOD_SECURE_VERIFY_COMPONENT
    53.67845|  ReasonCode 0x1e07 SECUREBOOT::RC_ROM_VERIFY
    53.67998|  UserData1   : 0x0000000000000000
    53.67999|  UserData2   : 0x0000000000000000
    53.67999|------------------------------------------------
    53.68000|  Callout type             : Procedure Callout
    53.68000|  Procedure                : EPUB_PRC_HB_CODE
    53.68001|  Priority                 : SRCI_PRIORITY_HIGH
    53.68001|------------------------------------------------
    53.68002|  Callout type             : Procedure Callout
    53.68003|  Procedure                : EPUB_PRC_FW_VERIFICATION_ERR
    53.68003|  Priority                 : SRCI_PRIORITY_HIGH
    53.68004|------------------------------------------------
    
  • hw/occ: Fix fast-reboot crash in P8 platforms.

    commit 85a1de3 ("fast-boot: occ: Re-parse the pstate table
    during fast-boot" ) breaks the fast-reboot on P8 platforms while
    reiniting the OCC pstates. On P8 platforms OPAL adds additional two
    properties #address-cells and #size-cells under
    ibm,opal/power-mgmt/ DT node. While in fast-reboot same properties
    adding back to the same node results in Duplicate properties and
    hence fast-reboot fails with below traces. :

    [  541.410373292,5] OCC: All Chip Rdy after 0 ms
    [  541.410488745,3] Duplicate property "#address-cells" in node /ibm,opal/power-mgt
    [  541.410694290,0] Aborting!
    CPU 0058 Backtrace:
     S: 0000000031d639d0 R: 000000003001367c   .backtrace+0x48
     S: 0000000031d63a60 R: 000000003001a03c   ._abort+0x4c
     S: 0000000031d63ae0 R: 00000000300267d8   .new_property+0xd8
     S: 0000000031d63b70 R: 0000000030026a28   .__dt_add_property_cells+0x30
     S: 0000000031d63c10 R: 000000003003ea3c   .occ_pstates_init+0x984
     S: 0000000031d63d90 R: 00000000300142d8   .load_and_boot_kernel+0x86c
     S: 0000000031d63e70 R: 000000003002586c   .fast_reboot_entry+0x358
     S: 0000000031d63f00 R: 00000000300029f4   fast_reset_entry+0x2c
    

    This patch fixes this issue by removing these two properties on P8
    while doing OCC pstates re-init in fast-reboot code path.

v5.10-rc2

23 Feb 03:37
v5.10-rc2
Compare
Choose a tag to compare
v5.10-rc2 Pre-release
Pre-release

skiboot-5.10-rc2

skiboot v5.10-rc2 was released on Friday February 9th 2018. It is the
second release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.

skiboot v5.10-rc2 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.

Over skiboot-5.10-rc1, we have the following changes:

  • hw/npu2: Implement logging HMI actions

  • opal-prd: Fix FTBFS with -Werror=format-overflow

    i2c.c fails to compile with gcc7 and -Werror=format-overflow used in
    Debian Unstable and Ubuntu 18.04 : :

    i2c.c: In function ‘i2c_init’:
    i2c.c:211:15: error: ‘%s’ directive writing up to 255 bytes into a
    region of size 236 [-Werror=format-overflow=]
    
  • core/exception: beautify exception handler, add MCE-involved
    registers

    Print DSISR and DAR, to help with deciphering machine check
    exceptions, and improve the output a bit, decode NIP symbol, improve
    alignment, etc. Also print a specific header for machine check,
    because we do expect to see these if there is a hardware failure.

    Before: :

    [    0.005968779,3] ***********************************************
    [    0.005974102,3] Unexpected exception 200 !
    [    0.005978696,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
    [    0.005985239,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
    [    0.005991782,3] LR   : 000000003002ad80 CTR  : 0000000000000000
    [    0.005998130,3] CFAR : 00000000300b58bc
    [    0.006002769,3] CR   : 40000004  XER: 20000000
    [    0.006008069,3] GPR00: 000000003002ad80 GPR16: 0000000000000000
    [    0.006015170,3] GPR01: 0000000031c03bd0 GPR17: 0000000000000000
    [...]
    

    After: :

    [    0.003287941,3] ***********************************************
    [    0.003561769,3] Fatal MCE at 000000003002ad80   .nvram_init+0x24
    [    0.003579628,3] CFAR : 00000000300b5964
    [    0.003584268,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
    [    0.003590812,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
    [    0.003597355,3] DSISR: 00000000         DAR  : 0000000000000000
    [    0.003603480,3] LR   : 000000003002ad68 CTR  : 0000000030093d80
    [    0.003609930,3] CR   : 40000004         XER  : 20000000
    [    0.003615698,3] GPR00: 00000000300149e8 GPR16: 0000000000000000
    [    0.003622799,3] GPR01: 0000000031c03bc0 GPR17: 0000000000000000
    [...]
    
  • core/init: manage MSR[ME] explicitly, always enable

    The current boot sequence inherits MSR[ME] from the IPL firmware,
    and never changes it. Some environments disable MSR[ME] (e.g.,
    mambo), and others can enable it (hostboot).

    This has two problems. First, MSR[ME] must be disabled while in
    the process of taking over the interrupt vector from the previous
    environment. Second, after installing our machine check handler,
    MSR[ME] should be enabled to get some useful output rather than a
    checkstop.

  • fast-reboot: occ: Re-parse the pstate table during fast-reboot

    OCC shares the frequency list to host by copying the pstate table to
    main memory in HOMER. This table is parsed during boot to create
    device-tree properties for frequency and pstate IDs. OCC can update
    the pstate table to present a new set of frequencies to the host.
    But host will remain oblivious to these changes unless it is
    re-inited with the updated device-tree CPU frequency properties. So
    this patch allows to re-parse the pstate table and update the
    device-tree properties during fast-reboot.

    OCC updates the pstate table when asked to do so using pstate-table
    bias command. And this is mainly used by WOF team for
    characterization purposes.

  • fast-reboot: move pci_reset error handling into fast-reboot code

    pci_reset() currently does a platform reboot if it fails. It should
    not know about fast-reboot at this level, so instead have it return
    an error, and the fast reboot caller will do the platform reboot.

    The code essentially does the same thing, but flexibility is
    improved. Ideally the fast reboot code should perform pci_reset and
    all such fail-able operations before the CPU resets itself and
    destroys its own stack. That's not the case now, but that should be
    the goal.

  • capi: Fix the max tlbi divider and the directory size.

    Switch to 512KB mode (directory size) as we don’t use bit 48 of the
    tag in addressing the array. This mode is controlled by the Snoop
    CAPI Configuration Register. Set the maximum of the number of data
    polls received before signaling TLBI hang detect timer expired. The
    value of '0000' is equal to 16.

  • npu2/tce: Fix page size checking

    The page size is encoded in the TVT data [59:63] as @shift+11 but
    the tce_kill handler does not do the math right; this fixes it.

  • stb: Enforce secure boot if called before libstb initialized

  • stb: Correctly error out when no PCR for resource

  • core/init: move imc catalog preload init after the STB init.

    As a safer side move the imc catalog preload after the STB init to
    make sure the imc catalog resource get's verified and measured
    properly during loading when both secure and trusted boot modes are
    on.

  • libstb: fix failure of calling trusted measure without STB
    initialization.

    When we load a flash resource during OPAL init, STB calls trusted
    measure to measure the given resource. There is a situation when a
    flash gets loaded before STB initialization then trusted measure
    cannot measure properly.

    So this patch fixes this issue by calling trusted measure only if
    the corresponding trusted init was done.

    The ideal fix is to make sure STB init done at the first place
    during init and then do the loading of flash resources, by that way
    STB can properly verify and measure the all resources.

  • libstb: fix failure of calling cvc verify without STB
    initialization.

    Currently in OPAL init time at various stages we are loading various
    PNOR partition containers from the flash device. When we load a
    flash resource STB calls the CVC verify and trusted measure(sha512)
    functions. So when we have a flash resource gets loaded before STB
    initialization, then cvc verify function fails to start the verify
    and enforce the boot.

    Below is one of the example failure where our VERSION partition gets
    loading early in the boot stage without STB initialization done.

    This is with secure mode off. STB: VERSION NOT VERIFIED, invalid
    param. buf=0x305ed930, len=4096 key-hash=0x0 hash-size=0

    In the same code path when secure mode is on, the boot process will
    abort.

    So this patch fixes this issue by calling cvc verify only if we have
    STB init was done.

    And also we need a permanent fix in init path to ensure STB init
    gets done at first place and then start loading all other flash
    resources.

  • libstb/tpm_chip: Add missing new line to print messages.

  • libstb: increase the log level of verify/measure messages to
    PR_NOTICE.

    Currently libstb logs the verify and hash caluculation messages in
    PR_INFO level. So when there is a secure boot enforcement happens
    in loading last flash resource(Ex: BOOTKERNEL), the previous verify
    and measure messages are not logged to console, which is not clear
    to the end user which resource is verified and measured. So this
    patch fixes this by increasing the log level to PR_NOTICE.

v5.10-rc1

23 Feb 03:38
v5.10-rc1
Compare
Choose a tag to compare
v5.10-rc1 Pre-release
Pre-release

skiboot-5.10-rc1

skiboot v5.10-rc1 was released on Tuesday February 6th 2018. It is the
first release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.

skiboot v5.10-rc1 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.

Over skiboot-5.9, we have the following changes:

New Features

  • hdata: Parse IPL FW feature settings

    Add parsing for the firmware feature flags in the HDAT. This
    indicates the settings of various parameters which are set at IPL
    time by firmware.

  • opal/xstop: Use nvram option to enable/disable sw checkstop.

    Add a mechanism to enable/disable sw checkstop by looking at nvram
    option opal-sw-xstop=<enable/disable>.

    For now this patch disables the sw checkstop trigger unless
    explicitly enabled through nvram option 'opal-sw-xstop=enable'i for
    p9. This will allow an opportunity to get host kernel in panic path
    or xmon for unrecoverable HMIs or MCE, to be able to debug the issue
    effectively.

    To enable sw checkstop in opal issue following command: :

    nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
    

    NOTE: This is a workaround patch to disable sw checkstop by
    default to gain control in host kernel for better checkstop
    debugging. Once we have most of the checkstop issues
    stabilized/resolved, revisit this patch to enable sw checkstop by
    default.

    For p8 platform it will remain enabled by default unless explicitly
    disabled.

    To disable sw checkstop on p8 issue following command: :

    nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
    
  • hdata: Parse SPD data

    Parse SPD data and populate device tree.

    list of properties parsing from SPD: :

    [root@ltc-wspoon dimm@d00f]# lsprop .
    memory-id        0000000c (12)      # DIMM type
    product-version  00000032 (50)      # Module Revision Code
    device_type      "memory-dimm-ddr4"
    serial-number    15d9acb6 (366587062)
    status           "okay"
    size             00004000 (16384)
    phandle          000000bd (189)
    ibm,loc-code     "UOPWR.0000000-Node0-DIMM7"
    part-number      "36ASF2G72PZ-2G6B2   "
    reg              0000d007 (53255)
    name             "dimm"
    manufacturer-id  0000802c (32812)  # Vendor ID, we can get vendor name from this ID
    

    Also update documentation.

  • hdata: Add memory hierarchy under xscom node

    We have memory to chip mapping but doesn't have complete memory
    hierarchy. This patch adds memory hierarchy under xscom node. This
    is specific to P9 system as these hierarchy may change between
    processor generation.

    It uses memory controller ID details and populates nodes like:

    : xscom@<addr>/mcbist@<mcbist_id>/mcs@<mcs_id>/mca@<mca_id>/dimm@<resource_id>

    Also this patch adds few properties under dimm node. Finally make
    sure xscom nodes created before calling memory_parse().

Fast Reboot and Quiesce

We have a preliminary fast reboot implementation for POWER9 systems,
which we look to enabling by default in the next release.

The OPAL Quiesce calls are designed to improve reliability and
debuggability around reboot and error conditions. See the full API
documentation for details: opal-quiesce.

  • fast-reboot: bare bones fast reboot implementation for POWER9

    This is an initial fast reboot implementation for p9 which has only
    been tested on the Witherspoon platform, and without the use of
    NPUs, NX/VAS, etc.

    This has worked reasonably well so far, with no failures in about
    100 reboots. It is hidden behind the traditional fast-reboot
    experimental nvram option, until more platforms and configurations
    are tested.

  • fast-reboot: move boot CPU clean-up logically together with
    secondaries

    Move the boot CPU clean-up and state transition to active, logically
    together with secondaries. Don't release secondaries from fast
    reboot hold until everyone has cleaned up and transitioned to
    active.

    This is cosmetic, but it is helpful to run the fast reboot state
    machine the same way on all CPUs.

  • fast-reboot: improve failure error messages

    Change existing failure error messages to PR_NOTICE so they get
    printed to the console, and add some new ones. It's not a more
    severe class because it falls back to IPL on failure.

  • fast-reboot: quiesce opal before initiating a fast reboot

    Switch fast reboot to use quiescing rather than "wait for a while".

    If firmware can not be quiesced, then fast reboot is skipped. This
    significantly improves the robustness of fast reboot in the face of
    bugs or unexpected latencies.

    Complexity of synchronization in fast-reboot is reduced, because we
    are guaranteed to be single-threaded when quiesce succeeds, so locks
    can be removed.

    In the case that firmware can be quiesced, then it will generally
    reduce fast reboot times by nearly 200ms, because quiescing usually
    takes very little time.

  • core: Add support for quiescing OPAL

    Quiescing is ensuring all host controlled CPUs (except the current
    one) are out of OPAL and prevented from entering. This can be use in
    debug and shutdown paths, particularly with system reset sequences.

    This patch adds per-CPU entry and exit tracking for OPAL calls, and
    adds logic to "hold" or "reject" at entry time, if OPAL is quiesced.

    An OPAL call is added, to expose the functionality to Linux, where
    it can be used for shutdown, kexec, and before generating sreset
    IPIs for debugging (so the debug code does not recurse into OPAL).

  • dctl: p9 increase thread quiesce timeout

    We require all instructions to be completed before a thread is
    considered stopped, by the dctl interface. Long running instructions
    like cache misses and CI loads may take a significant amount of time
    to complete, and timeouts have been observed in stress testing.

    Increase the timeout significantly, to cover this. The workbook just
    says to poll, but we like to have timeouts to avoid getting stuck in
    firmware.

POWER9 power saving

There is much improved support for deeper sleep/idle (stop) states on
POWER9.

  • OCC: Increase max pstate check on P9 to 255

    This has changed from P8, we can now have > 127 pstates.

    This was observed on Boston during WoF bring up.

  • SLW: Add idle state stop5 for DD2.0 and above

    Adding stop5 idle state with rough residency and latency numbers.

  • SLW: Add p9_stop_api calls for IMC

    Add p9_stop_api for EVENT_MASK and PDBAR scoms. These scoms are
    lost on wakeup from stop11.

  • SCOM restore for DARN and XIVE

    While waking up from stop11, we want NCU_DARN_BAR to have enable
    bit set. Without this stop_api call, the value restored is without
    enable bit set. We loose NCU_SPEC_BAR when the quad goes into
    stop11, stop_api will restore while waking up from stop11.

  • SLW: Call p9_stop_api only if deep_states are enabled

    All init time p9_stop_api calls have been isolated to
    slw_late_init. If p9_stop_api fails, then the deep states can be
    excluded from device tree.

    For p9_stop_api called after device-tree for cpuidle is created ,
    has_deep_states will be used to check if this call is even
    required.

  • Better handle errors in setting up sleep states (p9_stop_api)

    We won't put affected stop states in the device tree if the wakeup
    engine is not present or has failed.

  • SCOM Restore: Increased the EQ SCOM restore limit.

    Commit increases the SCOM restore limit from 16 to 31.

  • hw/dts: retry special wakeup operation if core still gated

    It has been observed that in some cases the special wakeup operation
    can "succeed" but the core is still in a gated/offline state.

    Check for this state after attempting to wakeup a core and retry the
    wakeup if necessary.

  • core/direct-controls: add function to read core gated state

  • core/direct-controls: wait for core special wkup bit cleared

    When clearing special wakeup bit on a core, wait until the bit is
    actually cleared by the hardware in the status register until
    returning success.

    This may help avoid issues with back-to-back reads where the special
    wakeup request is cleared but the firmware is still processing the
    request and the next attempt to set the bit reads an immediate
    success from the previous operation.

  • p9_stop_api: PM: Added support for version control in SCOM restore
    entries.

    • adds version info in SCOM restore entry header

    • adds version specific details in SCOM restore entry header

    • retains old behaviour of SGPE Hcode's base version

  • p9_stop_api: EQ SCOM Restore: Introduced version control in SCOM
    restore entry.

    • introduces version control in header of SCOM restore entry
    • ensures backward compatibility
    • introduces flexibility to handle any number of SCOM restore
      entry.

Secure and Trusted Boot for POWER9

We introduce support for Secure and Trusted Boot for POWER9 systems,
with eq...

Read more