Skip to content

【edk2和opensbi的配套RAS功能已提出pr】riscv: add RAS support with SBI SSE extension#98

Merged
sterling-teng merged 15 commits into
RVCK-Project:rvck-6.6from
zte-riscv:pr-RAS-with-SSE
Nov 27, 2025
Merged

【edk2和opensbi的配套RAS功能已提出pr】riscv: add RAS support with SBI SSE extension#98
sterling-teng merged 15 commits into
RVCK-Project:rvck-6.6from
zte-riscv:pr-RAS-with-SSE

Conversation

@GooTal
Copy link
Copy Markdown
Contributor

@GooTal GooTal commented Aug 22, 2025

community inclusion
category: feature
bugzilla: #95
Link: https://lore.kernel.org/all/20250908181717.1997461-2-cleger@rivosinc.com/
Link: https://lore.kernel.org/all/20250227123628.2931490-2-hchauhan@ventanamicro.com/


Implement Reliability, Availability and Serviceability (RAS) support for RISC-V architecture using RISC-V RERI specification and SBI SSE extension. This feature provides hardware error event handling and reporting capabilities conformant to ACPI platform error interfaces.

Key changes include:

SBI SSE Extension Support:

  • Add Supervisor Software Events mechanism for non-maskable event notification from SBI firmware
  • Implement context switching with firmware saving registers a6/a7
  • Add __sse_entry_task per_cpu array for reliable current task tracking during SSE event delivery
  • Allocate dedicated stacks for each event and CPU to support event preemption
  • Handle event completion with proper interrupt simulation for signal delivery to user tasks

RAS Framework Integration:

  • Integrate with existing GHES driver framework using highest priority SSE events for hardware error delivery
  • Register GHES entries with SSE layer using GHES notification vectors as SSE events
  • Add RISC-V specific processor type and ISA string entries
  • Add GHES SSE fixmap indices for physical address mapping
  • Enable build and configuration support for RAS

The implementation ensures reliable hardware error event processing and maintains system stability during error conditions. SSE events can be delivered at any time including during exception handling, with proper context preservation and signal delivery mechanisms.

Co-developed-by: Clément Léger cleger@rivosinc.com
Signed-off-by: Clément Léger cleger@rivosinc.com
Co-developed-by: Himanshu Chauhan hchauhan@ventanamicro.com
Signed-off-by: Himanshu Chauhan hchauhan@ventanamicro.com
Tested-by: lupeng lu.peng3@zte.com.cn
Signed-off-by: lupeng lu.peng3@zte.com.cn
Signed-off-by: liuqingtao liu.qingtao2@zte.com.cn

@oervci
Copy link
Copy Markdown

oervci commented Aug 22, 2025

开始测试

@oervci
Copy link
Copy Markdown

oervci commented Aug 22, 2025

@GooTal
Copy link
Copy Markdown
Contributor Author

GooTal commented Aug 22, 2025

cc:@yclup

@oervci
Copy link
Copy Markdown

oervci commented Aug 22, 2025

Kernel build success!

@oervci
Copy link
Copy Markdown

oervci commented Aug 22, 2025

@oervci
Copy link
Copy Markdown

oervci commented Aug 22, 2025

@oervci
Copy link
Copy Markdown

oervci commented Aug 27, 2025

开始测试

@GooTal
Copy link
Copy Markdown
Contributor Author

GooTal commented Nov 18, 2025

“ edk2和opensbi的配套RAS功能已提出pr,”现在那边是合并了,还是在审阅中?

还在审阅中

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 19, 2025


开始测试 log: https://github.com/RVCK-Project/rvck/actions/runs/19485515932

参数解析结果
args value
repository RVCK-Project/rvck
head ref pull/98/head
base ref rvck-6.6
LAVA repo RVCK-Project/lavaci
LAVA Template lava-job-template/qemu/qemu-ltp.yaml
Testcase path lava-testcases/common-test/ltp/ltp.yaml

测试完成

详细结果:

RVCK result

check result
kunit-test success
kernel-build failure
lava-trigger skipped
check-patch success

Kunit Test Result

[00:48:45] Testing complete. Ran 209 tests: passed: 175, crashed: 31, skipped: 3, errors: 32

Kernel Build Result

Kernel build failed.

Check Patch Result

Total Errors 0
Total Warnings 17

@sterling-teng
Copy link
Copy Markdown
Contributor

每次push -f后,请附上变动摘要。

@GooTal
Copy link
Copy Markdown
Contributor Author

GooTal commented Nov 20, 2025

每次push -f后,请附上变动摘要。

好的,本次变动主要是rebase更新,无新增代码。历史的改动主要是引入了开源的defconfig选项。其他我提出的pr大概也都是rebase更新

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 24, 2025


开始测试 log: https://github.com/RVCK-Project/rvck/actions/runs/19620088449

参数解析结果
args value
repository RVCK-Project/rvck
head ref pull/98/head
base ref rvck-6.6
LAVA repo RVCK-Project/lavaci
LAVA Template lava-job-template/qemu/qemu-ltp.yaml
Testcase path lava-testcases/common-test/ltp/ltp.yaml
need run job kunit-test,kernel-build,check-patch,lava-trigger

测试完成

详细结果:

RVCK result

check result
kunit-test success
kernel-build success
lava-trigger success
check-patch success

Kunit Test Result

[00:58:31] Testing complete. Ran 455 tests: passed: 443, skipped: 12

Kernel Build Result

Kernel build succeeded: RVCK-Project/rvck/98/

dadb84edda096e91d8f2aee35cd4f533 /srv/guix_result/ae458f443ad9d19e35fc0292700693f2f94a50be/Image
2bec2c836a7712b2d3d232b0dfc2f3f0 /root/initramfs.img

LAVA Check

args:

result:

Lava check done! lava log: https://lava.oerv.ac.cn/scheduler/job/930

lava result count: [fail]: 175, [pass]: 1433, [skip]: 291

Check Patch Result

Total Errors 0
Total Warnings 17

@GooTal
Copy link
Copy Markdown
Contributor Author

GooTal commented Nov 24, 2025

rebase更新,无其他改动

@sterling-teng
Copy link
Copy Markdown
Contributor

rebase更新,无其他改动

好的,该pr补丁正在审阅过程中,不要频繁改动。

@sterling-teng
Copy link
Copy Markdown
Contributor

sterling-teng commented Nov 25, 2025

该pr中的部分补丁在11月5日时已经进入了maintianer的开发树,请更新这部分的commit message和代码。
https://lore.kernel.org/all/176355541775.758643.18140349571928540394.git-patchwork-notify@kernel.org/

其它几个未合并进maintainer开发树中的补丁我们还需要评估之后尝试合并。

clementleger and others added 15 commits November 26, 2025 09:22
community inclusion
category: feature
bugzilla: RVCK-Project#95

------------------

Add needed definitions for SBI Supervisor Software Events extension [1].
This extension enables the SBI to inject events into supervisor software
much like ARM SDEI.

[1] https://lists.riscv.org/g/tech-prs/message/515

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://patch.msgid.link/20251105082639.342973-2-cleger@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95

------------------

The SBI SSE extension allows the supervisor software to be notified by
the SBI of specific events that are not maskable. The context switch is
handled partially by the firmware which will save registers a6 and a7.
When entering kernel we can rely on these 2 registers to setup the stack
and save all the registers.

Since SSE events can be delivered at any time to the kernel (including
during exception handling, we need a way to locate the current_task for
context tracking. On RISC-V, it is sotred in scratch when in user space
or tp when in kernel space (in which case SSCRATCH is zero). But at a
at the beginning of exception handling, SSCRATCH is used to swap tp and
check the origin of the exception. If interrupted at that point, then,
there is no way to reliably know were is located the current
task_struct. Even checking the interruption location won't work as SSE
event can be nested on top of each other so the original interruption
site might be lost at some point. In order to retrieve it reliably,
store the current task in an additional __sse_entry_task per_cpu array.
This array is then used to retrieve the current task based on the
hart ID that is passed to the SSE event handler in a6.

That being said, the way the current task struct is stored should
probably be reworked to find a better reliable alternative.

Since each events (and each CPU for local events) have their own
context and can preempt each other, allocate a stack (and a shadow stack
if needed for each of them (and for each cpu for local events).

When completing the event, if we were coming from kernel with interrupts
disabled, simply return there. If coming from userspace or kernel with
interrupts enabled, simulate an interrupt exception by setting IE_SIE in
CSR_IP to allow delivery of signals to user task. For instance this can
happen, when a RAS event has been generated by a user application and a
SIGBUS has been sent to a task.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://patch.msgid.link/20251105082639.342973-3-cleger@rivosinc.com
[pjw@kernel.org: cleaned up patch description and whitespace]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95

--------------

Add driver level interface to use RISC-V SSE arch support. This interface
allows registering SSE handlers, and receive them. This will be used by
PMU and GHES driver.

Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251105082639.342973-4-cleger@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95

----------------

In order to use SSE within PMU drivers, register a SSE handler for the
local PMU event. Reuse the existing overflow IRQ handler and pass
appropriate pt_regs. Add a config option RISCV_PMU_SSE to select event
delivery via SSE events.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://patch.msgid.link/20251105082639.342973-5-cleger@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95

----------------

This module, once loaded, will execute a series of tests using the SSE
framework. The provided script will check for any error reported by the
test module.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://patch.msgid.link/20251105082639.342973-6-cleger@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-2-hchauhan@ventanamicro.com/

----------------

bert and einj drivers use ioremap_cache for mapping entries
but ioremap_cache is not defined for RISC-V.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-3-hchauhan@ventanamicro.com/

----------------

ghes_map function uses arch_apei_get_mem_attribute to get the
protection bits for a given physical address. These protection
bits are then used to map the physical address.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-4-hchauhan@ventanamicro.com/

----------------

Introduce a new HEST notification type for RISC-V SSE events.
The GHES entry's notification structure contains the notification
to be used for a given error source. For error sources delivering
events over SSE, it should contain the new SSE notification type.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-5-hchauhan@ventanamicro.com/

----------------

GHES error handling requires fixmap entries for IRQ notifications.
Add fixmap indices for IRQ, SSE Low and High priority notifications.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-6-hchauhan@ventanamicro.com/

----------------

Compile ghes_in_nmi_spool_from_list only when NMI and SEA
is enabled. Otherwise compilation fails with "defined but
not used" error.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-7-hchauhan@ventanamicro.com/

----------------

Add functions to register the ghes entries which have SSE as
notification type. The vector inside the ghes is the SSE event
ID that should be registered.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-8-hchauhan@ventanamicro.com/

----------------

- Add RISCV in processor type
- Add RISCV32/64 in ISA

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-9-hchauhan@ventanamicro.com/

----------------

- Functions to register a ghes entry with SSE
- Add Handlers for low/high priority events
- Call ghes common handler to handle an error event

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-10-hchauhan@ventanamicro.com/

----------------

APEI SSE handlers can be enabled/disabled with this config
option. When enabled, the SSE registration is done for GHES
entries having notification type set to SSE. When disabled,
registration function return not supported error.

Enable the CONFIG_ACPI_APEI and CONFIG_ACPI_APEI_GHES options in defconfig.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
community inclusion
category: feature
bugzilla: RVCK-Project#95
Link: https://lore.kernel.org/all/20250227123628.2931490-11-hchauhan@ventanamicro.com/

----------------

Enable the APEI option so that APEI GHES options are visible.
Enable SAFE_CMPXCHG option required for GHES error handling.

Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Lu Peng <lu.peng3@zte.com.cn>
Signed-off-by: liuqingtao <liu.qingtao2@zte.com.cn>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 26, 2025


开始测试 log: https://github.com/RVCK-Project/rvck/actions/runs/19690876919

参数解析结果
args value
repository RVCK-Project/rvck
head ref pull/98/head
base ref rvck-6.6
LAVA repo RVCK-Project/lavaci
LAVA Template lava-job-template/qemu/qemu-ltp.yaml
Testcase path lava-testcases/common-test/ltp/ltp.yaml
need run job kunit-test,kernel-build,check-patch,lava-trigger

测试完成

详细结果:

RVCK result

check result
kunit-test success
kernel-build success
lava-trigger success
check-patch success

Kunit Test Result

[02:54:47] Testing complete. Ran 455 tests: passed: 443, skipped: 12

Kernel Build Result

Kernel build succeeded: RVCK-Project/rvck/98/

fe70f7aa427750dc2efc46238220bcbb /srv/guix_result/69d4d8009236462cec35c1ec789a51f64877f21a/Image
5dcc18e2ea6c53da5119e897496e4250 /root/initramfs.img

LAVA Check

args:

result:

Lava check done! lava log: https://lava.oerv.ac.cn/scheduler/job/935

lava result count: [fail]: 174, [pass]: 1434, [skip]: 291

Check Patch Result

Total Errors 0
Total Warnings 17

@GooTal
Copy link
Copy Markdown
Contributor Author

GooTal commented Nov 26, 2025

该pr中的部分补丁在11月5日时已经进入了maintianer的开发树,请更新这部分的commit message和代码。 https://lore.kernel.org/all/176355541775.758643.18140349571928540394.git-patchwork-notify@kernel.org/

其它几个未合并进maintainer开发树中的补丁我们还需要评估之后尝试合并。

已更新

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants