|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-or-later |
| 2 | +
|
| 3 | +================= |
| 4 | +EDAC/RAS features |
| 5 | +================= |
| 6 | + |
| 7 | +Copyright (c) 2024-2025 HiSilicon Limited. |
| 8 | + |
| 9 | +:Author: Shiju Jose <shiju.jose@huawei.com> |
| 10 | +:License: The GNU Free Documentation License, Version 1.2 without |
| 11 | + Invariant Sections, Front-Cover Texts nor Back-Cover Texts. |
| 12 | + (dual licensed under the GPL v2) |
| 13 | + |
| 14 | +- Written for: 6.15 |
| 15 | + |
| 16 | +Introduction |
| 17 | +------------ |
| 18 | + |
| 19 | +EDAC/RAS components plugging and high-level design: |
| 20 | + |
| 21 | +1. Scrub control |
| 22 | + |
| 23 | +2. Error Check Scrub (ECS) control |
| 24 | + |
| 25 | +3. ACPI RAS2 features |
| 26 | + |
| 27 | +4. Post Package Repair (PPR) control |
| 28 | + |
| 29 | +5. Memory Sparing Repair control |
| 30 | + |
| 31 | +High level design is illustrated in the following diagram:: |
| 32 | + |
| 33 | + +-----------------------------------------------+ |
| 34 | + | Userspace - Rasdaemon | |
| 35 | + | +-------------+ | |
| 36 | + | | RAS CXL mem | +---------------+ | |
| 37 | + | |error handler|---->| | | |
| 38 | + | +-------------+ | RAS dynamic | | |
| 39 | + | +-------------+ | scrub, memory | | |
| 40 | + | | RAS memory |---->| repair control| | |
| 41 | + | |error handler| +----|----------+ | |
| 42 | + | +-------------+ | | |
| 43 | + +--------------------------|--------------------+ |
| 44 | + | |
| 45 | + | |
| 46 | + +-------------------------------|------------------------------+ |
| 47 | + | Kernel EDAC extension for | controlling RAS Features | |
| 48 | + |+------------------------------|----------------------------+ | |
| 49 | + || EDAC Core Sysfs EDAC| Bus | | |
| 50 | + || +--------------------------|---------------------------+| | |
| 51 | + || |/sys/bus/edac/devices/<dev>/scrubX/ | | EDAC device || | |
| 52 | + || |/sys/bus/edac/devices/<dev>/ecsX/ |<->| EDAC MC || | |
| 53 | + || |/sys/bus/edac/devices/<dev>/repairX | | EDAC sysfs || | |
| 54 | + || +---------------------------|--------------------------+| | |
| 55 | + || EDAC|Bus | | |
| 56 | + || | | | |
| 57 | + || +----------+ Get feature | Get feature | | |
| 58 | + || | | desc +---------|------+ desc +----------+ | | |
| 59 | + || |EDAC scrub|<-----| EDAC device | | | | | |
| 60 | + || +----------+ | driver- RAS |----->| EDAC mem | | | |
| 61 | + || +----------+ | feature control| | repair | | | |
| 62 | + || | |<-----| | +----------+ | | |
| 63 | + || |EDAC ECS | +---------|------+ | | |
| 64 | + || +----------+ Register RAS|features | | |
| 65 | + || ______________________|_____________ | | |
| 66 | + |+---------|---------------|------------------|--------------+ | |
| 67 | + | +-------|----+ +-------|-------+ +----|----------+ | |
| 68 | + | | | | CXL mem driver| | Client driver | | |
| 69 | + | | ACPI RAS2 | | scrub, ECS, | | memory repair | | |
| 70 | + | | driver | | sparing, PPR | | features | | |
| 71 | + | +-----|------+ +-------|-------+ +------|--------+ | |
| 72 | + | | | | | |
| 73 | + +--------|-----------------|--------------------|--------------+ |
| 74 | + | | | |
| 75 | + +--------|-----------------|--------------------|--------------+ |
| 76 | + | +---|-----------------|--------------------|-------+ | |
| 77 | + | | | | |
| 78 | + | | Platform HW and Firmware | | |
| 79 | + | +--------------------------------------------------+ | |
| 80 | + +--------------------------------------------------------------+ |
| 81 | + |
| 82 | + |
| 83 | +1. EDAC Features components - Create feature-specific descriptors. For |
| 84 | + example: scrub, ECS, memory repair in the above diagram. |
| 85 | + |
| 86 | +2. EDAC device driver for controlling RAS Features - Get feature's attribute |
| 87 | + descriptors from EDAC RAS feature component and registers device's RAS |
| 88 | + features with EDAC bus and expose the features control attributes via |
| 89 | + sysfs. For example, /sys/bus/edac/devices/<dev-name>/<feature>X/ |
| 90 | + |
| 91 | +3. RAS dynamic feature controller - Userspace sample modules in rasdaemon for |
| 92 | + dynamic scrub/repair control to issue scrubbing/repair when excess number |
| 93 | + of corrected memory errors are reported in a short span of time. |
0 commit comments