Skip to content

4. Kernel

Wonjae Lee edited this page Dec 21, 2023 · 5 revisions

4.1 User Guide

This section explains some technical background, functionality, and testing of SMDK kernel. It covers how to recognize a CXL memory device on OS booting, interpreting information out of BIOS. It also explains how the CXL device becomes memory interfaces, System RAM, Swap and DAX.

At first, the base address and size of the CXL device being attached should be provided by BIOS and/or the device through SRAT, CEDT, and/or DVSEC. In addition, the CXL memory range presented in the EFI memory map must be typed as soft reserved, not as usable. Details are described below.

4.1.1 SRAT table

In order for the CXL device to be detected and function properly, the OS should be able to retrieve base address and size information of CXL device from SRAT (System Resources Affinity Table). Thus, in case a CXL device is not normally detected and operated in your system, you need to check whether SRAT entry contains CXL device information such as affinity, base address, and size.

The first step is to parse the SRAT information. The SRAT table is one of the ACPI (Advanced Configuration and Power Interface) tables. Next, dump the ACPI tables from the system, and then extract the SRAT table from the dumped file.

# /path/to/SMDK/src/test/system/extract_system_info.sh
 
# Install packages
$ sudo apt install acpica-tools
 
# Extract ACPI Tables
$ sudo acpidump -o acpidump.out
 
# Separate Dumped files by tables
$ acpixtract -a acpidump.out
 
# Change raw data's format to human-readable through parser
$ iasl -d srat.dat
 
# Find the result
$ ls srat.dsl
srat.dsl

You can now check the details through the srat.dsl file. The srat.dsl file lists information such as Processor Local Affinity, Memory Affinity, which are the subtable type of SRAT table. In a system where the CXL device is normally initialized, the CXL memory range should be included as Memory Affinity as follows. In the example below, the Base Address of the CXL memory region is 0x2380000000, and the Address Length is 0x2000000000, that is, 128GB. In addition, the Proximity Domain of the CXL memory area is identified as 1. This value is used by OS to assign the NUMA node ID during kernel booting.

[78C0h 30912   1]                Subtable Type : 01 [Memory Affinity]
[78C1h 30913   1]                       Length : 28
 
[78C2h 30914   4]             Proximity Domain : 00000001
[78C6h 30918   2]                    Reserved1 : 0000
[78C8h 30920   8]                 Base Address : 0000002380000000
[78D0h 30928   8]               Address Length : 0000002000000000
[78D8h 30936   4]                    Reserved2 : 00000000
[78DCh 30940   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
                               Hot Pluggable : 0
                                Non-Volatile : 0
[78E0h 30944   8]                    Reserved3 : 0000000000000000

If there are multiple CXL memory devices, there would be multiple Memory Affinities in the SRAT table, and different values of the proximity domain will be assigned. If the CXL memory range is included as Memory Affinity, the SRAT Table is parsed and CXL memory is added to NUMA node during kernel booting as follows. You can check the following log using the $ dmesg command. In the example below, the CXL memory area with Proximity Domain (PXM) 1 is registered as NUMA Node 1.

$ dmesg
...
[    0.012865] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
[    0.012868] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x107fffffff]
[    0.012877] ACPI: SRAT: Node 1 PXM 1 [mem 0x2380000000-0x437fffffff]
...

If you cannot extract srat.dat file, it means your BIOS has not published the SRAT table to your OS. So the BIOS option to support SRAT table needs to be enabled. On the other hand, even though the srat.dat file is extracted, if there is no Memory Affinity for the CXL memory in srat.dsl file, there may be a need to update the BIOS to add the information to the SRAT table.

4.1.2 CEDT/DVSEC

The other means that SMDK kernel uses to register CXL device is CEDT(CXL Early Discovery Table) and/or DVSEC(Designated Vendor-Specific Extended Capability). DVSEC is a structure defined in the CXL specification and includes a set of information about the capabilities of the CXL device that the vendor supports. In particular, the PCIe DVSEC for CXL device(DVSEC ID=0) contains the base address and size of the CXL device. CEDT enables the OS to locate CXL Host Bridges and location of Host Bridge registers during the boot process. Both CEDT and DVSEC contain the base address and size information. SMDK registers CXL devices as system memory using one of the 3 ways, i.e., SRAT, CEDT, and DVSEC.

4.1.3 EFI Memory Map

It is necessary to verify that CXL memory range is registered as soft reserved in the EFI memory map. The EFI memory map can be found in the kernel boot log. Please see the example below. BIOS-e820 prefix indicates e820 memory map information received from BIOS, and it displays the memory range, memory attribute of each range.

$ dmesg
...
[    0.000000] BIOS-e820: [mem 0x0000002380000000-0x000000437fffffff] soft reserved
...

If BIOS did not set specific-purpose memory attribute (EFI_MEMORY_SP) for the range, this area would be recognized as a usable area. To recognize this as a soft reserved area, you can set EFI_MEMORY_SP attribute by adding efi_fake_mem to the kernel command line. (e.g., efi_fake_mem=<size>@<start address>:<memory attribute>) This kernel command is used to set the memory attribute for a specific memory range. During system booting, you can add kernel commands by pressing 'e' on the kernel selection grub screen. Please refer to the Installation Guide for an example of the boot screen.

Below is an example of setting efi_fake_mem that should be added to kernel commands when the CXL memory region is recognized as usable in the BIOS memory map. In the example below, the base address is 0x2380000000, the size of the CXL memory area is 128GB, and the memory attribute to be added is 0x40000(=EFI_MEMORY_SP).

efi_fake_mem=128G@0x2380000000:0x40000

After adding the efi_fake_mem command and rebooting your system, check the e820 memory map for the CXL memory region in the booting log again. If CXL memory region is recognized as soft reserved, the CXL/kmem extension driver of SMDK will registers it as a movable memory node.

4.1.4 Memory Partition

Once the SMDK kernel is booted, CXL memory channel(s) in the system is registered as movable memory nodes by default. Later, a system administrator can change the grouping policies through the CXL-CLI or sysfs interface with root permission.

4.1.4.1 Memory Partition Policies

SMDK supports two grouping policies: noop and node. You can change SMDK memory partition with CXL-CLI. Please refer to CXL-CLI Guide section for more details.

# ./cxl create-region -V -G node ("noop" or "node")

By default, the grouping policy of SMDK Memory Partition is noop (represented a CXL memory as an independent nodes). Please see the table below for more details of each policy.

Value Desc. Example: 3ch of CXL devices @Socket 0
node Node Partition:
 Represent CXL memories as independent nodes.
 CXL Memories : Node = N : 1
node 0 : CPU #1 + DDR Memory #1
node 1 : CXL #1, #2, #3
noop Single Node:
 Represent a CXL Memory as an independent node.
 CXL Memories : Node = 1 : 1
node 0 : CPU #1 + DDR Memory #1
node 1 : CXL #1
node 2 : CXL #2
node 3 : CXL #3

4.1.4.2 Memory Partition Examples

online/offline

  • Offline: CXL memory is not recognized as a system RAM but as a soft reserved area.
Node 0, zone       DMA      1      0      0      1      2      1      1      0      1      1      3
Node 0, zone     DMA32      3      8      3      4      4      5      3      5      4      4    436
Node 0, zone    Normal   1897    128     43    108    119     76     37     13      5     1 0 45128
  • Online: CXL is mapped to an independent memory node.
Node 0, zone       DMA      1      0      0      1      2      1      1      0      1      1      3
Node 0, zone     DMA32      3      8      3      4      4      5      3      5      4      4    436
Node 0, zone    Normal   2238    196     75     40     17     31     29     10      5      2  43595
Node 1, zone   Movable      0      0      0      0      0      0      0      0      0      0  32768

SMDK Memory Partition

  • noop: All CXL memory devices are added as different nodes from normal DDR memory. Every single CXL device becomes a separate node.
Node 0, zone       DMA      1      0      0      1      2      1      1      0      1      1      3
Node 0, zone     DMA32      3      8      3      4      4      5      3      5      4      4    436
Node 0, zone    Normal    224    705    660    443    166    103     50     31     16     15  44047
Node 1, zone   Movable      0      0      0      0      0      0      0      0      0      0  32768
Node 2, zone   Movable      0      0      0      0      0      0      0      0      0      0  32768
Node 3, zone   Movable      0      0      0      0      0      0      0      0      0      0  32768
  • node: All CXL memory devices are grouped by the installed socket, and devices of each socket are added as separate nodes.
Node 0, zone       DMA      1      0      0      1      2      1      1      0      1      1      3
Node 0, zone     DMA32      3      8      3      4      4      5      3      5      4      4    436
Node 0, zone    Normal  33615    309     94     63     38      3      4      1      2      2  43597
Node 1, zone   Movable      0      0      0      0      0      0      0      0      0      0  98304

If the memory of the CXL device is in use, the change operation of online/offline and memory partition is canceled. You can check the result of the memory partition change using the command below.

# ./cxl list -V <--list_node | --list_dev>

4.1.4.3 N-way Grouping

In addition to noop and node options of CXL-CLI, you can freely configure CXL node partitions through create-region -V(--soft_interleaving) commands. Assuming that noop partition policy has been applied as shown in the example right above, you can combine Node 2 and Node 3 into one through the command below.

# ./cxl create-region -V --target_node 2 --ways 1 cxl2
# cat /proc/buddyinfo
Node 0, zone       DMA      1      0      0      1      2      1      1      0      1      1      3
Node 0, zone     DMA32      3      8      3      4      4      5      3      5      4      4    436
Node 0, zone    Normal   2907   1686    828   4416   2038    913    407    192    106     58  13430
Node 1, zone   Movable      0      0      0      0      0      0      0      0      0      0  32768
Node 2, zone   Movable      0      0      0      0      0      0      0      0      0      0  65536
# ./cxl list -V --list_node
[
   {
    "node_id" : -1,
    "devices" : [ ]
   }
   {
    "node_id" : 0,
    "devices" : [ ]
   }
   {
    "node_id" : 1,
    "devices" : [ "cxl0"  ]
   }
   {
    "node_id" : 2,
    "devices" : [ "cxl1" "cxl2"  ]    # Node 2 consists of cxl dev 1 and 2.
   }
]

Please check CXL-CLI Guide for more details.

4.1.4.4 Device-dax

To use CXL RAM regions as System RAM, the CXL RAM regions should be mapped as the MOVABLE node. SMDK provides the extended KMEM DAX driver that CXL memory regions marked "Soft Reserved" by platform firmware add core kernel memory service as the movable node. So, The SMDK kernel basically recognizes the CXL memory range as the movable node when booting.

To check if the DAX device has successfully bound the CXL memory range, check /proc/iomem. Below is an example of a system equipped with a single channel of CXL memory expander device.

$ sudo cat /proc/iomem
880000000-287fffffff : Soft Reserved
  880000000-287fffffff : dax0.0
    880000000-287fffffff : System RAM (kmem)

If you want to bind it to DEVICE DAX driver, you need to make CXL device offline using the sysfs interface and reconfigure dax device to devdax mode using daxctl cli tool.

$ echo -1 | sudo tee /sys/kernel/cxl/devices/cxl0/node_id
$ cd /path/to/smdk/lib/cxl_cli/build/daxctl/
$ sudo ./daxctl reconfigure-device --mode=devdax dax0.0
$ sudo cat /proc/iomem
880000000-287fffffff : Soft Reserved
  880000000-287fffffff : dax0.0

Now, this DAX device can be used through fio benchmark, etc. For more information, refer to Test section below.

If you want to unbind the CXL memory range from the DEVICE DAX device and register it as the MOVABLE node again, execute the command below.

$ cd /path/to/smdk/lib/cxl_cli/build/daxctl/
$ sudo ./daxctl reconfigure-device --mode=system-ram dax0.0

4.1.5 CXL Swap

CXL swap is another memory interface for userspace applications. It allows a CXL Device to function as a swap interface, and unlike zswap, it avoids (de)compression overhead and latency fluctuations by wasting host cpu while swap-out(in) pages. When swapping takes place, CXL swap works in the middle of Linux swap procedure, prior to cast disk I/Os and then retrieve/locate the swap pages in a ZONE MOVABLE memory pool that expands and shrinks dynamically.

4.1.5.1 How to Use

On executable perspective, CXL Swap is a built-in kernel module, so you don't need to insert a separate module; just turn on CONFIG_CXLSWAP when $ make menuconfig. After system booting, you can enable CXL Swap feature like below.

 echo 1 > /sys/module/cxlswap/parameters/enabled

Other parameters can be found in the following Configurations.

Please note that it is recommended to use zSwap and CXL Swap exclusively because the two modules targets different contribution. (trade-off: CPU and memory density)

4.1.5.1.1 Configurations

Note: The following configurations are located in /sys/module/cxlswap/parameters/ and can be modified by writing values in the corresponding files or using CXL-CLI. Root privileges are required to change the settings.

Config. Desc. Default Note
accept_threshold_percent The threshold at which cxlswap would start accepting pages again after it became full. 90
cxlpool The memory pool for cxlswap that grows on demand and shrinks as pages are freed. cxlbud
enabled Enable or disable cxlswap at runtime. N
flush (experimental) Flush all pages in cxlpool. CXL Swap should be disabled before execute flush. N/A
max_pool_percent The maximum percentage of memory that the cxlpool can occupy. 20
same_filled_pages_enabled Identify same-value filled pages (i.e. contents of the page have same value or repetitive pattern) during store operation, and if true, the length of the page is set to zero and the pattern or same-filled value is stored. Y
non_same_filled_pages_enabled If the attribute is disabled, the handling of non-same-value pages by cxlswap is disabled. Y

4.1.5.1.2 Examples

# echo 1 | sudo tee /sys/module/cxlswap/parameters/enabled
# cat /sys/module/cxlswap/parameters/enabled
Y
# echo 0 | sudo tee /sys/module/cxlswap/parameters/enabled
# cat /sys/module/cxlswap/parameters/enabled
N
# echo 1 | sudo tee /sys/module/cxlswap/parameters/flush

4.1.6 CXL Cache

CXL Cache is one of the memory interfaces provided by the SMDK kernel. It allows CXL devices to be utilized as a 2nd-level page cache in the OS. CXL Cache puts page cache pages which selected as victim pages during the Page Frame Reclaim Algorithm (PFRA). The page cache pages stored in ZONE_MOVABLE memory pool are returned to the page cache when file read occurs, reducing the number of disk read.

4.1.6.1 How to Use

CXL Cache is built-in kernel module, so you need to turn on CONFIG_CXLCACHE when you build SMDK kernel. After booting, you can enable CXL Cache through the command below.

echo 1 > /sys/module/cxlcache/parameters/enabled

Other parameters can be found in the following Configurations.

4.1.6.1.1 Configurations

Note: The following configurations are located in /sys/module/cxlcache/parameters/ and can be modified by writing values in the following files or using CXL-CLI. Root privileges are required to change the settings.

Config. Desc. Default Note
accept_threshold_percent The threshold at which cxlcache would start accepting pages again after it became full. 90
cxlpool The memory pool for cxlcache that grows on demand and shrinks as pages are freed. cxlbud
enabled Enable or disable cxlcache at runtime. N
flush (experimental) Flush all pages in cxlpool. CXL Cache should be disabled before execute flush. N/A
max_pool_percent The maximum percentage of memory that the cxlpool can occupy. 20

4.1.6.1.2 Examples

# echo 1 | sudo tee /sys/module/cxlcache/parameters/enabled
# cat /sys/module/cxlcache/parameters/enabled
Y
# echo 0 | sudo tee /sys/module/cxlcache/parameters/enabled
# cat /sys/module/cxlcache/parameters/enabled
N
# echo 1 | sudo tee /sys/module/cxlcache/parameters/flush

4.1.6.1.3 Performance Metrics

If turn on CONFIG_DEBUG_FS when $ make menuconfig, monitoring of CXL Cache is done via debugfs in the /sys/kernel/debug/cxlcache directory. The effectiveness of cxlcache can be measured (across all filesystems) with:

Metrics. Desc. Note
evicted_pages The number of pages evicted since CXL Cache is full.
pool_limit_hit The number of times CXL Cache reached its maximum size set by module parameter.
pool_total_size The size of pages currently stored in CXL Cache.
put_pages The number of pages currently stored in CXL Cache.
reject_alloc_fail The number of page put failures due to allocation failure from CXL pool.
reject_kmemcache_fail The number of page put failures due to Slab memory allocation failure.
reject_reclaim_fail The number of CXL Cache eviction failures using work queue.


4.2 Test

Below is a set of test cases and examples, to verify operations of SMDK kernel. You can build the binaries required for the tests by running the make command at /path/to/SMDK/src/test once.

4.2.1 BIOS (SRAT/e820)

4.2.1.1 UEFI information

This test case checks whether UEFI BIOS properly provides the CXL device related information to the kernel.

The test case checks the following:

  • SRAT table contains the Memory Affinity information of the CXL memory.
  • CXL memory range is included in the EFI memory map on dmesg, and it is recognized as soft reserved.
  • The CXL memory range of /proc/iomem is recognized as system RAM.
  • The CXL memory is recognized as Movable node in /proc/buddyinfo.

Command lines

$ cd /path/to/SMDK/src/test/system
$ ./extract_system_info.sh <CXL memory start address>
(Example) $ ./extract_system_info.sh 2080000000

Result

1. SRAT table:
 
 
[7830h 30768   1]                Subtable Type : 01 [Memory Affinity]
[7831h 30769   1]                       Length : 28

[7832h 30770   4]             Proximity Domain : 00000002
[7836h 30774   2]                    Reserved1 : 0000
[7838h 30776   8]                 Base Address : 0000002080000000
[7840h 30784   8]               Address Length : 0000004000000000
[7848h 30792   4]                    Reserved2 : 00000000
[784Ch 30796   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
                               Hot Pluggable : 0
                                Non-Volatile : 0
[7850h 30800   8]                    Reserved3 : 0000000000000000
  
2. e820 memory map:
 
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007356bfff] usable
[    0.000000] BIOS-e820: [mem 0x000000007356c000-0x0000000073f16fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000073f17000-0x00000000772b2fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000772b3000-0x00000000777fefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000777ff000-0x00000000777fffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000077800000-0x000000008fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fe010000-0x00000000fe010fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000207fffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000002080000000-0x000000607fffffff] soft reserved
 
3. /proc/iomem:
 
2080000000-607fffffff : System RAM (kmem)
 
4. /proc/buddyinfo:

Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      1      2
Node 0, zone    DMA32      8      7      5      5      6      4      5      5      6      3    436
Node 0, zone   Normal    324     74    239    501    369    136     15      3     16      8  14318
Node 1, zone   Normal    131    867   2164   1348    976    500    165     65     17      1  15904
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  65536 

4.2.2 Memory Partition

4.2.2.1 subzone

4.2.2.1.1 run_4GB_malloc_test.sh

SMDK kernel includes Subzone architecture for efficient memory management, and this test script is for verifying the operation of subzone function. For a detailed description of subzone architecture, refer to the Memory Partition of SMDK Architecture.

In this test, 1 thread will perform a memory allocation request of size 1KiB/4KiB/128KiB/4MiB,totaling 4GiB per each thread. Then 10 threads allocate memory in the same way.

Command lines

$ cd /path/to/SMDK/src/test/subzone
$ ./run_4GB_malloc_test.sh

Result

Single Thread Testcases
 
 
TC test_malloc_1K_bytes_4M_times starts

cxl: set node, size: 1.0K bytes, iteration: 4.0M times, cxl region: cmd_create_region: created 1 region
elapsed time: ......
cxl: set noop, size: 1.0K bytes, iteration: 4.0M times, cxl region: cmd_create_region: created 1 region
elapsed time: ......

......

TC test_malloc_1K_bytes_4M_times done
 
TC test_malloc_4K_bytes_1M_times starts
 
......
 
TC test_malloc_4GB_10_threads_4M_unit done

4.2.2.1.2 run_random_malloc_test.sh

This script is similar to the above (run_4GB_malloc_test.sh), but the requested memory size is random, i.e., allocation request size changes per every request. Total amount of memory requested is 4GiB.

Command lines

$ cd /path/to/SMDK/src/test/subzone
$ ./run_random_malloc_test.sh

Result

Allocation size: 4294990785

4.2.2.2 N-way partition

4.2.2.2.1 run_functional_test.sh

This is a test case to check whether the online/offline change and node id change of the CXL device work normally.

Command lines

$ cd /path/to/SMDK/src/test/driver
$ ./run_functional_test.sh

Result

[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal   7127   8050   7715   7933   4266   1563    246     16    393     99    453
Node 1, zone  Movable     19     15     17     16     19     14     14     11     12     12  32755
Node 2, zone  Movable     13     17     12      8     13     12     11     11     10     10  32757
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online

[OFFLINE TEST]
[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal     39   4620   7788   7729   4254   1589    254     18    393     99    453
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: -1
socket_id: 0
state: offline
PASS

[ONLINE TEST]
[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal    344   3445   6659   7927   4265   1565    247     16    394     99    452
Node 1, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online
PASS

[NODE CHANGE TEST]
[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal     58   3564   6563   7919   4280   1565    248     17    394     99    452
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  65536
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 2
socket_id: 0
state: online
PASS

[KOBJECT RELEASE TEST]
kobject is released
PASS

[SYMLINK CHECK]
memdev: /sys/devices/pci0000:3d/0000:3d:02.0/0000:3e:00.0/mem0
PASS

4.2.2.2.2 run_rollback_test.sh

This is a test case to check whether the state of the device remains unchanged when attempting to change to offline/online in the case of a CXL device that is in use or bound to DAX.

Command lines

$ cd /path/to/SMDK/src/test/driver
$ ./run_rollback_test.sh

Result

[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal    799   3530   6456   7984   4348   1587    253     21    402     99    447
Node 1, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online

[online rollback test]
addr[0x7fea88c1b000]
[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal   1299   4184   6384   7893   4382   1612    280     47    394     98    442
Node 1, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online
PASS

[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal   2431   3929   6618   8005   4369   1601    268     29    403    113    950
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: -1
socket_id: 0
state: offline

[offline rollback test]
./run_rollback_test.sh: line 98: echo: write error: Invalid argument
[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      7      5      6      5      4      7      5      4      6      5    437
Node 0, zone   Normal   1865   3845   6406   7796   4254   1558    253     22    394     99    838
Node 2, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: -1
socket_id: 0
state: offline
./run_rollback_test.sh: line 108: echo: write error: Invalid argument
PASS

4.2.3 CXL Swap

This test checks that CXL Swap works well on various swap out/in scenario and verifies the functionality of CXL Swap flush.

The detailed description and prerequisites for each test are written in each script file. Please read the comments in the script first, before running test.

4.2.3.1 run_cxlswap_storeload_test.sh

Test basic swap out/in data to/from CXL Swap. The data before swap out and the data after swap in must be the same.

Command lines

$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_storeload_test.sh

Result

Store Load Test Start
=======Test Info======
Process ID : 2034 / CXL Swap Enabled : Y
Test Name : store_load
Total Memory Size 512.00M / Memory Limit to 460.80M
======= RESULT =======
CXL Swap Stored Pages Before Swap : 81380
CXL Swap Stored Pages After Swap : 104871
====== PASS ======
...

4.2.3.2 run_cxlswap_multithread_test.sh

Test swap out/in data to/from CXL Swap by multi-threaded. Regardless of the thread, the data before swap out and the data after swap in must be the same.

Command lines

$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_multithread_test.sh

Result

Multi Thread Test Start
=======Test Info======
Process ID : 2072 / CXL Swap Enabled : Y
Test Name : multi_thread
Total Memory Size 1.00G / Memory Limit to 921.60M
======= RESULT =======
Elapsed Time 0.688225 using 10 threads
CXL Swap Stored Pages Before Swap : 81384
CXL Swap Stored Pages After Swap : 104861
====== PASS ======
...

4.2.3.3 run_cxlswap_sharedmemory_test.sh

Test swap out/in shared data to/from CXL Swap. The data before swap out and the data after swap in must be the same even using shared memory.

Command lines

$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_sharedmemory_test.sh

Result

Shared Memory Test Start
=======Test Info======
Process ID : 1980 / CXL Swap Enabled : Y
Test Name : shared_memory
Total Memory Size 512.00M / Memory Limit to 460.80M
Process 1980 Initialize Data [Shmid 0]...
Process 1992 Check Initialized Data [Shmid 0]...
Process 1992 Check Initialized Data [Shmid 0] Pass
Process 1992 Modify Data [Shmid 0]...
Process 1980 Check Modified Data [Shmid 0]...
Process 1980 Check Modified Data [Shmid 0] Pass
======= RESULT =======
CXL Swap Stored Pages Before Swap : 1
CXL Swap Stored Pages After Swap : 13801
====== PASS ======
...

4.2.3.4 run_cxlswap_flush_test.sh

Test CXL Swap Flush functionality. Note that even after Flush, there can be few remain pages in CXL Swap. See the description in this script.

Command lines

$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_flush_test.sh 

Result

Flush Test Start
Before Flush : 81373
After Flush : 5
Flush Test Finish

4.2.4 CXL Cache

This test checks that CXL Cache works well on various page put/get scenario and verifies the functionality of CXL Cache flush.

The detailed description and prerequisites for each test are written in each script file. Please read comments in the script first, before running the tests.

4.2.4.1 run_cxlcache_put_cxl_page_test.sh

Test page put/get to/from CXL Cache when using CXL page as page cache. The data before put and the data after get must be the same.

Command lines

$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_put_cxl_page_test.sh

Result

Put CXL Page Test Start
======Test Info======
Process ID : 10076 / CXL Cache Enabled : Y
Test Name : put_cxl_page
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 256.24M
CXL Cache Succ Get Pages After Caching : 256.41M
====== PASS ======
======Test Info======
Process ID : 10088 / CXL Cache Enabled : Y
Test Name : put_cxl_page
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.24M
CXL Cache Succ Get Pages After Caching : 512.41M
====== PASS ======
Put CXL Test Succ.

4.2.4.2 run_cxlcache_put_get_correctness_test.sh

Test basic page put/get to/from CXL Cache. The data before put and the data after get must be the same.

Command lines

$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_put_get_correctness_test.sh

Result

Put Get Correctness Test Start
======Test Info======
Process ID : 59846 / CXL Cache Enabled : Y
Test Name : put_get_correctness
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 256.00M
CXL Cache Succ Get Pages After Caching : 256.00M
====== PASS ======
======Test Info======
Process ID : 59857 / CXL Cache Enabled : Y
Test Name : put_get_correctness
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.00M
CXL Cache Succ Get Pages After Caching : 512.00M
====== PASS ======
======Test Info======
Process ID : 59892 / CXL Cache Enabled : Y
Test Name : put_get_correctness
Test File Size 1.00G
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.00G
CXL Cache Succ Get Pages After Caching : 1.00G
====== PASS ======
Put Get Correctness Test Succ.

4.2.4.3 run_cxlcache_modify_put_get_correctness_test.sh

Test page put/get to/from CXL Cache when a file that already exists in CXL Cache is modified. The data before put and the data after get must be the same.

Command lines

$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_modify_put_get_correctness_test.sh

Result

Modify Put Get Correctness Test Start
======Test Info======
Process ID : 59963 / CXL Cache Enabled : Y
Test Name : modify_put_get_correctness
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.01M
CXL Cache Succ Get Pages After Caching : 512.01M
====== PASS ======
======Test Info======
Process ID : 59985 / CXL Cache Enabled : Y
Test Name : modify_put_get_correctness
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.00G
CXL Cache Succ Get Pages After Caching : 1.00G
====== PASS ======
======Test Info======
Process ID : 60028 / CXL Cache Enabled : Y
Test Name : modify_put_get_correctness
Test File Size 1.00G
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 2.00G
CXL Cache Succ Get Pages After Caching : 2.00G
====== PASS ======
Modify Put Get Correctness Test Succ.

4.2.4.4 run_cxlcache_multithread_test.sh

This TC is geared to test data integrity while a bunch of put/get operations happen simultaneously out of threads.

Command lines

$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_multithread_test.sh

Result

Multi Thread Test Start
======Test Info======
Process ID : 60104 / CXL Cache Enabled : Y
Test Name : multi_thread
Test File Size 128.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.75G
CXL Cache Succ Get Pages After Caching : 1.17G
====== PASS ======
======Test Info======
Process ID : 60172 / CXL Cache Enabled : Y
Test Name : multi_thread
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 4.74G
CXL Cache Succ Get Pages After Caching : 2.47G
====== PASS ======
Multi Thread Test Succ.

4.2.4.5 run_cxlcache_multiprocess_test.sh

Test shared file page put/get to/from CXL Cache by multi-process. Regardless of shared file situation, the data before put and the data after get must be the same.

Command lines

$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_multiprocess_test.sh

Result

Multi Process Test Start
======Test Info======
Process ID : 60366 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 256.00M
Test File Size Limit is 2.00G
======Test Info======
Process ID : 60366 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.01M
CXL Cache Succ Get Pages After Caching : 512.01M
====== PASS ======
======Test Info======
Process ID : 60379 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 512.00M
Test File Size Limit is 2.00G
======Test Info======
Process ID : 60379 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.00G
CXL Cache Succ Get Pages After Caching : 1.00G
====== PASS ======
Multi Process Test Succ.

4.2.4.6. run_cxlcache_flush_test.sh

Test CXL Cache Flush functionality. Note that this test is fail if there is no remaining data in the CXL Cache.

Command lines

$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_flush_test.sh

Result

Flush Test Start
Before Flush : 82450
After Flush : 0
Flush Test Finish

4.2.5 DAX

This test checks that registered CXL devices works well as DAX devices. This script releases the CXL device memory area from the memory, binds it to the DAX device, and checks if it operates as a DAX device through fio. The number of devices and address of the devices in the script should be modified to run correctly.

Note: In order to run the script below, you need to install fio in your system first. Please refer to fio GitHub for information related to the installation and usage of it.

Commnad lines

$ cd /path/to/SMDK/src/test/dax
$ vi ./run_dax_test.sh

# Change the number of devices and address of devices according to your system,
NUM_DEVICE=3
ADDRESS=("1080000000-307fffffff" "3080000000-507fffffff" "5080000000-707fffffff")

# If you are not sure about it, leave NUM_DEVICE as 0 to detect automatically
NUM_DEVICE=0
ADDRESS=()

# Download fio from https://github.com/axboe/fio.git
# Change FIO_PATH from /path/to to your system's path
FIO_PATH=/path/to/fio/

# After modifying the script
$ ./run_dax_test.sh

Result

IOMEM
480000000-f43fffffff : CXL Window 0
  480000000-247fffffff : region0
    480000000-247fffffff : Soft Reserved
      480000000-247fffffff : dax0.0
        480000000-247fffffff : System RAM (kmem)

[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      5      3      6      6      4      7      5      5      7      4    437
Node 0, zone   Normal     38    250    183    105     71     28     15     31     39     12   2218
Node 1, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768

-----------------------------------------------------------------
[
  {
    "chardev":"dax0.0",
    "size":137438953472,
    "target_node":1,
    "align":2097152,
    "mode":"devdax"
  }
]
reconfigured 1 device
IOMEM
480000000-f43fffffff : CXL Window 0
  480000000-247fffffff : region0
    480000000-247fffffff : Soft Reserved
      480000000-247fffffff : dax0.0

[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      5      3      6      6      4      7      5      5      7      4    437
Node 0, zone   Normal     24     20     13     19     17     12      1      2     20     10   2608

-----------------------------------------------------------------
FIO TEST
dev-dax-write: (g=0): rw=randwrite, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=dev-dax, iodepth=1
...
dev-dax-read: (g=1): rw=randread, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=dev-dax, iodepth=1

[
  {
    "chardev":"dax0.0",
    "size":137438953472,
    "target_node":1,
    "align":2097152,
    "mode":"system-ram",
    "online_memblocks":1024,
    "total_memblocks":1024,
    "movable":true
  }
]
reconfigured 1 device
IOMEM
480000000-f43fffffff : CXL Window 0
  480000000-247fffffff : region0
    480000000-247fffffff : Soft Reserved
      480000000-247fffffff : dax0.0
        480000000-247fffffff : System RAM (kmem)

[[Buddy Info]]
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      2      2
Node 0, zone    DMA32      5      3      6      6      4      7      5      5      7      4    437
Node 0, zone   Normal    408   1223   1358    792    287    482    278     41     23     10   2211
Node 1, zone  Movable      0      0      0      0      0      0      0      0      0      0  32768

4.2.6 QEMU

You can use QEMU to emulate a CXL system(4.2.6.1) and SMDK functionality(4.2.6.2)

4.2.6.1 Launching QEMU

First, build the QEMU.

$ cd /path/to/SMDK/lib/
$ ./build_lib.sh qemu

After downloading Ubuntu ISO image file from here, update the ISO file path, UBUNTU_ISO, in /path/to/SMDK/lib/qemu/create_gui_image.sh, then run the script.

$ cd /path/to/SMDK/lib/qemu/
$ vi create_gui_image.sh	# Update UBUNTU_ISO file path.
$ ./create_gui_image.sh

When the Ubuntu installation is finished, run the following command to boot to Ubuntu.

$ cd /path/to/SMDK/lib/qemu/
$ ./setup_gui_ssh.sh

After booting, update the APT repository if necessary, and install the required package by $ sudo apt update, $ sudo apt install <packages e.g., openssh-server>, etc.

With SMDK repository cloned from github, build and install SMDK kernel. You can now emulate the SMDK Kernel with the following script.

$ cd /path/to/SMDK/lib/qemu/
$ ./run_cxl_emu_gui.sh			# default setting: 6 cores, 8GB RAM. (KVM hardware acceleration is not enabled)

Note: run_cxl_emu_gui.sh disables KVM(Kernel-based Virtual Machine) to support load/store to CXL emulation memory. If you don't need this feature, you can add the '-enable-kvm' option to speed up your system when running QEMU emulation.

You can connect to the QEMU virtual machine through QEMU monitor(port: 45454) and sshd(port: 2242) with scripts below.

# Connect to QEMU Monitor
$ cd /path/to/SMDK/lib/qemu/
$ ./connect_monitor.sh

# Connect to sshd
$ cd /path/to/SMDK/lib/qemu/
$ ./connect_ssh.sh

4.2.6.2 Using CXL memory and SMDK on QEMU

QEMU supports CXL type3 volatile-memory emulations since v8.1.0. SMDK supports QEMU emulation since v1.5.1, which allows using userspace plugins(library, cli, BM, testcases) and OS interfaces(swap and cache).

Step 1: Configure & Build the kernel

The building procedure of the SMDK kernel for CXL emulation is identical with here. However, the region creation will fail because QEMU does not support invalidation for memregion. To avoid the failure, 'CXL_REGION_INVALIDATION_TEST' must be enabled(CONFIG_CXL_REGION_INVALIDATION_TEST=y) while kernel configuration.

If you booted the SMDK kernel on QEMU, now you can use the test script below to check if CXL emulation is working properly. If the test succeeds, CXL memory will be registered as System RAM and available for use.

$ cd /path/to/SMDK/src/test/qemu/
$ sudo ./run_qemu_test.sh

Alternatively, you can manually perform below procedures step by step.

Step 2: Create CXL region

$ cat /proc/buddyinfo
Node 0, zone      DMA      0      0      0      0      0      0      0      0      0      1      3
Node 0, zone    DMA32      7      5      5      7      6      5      8      4      5      3    475
Node 0, zone   Normal    218    135     85     37     11     32      3      2      1      2   1152

$ sudo cat /proc/iomem
690000000-78fffffff : CXL Window 0

$ cd /path/to/SMDK/lib/
$ ./build_lib.sh cxl_cli
$ sudo ./cxl_cli/build/cxl/cxl create-region -d decoder0.0 -s 1073741824 -t ram
{
  "region":"region0",
  "resource":"0x690000000",
  "size":"1024.00 MiB (1073.74 MB)",
  "type":"ram",
  "interleave_ways":1,
  "interleave_granularity":256,
  "decode_state":"commit",
  "mappings":[
    {
      "position":0,
      "memdev":"mem0",
      "decoder":"decoder2.0"
    }
  ]
}
cxl region: cmd_create_region: created 1 region

$ cat /proc/buddyinfo
Node 0, zone      DMA      0      0      0      0      0      0      0      0      0      1      3
Node 0, zone    DMA32      2      3      2      3      4      1      3      1      2      2    488
Node 0, zone   Normal    370    869    388    112     61    171     45     14      2      0   1095
Node 1, zone  Movable      0      0      0      0      0      0      0      0      0      0    256

$ sudo cat /proc/iomem
690000000-78fffffff : CXL Window 0
  690000000-6cfffffff : region0
    690000000-6cfffffff : dax0.0
      690000000-6cfffffff : System RAM (kmem)

Specifically, when the step 2 is done normally, it is ready to use the SMDK plugins and interfaces. Please refer to the separate urls that explain how to use them - https://github.com/OpenMPDK/SMDK/wiki/5.-Plugin and https://github.com/OpenMPDK/SMDK/wiki/4.-Kernel

Limitations

  1. MLC BW tool and PMU related SW are not working due to the CPU dependency.

Step 3: use CXL memory

$ cd /path/to/SMDK/
$ cd lib/ && ./build_lib.sh numactl && cd -
$ cd src/test/mmap && make && cd -
$ ./lib/numactl-2.0.16/numactl -m 1 ./src/test/mmap/test_mmap_cxl
addr[0x7f0125c07010], one='1' zero='0'
addr[0x7f0125206010], one='1' zero='0'
addr[0x7f0124805010], one='1' zero='0'
addr[0x7f0123e04010], one='1' zero='0'
addr[0x7f0123403010], one='1' zero='0'
...

$ cat /proc/buddyinfo
Node 0, zone      DMA      0      0      0      1      0      0      0      0      2      2      2
Node 0, zone    DMA32      9      8     11      8      5     11     11      9      7      6    472
Node 0, zone   Normal     25    108    132     39    135    193    225    100     39      3   1166
Node 1, zone  Movable      0      0      1      1      0      1      1      0      1      1    240

Step 4: Destroy CXL region

$ cd /path/to/SMDK/lib
$ sudo ./cxl_cli/build/daxctl/daxctl reconfigure-device --mode=devdax dax0.0 -f
[
  {
    "chardev":"dax0.0",
    "size":1073741824,
    "target_node":1,
    "align":2097152,
    "mode":"devdax"
  }
]
reconfigured 1 device

$ sudo ./cxl_cli/build/cxl/cxl destroy-region region0 -f
cxl region: cmd_destroy_region: destroyed 1 region

$ sudo cat /proc/iomem
690000000-6cfffffff : CXL Window 0

$ cat /proc/buddyinfo
Node 0, zone      DMA      0      0      0      1      0      0      0      0      2      2      2
Node 0, zone    DMA32      9      8     10     10      7     10     12      8      7     11    469
Node 0, zone   Normal    231    344    222    134    111     63     12      2      3      2   1119

Note: If you do not change to devdax mode before region is destroyed, the region will be deleted, but the memory area will not be deleted.

removing memory fails, because memory [0x0000000690000000-0x0000000697ffffff] is onlined
kmem dax0.0: mapping0: 0x690000000-0x6cfffffff cannot be hotremoved until the next reboot

Step 5: Recreate CXL region

$ cd /path/to/SMDK/lib/
$ sudo ./cxl_cli/build/cxl/cxl create-region -d decoder0.0 -s 1073741824 -t ram
$ sudo ./cxl_cli/build/daxctl/daxctl reconfigure-device --mode=system-ram dax0.0 -f

Clone this wiki locally