Skip to content

2. SMDK Architecture

KyungsanKim edited this page Mar 23, 2022 · 30 revisions

2.1 High Level Architecture


The picture below explains high level architecture of SMDK v1.0


image


• The top layer of SMDK is the compatible and optimization path for application integration. The layers consist of libraries, sets of pre-built and reusable codes, application programming interfaces (APIs), and the connections to access these software codes. Using SMDK, system developers not only easily incorporate CXL memory into existing systems without having to modify existing application, but also optimize and re-write application source code to reach up higher level optimization.

• The middle layer of SMDK is memory tiering engine with scalable memory management. It supports diverse memory use cases with performance.

• The bottom layer of SMDK is the memory partition provided by SMDK kernel. The kernel change is geared to serve CXL memory with more functionality and common interface.

Please note that SMDK is being developed to support SDM, Software-Defined Memory, penetrating full-stack software. SMDK allocator part performs composite application interface and memory tiering management, while SMDK kernel part provides primitive memory partition.




2.2 User Interface

The memory map of a Linux process is composed of code, data (bss), stack, heap segment, and etc, and each segment is represented as a VMA(Virtual Memory Area).

Among the segments, the heap segment is commonly used to deal with large memory consumption of the process by dynamically adjusting its size. Linux provides several system calls such as mmap(), brk() for processes to support a live expansion of the heap segment.

On SMDK Kernel, when a process attempts to expand its heap segment by calling mmap(), the process should specify which memory partition is used among NORMAL and EXMEM zones. Given context and size, mmap() internally allocates a proper amount of virtual pages out of free lists in buddy allocator of the NORMAL or EXMEM zone, and it attaches the virtual pages to its heap segment, and then adjusts the segment size. Finally, the call returns an execution result to the caller.

To be specific, SMDK expands mmap() system call by adding MAP_EXMEM flag to selectively allocate and deallocate the free memory resources from NORMAL or EXMEM zones. Please refer to the code snippet next.

/* mmap  from CXL Memory */ unsigned int flag = MAP_EXMEM | MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE | MAP_POPULATE;   
/* mmap from Normal Memory */ unsigned int flag = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE | MAP_POPULATE; 
char *addr = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, -1, 0);

As for user interface, SMDK v1.0 provides SMDK allocator, a heap allocator that aims to expand and tier the heap segment from NORMAL and EXMEM zones selectively. Furthermore, SMDK allocator also supports scalability in performance by assigning a pair of lock-free NORMAL/EXMEM arenas per CPU for the bandwidth aggregation and throttling the condition that multiple CPUs concurrently access the memory array. SMDK allocator is an extension of Jemalloc allocator, and it can be used in either compatible and/or optimization way.

Figure below describes how SMDK allocator works.

image

In addition, SMDK v1.0 provides numactl extension to tier hybrid memories over the entire process context.


Compatible Path

• Background - When a new memory device emerges, it has been common that a novel API and programming model are provided to enable the new device. Given the API and programming model, application developers and open-source community members are forced to modify the source code of their applications to adopt and use the device properly. However, this way harms the reliability of running services and increases management cost. Compatible API of SMDK is geared to resolve the pain point, reflecting the VOC of the industry.

• SMDK enables an application that tiers CXL/DDR memory without SW modification. In other words, neither change API nor programming model. Also, SMDK provides transparent memory management by inheriting and extending Linux process/VMM design.

• Technically, compatible path is not an API, but the way for application developer and service operator to use CXL memory without application change.

• Compatible path provides heap segment tiering using standard heap allocation API and system call such as malloc(), realloc(), and mmap(), overriding that of libc. Especially, the path possesses intelligent allocation engine that enables DDR/CXL usecases of memory tiering such as memory priority, size, and bandwidth.

• Compatible Path also provides not only heap segment tiering, but user process tiering and resource isolation.

image


Optimization Path

• Optimization Path is literally the API to meet higher level optimization by rewriting application.

• Optimization API consists of Allocation API and Metadata API. Allocation API is for memory allocation and deallocation, while Metadata API is to acquire online memory status.

image


Language Binding

SMDK allocator provides language binding interfaces for application use. In SMDK v1.0, the compatible and optimization path support a different language binding coverage.

• Compatible Path - C, C++, Python, Java, Go o Python, Java, and Go framework design and implement internal memory management schemes, respectively. Those are based on primitive heap allocation ways such as malloc() and mmap(). o Above that, the SMDK compatible library supports language binding for the languages.

• Optimization Path - C, C++ o Proprietary API is provided from the SMDK optimization library to implement an application with sophisticated CXL/DDR memory use.

The picture below is drawn to depict SMDK on language binding aspect. image




2.3 Kernel


Memory Zone

• Background - Historically, Linux VMM has the hierarchy of logical memory views from node, zone, buddy, and finally page granularity. Also, Linux VMM has expanded the zone unit for better use of physical memory according to HW and SW needs of the time such as DMA/DMA32, MOVABLE, DEVICE, NORMAL and HIGHMEM zones.

• DMA/DMA32 zone includes specific pages that are addressable from some IO devices that support limited address space.

• MOVABLE zone is geared to have less fragmentation and memory hot plugging.

• HIGHMEM zone was used to access memory space that was located beyond NORMAL zone due to the addressing limitation.

• Most DRAM pages are located in NORMAL zone and used by system and application contexts.

image

SMDK introduces a new memory zone, ZONE_EXMEM, to independently manage DDR and extended memory, like CXL memory. It is designed for two reasons. One is to serve a new memory hardware that has a different latency range due to a different wire protocol, another is to serve memory usecases like composability and pluggability. ZONE_EXMEM is now a SMDK kernel configuration. We are open to have a discussion for the design approach.

Linux Zone Trigger Description Option
ZONE_NORMAL Initial design Normal addressable memory. mandatory
ZONE_DMA HW (I/O) Addressable memory for some peripheral devices supporting limited address resolution. configurable
ZONE_HIGHMEM HW (CPU) A memory area that is only addressable by the kernel through. Mapping portions into its own address space. This is for example used by i386 to allow the kernel to address the memory beyond 900MB. configurable
ZONE_DEVICE HW (I/O) Offers paging and mmap for device driver identified physical address ranges.e.g) pmem, hmm, p2pdma configurable
ZONE_MOVABLE SW Is similar to ZONE_NORMAL, except that it contains movable pages. Main usecases for ZONE_MOVABLE are to make memory offlining/unplug more likely to succeed, and to locally limit unmovable allocations. configurable
ZONE_EXMEM HW (CXL) Extended memory for latency isolation, composability, and pluggability. configurable

Memory Partition

Right now, Industry deals with a single CXL memory channel as a single logical memory-only node in Linux system. Hence, "CXL Memory : Node = 1 : 1", which is "Single Node" status in picture below.

However, we think it has a drawback in that service operators or high level application developers have to aware the existence of CXL memory and manually control it by themselves using 3rd party plugins such as numactl and libnuma. The more CXL memories a system have, the more management efforts are needed.

In addition, we found malfunction case that when multiple applications perform own node traversal simultaneously, it did not work properly. We were able to avoid issue using SMDK zone partition type because the CXL memory is seen as a new zone granularity at the type.

For the reasons, SMDK further suggests an abstraction layer that hides and minimizes the efforts. In the Zone partition of SMDK, it locates CXL memories on the system in a single zone, EXMEM ZONE, separate from Normal Zone. In the Node partition of SMDK, it locates CXL memories on the system in a separate memory node. The two partitions support CXL memory interleaving in a software way inside kernel seamlessly for bandwidth aggregation of connected CXL devices.

image

The figure below further depicts the SMDK zone partition.

Single CXL memory is managed as a sub-zone possessing its own buddy list, then multiple CXL memories are grouped as so-called a super-zone. If a memory request comes from a thread context, the CXL super-zone assigns proper amount of pages in order out of the managed sub-zone array. The operation results in bandwidth aggregation effect on multi-threaded process. In a multi-threaded process, aggregated memory bandwidth can be obtained out of the connected CXL memory array.

Without the super-zone design, memory requests from a process is not balanced, but concentrated on a single CXL device. Because a zone's buddy list is composed in a sequential order of connected CXL devices.

image


Device Driver

CXL Device Driver part of SMDK kernel performs two addition functions.

On system booting, the driver creates and maintains a "CXL:Partition map" based on SRAT and e820 type provided by BIOS. The map maintains the relations among CXL memory, base address, size, and node id.

Sysfs interfaces, cxl_mem_mode and cxl_group_policy, are provided for users to selectively determine the memory modes and policies of CXL memory partitions.

image