Skip to content

Conversation

assembler-0
Copy link
Owner

@assembler-0 assembler-0 commented Oct 2, 2025

Summary by CodeRabbit

  • New Features

    • Added macOS and Windows x64 toolchains and high-performance SIMD memory routines for faster memcpy/memset/memcmp paths.
  • Documentation

    • Quickstart now shows using a platform toolchain file; removed outdated clang-x64 docs.
  • Chores

    • Build now emits toolchain/generator/state messages; CI uses the toolchain file, parallel Ninja and updated artifact retention.
  • Bug Fixes

    • Adjusted run targets: reduced test-run memory and updated run dependencies.
  • Refactor

    • Reordered build include logic and simplified Linux toolchain messaging.

Copy link

coderabbitai bot commented Oct 2, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

CMake now emits configure-time toolchain/generator/build-state messages and reorders includes; run target dependencies changed and runmin memory reduced. New macOS/Windows x64 toolchains added; Linux toolchain linker flags simplified. CPU features gain avx512f. New compile-time constants added. mm/MemOps implements SIMD-accelerated memcpy/memset/memcmp with public wrappers and restrict-qualified signatures.

Changes

Cohort / File(s) Summary
CMake top-level & messaging
CMakeLists.txt
Emits configure-time messages (toolchain, generator, build type, source/binary dirs, current dir, host OS/processor). Reorders includes to include cmake/run.cmake earlier and removes prior clang-x64 include.
Run targets & runner flags
cmake/run.cmake
add_custom_target(run) DEPENDS expanded from isoiso, img, extra-img. add_custom_target(runmin) memory flag changed -m 4G-m 1G and its DEPENDS reduced to iso.
Toolchain files (linux)
cmake/toolchain/linux-x64.cmake
Removed initial linker-flag initializations for -fuse-ld=lld; moved CMAKE_ASM_NASM_COMPILER to file end and generalized header comment.
New toolchains: macOS & Windows
cmake/toolchain/macos-x64.cmake, cmake/toolchain/windows-x64.cmake
Added macOS x64 and Windows x64 toolchain files (Generic system, x86_64, clang/clang++, nasm).
Docs / Quickstart / CI
README.md, docs/CMAKE.md, .github/workflows/main.yaml
README quickstart CMake invocation now passes -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchain/<linux/windows/macos>-x64.cmake. Removed clang-x64.cmake subsection from docs/CMAKE.md. CI workflow CMake invocation added -DCMAKE_BUILD_TYPE=Release and -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchain/linux-x64.cmake; ninja invocation uses -j $(nproc) and runmin is invoked from cmake-build directory; artifact retention reduced.
CMake variables / compile defs
cmake/variable.cmake
Added compile-time definitions: PREFETCH_DISTANCE=256 and NT_STORE_THRESHOLD=4*1024*1024 via add_compile_definitions.
CPU feature additions & helpers
arch/x86_64/features/x64.h, arch/x86_64/features/x64.c
Added bool avx512f to CpuFeatures. When OSXSAVE unsupported, code disables avx512f (and updates warning text). Added static inline memory-protection helper functions in header.
Memory operations (SIMD & public wrappers)
mm/MemOps.c, mm/MemOps.h
Added public wrappers memcpy and memcmp; changed memset and Fast* signatures to use restrict. Implemented multi-path FastMemset/FastMemcpy/FastMemcmp with AVX-512f/AVX2/SSE2 and scalar fallbacks, prefetching, non-temporal store paths, alignment and overlap handling, and FastZeroPage optimizations; header updated to expose restricted signatures and host wrappers.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant CMake as CMake (configure)
  participant TC as Toolchain File
  participant RunCM as cmake/run.cmake

  Dev->>CMake: cmake -S . -B build -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchain/linux-x64.cmake -DCMAKE_BUILD_TYPE=Release -G Ninja
  CMake->>TC: load toolchain (system, processor, compilers)
  CMake->>RunCM: include run.cmake early
  CMake-->>Dev: print toolchain/generator/build-type/source/bin/current-dir/host info
Loading
sequenceDiagram
  autonumber
  participant Caller as mem_* caller
  participant Detect as CPU feature detection
  participant Fast as FastMemcpy/FastMemset/FastMemcmp
  participant Mem as Memory

  Caller->>Fast: call memcpy/memset/memcmp(args)
  Fast->>Detect: query features & size
  alt AVX-512f & large & NT threshold
    Fast->>Mem: prefetch → aligned NT stores → sfence
  else AVX2
    Fast->>Mem: AVX2 loads/stores → vzeroupper
  else SSE2
    Fast->>Mem: SSE2 loads/stores → vzeroupper
  else scalar
    Fast->>Mem: word/byte operations
  end
  Fast-->>Caller: return result
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Development #139 — Nearly identical edits across CMake files, toolchains, run targets, arch/x64 changes, and mm/MemOps; likely a direct duplicate or follow-up.
  • Development #135 — Modifies the same x86_64 CPU feature code (adds/adjusts AVX/AVX512 handling); strongly related to the avx512f addition.
  • Development #82 — Overlaps on mm/MemOps.c FastMemcpy implementation changes (SIMD/path adjustments); relevant to the memcpy/memcmp rewrite.

Poem

In burrows where the build logs hum,
I nibble toolchains, make memops run.
AVX wakes, copies zip and zoom,
Run trims RAM and CI clears the room.
Carrots for commits — a speedy sprint! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “Development” is too vague and does not convey any meaningful information about the substantive changes in this pull request, making it unclear to reviewers what the main focus of the updates is. Please revise the title to clearly summarize the primary change, for example “Add CMake toolchain support and high-performance memcpy/memset implementations” or another concise phrase that highlights the main update.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06ed0ee and 8dd8fd5.

📒 Files selected for processing (2)
  • .github/workflows/main.yaml (2 hunks)
  • cmake/run.cmake (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
CMakeLists.txt (1)

21-29: Use message(STATUS ...) for consistent CMake output formatting.

The bare message() calls work but don't follow CMake conventions. Using message(STATUS ...) will:

  • Prefix output with -- for better readability
  • Integrate better with CMake's logging levels
  • Align with the existing message(STATUS ...) used in lines 12, 16

Apply this diff:

-message("CMake Toolchain: ${CMAKE_TOOLCHAIN_FILE}")
-message("CMake Generator: ${CMAKE_GENERATOR}")
-message("CMake Build Type: ${CMAKE_BUILD_TYPE}")
-message("CMake Source Directory: ${CMAKE_SOURCE_DIR}")
-message("CMake Binary Directory: ${CMAKE_BINARY_DIR}")
-message("CMake Current Source Directory: ${CMAKE_CURRENT_SOURCE_DIR}")
-message("CMake Current Binary Directory: ${CMAKE_CURRENT_BINARY_DIR}")
-message("CMake Host System Name: ${CMAKE_HOST_SYSTEM_NAME}")
-message("CMake Host System Processor: ${CMAKE_HOST_SYSTEM_PROCESSOR}")
+message(STATUS "CMake Toolchain: ${CMAKE_TOOLCHAIN_FILE}")
+message(STATUS "CMake Generator: ${CMAKE_GENERATOR}")
+message(STATUS "CMake Build Type: ${CMAKE_BUILD_TYPE}")
+message(STATUS "CMake Source Directory: ${CMAKE_SOURCE_DIR}")
+message(STATUS "CMake Binary Directory: ${CMAKE_BINARY_DIR}")
+message(STATUS "CMake Current Source Directory: ${CMAKE_CURRENT_SOURCE_DIR}")
+message(STATUS "CMake Current Binary Directory: ${CMAKE_CURRENT_BINARY_DIR}")
+message(STATUS "CMake Host System Name: ${CMAKE_HOST_SYSTEM_NAME}")
+message(STATUS "CMake Host System Processor: ${CMAKE_HOST_SYSTEM_PROCESSOR}")
README.md (1)

49-49: Consider clarifying the toolchain selection with examples.

The placeholder <linux/windows/macos>-x64.cmake might be unclear to new users. Consider adding a brief example or note.

Apply this diff to add clarity:

-cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain/<linux/windows/macos>-x64.cmake -G Ninja
+# Choose the appropriate toolchain file for your host system:
+# Linux:   -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain/linux-x64.cmake
+# macOS:   -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain/macos-x64.cmake
+# Windows: -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain/windows-x64.cmake
+cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain/linux-x64.cmake -G Ninja
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6416eb4 and 4405206.

📒 Files selected for processing (7)
  • CMakeLists.txt (2 hunks)
  • README.md (1 hunks)
  • cmake/run.cmake (1 hunks)
  • cmake/toolchain/linux-x64.cmake (1 hunks)
  • cmake/toolchain/macos-x64.cmake (1 hunks)
  • cmake/toolchain/windows-x64.cmake (1 hunks)
  • docs/CMAKE.md (0 hunks)
💤 Files with no reviewable changes (1)
  • docs/CMAKE.md
🔇 Additional comments (4)
cmake/toolchain/macos-x64.cmake (1)

1-9: LGTM! Toolchain configuration is consistent with cross-platform approach.

The macOS toolchain correctly sets up a bare-metal/kernel build environment using Generic system, x86_64 processor, and standard Clang toolchain with NASM assembler. This aligns well with the linux-x64.cmake and windows-x64.cmake approach.

cmake/run.cmake (1)

29-29: LGTM! Explicit dependencies improve build correctness.

Adding img and extra-img to the run target's dependencies ensures disk images referenced in the QEMU command (VoidFrameDisk.img, SataDisk.img, VirtioDisk.img) are created before launching. This makes the dependency graph explicit and prevents potential build ordering issues.

cmake/toolchain/linux-x64.cmake (1)

1-9: LGTM! Simplified toolchain with target-level linker configuration.

The removal of CMAKE_EXE_LINKER_FLAGS_INIT and CMAKE_SHARED_LINKER_FLAGS_INIT is appropriate since linker configuration (-fuse-ld=lld) is now handled at the target level in CMakeLists.txt (line 245-251). This provides better flexibility and aligns with modern CMake practices.

The updated header comment and structure now match the macos-x64.cmake and windows-x64.cmake patterns.

CMakeLists.txt (1)

43-43: Verify run.cmake variable definitions before include.
run.cmake references QEMU_IMG, MKFS_EXT2, MKFS_FAT, LLVM_OBJDUMP but no set or find_program calls were found defining them; ensure these variables are initialized prior to include(run) or adjust the include order.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4405206 and ab0ea1a.

📒 Files selected for processing (8)
  • .github/workflows/main.yaml (1 hunks)
  • README.md (1 hunks)
  • arch/x86_64/features/x64.c (1 hunks)
  • arch/x86_64/features/x64.h (1 hunks)
  • cmake/run.cmake (2 hunks)
  • cmake/toolchain/windows-x64.cmake (1 hunks)
  • cmake/variable.cmake (1 hunks)
  • mm/MemOps.c (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmake/toolchain/windows-x64.cmake
  • README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-01T01:10:22.123Z
Learnt from: assembler-0
PR: assembler-0/VoidFrame#103
File: README.md:9-9
Timestamp: 2025-09-01T01:10:22.123Z
Learning: The current VoidFrame Registers struct in arch/x86_64/cpu/Cpu.h only saves basic GPRs, segment registers, and exception context (interrupt_number, error_code, rip, cs, rflags, rsp, ss). It's missing critical registers for deep fault tracing: SIMD registers (XMM/YMM/ZMM), control registers (CR0/CR2/CR3/CR4), debug registers (DR0-DR7), and extended FPU state. The struct has a "DO NOT TOUCH" comment indicating it must match interrupt ASM stack layout.

Applied to files:

  • arch/x86_64/features/x64.h
🧬 Code graph analysis (2)
arch/x86_64/features/x64.c (1)
kernel/etc/Console.c (1)
  • PrintKernelWarning (220-225)
mm/MemOps.c (2)
arch/x86_64/features/x64.c (1)
  • GetCpuFeatures (102-104)
include/Io.h (2)
  • save_irq_flags (38-42)
  • restore_irq_flags (44-46)
🔇 Additional comments (3)
cmake/run.cmake (3)

29-29: LGTM! Correct dependency declaration.

Adding img and extra-img dependencies ensures that all disk images used by the run target (VoidFrameDisk.img, VirtioDisk.img, SataDisk.img) are created before QEMU launches. This prevents runtime errors.


39-39: LGTM! Appropriate memory reduction for minimal target.

Reducing memory from 4G to 1G aligns with the minimal testing purpose of the runmin target. 1G should be sufficient for basic functionality verification.


41-41: LGTM! Correct dependency optimization.

Removing the img and extra-img dependencies is appropriate since the runmin target doesn't use any disk images. This optimization eliminates unnecessary build steps for the minimal test configuration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
mm/MemOps.c (2)

238-248: FastMemcpy small-copy fallback dereferences unaligned pointers.

Casting d/s to uint64_t* or uint32_t* without verifying alignment is UB and can crash under strict-alignment builds. Replace these with unaligned-safe copies—load into a temporary via __builtin_memcpy and store back, or drop to the byte loop.


49-63: Fix unaligned wide stores in the small memset fast path.

*(uint64_t*)d / *(uint32_t*)d assume natural alignment. With arbitrary buffers those casts are undefined behavior and can fault on platforms with strict alignment. Use an unaligned-safe copy (e.g., __builtin_memcpy) or byte writes instead of typed stores.

-        if (size >= 8) {
-            uint64_t val64 = 0x0101010101010101ULL * val;
-            *(uint64_t*)d = val64;
-            *(uint64_t*)(d + size - 8) = val64;
-            return dest;
-        }
+        if (size >= 8) {
+            uint64_t val64 = 0x0101010101010101ULL * val;
+            __builtin_memcpy(d, &val64, 8);
+            __builtin_memcpy(d + size - 8, &val64, 8);
+            return dest;
+        }
@@
-        if (size >= 4) {
-            uint32_t val32 = 0x01010101U * val;
-            *(uint32_t*)d = val32;
-            *(uint32_t*)(d + size - 4) = val32;
-            return dest;
-        }
+        if (size >= 4) {
+            uint32_t val32 = 0x01010101U * val;
+            __builtin_memcpy(d, &val32, 4);
+            __builtin_memcpy(d + size - 4, &val32, 4);
+            return dest;
+        }
🧹 Nitpick comments (1)
arch/x86_64/features/x64.h (1)

65-68: Document split full‐fence pattern and align attributes with cli()/sti()

The __full_mem_prot_init/__full_mem_prot_end pair implements a split open/close full‐fence + serialize, unlike cli()/sti() which each include the full fence sandwich and hot, flatten attributes. Add a brief comment explaining the intended bracketing usage and consider adding hot, flatten to these functions for consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab0ea1a and 06ed0ee.

📒 Files selected for processing (6)
  • .github/workflows/main.yaml (1 hunks)
  • CMakeLists.txt (2 hunks)
  • arch/x86_64/features/x64.h (2 hunks)
  • cmake/variable.cmake (1 hunks)
  • mm/MemOps.c (4 hunks)
  • mm/MemOps.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CMakeLists.txt
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-14T02:11:58.963Z
Learnt from: assembler-0
PR: assembler-0/VoidFrame#82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:58.963Z
Learning: In VoidFrame kernel's FastMemcpy implementation (kernel/memory/MemOps.c), memory corruption occurs under high load due to interrupts occurring between SSE2 movdqu load and store instructions, corrupting XMM register state. The fix requires disabling interrupts around SSE operations and adding memory barriers (sfence) to ensure stores complete atomically.

Applied to files:

  • mm/MemOps.c
📚 Learning: 2025-08-14T02:11:27.395Z
Learnt from: assembler-0
PR: assembler-0/VoidFrame#82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:27.395Z
Learning: In VoidFrame kernel, FastMemcpy was using SSE2 movdqu instructions without preserving XMM registers, causing crashes when called from interrupt context (like the scheduler's FastSchedule function). The fix is to use only general-purpose registers with 64-bit unrolled copies to maintain performance while being interrupt-safe.

Applied to files:

  • mm/MemOps.c
📚 Learning: 2025-09-01T01:10:22.123Z
Learnt from: assembler-0
PR: assembler-0/VoidFrame#103
File: README.md:9-9
Timestamp: 2025-09-01T01:10:22.123Z
Learning: The current VoidFrame Registers struct in arch/x86_64/cpu/Cpu.h only saves basic GPRs, segment registers, and exception context (interrupt_number, error_code, rip, cs, rflags, rsp, ss). It's missing critical registers for deep fault tracing: SIMD registers (XMM/YMM/ZMM), control registers (CR0/CR2/CR3/CR4), debug registers (DR0-DR7), and extended FPU state. The struct has a "DO NOT TOUCH" comment indicating it must match interrupt ASM stack layout.

Applied to files:

  • arch/x86_64/features/x64.h
🧬 Code graph analysis (3)
mm/MemOps.h (1)
mm/MemOps.c (7)
  • FastMemset (23-208)
  • FastMemcpy (210-434)
  • FastMemcmp (526-645)
  • FastZeroPage (436-524)
  • memset (11-13)
  • memcpy (15-17)
  • memcmp (19-21)
mm/MemOps.c (2)
include/Io.h (2)
  • save_irq_flags (38-42)
  • restore_irq_flags (44-46)
arch/x86_64/features/x64.c (1)
  • GetCpuFeatures (102-104)
arch/x86_64/features/x64.h (1)
include/Io.h (5)
  • always_inline (48-55)
  • always_inline (57-64)
  • always_inline (67-71)
  • always_inline (74-78)
  • always_inline (80-84)
🔇 Additional comments (1)
arch/x86_64/features/x64.h (1)

70-74: Memory barrier implementation is correct.

The __full_mem_prot_end function correctly implements the closing memory barrier with serialization. Together with __full_mem_prot_init, this pair provides a complete fence sandwich around protected code sections.

The implementation properly orders:

  1. Memory fences (ensuring all memory operations are visible)
  2. Compiler barrier (__sync_synchronize())
  3. Instruction serialization (__builtin_ia32_serialize() - ensures all prior instructions complete)

Note: The attribute and documentation concerns apply to both functions and were mentioned in the review of __full_mem_prot_init.

@assembler-0 assembler-0 merged commit 3fbd70e into main Oct 2, 2025
0 of 2 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Oct 3, 2025
@coderabbitai coderabbitai bot mentioned this pull request Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant