Skip to content

Conversation

@kpouget
Copy link
Collaborator

@kpouget kpouget commented Nov 4, 2025

Summary by CodeRabbit

  • New Features

    • Added remote GPU compute capabilities enabling distributed inference through remoting backend and frontend integration with Vulkan/virtgpu infrastructure.
  • Chores

    • Added new build scripts and CMake configurations for remote compute targets.
    • Consolidated project ownership structure via OWNERS file.

@coderabbitai
Copy link

coderabbitai bot commented Nov 4, 2025

Caution

Review failed

Failed to post review comments

Walkthrough

This PR introduces a comprehensive GGML remoting backend and frontend infrastructure, enabling distributed tensor computation across virtgpu. Adds dispatched backend handling, RPC serialization protocols, frontend buffer/device management, DRM kernel interfaces, and build/run scripts. Disables verbose logging in llama.cpp and adds timing instrumentation.

Changes

Cohort / File(s) Summary
Build system & configuration
.gitignore, CMakePresets.json, OWNERS
Updated .gitignore with build path patterns; added remoting frontend/backend CMake presets; created OWNERS file with approvers/reviewers
Build & preparation scripts
build.sh, build.backend.sh, build.remoting.sh, build.vulkan.sh, prepare.sh, prepare.backend.sh, prepare.remoting.sh, prepare.vulkan.sh, podman_compile.sh
Added parallelized build scripts for remoting backend/frontend, Vulkan; preparation scripts configure CMake for different backends with feature flags; podman script orchestrates container-based builds
Run/execution scripts
run.ramalama.sh, run.remoting.sh, run.vulkan.sh
Added execution scripts with environment setup for Vulkan ICD, remoting backend selection (bench/perf/normal modes), and debugging tool prefixes
GGML backend registration & dispatch
ggml/CMakeLists.txt, ggml/src/CMakeLists.txt, ggml/src/ggml-backend-reg.cpp, ggml/include/ggml-remoting-frontend.h
Added GGML_REMOTING_FRONTEND/BACKEND options, registered RemotingFrontend/RemotingBackend; exposed remoting frontend public header with registration function
Metal backend remoting
ggml/src/ggml-metal/CMakeLists.txt, ggml/src/ggml-metal/ggml-metal-context.m, ggml/src/ggml-metal/ggml-metal-device.m, ggml/src/ggml-metal/ggml-metal-remoting.cpp
Added Metal remoting support file to backend; enabled graph optimization timing; disabled debug logging in pipeline compilation; exposed Metal device context retrieval API
Remoting backend (server-side)
ggml/src/ggml-remotingbackend/CMakeLists.txt, ggml/src/ggml-remotingbackend/backend-*.{cpp,h}, ggml/src/ggml-remotingbackend/shared/{api_remoting.h,apir_backend.h,venus_cs.h,venus_cs_ggml*.{h,cpp}}
Comprehensive backend dispatcher with device/buffer-type/buffer/Metal command handlers; RPC protocol definitions; binary encoding/decoding for tensor serialization; graph construction from RPC payloads
Remoting frontend (client-side)
ggml/src/ggml-remotingfrontend/CMakeLists.txt, ggml/src/ggml-remotingfrontend/ggml-*.{cpp,h}, ggml/src/ggml-remotingfrontend/virtgpu-*.{cpp,h}
Frontend buffer/device/Metal operations; virtgpu integration with DRM IOCTLs; shared memory management; remote call lifecycle (prepare/dispatch/finish)
DRM kernel UAPI headers
ggml/src/ggml-remotingfrontend/include/drm-uapi/{drm.h,virtgpu_drm.h}, include/venus_hw.h
Complete DRM and Virtio-GPU userspace API definitions; capability structures; IOCTL codes
Utility infrastructure
ggml/src/ggml-remotingfrontend/virtgpu-utils.{cpp,h}
Sparse hierarchical array implementation, logging/debug helpers, alignment/atomic utilities, linked-list structures
Logging suppression in llama.cpp
src/llama-context.cpp, src/llama-kv-cache.cpp, src/llama-model-loader.cpp, src/llama-model.cpp, src/llama-vocab.cpp
Commented out verbose debug/info logs for metadata, tensor loading, and context info; added early returns in print_info functions
Performance instrumentation
tools/run/run.cpp
Added timing instrumentation for token generation with throughput reporting (tokens/sec)

Sequence Diagram(s)

sequenceDiagram
    participant Frontend as GGML Frontend
    participant Dispatcher as Backend Dispatcher
    participant GPU as Remote GPU
    participant Host as Host Virtgpu

    Frontend->>Host: create_virtgpu() - handshake & load backend library
    Host-->>Frontend: virtgpu handle, capset, shmem regions
    
    Note over Frontend: Graph Compute Flow
    Frontend->>Frontend: serialize_graph(cgraph)
    Frontend->>Host: remote_call_prepare(GRAPH_COMPUTE)
    Host-->>Frontend: encoder + decoder
    
    Frontend->>Frontend: encode cgraph to shmem
    Frontend->>Host: remote_call() - send command
    Host->>Dispatcher: apir_backend_dispatcher(cmd_type, encoded_data)
    Dispatcher->>GPU: backend_graph_compute(cgraph)
    GPU-->>Dispatcher: ggml_status result
    Dispatcher->>Host: encode status
    Host-->>Frontend: remote_call result
    
    Frontend->>Frontend: deserialize result
    Frontend->>Host: remote_call_finish()
Loading
sequenceDiagram
    participant App as Application
    participant FrontBuf as Frontend Buffer
    participant RemoteCall as Remote Backend
    participant GPU as GPU Device
    participant SharedMem as Shared Memory

    App->>FrontBuf: set_tensor(tensor, data)
    FrontBuf->>SharedMem: allocate/use shmem region
    FrontBuf->>FrontBuf: encode tensor + shmem_id
    FrontBuf->>RemoteCall: apir_buffer_set_tensor(encoded)
    
    RemoteCall->>SharedMem: resolve shmem pointer
    RemoteCall->>GPU: dma/copy data to GPU buffer
    GPU-->>RemoteCall: done
    RemoteCall-->>FrontBuf: status
    
    FrontBuf-->>App: set_tensor complete
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Justification: This is a substantial architectural addition with high complexity and heterogeneity:

  • Scope: ~80 new/modified files across three major components (backend, frontend, build system)
  • Density: Dense logic in dispatchers, RPC serialization, buffer management, and virtgpu integration
  • Heterogeneity: Diverse patterns—DRM IOCTLs, VN encoding/decoding protocol, GGML tensor serialization, CMake target configuration
  • Critical infrastructure: Remote procedure call mechanism, shared memory management, and cross-process tensor graph execution require careful review
  • Language variety: Mix of C++, C, Objective-C, shell scripts, and CMake

Areas requiring extra attention:

  • RPC serialization protocol (venus_cs.h, venus_cs_ggml-rpc*.cpp): Custom binary encoding/decoding with bounds checks and overflow validation
  • Shared memory lifecycle (virtgpu-shm.cpp/h): Allocation, mapping, deallocation patterns; sparse array indexing
  • Buffer tracking and synchronization (backend-dispatched-buffer.cpp, ggml-backend-buffer.cpp): Proper lifetime management across frontend/backend boundary
  • Graph serialization/deserialization (venus_cs_ggml-rpc*.cpp): Tensor dependency reconstruction with buffer validation
  • Remote call dispatch table (backend-dispatched.h): Command routing correctness and bounds checking
  • Metal device context handling (ggml-metal-remoting.cpp, virtgpu-forward-metal.cpp): Capability query correctness

Possibly related PRs

Suggested reviewers

  • cfergeau
  • praveenkumar

Poem

🐰 A rabbit's remote dream, so grand and sublime,
Dispatchers and buffers, all coded with time,
Through virtgpu's tunnel, the tensors now flow,
From frontend to backend, a virtuoso show!
With serialized graphs and shared memory deep,
The compute remotes forth—innovations to reap!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is entirely empty; no description was provided by the author despite a template being available in the repository. Add a comprehensive description explaining the purpose of the rebase, what changes are included, and why this rebase is necessary. Reference the template in CONTRIBUTING.md.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[remoting] Rebase on top of b6945' is specific and clearly indicates a rebase operation on a commit identifier, directly related to the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Nov 4, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kpouget
Copy link
Collaborator Author

kpouget commented Nov 4, 2025

/test topsail
/cluster mac5

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 51

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/run/run.cpp (1)

1175-1199: Call stop_timer() on every exit path.

We start the timer at the top of each loop iteration, but exit paths (decode failure, EOG break, token-to-string failure) return or break before the trailing stop_timer(). Those iterations never increment timer_total/timer_count, so the stats under-report work and may leave the last start_timer() unmatched. Stop the timer before each early exit.

-        if (llama_decode(llama_data.context.get(), batch)) {
+        if (llama_decode(llama_data.context.get(), batch)) {
+            stop_timer();
             printe("failed to decode\n");
             return 1;
         }
 
         // sample the next token, check is it an end of generation?
         new_token_id = llama_sampler_sample(llama_data.sampler.get(), llama_data.context.get(), -1);
-        if (llama_vocab_is_eog(vocab, new_token_id)) {
+        if (llama_vocab_is_eog(vocab, new_token_id)) {
+            stop_timer();
             break;
         }
 
         std::string piece;
-        if (convert_token_to_string(vocab, new_token_id, piece)) {
+        if (convert_token_to_string(vocab, new_token_id, piece)) {
+            stop_timer();
             return 1;
         }
 
         print_word_and_concatenate_to_response(piece, response);
 
         // prepare the next batch with the sampled token
         batch = llama_batch_get_one(&new_token_id, 1);
         stop_timer();
🧹 Nitpick comments (19)
ggml/src/ggml-remotingfrontend/include/venus_hw.h (1)

30-33: Consider documenting the version fields.

The first four fields lack inline comments explaining their purpose and valid value ranges. While the names are descriptive, brief documentation would improve maintainability, especially for protocol version compatibility checks.

Example:

 struct virgl_renderer_capset_venus {
+   /* Wire protocol format version for Venus commands */
    uint32_t wire_format_version;
+   /* Vulkan XML specification version */
    uint32_t vk_xml_version;
+   /* VK_EXT_command_serialization specification version */
    uint32_t vk_ext_command_serialization_spec_version;
+   /* VK_MESA_venus_protocol specification version */
    uint32_t vk_mesa_venus_protocol_spec_version;
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)

83-94: Consider adding [[noreturn]] attribute to FATAL.

Since FATAL always calls abort(), it would be beneficial to mark it with the [[noreturn]] attribute to help the compiler with optimization and dead code analysis.

Apply this diff:

+[[noreturn]]
 inline void
 FATAL(const char *format, ...) {

96-100: Consider removing redundant wrapper function.

util_is_power_of_two_nonzero64 is a simple wrapper around the IS_POT_NONZERO macro with no additional logic. Consider using the macro directly or documenting why the function wrapper is necessary.

src/llama-model.cpp (1)

2256-2256: Keep debug logs; they’re already gated by LLAMA_LOG_DEBUG.

Commenting them out removes useful diagnostics for device assignment/splits. Since these are debug-level, they won’t spam unless enabled. Recommend restoring them.

-            //LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(cpu_dev), is_swa);
+            LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(cpu_dev), is_swa);
@@
-        //LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(dev), is_swa);
+        LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(dev), is_swa);

Also applies to: 2262-2262

ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h (1)

1-64: Document the kernel source and version of this vendored UAPI header.

This appears to be a DRM UAPI header copied from the Linux kernel. It's important to document:

  • Which kernel version this header is from
  • The update policy for keeping it in sync with kernel changes
  • Why vendoring is preferred over using system headers

Consider adding a comment at the top indicating this is from kernel UAPI and the specific version/commit.

ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h (1)

1-38: Document kernel version and note dependency on drm.h.

Similar to drm.h, this vendored virtgpu UAPI header should document its kernel source version. Additionally, this header depends on drm.h which has a missing drm_mode.h include (see drm.h review comments).

run.vulkan.sh (2)

12-12: Remove commented-out code.

Dead code should be removed rather than left commented out. Use version control to recover it if needed.

Apply this diff:

-#rm -f /usr/lib64/libvulkan_virtio.so
-

16-16: Make MESA_FLAVOR configurable.

The MESA_FLAVOR variable is hardcoded to "good". Consider making this configurable via environment variable or command-line argument to support different testing scenarios.

Apply this diff:

-MESA_FLAVOR=good
+MESA_FLAVOR="${MESA_FLAVOR:-good}"
ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h (1)

7-8: Clarify or remove incomplete CACHED macro.

The CACHED macro is defined as empty with a commented-out placeholder. If this is for future use, consider adding a TODO comment explaining its purpose. If it's not needed, remove it.

src/llama-vocab.cpp (3)

2360-2361: Debug log suppression is fine; prefer a feature toggle.

Commenting out LLAMA_LOG_DEBUG reduces diagnosability. Consider guarding with a compile-time flag or env-driven verbosity check instead of hard-disabling.


3207-3207: Early-return disables print_info entirely.

This short-circuits all vocab diagnostics. Recommend gating via a runtime flag (e.g., LLAMA_QUIET=1) or compile-time option so developers can re-enable when needed.


3573-3573: Same as above: print_info() is now a no-op.

Apply the same gating approach to allow opt-in diagnostics.

prepare.vulkan.sh (1)

1-7: Missing shebang and strict mode; add for robustness.

Shellcheck SC2148 applies. Also suggest set -euo pipefail.

Apply this diff:

+#!/usr/bin/env bash
+set -euo pipefail
 cmake -S . \
       -B ../build.vulkan \
       -DGGML_VULKAN=ON \
       -DGGML_NATIVE=OFF \
       -DGGML_METAL=OFF \
       -DLLAMA_CURL=OFF \
       -DCMAKE_BUILD_TYPE=Debug
src/llama-model-loader.cpp (2)

682-709: Log lines disabled; prefer a controllable verbosity switch.

Commenting out LLAMA_LOG_INFO/DEBUG reduces helpful diagnostics. Gate via env var (e.g., LLAMA_VERBOSE_META) or compile-time option instead of hard-disabling.

Also applies to: 793-794


1160-1161: print_info() early-return suppresses all loader metadata.

Consider honoring a quiet flag rather than unconditional return, to aid debugging when needed.

build.backend.sh (1)

13-13: Consider separating declaration and assignment to catch errors.

Combining export with command substitution can mask failures from the subcommand.

Apply this diff to separate the operations:

-export SDKROOT=$(xcrun --sdk macosx --show-sdk-path)
+SDKROOT=$(xcrun --sdk macosx --show-sdk-path)
+export SDKROOT
podman_compile.sh (1)

34-34: Consider mounting only the required directory instead of entire $HOME.

Mounting the entire home directory gives the container broad access to user data. For better security isolation, mount only the workspace directory needed for the build.

For example, if only the current project directory is needed:

--env HOME="$HOME" \
--env PERF_MODE="${PERF_MODE:-}" \
--env BENCH_MODE="${BENCH_MODE:-}" \
--v "$HOME":"$HOME":Z \
+-v "$PWD":"$PWD":Z \
-w "$PWD" \
ggml/src/ggml-remotingfrontend/CMakeLists.txt (1)

27-29: Use pkg-config or find_package instead of hardcoding system paths.

The hardcoded path /usr/include/libdrm/ and the Fedora-specific dnf install comment reduce portability. Different Linux distributions and macOS may install libdrm in different locations.

Apply this diff to use pkg-config for better portability:

 # dnf install -y libdrm-devel
-target_link_libraries(ggml-remotingfrontend PUBLIC drm)
-target_include_directories(ggml-remotingfrontend PUBLIC /usr/include/libdrm/)
+find_package(PkgConfig REQUIRED)
+pkg_check_modules(DRM REQUIRED libdrm)
+target_link_libraries(ggml-remotingfrontend PUBLIC ${DRM_LIBRARIES})
+target_include_directories(ggml-remotingfrontend PUBLIC ${DRM_INCLUDE_DIRS})
ggml/src/ggml-remotingbackend/backend-dispatched.h (1)

56-65: Fix command name strings to match handlers.

Several command-name strings drop the _device/_backend prefixes (e.g., Line 56 returns "backend_get_device_count" while the handler is backend_reg_get_device_count). When this helper is used for tracing or diagnostics, the mismatches make it very hard to map logs back to the actual dispatchers. Please align the returned strings with the real handler names.

Apply this diff:

-  case APIR_COMMAND_TYPE_DEVICE_GET_COUNT: return "backend_get_device_count";
-  case APIR_COMMAND_TYPE_DEVICE_GET_NAME: return "backend_get_device_name";
-  case APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION: return "backend_get_device_description";
-  case APIR_COMMAND_TYPE_DEVICE_GET_MEMORY: return "backend_get_device_memory";
-  case APIR_COMMAND_TYPE_DEVICE_GET_BUFFER_TYPE: return "backend_get_buffer_type";
-  case APIR_COMMAND_TYPE_DEVICE_GET_PROPS: return "backend_get_props";
-  case APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR: return "backend_buffer_from_ptr";
+  case APIR_COMMAND_TYPE_DEVICE_GET_COUNT: return "backend_reg_get_device_count";
+  case APIR_COMMAND_TYPE_DEVICE_GET_NAME: return "backend_device_get_name";
+  case APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION: return "backend_device_get_description";
+  case APIR_COMMAND_TYPE_DEVICE_GET_MEMORY: return "backend_device_get_memory";
+  case APIR_COMMAND_TYPE_DEVICE_GET_BUFFER_TYPE: return "backend_device_get_buffer_type";
+  case APIR_COMMAND_TYPE_DEVICE_GET_PROPS: return "backend_device_get_props";
+  case APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR: return "backend_device_buffer_from_ptr";
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc98f8d and a15ef1d.

📒 Files selected for processing (76)
  • .gitignore (1 hunks)
  • CMakePresets.json (1 hunks)
  • OWNERS (1 hunks)
  • build.backend.sh (1 hunks)
  • build.remoting.sh (1 hunks)
  • build.sh (1 hunks)
  • build.vulkan.sh (1 hunks)
  • ggml/CMakeLists.txt (2 hunks)
  • ggml/include/ggml-remoting-frontend.h (1 hunks)
  • ggml/src/CMakeLists.txt (1 hunks)
  • ggml/src/ggml-backend-reg.cpp (3 hunks)
  • ggml/src/ggml-metal/CMakeLists.txt (1 hunks)
  • ggml/src/ggml-metal/ggml-metal-context.m (1 hunks)
  • ggml/src/ggml-metal/ggml-metal-device.m (2 hunks)
  • ggml/src/ggml-metal/ggml-metal-remoting.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/CMakeLists.txt (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-convert.h (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-dispatched-buffer.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-dispatched-device.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-dispatched.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-dispatched.h (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-internal.h (1 hunks)
  • ggml/src/ggml-remotingbackend/backend-utils.h (1 hunks)
  • ggml/src/ggml-remotingbackend/backend.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/shared/api_remoting.h (1 hunks)
  • ggml/src/ggml-remotingbackend/shared/apir_backend.h (1 hunks)
  • ggml/src/ggml-remotingbackend/shared/venus_cs.h (1 hunks)
  • ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (1 hunks)
  • ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (1 hunks)
  • ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (1 hunks)
  • ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/CMakeLists.txt (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-backend-buffer-type.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-backend-host-buffer-type.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-backend.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-metal-remoting.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-remoting-frontend.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/ggml-remoting.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/include/venus_hw.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-forward.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-shm.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu.cpp (1 hunks)
  • ggml/src/ggml-remotingfrontend/virtgpu.h (1 hunks)
  • podman_compile.sh (1 hunks)
  • prepare.backend.sh (1 hunks)
  • prepare.remoting.sh (1 hunks)
  • prepare.sh (1 hunks)
  • prepare.vulkan.sh (1 hunks)
  • run.ramalama.sh (1 hunks)
  • run.remoting.sh (1 hunks)
  • run.vulkan.sh (1 hunks)
  • src/llama-context.cpp (2 hunks)
  • src/llama-kv-cache.cpp (1 hunks)
  • src/llama-model-loader.cpp (5 hunks)
  • src/llama-model.cpp (2 hunks)
  • src/llama-vocab.cpp (3 hunks)
  • tools/run/run.cpp (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (42)
ggml/include/ggml-remoting-frontend.h (1)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)
  • ggml_backend_remoting_frontend_reg (132-159)
  • ggml_backend_remoting_frontend_reg (132-132)
ggml/src/ggml-backend-reg.cpp (1)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)
  • ggml_backend_remoting_frontend_reg (132-159)
  • ggml_backend_remoting_frontend_reg (132-132)
ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (2)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (1)
  • vn_decode_bool_t (508-512)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (2)
  • remote_call_finish (551-567)
  • remote_call_finish (552-555)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.h (1)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (4)
  • get_metal_dev_context (4-18)
  • get_metal_dev_context (4-4)
  • ggml_metal_device_supports_op (20-254)
  • ggml_metal_device_supports_op (20-20)
ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (2)
ggml/src/ggml-remotingbackend/backend-utils.h (1)
  • ERROR (52-55)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (1)
  • vn_encode_bool_t (502-506)
ggml/src/ggml-remotingbackend/backend-convert.h (1)
ggml/src/ggml-remotingfrontend/ggml-remoting.h (2)
  • ggml_buffer_to_apir_handle (137-139)
  • ggml_buffer_type_to_apir_handle (34-38)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (4)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (1)
  • vn_encode_ggml_buffer_type (69-73)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (7)
  • vn_decode_array_size_unchecked (293-299)
  • vn_cs_decoder_alloc_array (493-498)
  • vn_decode_char_array (466-476)
  • vn_decode_size_t (394-400)
  • vn_decode_bool_t (508-512)
  • vn_encode_size_t (387-392)
  • vn_decode_apir_buffer_host_handle_t (536-540)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (3)
  • FATAL (83-94)
  • INFO (35-44)
  • INFO (46-47)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (2)
  • remote_call_finish (551-567)
  • remote_call_finish (552-555)
ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (1)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1)
  • FATAL (83-94)
ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp (2)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (12)
  • apir_buffer_get_base (3-21)
  • apir_buffer_get_base (4-4)
  • apir_buffer_set_tensor (23-65)
  • apir_buffer_set_tensor (24-25)
  • apir_buffer_get_tensor (68-76)
  • apir_buffer_get_tensor (69-70)
  • apir_buffer_get_tensor (78-114)
  • apir_buffer_get_tensor (79-80)
  • apir_buffer_clear (117-132)
  • apir_buffer_clear (118-119)
  • apir_buffer_free_buffer (135-148)
  • apir_buffer_free_buffer (136-136)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (2)
  • start_timer (91-95)
  • stop_timer (98-108)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (3)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (3)
  • align64 (102-107)
  • INFO (35-44)
  • INFO (46-47)
ggml/src/ggml-remotingfrontend/virtgpu.h (1)
  • virtgpu_ioctl (101-105)
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (2)
  • util_sparse_array_get (125-186)
  • util_sparse_array_get (126-126)
ggml/src/ggml-remotingfrontend/virtgpu-shm.h (1)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (4)
  • virtgpu_shmem_create (79-111)
  • virtgpu_shmem_create (80-80)
  • virtgpu_shmem_destroy (71-77)
  • virtgpu_shmem_destroy (72-73)
ggml/src/ggml-remotingbackend/backend-dispatched.cpp (3)
ggml/src/ggml-backend-reg.cpp (4)
  • reg (241-254)
  • reg (241-241)
  • reg (309-332)
  • reg (309-309)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (4)
  • FATAL (83-94)
  • ERROR (72-81)
  • INFO (35-44)
  • INFO (46-47)
ggml/src/ggml-remotingbackend/backend-utils.h (2)
  • ERROR (52-55)
  • INFO (42-45)
ggml/src/ggml-remotingbackend/backend-utils.h (1)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (5)
  • INFO (35-44)
  • INFO (46-47)
  • WARNING (61-70)
  • ERROR (72-81)
  • FATAL (83-94)
tools/run/run.cpp (1)
src/llama-batch.cpp (2)
  • llama_batch_get_one (851-863)
  • llama_batch_get_one (851-853)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (6)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (7)
  • vn_encode (136-142)
  • vn_cs_decoder_use_inplace (81-92)
  • vn_cs_encoder_write (108-123)
  • vn_cs_decoder_read (98-106)
  • vn_encode_uint32_t (342-346)
  • vn_decode_uint32_t (348-352)
  • vn_decode_uint64_t_array_inplace (211-215)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (8)
  • serialize_tensor (17-46)
  • serialize_tensor (18-18)
  • deserialize_tensor (48-78)
  • deserialize_tensor (49-49)
  • serialize_graph (96-117)
  • serialize_graph (97-97)
  • deserialize_graph (144-167)
  • deserialize_graph (145-145)
ggml/src/ggml.c (2)
  • ggml_tensor_overhead (1356-1358)
  • ggml_init (1487-1527)
ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (4)
  • deserialize_tensor (33-68)
  • deserialize_tensor (34-34)
  • deserialize_graph (95-118)
  • deserialize_graph (96-96)
ggml/src/ggml-remotingbackend/backend-convert.h (2)
  • ggml_buffer_type_to_apir_handle (11-15)
  • ggml_buffer_to_apir_handle (5-9)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
  • FATAL (83-94)
  • WARNING (61-70)
ggml/src/ggml-remotingfrontend/virtgpu.h (1)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (10)
  • vn_log (347-361)
  • vn_log (348-348)
  • create_virtgpu (185-237)
  • create_virtgpu (186-186)
  • remote_call_prepare (510-549)
  • remote_call_prepare (511-514)
  • remote_call (569-668)
  • remote_call (570-575)
  • remote_call_finish (551-567)
  • remote_call_finish (552-555)
ggml/src/ggml-remotingfrontend/ggml-backend.cpp (3)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (2)
  • start_timer (91-95)
  • stop_timer (98-108)
ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (2)
  • apir_backend_graph_compute (9-54)
  • apir_backend_graph_compute (10-10)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)
  • ggml_backend_remoting_frontend_reg (132-159)
  • ggml_backend_remoting_frontend_reg (132-132)
ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (2)
ggml/src/ggml.c (7)
  • ggml_new_tensor_4d (1740-1749)
  • ggml_nbytes (1203-1226)
  • ggml_set_name (1808-1815)
  • ggml_tensor_overhead (1356-1358)
  • ggml_graph_overhead_custom (6700-6702)
  • ggml_init (1487-1527)
  • ggml_new_graph_custom (6708-6750)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (4)
  • deserialize_tensor (48-78)
  • deserialize_tensor (49-49)
  • create_node (119-142)
  • create_node (120-123)
ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (5)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)
  • vn_serialize_ggml_cgraph (140-145)
  • vn_encode_virtgpu_shmem_res_id (128-131)
  • vn_encode_cgraph_data (147-152)
  • vn_decode_ggml_status (121-124)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (4)
  • virtgpu_shmem_create (79-111)
  • virtgpu_shmem_create (80-80)
  • virtgpu_shmem_destroy (71-77)
  • virtgpu_shmem_destroy (72-73)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
  • WARNING (61-70)
  • FATAL (83-94)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (2)
  • vn_encode_size_t (387-392)
  • vn_cs_new_encoder (37-46)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (2)
  • remote_call_finish (551-567)
  • remote_call_finish (552-555)
ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp (3)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (16)
  • apir_device_get_name (27-53)
  • apir_device_get_name (28-28)
  • apir_device_get_description (55-77)
  • apir_device_get_description (56-56)
  • apir_device_get_type (79-102)
  • apir_device_get_type (80-80)
  • apir_device_get_memory (104-138)
  • apir_device_get_memory (105-105)
  • apir_device_supports_op (140-158)
  • apir_device_supports_op (141-141)
  • apir_device_get_props (178-201)
  • apir_device_get_props (179-183)
  • apir_device_get_buffer_type (160-176)
  • apir_device_get_buffer_type (161-161)
  • apir_device_buffer_from_ptr (203-237)
  • apir_device_buffer_from_ptr (204-206)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (2)
  • ggml_metal_device_supports_op (20-254)
  • ggml_metal_device_supports_op (20-20)
ggml/src/ggml-remotingfrontend/ggml-backend.cpp (2)
  • ggml_backend_remoting_device_init (73-87)
  • ggml_backend_remoting_device_init (73-73)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (5)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (4)
  • vn_encode_apir_buffer_host_handle_t (530-534)
  • vn_decode_uintptr_t (550-554)
  • vn_encode_size_t (387-392)
  • vn_encode_uint8_t (150-154)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (2)
  • remote_call_finish (551-567)
  • remote_call_finish (552-555)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (4)
  • INFO (35-44)
  • INFO (46-47)
  • FATAL (83-94)
  • WARNING (61-70)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (2)
  • vn_encode_ggml_tensor (39-44)
  • vn_encode_virtgpu_shmem_res_id (128-131)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (4)
  • virtgpu_shmem_create (79-111)
  • virtgpu_shmem_create (80-80)
  • virtgpu_shmem_destroy (71-77)
  • virtgpu_shmem_destroy (72-73)
ggml/src/ggml-remotingbackend/backend-dispatched.h (6)
ggml/src/ggml-remotingbackend/backend-dispatched.cpp (2)
  • backend_dispatch_initialize (19-47)
  • backend_dispatch_initialize (19-19)
ggml/src/ggml-remotingbackend/backend-dispatched-device.cpp (18)
  • backend_reg_get_device_count (9-18)
  • backend_reg_get_device_count (9-9)
  • backend_device_get_name (20-31)
  • backend_device_get_name (20-20)
  • backend_device_get_description (33-45)
  • backend_device_get_description (34-34)
  • backend_device_get_type (47-56)
  • backend_device_get_type (48-48)
  • backend_device_get_memory (58-70)
  • backend_device_get_memory (59-59)
  • backend_device_supports_op (72-83)
  • backend_device_supports_op (73-73)
  • backend_device_get_buffer_type (85-95)
  • backend_device_get_buffer_type (86-86)
  • backend_device_get_props (97-111)
  • backend_device_get_props (98-98)
  • backend_device_buffer_from_ptr (113-142)
  • backend_device_buffer_from_ptr (114-114)
ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp (10)
  • backend_buffer_type_get_name (9-22)
  • backend_buffer_type_get_name (10-10)
  • backend_buffer_type_get_alignment (24-34)
  • backend_buffer_type_get_alignment (25-25)
  • backend_buffer_type_get_max_size (36-46)
  • backend_buffer_type_get_max_size (37-37)
  • backend_buffer_type_is_host (48-58)
  • backend_buffer_type_is_host (49-49)
  • backend_buffer_type_alloc_buffer (60-81)
  • backend_buffer_type_alloc_buffer (61-61)
ggml/src/ggml-remotingbackend/backend-dispatched-buffer.cpp (10)
  • backend_buffer_get_base (12-22)
  • backend_buffer_get_base (13-13)
  • backend_buffer_set_tensor (24-71)
  • backend_buffer_set_tensor (25-25)
  • backend_buffer_get_tensor (73-109)
  • backend_buffer_get_tensor (74-74)
  • backend_buffer_clear (111-125)
  • backend_buffer_clear (112-112)
  • backend_buffer_free_buffer (127-143)
  • backend_buffer_free_buffer (128-128)
ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (2)
  • backend_graph_compute (13-58)
  • backend_graph_compute (14-14)
ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (2)
  • backend_metal_get_device_context (14-41)
  • backend_metal_get_device_context (15-15)
ggml/src/ggml-remotingfrontend/ggml-remoting.h (4)
ggml/src/ggml-remotingbackend/backend-convert.h (2)
  • ggml_buffer_type_to_apir_handle (11-15)
  • ggml_buffer_to_apir_handle (5-9)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)
  • ggml_backend_remoting_get_device (49-52)
  • ggml_backend_remoting_get_device (49-49)
ggml/src/ggml-remotingfrontend/ggml-backend.cpp (2)
  • ggml_backend_remoting_device_init (73-87)
  • ggml_backend_remoting_device_init (73-73)
ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp (2)
  • ggml_backend_remoting_device_get_buffer_type (129-144)
  • ggml_backend_remoting_device_get_buffer_type (130-130)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (4)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (10)
  • vn_decode_int32_t (235-239)
  • vn_decode_array_size_unchecked (293-299)
  • vn_cs_decoder_alloc_array (493-498)
  • vn_decode_char_array (466-476)
  • vn_decode_uint32_t (348-352)
  • vn_decode_size_t (394-400)
  • vn_decode_bool_t (508-512)
  • vn_decode_apir_buffer_type_host_handle_t (522-526)
  • vn_encode_size_t (387-392)
  • vn_decode_apir_buffer_host_handle_t (536-540)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (3)
  • INFO (35-44)
  • INFO (46-47)
  • FATAL (83-94)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (2)
  • remote_call_finish (551-567)
  • remote_call_finish (552-555)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (2)
  • virtgpu_shmem_create (79-111)
  • virtgpu_shmem_create (80-80)
ggml/src/ggml-remotingbackend/backend-dispatched-device.cpp (3)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (7)
  • vn_encode_int32_t (229-233)
  • vn_encode_array_size (274-278)
  • vn_encode_char_array (459-464)
  • vn_encode_uint32_t (342-346)
  • vn_encode_size_t (387-392)
  • vn_encode_bool_t (502-506)
  • vn_decode_size_t (394-400)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)
  • vn_decode_ggml_tensor_inplace (212-236)
  • vn_encode_ggml_buffer_type (69-73)
  • vn_decode_virtgpu_shmem_res_id (133-136)
  • vn_encode_ggml_buffer (98-102)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1)
  • FATAL (83-94)
ggml/src/ggml-remotingfrontend/virtgpu-forward.h (5)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (18)
  • apir_device_get_count (3-25)
  • apir_device_get_count (4-4)
  • apir_device_get_name (27-53)
  • apir_device_get_name (28-28)
  • apir_device_get_description (55-77)
  • apir_device_get_description (56-56)
  • apir_device_get_type (79-102)
  • apir_device_get_type (80-80)
  • apir_device_get_memory (104-138)
  • apir_device_get_memory (105-105)
  • apir_device_supports_op (140-158)
  • apir_device_supports_op (141-141)
  • apir_device_get_buffer_type (160-176)
  • apir_device_get_buffer_type (161-161)
  • apir_device_get_props (178-201)
  • apir_device_get_props (179-183)
  • apir_device_buffer_from_ptr (203-237)
  • apir_device_buffer_from_ptr (204-206)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (10)
  • apir_buffer_type_get_name (3-29)
  • apir_buffer_type_get_name (4-4)
  • apir_buffer_type_get_alignment (31-51)
  • apir_buffer_type_get_alignment (32-32)
  • apir_buffer_type_get_max_size (53-73)
  • apir_buffer_type_get_max_size (54-54)
  • apir_buffer_type_is_host (75-95)
  • apir_buffer_type_is_host (76-76)
  • apir_buffer_type_alloc_buffer (97-119)
  • apir_buffer_type_alloc_buffer (98-98)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (12)
  • apir_buffer_get_base (3-21)
  • apir_buffer_get_base (4-4)
  • apir_buffer_set_tensor (23-65)
  • apir_buffer_set_tensor (24-25)
  • apir_buffer_get_tensor (68-76)
  • apir_buffer_get_tensor (69-70)
  • apir_buffer_get_tensor (78-114)
  • apir_buffer_get_tensor (79-80)
  • apir_buffer_clear (117-132)
  • apir_buffer_clear (118-119)
  • apir_buffer_free_buffer (135-148)
  • apir_buffer_free_buffer (136-136)
ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (2)
  • apir_backend_graph_compute (9-54)
  • apir_backend_graph_compute (10-10)
ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (2)
  • apir_metal_get_device_context (3-20)
  • apir_metal_get_device_context (4-4)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (3)
ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (2)
  • apir_metal_get_device_context (3-20)
  • apir_metal_get_device_context (4-4)
ggml/src/ggml.c (5)
  • ggml_get_unary_op (1794-1797)
  • ggml_is_contiguous (1386-1388)
  • ggml_get_glu_op (1799-1802)
  • ggml_is_contiguous_1 (1394-1396)
  • ggml_is_contiguous_rows (1419-1423)
ggml/src/ggml-impl.h (1)
  • ggml_get_op_params_i32 (151-154)
ggml/src/ggml-remotingbackend/backend-dispatched-buffer.cpp (5)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (3)
  • vn_decode_ggml_buffer (104-112)
  • vn_decode_ggml_tensor (46-236)
  • vn_decode_virtgpu_shmem_res_id (133-136)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (3)
  • vn_encode_uintptr_t (544-548)
  • vn_decode_size_t (394-400)
  • vn_decode_uint8_t (156-160)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (2)
  • start_timer (91-95)
  • stop_timer (98-108)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
  • FATAL (83-94)
  • WARNING (61-70)
ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (2)
  • untrack_backend_buffer (17-26)
  • untrack_backend_buffer (18-18)
ggml/src/ggml-remotingbackend/backend-internal.h (2)
ggml/src/ggml-remotingbackend/backend.cpp (6)
  • apir_backend_initialize (49-114)
  • apir_backend_initialize (49-49)
  • apir_backend_deinit (23-47)
  • apir_backend_deinit (23-23)
  • apir_backend_dispatcher (116-150)
  • apir_backend_dispatcher (116-119)
ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (1)
  • ggml_backend_metal_get_device_context_fct (9-12)
ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp (3)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (2)
  • vn_decode_ggml_buffer_type (75-82)
  • vn_encode_ggml_buffer (98-102)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (5)
  • vn_encode_array_size (274-278)
  • vn_encode_char_array (459-464)
  • vn_encode_size_t (387-392)
  • vn_encode_bool_t (502-506)
  • vn_decode_size_t (394-400)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (2)
  • track_backend_buffer (12-15)
  • track_backend_buffer (13-13)
ggml/src/ggml-remotingfrontend/ggml-backend-host-buffer-type.cpp (3)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (4)
  • WARNING (61-70)
  • FATAL (83-94)
  • INFO (35-44)
  • INFO (46-47)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (2)
  • virtgpu_shmem_destroy (71-77)
  • virtgpu_shmem_destroy (72-73)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (2)
  • apir_device_buffer_from_ptr (203-237)
  • apir_device_buffer_from_ptr (204-206)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (3)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (14)
  • serialize_tensor (17-46)
  • serialize_tensor (18-18)
  • serialize_graph (96-117)
  • serialize_graph (97-97)
  • track_backend_buffer (12-15)
  • track_backend_buffer (13-13)
  • add_tensor (80-94)
  • add_tensor (81-81)
  • deserialize_tensor (48-78)
  • deserialize_tensor (49-49)
  • create_node (119-142)
  • create_node (120-123)
  • deserialize_graph (144-167)
  • deserialize_graph (145-145)
ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (6)
  • serialize_tensor (12-48)
  • serialize_tensor (13-13)
  • serialize_graph (66-87)
  • serialize_graph (67-67)
  • add_tensor (50-64)
  • add_tensor (51-51)
ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (12)
  • track_backend_buffer (12-15)
  • track_backend_buffer (13-13)
  • untrack_backend_buffer (17-26)
  • untrack_backend_buffer (18-18)
  • get_track_backend_buffers (28-31)
  • get_track_backend_buffers (29-29)
  • deserialize_tensor (33-68)
  • deserialize_tensor (34-34)
  • create_node (70-93)
  • create_node (71-74)
  • deserialize_graph (95-118)
  • deserialize_graph (96-96)
ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (5)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (2)
  • start_timer (91-95)
  • stop_timer (98-108)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (3)
  • vn_decode_virtgpu_shmem_res_id (133-136)
  • vn_decode_ggml_cgraph (154-167)
  • vn_encode_ggml_status (116-119)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
  • FATAL (83-94)
  • ERROR (72-81)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (2)
  • vn_decode_size_t (394-400)
  • vn_cs_new_decoder (27-35)
ggml/src/ggml.c (2)
  • ggml_graph_node (6885-6893)
  • ggml_op_desc (1273-1283)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (8)
  • thks_bye (189-195)
  • thks_bye (189-189)
  • breakpoint (197-200)
  • breakpoint (197-197)
  • util_sparse_array_get (125-186)
  • util_sparse_array_get (126-126)
  • util_sparse_array_init (33-41)
  • util_sparse_array_init (34-35)
ggml/src/ggml-remotingbackend/backend-utils.h (3)
  • INFO (42-45)
  • WARNING (47-50)
  • ERROR (52-55)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (6)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (7)
  • FATAL (83-94)
  • INFO (35-44)
  • INFO (46-47)
  • ERROR (72-81)
  • WARNING (61-70)
  • MESSAGE (50-59)
  • os_time_sleep (126-133)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (3)
  • vn_encode_uint32_t (342-346)
  • vn_decode_uint32_t (348-352)
  • vn_encode_int32_t (229-233)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (3)
  • apir_backend_initialize_error (123-139)
  • start_timer (91-95)
  • stop_timer (98-108)
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (2)
  • util_sparse_array_init (33-41)
  • util_sparse_array_init (34-35)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (2)
  • virtgpu_shmem_create (79-111)
  • virtgpu_shmem_create (80-80)
ggml/src/ggml-remotingfrontend/virtgpu.h (2)
  • vn_log (55-98)
  • virtgpu_ioctl (101-105)
ggml/src/ggml-remotingfrontend/ggml-backend-buffer-type.cpp (3)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1)
  • FATAL (83-94)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (2)
  • apir_device_buffer_from_ptr (203-237)
  • apir_device_buffer_from_ptr (204-206)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (8)
  • apir_buffer_type_get_name (3-29)
  • apir_buffer_type_get_name (4-4)
  • apir_buffer_type_get_alignment (31-51)
  • apir_buffer_type_get_alignment (32-32)
  • apir_buffer_type_get_max_size (53-73)
  • apir_buffer_type_get_max_size (54-54)
  • apir_buffer_type_is_host (75-95)
  • apir_buffer_type_is_host (76-76)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (4)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (2)
  • create_virtgpu (185-237)
  • create_virtgpu (186-186)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (2)
  • apir_device_get_count (3-25)
  • apir_device_get_count (4-4)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (2)
  • get_metal_dev_context (4-18)
  • get_metal_dev_context (4-4)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (1)
  • show_timer (110-121)
ggml/src/ggml-remotingbackend/backend.cpp (6)
ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (4)
  • get_track_backend_buffers (28-31)
  • get_track_backend_buffers (29-29)
  • untrack_backend_buffer (17-26)
  • untrack_backend_buffer (18-18)
ggml/src/ggml-remotingbackend/backend-utils.h (2)
  • INFO (42-45)
  • ERROR (52-55)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (1)
  • show_timer (110-121)
ggml/src/ggml-remotingbackend/backend-dispatched.cpp (3)
  • ggml_backend_reg_fct (23-23)
  • backend_dispatch_initialize (19-47)
  • backend_dispatch_initialize (19-19)
ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (1)
  • ggml_backend_metal_get_device_context_fct (9-12)
ggml/src/ggml-remotingbackend/backend-dispatched.h (1)
  • backend_dispatch_command_name (52-88)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (3)
ggml/src/ggml.c (7)
  • ggml_new_tensor_4d (1740-1749)
  • ggml_nbytes (1203-1226)
  • ggml_set_name (1808-1815)
  • ggml_tensor_overhead (1356-1358)
  • ggml_graph_overhead_custom (6700-6702)
  • ggml_init (1487-1527)
  • ggml_new_graph_custom (6708-6750)
ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (2)
  • serialize_tensor (12-48)
  • serialize_tensor (13-13)
ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (2)
  • deserialize_tensor (33-68)
  • deserialize_tensor (34-34)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (1)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1)
  • FATAL (83-94)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (2)
ggml/src/ggml-remotingbackend/backend-utils.h (1)
  • INFO (42-45)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
  • INFO (35-44)
  • INFO (46-47)
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (1)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
  • INFO (35-44)
  • INFO (46-47)
🪛 Shellcheck (0.11.0)
prepare.backend.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

build.vulkan.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

build.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

build.remoting.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

build.backend.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)


[warning] 13-13: Declare and assign separately to avoid masking return values.

(SC2155)

prepare.remoting.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

prepare.vulkan.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

prepare.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

run.ramalama.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

🔇 Additional comments (25)
ggml/src/ggml-remotingfrontend/include/venus_hw.h (1)

1-22: I need to gather more information to verify the vendoring and attribution. Let me try alternative approaches:

Now let me check the full file content in the repository:

Based on my verification, the copyright and attribution appear legitimate. Venus is a Virtio-GPU protocol for Vulkan command serialization, with protocol definition hosted at venus-protocol and the renderer at virglrenderer. This code originates from the Chromium/ChromiumOS ecosystem, where it has been developed as part of established virtualization infrastructure.

The MIT-style license is appropriate for permissive open-source work. The attribution is clear and complete in the header.

ggml/src/ggml-remotingfrontend/virtgpu-utils.h (3)

1-29: LGTM!

The header setup, branch prediction macros, and bit manipulation utilities follow standard patterns and are correctly implemented.


34-48: LGTM!

The conditional compilation of the INFO macro based on NDEBUG is a standard pattern for debug logging that optimizes away in release builds.


126-133: Verify error handling is sufficient for your use case.

The function correctly handles EINTR by looping with the remaining time. However, other error codes from clock_nanosleep (such as EINVAL or EFAULT) are silently ignored. Consider whether these errors should be logged or handled.

If error handling is needed, consider:

 inline void
 os_time_sleep(int64_t usecs)
 {
    struct timespec time;
    time.tv_sec = usecs / 1000000;
    time.tv_nsec = (usecs % 1000000) * 1000;
-   while (clock_nanosleep(CLOCK_MONOTONIC, 0, &time, &time) == EINTR);
+   int ret;
+   while ((ret = clock_nanosleep(CLOCK_MONOTONIC, 0, &time, &time)) == EINTR);
+   if (ret != 0) {
+       ERROR("clock_nanosleep failed with error: %d", ret);
+   }
 }
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (4)

1-5: LGTM!

The includes are appropriate for the utility implementations in this file.


33-41: LGTM!

The initialization correctly zeros the structure, sets the element size, and validates that node_size is a power of two >= 2.


53-186: LGTM!

The sparse array implementation correctly handles:

  • Thread-safe lazy allocation using atomic compare-and-swap operations
  • Dynamic tree growth when indices exceed current capacity
  • Race conditions by freeing the losing allocation in _util_sparse_array_set_or_free_node
  • Level-based traversal for efficient sparse storage

The comments clearly explain the design choices, particularly around single-node growth for correctness.


197-200: LGTM!

The breakpoint() function serves as a convenient debug hook. The inline INFO call and comment make the intent clear.

src/llama-model.cpp (1)

6447-6447: Early returns in print_info() functions are correct and necessary; the review comment's suggestion is based on a misconception.

The logging infrastructure does not automatically filter by log level in the default case. While callbacks can implement level-based filtering (as seen in common/common.cpp and tools/run/run.cpp), the default callback (llama_log_callback_default) ignores the level parameter and always prints to stderr. The early returns are the only mechanism preventing unconditional verbose output during model loading. Removing them without implementing a filtering callback would cause all model info to print unconditionally, breaking default behavior for CLI tools and applications that don't set a custom logging callback.

Likely an incorrect or invalid review comment.

ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h (3)

66-1360: LGTM: DRM UAPI definitions are well-formed.

The struct, enum, and constant definitions follow standard kernel UAPI conventions with appropriate types for kernel/user-space compatibility. However, ensure these definitions match the kernel version used by the remoting backend to avoid ABI mismatches.


1361-1408: LGTM: Typedef aliases and header closing are correct.

The typedef aliases provide backward compatibility with older code, and the header closing with extern "C" and include guard is proper.


1025-1025: ****

The #include "drm_mode.h" at line 1025 is not a missing dependency issue. This is a standard Linux kernel DRM UAPI header expected to come from the system's libdrm development package. The CMakeLists.txt correctly configures this by:

  • Adding /usr/include/libdrm/ to include directories
  • Linking against the system drm library
  • Documenting the dependency: # dnf install -y libdrm-devel

The repository intentionally omits standard kernel UAPI headers and only provides vendor-specific extensions (like virtgpu_drm.h). When libdrm-devel is installed, the compiler will locate drm_mode.h in the system include path during the local search initiated by the quoted include directive. This is proper design practice.

Likely an incorrect or invalid review comment.

ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h (2)

40-227: LGTM: Virtgpu struct and constant definitions are well-formed.

The virtgpu-specific command codes, flags, structs, and constants follow proper kernel UAPI conventions with appropriate sized types and clear field documentation.


229-276: LGTM: IOCTL definitions and header closing are correct.

The IOCTL definitions properly use DRM_COMMAND_BASE from drm.h and follow standard kernel UAPI patterns. Header closing is well-formed.

OWNERS (1)

1-13: LGTM!

The OWNERS file follows standard format and clearly defines approvers and reviewers.

ggml/src/ggml-remotingbackend/CMakeLists.txt (1)

1-21: LGTM!

The CMake configuration correctly sets up the remoting backend library with appropriate C++20 compilation flags.

.gitignore (1)

47-47: LGTM!

The simplified build-* pattern is clearer and aligns well with the new build scripts introduced in this PR.

ggml/src/ggml-metal/ggml-metal-device.m (1)

336-338: LGTM!

Disabling verbose debug logging in the pipeline compilation path is appropriate for reducing log noise in production builds.

Also applies to: 361-365

ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (1)

3-20: Verify error handling is sufficient.

The function returns true unconditionally (line 19), even though the REMOTE_CALL macro populates a ret variable (line 11) that appears to indicate success or failure. If the macro doesn't FATAL on all error conditions, the function might return success after a failed remote call.

Based on the macro definition in ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h, the REMOTE_CALL macro does call FATAL on errors (lines 25, 28), so the unconditional return should be safe. However, please verify that all error paths are covered by the macro's validation.

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (1)

1-45: LGTM!

The header properly declares the RPC serialization interface. The explicit padding field (line 21) is good practice for ensuring consistent struct layout across different compilers and platforms.

src/llama-kv-cache.cpp (1)

126-128: LGTM!

Disabling the verbose per-layer device logging is consistent with the PR's goal to reduce noisy debug output. The logging can be re-enabled if needed by uncommenting these lines.

CMakePresets.json (1)

33-34: LGTM!

The new remoting presets follow the established pattern and correctly enable the corresponding cache variables for the remoting frontend and backend builds.

ggml/CMakeLists.txt (1)

322-323: Public header exposure LGTM.

Adding include/ggml-remoting-frontend.h to public headers looks correct.

Please confirm the header actually exists at ggml/include/ggml-remoting-frontend.h in this PR.

ggml/src/ggml-backend-reg.cpp (1)

76-79: Remoting frontend include/registration LGTM; verify define wiring.

Use of GGML_USE_REMOTINGFRONTEND matches ggml_add_backend expectations, contingent on the CMake alias fix.

Also applies to: 207-210

ggml/src/ggml-remotingfrontend/ggml-metal-remoting.h (1)

1-16: LGTM!

The header is well-structured with clear type definitions and function declarations for the Metal remoting backend interface.

@topsail-bot
Copy link

topsail-bot bot commented Nov 4, 2025

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 09 minutes 53 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail

@crc-org crc-org deleted a comment from openshift-ci bot Nov 4, 2025
@kpouget
Copy link
Collaborator Author

kpouget commented Nov 4, 2025

test passed ❤️ , merging

@kpouget kpouget merged commit f57bace into crc-org:main Nov 4, 2025
2 of 3 checks passed
@kpouget kpouget deleted the rebase-b6945 branch November 4, 2025 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant