Reshape the commits on top of `b6298` #12

kpouget · 2025-08-29T12:09:27Z

Summary by CodeRabbit

New Features
- Added diffusion text-generation CLI and examples.
- Introduced Granite and GPT-OSS chat formats with template kwargs and BOS/EOS controls.
- New model-conversion toolkit with logits/embeddings verification, quantization, and HF upload helpers.
- WebGPU (Dawn) build paths added; CANN runtime images introduced.
Bug Fixes
- More robust tool-call argument parsing.
- Safer tokenization and eval-callback handling.
- Improved embeddings handling for SEP/EOS cases.
Documentation
- New GGML ops matrix, expanded build guides (s390x, WebGPU), multimodal docs, and README updates.
Chores
- CI/CD workflows added/updated; Vulkan/ROCm/MUSA versions refreshed; Makefile deprecated in favor of CMake.

* support hunyuan_v1_dense Signed-off-by: stevenkuang <stevenkuang@tencent.com> * update hunyuan_moe to hunyuan_v1_moe Signed-off-by: stevenkuang <stevenkuang@tencent.com> * fix rope alpha assert and bos token Signed-off-by: stevenkuang <stevenkuang@tencent.com> * add blank line Signed-off-by: stevenkuang <stevenkuang@tencent.com> * Revert "update hunyuan_moe to hunyuan_v1_moe" This reverts commit aa973ca21913aba77f6e81a935270ef7be222e75. * use hunyuan_dense instead of hunyuan_v1_dense Signed-off-by: stevenkuang <stevenkuang@tencent.com> * fix hunyuan_moe chat template Signed-off-by: stevenkuang <stevenkuang@tencent.com> * remove leftover code Signed-off-by: stevenkuang <stevenkuang@tencent.com> * update hunyuan dense chat template Signed-off-by: stevenkuang <stevenkuang@tencent.com> * fix hunyuan dense vocab and chat template Signed-off-by: stevenkuang <stevenkuang@tencent.com> --------- Signed-off-by: stevenkuang <stevenkuang@tencent.com>

* vendor : update vendored copy of google/minja Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com> * Re-remove trailing whitespace Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com> * Remove another trailing whitespace Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com> --------- Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com>

* vulkan: optimizations for direct convolution - Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill the GPU. The new size should be amenable to using coopmat, too. - Fix shmem bank conflicts. 16B padding should work with coopmat. - Some explicit loop unrolling. - Skip math/stores work for parts of the tile that are OOB. - Apply fastdiv opt. - Disable shuffles for NV. * Three tiles sizes for CONV_2D, and a heuristic to choose * reallow collectives for pre-Turing * make SHMEM_PAD a spec constant * fixes for intel perf - no shmem padding, placeholder shader core count * shader variants with/without unrolling * 0cc4m's fixes for AMD perf Co-authored-by: 0cc4m <picard12@live.de> --------- Co-authored-by: 0cc4m <picard12@live.de>

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

- Increase tile size for k-quants, to match non-k-quants - Choose more carefully between large and medium tiles, considering how it interacts with split_k - Allow larger/non-power of two split_k, and make the splits a multiple of 256 - Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

* torch is not required for convert_hf_to_gguf_update * add --check-missing parameter * check that pre-tokenizer hashes are up-to-date

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ggml-ci * cont : fix cont types ggml-ci * cont : adopt variable names and comment from the other branch

ggml-ci

…5040) This commit removes the right alignment the `n_stream` value in the log message in the `llama_kv_cache_unified` constructor. The motivation for this change is to enhance the readability of log message. Currently the output looks like this: ```console llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/ 1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB ``` Notice that the `n_stream` value is right aligned, which makes it a little harder to read. With the change in this commit the output will look like ```console llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB ```

… text_config) (#15051) * basic kimi-vl textmodel conversion * check config["text_config"] for special tokens

…(#14994) * imatrix : use a single count for dense 3d tensors * imatrix : fix 3d activations when model tensor is 2d * imatrix : fix 3d tensor counts

* imatrix : use GGUF by default * imatrix : use GGUF regardless of the output filename The legacy format can only be produced with --output-format dat

…#15062)

* Add parameter buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow

* model: Add GLM 4.5 (#14921) Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Merge in PR suggestions Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * model: Add GLM 4.5 family of models (#14921) 1. Updated tensor_mapping.py with NextN tensor mappings - Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py - Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm 2. Added num_nextn_predict_layers configuration - Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp - Added num_nextn_predict_layers field to llama_hparams struct - Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter - Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers - Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method - Updated conversion script to extract and write this parameter from HuggingFace config 3. Added FIM tokens for GLM4_MOE - Added GLM-4.5's FIM tokens to llama-vocab.cpp: - <|code_prefix|> for FIM_PRE - <|code_suffix|> for FIM_SUF - <|code_middle|> for FIM_MID 4. Removed manual NextN tensor handling - Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors - NextN tensors are now handled automatically through the proper tensor mapping system * glm 4.5 update tensors names * model: glm 4.5 apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * model: glm 4.5 apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * model: glm 4.5 apply suggestions from code review * Apply suggestions from code review * patch broken chat template * typings fix * add TENSOR_SKIP flag Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update src/llama-model-loader.h Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Diego Devesa <slarengh@gmail.com>

* convert : fix tensor naming conflict for llama 4 vision * convert ok * support kimi vision model * clean up * fix style * fix calc number of output tokens * refactor resize_position_embeddings * add test case * rename build fn * correct a small bug

* metal : optmize FA vec for large heads and sequences * metal : adjust small-batch mul mv kernels ggml-ci * batched-bench : fix total speed computation ggml-ci * cont : add comments ggml-ci

This commit adds two targets to the Makefile for quantizing of Quantization Aware Trained (QAT) models to Q4_0 format. The motivation for this is that this sets the token embedding and the output tensors data types to Q8_0 instead of the default Q6_K. This is someting that we wish to enforce for QAT Q4_0 models that are to be uploaded to ggml-org on Huggingface to guarantee the best quality.

ggml-ci

This patch improves GEMM for FP32 Data Type on PowerPC Implements GEMM on large blocks with configurable block size mc, nc, kc (default: 256, 256, 256). Packing Function optimized to access blocks as per memory layout. GEMM Optimized to work on larger blocks. Isolated Packing from GEMM Operations for better MMA utilization. Verified functionality and correctness uing llama-cli and stand alone test case (performs matmul and compares final mattrix C result with base). Minor code refactoring changes: Replace macro with inline function Code Indent made consistent with 4 spaces Performance Testing: Observed 50% ~ 70% improvement in Prompt Processing Speed mesured using llama-bench with Meta-Llama3-8B FP32 Model. Similar gains observed with Mistral-7b-Instruct-v0.3 Model. model Size Params Backend Threads Test Patch Base llama 8B all F32 29.92 GiB 8.03 B CPU 20 pp512 98.58 60.3 llama 8B all F32 29.92 GiB 8.03 B CPU 20 pp1024 95.88 57.36 llama 8B all F32 29.92 GiB 8.03 B CPU 20 pp2048 85.46 53.26 llama 8B all F32 29.92 GiB 8.03 B CPU 20 pp4096 68.66 45.78 llama 8B all F32 29.92 GiB 8.03 B CPU 20 pp6144 57.35 40.44 25 ~ 30% improvement in llama-batched-bench with Metla-Llama3-8B in Prompt Processing Speed for large prompts (256, 512, 1024, 2048, 4096)tokens with various batch sizes ( 1, 2, 4, 8, 16) Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>

…#15592) The original implementation unconditionally returned true for this operation, leading to a failure when the tensor's first dimension (ne[0]) was not a multiple of WARP_SIZE. This caused an GGML_ASSERT(ncols % WARP_SIZE == 0) failure in ggml-sycl/norm.cpp. This change updates the ggml_backend_sycl_device_supports_op check to correctly return true for GGML_OP_RMS_NORM only when the first dimension of the tensor is a multiple of WARP_SIZE, ensuring the operation can be performed without error.

* add fused group_norm/norm, mul, add * fix spacing * revert rms_norm logic * fix trailing whitespace

This commit updates the bash completion script to include the -m short option for the --model argument. The motivation for this is that currently tab completion only works the full --model option, and it is nice to have it work for the short option as well.

* ggml-cpu : add basic RVV support for vector f32 ops * ggml-cpu : add RVV support for f32 softmax

* CANN(flash-attn): refactor mask handling and improve performance 1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode. 2. Optimized performance in non-alibi scenarios by reducing one repeat operation. 3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16. Signed-off-by: noemotiovon <757486878@qq.com> * [CANN]: fix review Signed-off-by: noemotiovon <757486878@qq.com> * [CANN]: Optimization FA BNSD to BSND Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>

…e_context function for API Remoting

openshift-merge-robot · 2025-08-29T12:09:36Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2025-08-29T12:09:38Z

Caution

Review failed

The pull request is closed.

Walkthrough

Introduces new diffusion CLI and model-conversion tooling, extends chat formats (Granite, GPT-OSS), adds speculative drafting across incompatible vocabularies, broadens CLI/common params (diffusion, LR/optimizer, server/API), updates build/CI (Makefile stub, CMake presets, multiple workflows), revises Dockerfiles, refreshes docs/templates/labels, and adds ownership/config files.

Changes

Cohort / File(s)	Summary
Build system & packaging `CMakeLists.txt`, `CMakePresets.json`, `Makefile`, `common/CMakeLists.txt`	Version var tweaks, log build type, remove KOMPUTE deprecation, add gcc/linux presets, add remoting presets; replace Makefile with CMake-only stub; CURL linking via CURL_LIBRARIES; llguidance ext updated.
Common params, CLI, and utilities `common/common.h`, `common/common.cpp`, `common/arg.cpp`, `common/json-schema-to-grammar.cpp`	Adds diffusion, LR/optimizer, kv_unified, API prefix, template kwargs, EOG bias handling; new string utility; tokenizer overflow guard; centralized tensor buffer overrides; deprecate defrag-thold; replace custom string_view with std::string_view.
Chat formats & templating `common/chat.h`, `common/chat.cpp`, `common/chat-parser.cpp`	Adds Granite and GPT-OSS formats; new template inputs (add_bos/eos, kwargs, extra_context); reasoning format AUTO + mapper; safer tool_call arguments serialization.
Speculative drafting API `common/speculative.h`, `common/speculative.cpp`	Split into target/draft contexts, add vocab-compat checks, retokenization with replacement map; new init signature and replacement API.
Conversion and tokenizer tooling `convert_hf_to_gguf_update.py`, `convert_lora_to_gguf.py`	Add --check-missing mode; expand models/hashes; robust download/token logic; adjust ModelBase.load_hparams call signature.
Examples: diffusion `examples/diffusion/*`	New diffusion CLI, CMake, and README for diffusion-based generation with multiple algorithms/schedules.
Examples: model-conversion suite `examples/model-conversion/**`	New end-to-end conversion/quantization/verification workflows, CMake target (llama-logits), Makefile, scripts (HF hub ops, logits/embeddings checks, perplexity), requirements.
Examples: misc `examples/embedding/embedding.cpp`, `examples/eval-callback/eval-callback.cpp`, `examples/*.sh`, `examples/llama.vim`, `examples/batched.swift/README.md`, `examples/lookahead/README.md`	Embedding: cls_sep handling, unified KV when n_parallel=1; eval-callback: I64 print, NaN check, empty-input guard; several shebangs to env bash; sample flags updated; added sample commands.
Docker & DevOps containers `.devops/*.Dockerfile`, `.devops/tools.sh`	New CANN multi-stage Dockerfile; CPU Dockerfile simplifies arch handling; CUDA adds pip --break-system-packages; MUSA and ROCm versions bumped; Vulkan SDK install switches to tarball + env; tools.sh shebang via env.
CI/workflows `.github/workflows/*`, `.devops/cloud-v-pipeline`	Add packaging, RISC-V native, pre-tokenizer-hashes, copilot setup, ops-docs sync; update main/release workflows (ccache action swap, RPATH flags, runners, WebGPU/Dawn jobs); disable Vulkan cross-builds; remove old Jenkins node.
GitHub repo meta `.github/ISSUE_TEMPLATE/*`, `.github/labeler.yml`, `.github/copilot-instructions.md`, `CODEOWNERS`, `OWNERS`	Update backend lists (add OpenCL, zDNN; remove Kompute); add labels for zDNN/OpenCL; add Copilot instructions; adjust code ownership; add OWNERS approvers/reviewers.
Docs: backends & build `docs/backend/.md`, `docs/build.md`, `docs/docker.md`	CANN note on NZ weights; SYCL flag description tweak; build docs updated (curl deps, Vulkan/WebGPU sections, Windows/Linux notes); Docker images list changes (SYCL→Vulkan); MUSA tag bump in CI doc.
Docs: multimodal `docs/multimodal/*.md`	Add MiniCPM-V 4/4.5 and Voxtral docs; relocate legacy MiniCPM scripts; remove image norm flags; repo URL updates.
Ops table & enforcement `docs/ops.md`, `.github/workflows/update-ops-docs.yml`, `scripts/create_ops_docs.py` (referenced)	New ops support matrix doc and workflow to enforce sync with generator script.
Repo config `.clang-format`, `.gitignore`, `.gitmodules`, `README.md`, `ci/run.sh`, `build-xcframework.sh`, `ci/README.md`	Format config categories updated; unignore models/templates and ignore .ccache; remove kompute submodule; README reorg (backends incl. WebGPU, hot topics); ci scripts: webgpu flag, wget resume, prompt tweak; env shebangs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant CLI as llama-diffusion-cli
  participant L as LLaMA Model (ctx)
  participant S as Samplers/CFG

  U->>CLI: Invoke with model, prompt, diffusion params
  CLI->>L: Load model + vocab, create context
  CLI->>CLI: Tokenize/format input (optional chat template)
  loop steps 1..N
    CLI->>L: Decode (conditional/unconditional if CFG)
    L-->>CLI: Logits
    CLI->>S: Sample/top-k/p + algorithm scoring
    S-->>CLI: Tokens to transfer/replace
    CLI->>CLI: Update masked sequence
    alt visual mode
      CLI-->>U: Render progress/text
    end
  end
  CLI->>L: Detokenize final tokens
  L-->>CLI: Text
  CLI-->>U: Output generated text + timings

sequenceDiagram
  autonumber
  actor App as Caller
  participant Spec as common_speculative
  participant T as Target ctx_tgt
  participant D as Draft ctx_dft

  App->>Spec: init(ctx_tgt, ctx_dft)
  Spec->>Spec: Check vocab compatibility
  App->>Spec: add_replacement_tgt_dft(map)
  App->>Spec: gen_draft(prompt_tgt)
  alt compatible vocab
    Spec->>D: Decode on draft using target tokens
  else incompatible
    Spec->>Spec: Detokenize target → replace → retokenize for draft
    Spec->>D: Decode on draft with prompt_dft
  end
  D-->>Spec: Draft tokens
  alt incompatible
    Spec->>Spec: Detokenize draft → replace → retokenize for target
  end
  Spec-->>App: Draft tokens in target vocab

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

OWNERS: add file for openshift CI #3 — Adds the same OWNERS file with identical approvers/reviewers; overlaps repository governance changes.
OWNERS: Update #8 — Modifies the OWNERS file; directly related to ownership/governance updates introduced here.

Suggested reviewers

cfergeau
praveenkumar

Poem

Hop, hop—new gears align,
Diffusion dreams and chats combine.
Two vocab burrows, retokenize!
Our CI sky has brighter skies.
Docker carts roll, presets chime—
gguf to stars, one hop at a time. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

openshift-ci · 2025-08-29T12:09:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cfergeau for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lhez and others added 30 commits August 1, 2025 13:15

opencl: add f16 for add, sub, mul, div (#14984)

1c872f7

CUDA: fix MMQ nwarps for AMD with warp_size==32 (#15014)

9c35706

server: enable token array inputs for OAI API (#15001)

f906275

model : support Qwen3-Embedding (#15023)

339bd02

vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015)

ec0b188

llama-bench: rename DB table name from test to llama_bench (#15003)

3025b62

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

chat : fix multiple tool_calls on hermes-2-pro (#14962)

f738989

convert : fix Qwen3-Embedding pre-tokenizer hash (#15030)

711d5e6

ci : check that pre-tokenizer hashes are up-to-date (#15032)

2bf3fbf

* torch is not required for convert_hf_to_gguf_update * add --check-missing parameter * check that pre-tokenizer hashes are up-to-date

cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038)

15e92fd

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ggml-ci * cont : fix cont types ggml-ci * cont : adopt variable names and comment from the other branch

llama : enable LLAMA_SET_ROWS=1 by default (#14959)

a4569c4

ggml-ci

cuda: make im2col a little faster (#15025)

3303c19

CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)

03d4698

opencl: fix adreno compiler detection logic (#15029)

5c0eb5e

vulkan: Use coopmat2 for conv2d (#14982)

6c7a441

model : add text-only support for Kimi-VL (and find special tokens in…

83bc2f2

… text_config) (#15051) * basic kimi-vl textmodel conversion * check config["text_config"] for special tokens

vocab : JetBrains Mellum pre-tokenizer (#15045)

97366dc

memory : handle kv_unified for hybrid models (#15050)

11a3811

imatrix : fix 3d activation handling for hybrid and recurrent models …

0a2f549

…(#14994) * imatrix : use a single count for dense 3d tensors * imatrix : fix 3d activations when model tensor is 2d * imatrix : fix 3d tensor counts

imatrix : use GGUF by default (#14842)

d31192b

* imatrix : use GGUF by default * imatrix : use GGUF regardless of the output filename The legacy format can only be produced with --output-format dat

vulkan: fix build when using glslang that does not support coopmat2 (…

5aa1105

…#15062)

quantize : fix confusing error message if ftype is invalid (#15071)

2721257

gguf-py : add --chat-template-file to gguf_new_metadata (#15075)

e5bebe5

ngxson and others added 22 commits August 26, 2025 12:54

metal : optimize FA vec for large sequences and BS <= 8 (#15566)

b3964c1

* metal : optmize FA vec for large heads and sequences * metal : adjust small-batch mul mv kernels ggml-ci * batched-bench : fix total speed computation ggml-ci * cont : add comments ggml-ci

CUDA: return -1 for nonexistent compiled arch (#15587)

8f5afa9

graph : fix assert in memory-less build_attn (#15590)

0373486

ggml-ci

tests: add performance test for mul mat id (#15543)

44b1efa

mtmd : fix mtmd ios build (#15579)

8ce3ff1

tests : fix test-opt with GGML_BACKEND_DL (#15599)

bcbddcd

OpenCL: add fused group_norm/norm, mul, add (#15314)

86076f9

* add fused group_norm/norm, mul, add * fix spacing * revert rms_norm logic * fix trailing whitespace

ggml-cpu : add basic RVV support for vector f32 ops (#15057)

1cf123a

* ggml-cpu : add basic RVV support for vector f32 ops * ggml-cpu : add RVV support for f32 softmax

OWNERS: add file for OpenShift CI control

a7f058b

ggml: add the ggml-remotingfrontend and ggml-remotingbackend libraries

449bc6f

ggml: src: ggml-metal/ggml-metal: expose ggml_backend_metal_get_devic…

5640d3b

…e_context function for API Remoting

ggml: src: ggml-metal/ggml-metal: reduce the verbosity

5f8b675

ggml: src: ggml-metal/ggml-metal: add timing instrumentation

4b06ce3

src: llama-*: reduce the verbosity

e7a99b0

tools: run: run: add timing instrumentation

3090191

Add helper scripts

fc2c13f

openshift-merge-robot added the needs-rebase label Aug 29, 2025

openshift-ci bot requested review from cfergeau and gbraad August 29, 2025 12:09

kpouget closed this Aug 29, 2025

kpouget deleted the reshape-b6298 branch August 29, 2025 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reshape the commits on top of `b6298` #12

Reshape the commits on top of `b6298` #12

Uh oh!

kpouget commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

openshift-merge-robot commented Aug 29, 2025

Uh oh!

coderabbitai bot commented Aug 29, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

openshift-ci bot commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

79 participants

Reshape the commits on top of b6298 #12

Reshape the commits on top of b6298 #12

Uh oh!

Conversation

kpouget commented Aug 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-merge-robot commented Aug 29, 2025

Uh oh!

coderabbitai bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

openshift-ci bot commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

79 participants

Reshape the commits on top of `b6298` #12

Reshape the commits on top of `b6298` #12

kpouget commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 29, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)