feat: add LLM2QnnLoweringPass and update graph splitting logic #577

chenghuaWang · 2025-12-30T08:55:11Z

Summary by CodeRabbit

New Features
- Config-driven layer-based model splitting with sequence-length-aware split graphs for QNN AOT.
- New graph assembly and capture in the QNN environment, plus context access and broader static-weight recognition.
Refactor
- Augmented AOT compilation pipeline with additional graph-splitting and lowering passes for deterministic, pattern-driven processing.
Chores
- Minor IR convenience alias added for value pointers.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-30T08:55:22Z

📝 Walkthrough

Walkthrough

Adds configuration-driven LLM graph splitting and a new LLM-to-QNN lowering pass to the QNN AOT pipeline; updates pipeline ordering to run SplitLLMGraphPass, MarkTensorIOPass, then LLM2QnnLoweringPass; adds a Val ptr alias and a layers field in the example config.

Changes

Cohort / File(s)	Summary
Configuration `examples/qwen3_qnn_aot/qnn_aot_cfg.json`	Added `"layers": 28` under `quant_recipe.llm_recipe`.
AOT Pipeline `mllm/backends/qnn/aot/passes/AOTPipeline.cpp`	Added includes and inserted `SplitLLMGraphPass`, `MarkTensorIOPass`, and `LLM2QnnLoweringPass` into the llm_recipe pipeline (adjusts pass ordering).
New Lowering Pass `mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp`, `.../LLM2QnnLoweringPass.cpp`	New pass `LLM2QnnLoweringPass`: discovers/validates model subgraphs, sorts them, matches/rewrites LinalgIROp via registered QNN AOT patterns, compiles per-subgraph AOT graphs, and exposes `createLLM2QnnLoweringPass()`.
Graph Splitting Pass `mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp`	Reworked to read `layers` and `split_graphs`, compute seq_len, create split-graph placeholders, attach `qnn_context_name`/`qnn_graph_name`, relocate ops into named split subgraphs, rewire CallGraphOp/IO, and prune obsolete subgraphs.
QNN Wrappers & Env `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`, `.../QnnWrappersAPI.hpp`	Added QnnAOTGraph constructor/factory, graph capture in env, context accessor, graph/tensor creation and registration, static weight handling for `"constant"` attrib; exposed context storage on QnnAOTGraph.
IR Convenience Alias `mllm/compile/ir/Node.hpp`	Added public type alias `using ptr_t = val_ptr_t;` inside class `Val`.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Pipeline as AOT Pipeline
    participant Config as AOTCompileContext
    participant Split as SplitLLMGraphPass
    participant MarkIO as MarkTensorIOPass
    participant Lower as LLM2QnnLoweringPass
    participant Graph as Model/Subgraphs

    Pipeline->>Split: run(module)
    Split->>Config: read quant_recipe.layers\nand split_graphs
    Split->>Graph: locate "model" subgraph\ncompute seq_len from inputs
    Split->>Split: create split placeholders\nassign qnn_context_name/qnn_graph_name
    Split->>Graph: move LinalgIROp into\ncorresponding split subgraphs
    Split->>Graph: rewire CallGraphOp and IO\nprune obsolete subgraphs
    Split->>Pipeline: return success
    Pipeline->>MarkIO: run(on updated graph)
    MarkIO->>Graph: mark tensor IO attributes
    MarkIO->>Pipeline: return success
    Pipeline->>Lower: run(on marked graph)
    Lower->>Graph: iterate ordered subgraphs\nmatch & rewrite LinalgIROp with patterns
    Lower->>Pipeline: return success

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(qnn): add Qualcomm QNN AOT support on x86 platforms #562: Overlaps QNN AOT backend changes (QnnWrappersAPI and environment/context handling).
feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode. #567: Adds/integrates AOT passes and relates to LLM graph-splitting and lowering pipeline changes.
feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572: Related SplitLLMGraphPass and AOT pipeline integration edits.

Suggested reviewers

liang1232018
oreomaker
yirongjie

Poem

🐇 I hopped through graphs both wide and deep,
I split the layers where the LLMs sleep,
I tagged each tensor, named each thread,
Then lowered ops so models tread —
A tiny rabbit’s hop—now builds won't weep!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	No pull request description was provided by the author; the template requires a clear description of changes, objectives, and contribution context.	Provide a comprehensive description following the repository template, including the purpose of the changes, what problems they solve, and relevant implementation details.
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.76% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main changes: adding LLM2QnnLoweringPass and updating graph splitting logic, which aligns with the substantial modifications across multiple files.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

… check in LLM2QnnLoweringPass

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (6)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

3-3: Unused include while pass is disabled.

The LLM2QnnLoweringPass.hpp header is included but createLLM2QnnLoweringPass() is commented out at line 27. Consider either removing the include until the pass is enabled, or adding a comment explaining why it's staged for future activation.
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
16-27: Consider explicitly deleting copy/move operations.

Static analysis flags that the class defines a destructor but not other special member functions. Since Pass objects typically shouldn't be copied during pipeline execution, consider explicitly deleting them:
🔎 Proposed fix
 class LLM2QnnLoweringPass final : public ir::Pass {
  public:
   LLM2QnnLoweringPass();
 
   ~LLM2QnnLoweringPass() override = default;
+
+  LLM2QnnLoweringPass(const LLM2QnnLoweringPass&) = delete;
+  LLM2QnnLoweringPass& operator=(const LLM2QnnLoweringPass&) = delete;
+  LLM2QnnLoweringPass(LLM2QnnLoweringPass&&) = delete;
+  LLM2QnnLoweringPass& operator=(LLM2QnnLoweringPass&&) = delete;
 
   uint8_t run(const ir::node_ptr_t& op) override;
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (2)
17-25: Remove redundant TODO comments.

The repeated TODO comments add noise without additional value. A single TODO with a clear description is sufficient.
🔎 Proposed fix
 LLM2QnnLoweringPass::LLM2QnnLoweringPass() {
   named_pattern_.insert(QnnAOTAddPattern::create());
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
+  // TODO: Register additional patterns (e.g., Mul, MatMul, Softmax, etc.)
 }
90-96: Avoid creating regex inside loop; reuse the existing pattern.

The regex at line 92 is redundant—model_pattern (line 74) already matches this pattern. Reuse it to avoid repeated compilation.
🔎 Proposed fix
-  // Validate that at least one model.x.sN subgraph exists (required for the lowering)
-  // We don't require specifically model.0.s32, but any model.x.sN pattern
-  bool has_valid_subgraph = false;
-  for (const auto& [name, _] : subgraph_map_) {
-    if (std::regex_match(name, std::regex(R"(^model\.\d+\.s\d+$)"))) {
-      has_valid_subgraph = true;
-      break;
-    }
-  }
+  // Validate that at least one model.x.sN subgraph exists (required for the lowering)
+  // subgraph_map_ excludes "model", so all entries must match model.x.sN
+  bool has_valid_subgraph = !subgraph_map_.empty();
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
20-22: Use more descriptive parameter and variable names.

Per static analysis and readability best practices, rename g to subgraph and _ to writer for clarity.
🔎 Proposed fix
 void recursiveAttachGraphNameAndContextName(const ir::IRContext::ptr_t& ctx, const std::string& qnn_context_name,
-                                            const std::string& qnn_graph_name, ir::graph::SubGraphOp::ptr_t& g) {
-  auto _ = ir::IRWriter(ctx, g->getTopRegion());
-  _.walk<ir::Op>([&](ir::IRWriter& w /*writer*/, const ir::Op::ptr_t& owo) -> ir::IRWriter::WalkResult {
+                                            const std::string& qnn_graph_name, ir::graph::SubGraphOp::ptr_t& subgraph) {
+  auto writer = ir::IRWriter(ctx, subgraph->getTopRegion());
+  writer.walk<ir::Op>([&](ir::IRWriter& /* unused */, const ir::Op::ptr_t& op) -> ir::IRWriter::WalkResult {
186-188: Use a more descriptive variable name.

The variable wwww is unclear. Consider renaming to graph_writer or subgraph_writer for clarity.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7c10c and cdb7387.

📒 Files selected for processing (7)

examples/qwen3_qnn_aot/qnn_aot_cfg.json
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/compile/ir/Node.hpp
pymllm/backends/qualcomm/transformers/static_qwen3_quantizer.py

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/compile/ir/Node.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/compile/ir/Node.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/compile/ir/Node.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp

🧬 Code graph analysis (4)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (3)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

LLM2QnnLoweringPass (17-25)

mllm/compile/ir/Node.hpp (2)

op (470-470)

op (472-472)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)

MergeLLMHeadIntoMainGraphPass (9-22)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (3)

LLM2QnnLoweringPass (18-18)

LLM2QnnLoweringPass (20-20)

op (22-22)

mllm/compile/ir/Node.hpp (12)

create (267-302)

create (267-267)

create (371-403)

create (371-371)

op (470-470)

op (472-472)

name (304-304)

name (308-308)

_ (65-65)

_ (65-65)

region (231-231)

region (324-324)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)

run (58-231)

run (58-58)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (5)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (2)

run (27-147)

run (27-27)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)

run (16-109)

run (16-16)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)

op (22-22)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (4)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)

createSplitLLMGraphPass (233-233)

createSplitLLMGraphPass (233-233)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)

createSplitLLMGraphPass (20-20)

mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)

createMarkTensorIOPass (120-120)

createMarkTensorIOPass (120-120)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

createMarkTensorIOPass (20-20)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp

[error] 16-16: class 'LLM2QnnLoweringPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp

[error] 20-20: 3 adjacent parameters of 'recursiveAttachGraphNameAndContextName' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 21-21: parameter name 'g' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 22-22: variable name '_' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 37-37: parameter name 'g' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 38-38: variable name '_' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-macos
GitHub Check: build-android

🔇 Additional comments (5)

examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)

18-18: LGTM!

The new layers field correctly specifies the total number of transformer layers for the Qwen3 model, which is consumed by SplitLLMGraphPass for configuration-driven graph splitting.

mllm/compile/ir/Node.hpp (1)

194-195: LGTM!

The ptr_t alias follows the established pattern used by Region (line 119) and Op (line 155), providing consistency across IR node types.

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

25-27: Pass ordering looks correct.

The pipeline flow is logical: graph marking → op naming → head merging → quantization → graph splitting → tensor I/O marking. The commented-out createLLM2QnnLoweringPass() indicates future work is pending.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

119-143: Verify behavior: WALK_BREAK on non-QNN op skips remaining ops in subgraph.

When encountering an op without use_qnn attribute, the code logs a warning and breaks. This leaves remaining ops in that subgraph unprocessed. If this is intentional fail-fast behavior, consider using MLLM_ERROR instead of MLLM_WARN for clarity. Otherwise, consider WALK_CONTINUE to skip and proceed.

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (1)

37-54: The iterator safety mechanism is properly implemented. IRWriter::removeOp() correctly sets is_iterator_modified_ = true (line 310 in Node.cpp), and the walk loop respects this flag by skipping the increment when the iterator has already been advanced by erase(). This pattern is safe and correctly uses the framework's iterator modification tracking.

coderabbitai · 2025-12-30T08:59:47Z

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp

+  if (call_graph_op == nullptr) {
+    MLLM_ERROR("LLM2QnnLoweringPass: No CallGraphOp found in ModuleOp");
+    return ir::PASS_RET_FAILURE;
+  }
+
+  // Check call graph op point to a subgraph named "model"
+  auto symbol_attr = call_graph_op->getSymbolAttr();
+  if (symbol_attr == nullptr || symbol_attr->str() != "model") {
+    MLLM_ERROR("LLM2QnnLoweringPass: CallGraphOp should point to a subgraph named 'model'");
+    return ir::PASS_RET_FAILURE;
+  }
+
+  // Get the "model" subgraph
+  auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
+  if (model_subgraph == nullptr) {
+    MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
+    return ir::PASS_RET_FAILURE;
+  }


⚠️ Potential issue | 🟡 Minor

Error messages reference wrong pass name.

The error messages at lines 75, 82, and 89 incorrectly reference LLM2QnnLoweringPass instead of SplitLLMGraphPass.

🔎 Proposed fix

if (call_graph_op == nullptr) { - MLLM_ERROR("LLM2QnnLoweringPass: No CallGraphOp found in ModuleOp"); + MLLM_ERROR("SplitLLMGraphPass: No CallGraphOp found in ModuleOp"); return ir::PASS_RET_FAILURE; } // Check call graph op point to a subgraph named "model" auto symbol_attr = call_graph_op->getSymbolAttr(); if (symbol_attr == nullptr || symbol_attr->str() != "model") { - MLLM_ERROR("LLM2QnnLoweringPass: CallGraphOp should point to a subgraph named 'model'"); + MLLM_ERROR("SplitLLMGraphPass: CallGraphOp should point to a subgraph named 'model'"); return ir::PASS_RET_FAILURE; } // Get the "model" subgraph auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>(); if (model_subgraph == nullptr) { - MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table"); + MLLM_ERROR("SplitLLMGraphPass: Cannot find 'model' subgraph in symbol table"); return ir::PASS_RET_FAILURE; }

🤖 Prompt for AI Agents

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 74-91: the logged error messages currently reference the wrong pass name `LLM2QnnLoweringPass`; update the three MLLM_ERROR messages to reference `SplitLLMGraphPass` instead (i.e., change the literal strings so they mention SplitLLMGraphPass and keep the rest of the messages intact) to ensure accurate error reporting.

coderabbitai · 2025-12-30T08:59:48Z

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp

+  auto cfg = AOTCompileContext::getInstance().getConfig();
+  int32_t __global_total_layers = cfg["quant_recipe"]["layers"];
+  int32_t __global_split_graphs = cfg["split_graph"];
+
+  // Check seq length
+  // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
+  int32_t __global_seq_len =
+      model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);


⚠️ Potential issue | 🟠 Major

Add validation for required config fields.

Direct access to cfg["quant_recipe"]["layers"] and cfg["split_graph"] will throw if keys are missing. Add validation before use.

🔎 Proposed fix

// Count how many layers we have. auto cfg = AOTCompileContext::getInstance().getConfig(); + if (!cfg.contains("quant_recipe") || !cfg["quant_recipe"].contains("layers")) { + MLLM_ERROR("SplitLLMGraphPass: Config missing 'quant_recipe.layers'"); + return ir::PASS_RET_FAILURE; + } + if (!cfg.contains("split_graph")) { + MLLM_ERROR("SplitLLMGraphPass: Config missing 'split_graph'"); + return ir::PASS_RET_FAILURE; + } int32_t __global_total_layers = cfg["quant_recipe"]["layers"]; int32_t __global_split_graphs = cfg["split_graph"];

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 104-111, the code directly indexes cfg["quant_recipe"]["layers"] and cfg["split_graph"] and assumes model_subgraph->getTopRegion()->inputs() contains a TensorValue; add explicit validation: verify cfg contains "quant_recipe" and within it "layers", and contains "split_graph" (or use cfg.find/contains) and that those entries are the expected numeric type before reading them; if missing or wrong type, return/throw a clear error message. Also validate model_subgraph->getTopRegion()->inputs() is non-empty and that the first input can be cast to ir::tensor::TensorValue and that its tensor_.size() has at least 2 dimensions before accessing size(1); if not, return/throw with a descriptive error. Ensure any change uses consistent logging or exceptions used elsewhere in this file.

coderabbitai · 2025-12-30T08:59:48Z

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp

+  // Check seq length
+  // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
+  int32_t __global_seq_len =
+      model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);


⚠️ Potential issue | 🟠 Major

Add bounds check for tensor shape access.

Accessing tensor_.size(1) assumes the tensor has at least 2 dimensions. Add validation to prevent out-of-bounds access.

🔎 Proposed fix

// Check seq length // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence] - int32_t __global_seq_len = - model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1); + auto first_input = model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>(); + if (first_input->tensor_.dims() < 2) { + MLLM_ERROR("SplitLLMGraphPass: Expected first input to have at least 2 dimensions, got {}", first_input->tensor_.dims()); + return ir::PASS_RET_FAILURE; + } + int32_t __global_seq_len = first_input->tensor_.size(1);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Check seq length

// FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]

int32_t __global_seq_len =

model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);

// Check seq length

// FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]

auto first_input = model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>();

if (first_input->tensor_.dims() < 2) {

MLLM_ERROR("SplitLLMGraphPass: Expected first input to have at least 2 dimensions, got {}", first_input->tensor_.dims());

return ir::PASS_RET_FAILURE;

}

int32_t __global_seq_len = first_input->tensor_.size(1);

coderabbitai · 2025-12-30T08:59:48Z

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp

+        size_t last_dot_pos = g_w_g_name.find_last_of('.');
+        int layer_num = 0;
+        if (last_dot_pos != std::string::npos && last_dot_pos + 1 < g_w_g_name.length()) {
+          layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1));
+        }


⚠️ Potential issue | 🟡 Minor

Handle potential exception from std::stoi.

If the subgraph name doesn't end with a valid integer (e.g., "model.foo"), std::stoi will throw. Consider using std::strtol or wrapping in try-catch.

🔎 Proposed fix

size_t last_dot_pos = g_w_g_name.find_last_of('.'); int layer_num = 0; if (last_dot_pos != std::string::npos && last_dot_pos + 1 < g_w_g_name.length()) { - layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1)); + try { + layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1)); + } catch (const std::exception& e) { + MLLM_WARN("SplitLLMGraphPass: Failed to parse layer number from '{}': {}", g_w_g_name, e.what()); + continue; + } }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 135-139, the call to std::stoi on the substring after the last '.' can throw if the suffix is not a valid integer; replace this with a safe parse and fallback: either wrap std::stoi in a try-catch and only set layer_num on success (leaving default 0 on failure), or use std::strtol on the c_str() of the substring and check the endptr to ensure the entire substring parsed as digits before assigning layer_num; also ensure empty substrings are skipped and document the fallback behavior.

coderabbitai · 2025-12-30T08:59:48Z

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp

+    // FIXME: currently only support one graph!
+    MLLM_RT_ASSERT_EQ(__global_split_graphs, 1);
+    auto one_graph =
+        getCtx()->lookupSymbolTable("model.0.s" + std::to_string(__global_seq_len))->cast_<ir::graph::SubGraphOp>();


⚠️ Potential issue | 🟠 Major

Convert runtime assertion to early validation.

The assertion at line 175 (and line 217) will crash if split_graph > 1. Since only single-graph splitting is supported, validate this early and return PASS_RET_FAILURE with a clear error message.

🔎 Proposed fix (add after line 106)

int32_t __global_split_graphs = cfg["split_graph"]; + if (__global_split_graphs != 1) { + MLLM_ERROR("SplitLLMGraphPass: Currently only supports split_graph=1, got {}", __global_split_graphs); + return ir::PASS_RET_FAILURE; + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 174-177 (and also address the similar assertion at line 217), replace the runtime assertions that crash when __global_split_graphs > 1 with an early validation: after line 106 add a check if (__global_split_graphs != 1) { getCtx()->logger().error("SplitLLMGraphPass only supports single graph splitting (split_graphs=%d)", __global_split_graphs); return PASS_RET_FAILURE; } and remove or replace the MLLM_RT_ASSERT_EQ calls at 174-177 and 217 so the pass fails gracefully with the clear error message instead of crashing.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
72-80: Consider adding a comment explaining the subgraph naming convention.

The regex pattern and validation logic are correct, but a brief comment explaining what "model.x.sN" represents (e.g., layer splits) would improve maintainability.

Example:
  // Validate subgraph names follow the convention:
  // - "model": base graph
  // - "model.x.sN": split subgraphs where x is the split index and N is the sequence number
  std::regex model_pattern(R"(^model(\.\d+\.s\d+)?$)");

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdb7387 and 1f345cd.

📒 Files selected for processing (2)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-android
GitHub Check: build-x86
GitHub Check: build-macos

🔇 Additional comments (1)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

156-156: LGTM!

The factory function implementation is straightforward and correct.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

coderabbitai · 2025-12-30T09:03:02Z

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

+  // Get the "model" subgraph
+  auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
+  if (model_subgraph == nullptr) {
+    MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
+    return ir::PASS_RET_FAILURE;
+  }


⚠️ Potential issue | 🟠 Major

Validate lookup result before casting.

lookupSymbolTable() may return nullptr, but the code casts before checking. If the symbol doesn't exist, this could cause undefined behavior.

🔎 Proposed fix

- // Get the "model" subgraph - auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>(); - if (model_subgraph == nullptr) { + // Get the "model" subgraph + auto symbol_node = getCtx()->lookupSymbolTable("model"); + if (symbol_node == nullptr) { MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table"); return ir::PASS_RET_FAILURE; } + auto model_subgraph = symbol_node->cast_<ir::graph::SubGraphOp>(); + if (model_subgraph == nullptr) { + MLLM_ERROR("LLM2QnnLoweringPass: Symbol 'model' is not a SubGraphOp"); + return ir::PASS_RET_FAILURE; + }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp around lines 55 to 60, the result of getCtx()->lookupSymbolTable("model") is cast before being checked for nullptr; change the logic to first store the lookup result in a local variable, check if that pointer is nullptr and handle the error/return, and only then perform the cast to ir::graph::SubGraphOp so you avoid casting a null pointer and potential undefined behavior.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

coderabbitai · 2025-12-30T09:03:03Z

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

+    subgraph_writer.walk<ir::linalg::LinalgIROp>(
+        [&](ir::IRWriter& this_tough_writer, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
+          if (!linalg_op->belongsTo()->getAttr("use_qnn")) {
+            MLLM_WARN("Found none qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), subgraph_name);
+            return ir::IRWriter::WalkResult::WALK_BREAK;
+          }
+          bool processed = false;
+          for (auto& [op_type, pass] : named_pattern_) {
+            if (pass->isMatch(linalg_op)) {
+              if (!pass->rewrite(this_tough_writer, linalg_op)) {
+                MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed when processing op {} with pass {}",
+                                linalg_op->getAOp()->getName(), optype2Str(op_type));
+              } else {
+                processed = true;
+                break;
+              }
+            }
+          }
+
+          if (!processed) {
+            MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", linalg_op->getAOp()->getName());
+          }
+
+          return ir::IRWriter::WalkResult::WALK_CONTINUE;
+        });


⚠️ Potential issue | 🟠 Major

Add null checks for chained method calls.

Lines 128-129 have chained method calls (belongsTo()->getAttr() and getAOp()->getName()) that could dereference null pointers if intermediate methods return nullptr.

🔎 Proposed fix

subgraph_writer.walk<ir::linalg::LinalgIROp>( [&](ir::IRWriter& this_tough_writer, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult { - if (!linalg_op->belongsTo()->getAttr("use_qnn")) { - MLLM_WARN("Found none qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), subgraph_name); + auto belongs_to = linalg_op->belongsTo(); + if (!belongs_to || !belongs_to->getAttr("use_qnn")) { + auto aop = linalg_op->getAOp(); + auto op_name = aop ? aop->getName() : "unknown"; + MLLM_WARN("Found none qnn op: {} in graph: {}", op_name, subgraph_name); return ir::IRWriter::WalkResult::WALK_BREAK; } bool processed = false; for (auto& [op_type, pass] : named_pattern_) { if (pass->isMatch(linalg_op)) { if (!pass->rewrite(this_tough_writer, linalg_op)) { + auto aop = linalg_op->getAOp(); + auto op_name = aop ? aop->getName() : "unknown"; MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed when processing op {} with pass {}", - linalg_op->getAOp()->getName(), optype2Str(op_type)); + op_name, optype2Str(op_type)); } else { processed = true; break; } } } if (!processed) { - MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", linalg_op->getAOp()->getName()); + auto aop = linalg_op->getAOp(); + auto op_name = aop ? aop->getName() : "unknown"; + MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", op_name); } return ir::IRWriter::WalkResult::WALK_CONTINUE; });

…2QnnLoweringPass with AOT context handling, and create new Embedding files

…e tensor handling in LLM2QnnLoweringPass

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)

19-26: Fix TODO comment formatting and eliminate duplication.

The coding guidelines require TODO comments to follow the format TODO: (with colon), but these use TODO without a colon. Additionally, six identical TODO comments violate DRY principles.

Based on coding guidelines for TODO format.

56-61: Validate lookup result before casting.

lookupSymbolTable() may return nullptr, but the code casts before checking. If the symbol doesn't exist, this causes undefined behavior.

133-157: Add null checks for chained method calls.

Lines 135-136 have chained method calls (belongsTo()->getAttr() and getAOp()->getName()) that could dereference null pointers if intermediate methods return nullptr.

🧹 Nitpick comments (5)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
202-216: Consider initializing qnn_ctx_handle_ to avoid undefined behavior.

The static analysis tool flagged that QnnDeviceAndContext has uninitialized members. While name_, graphs_, and static_tensor_ have default constructors, qnn_ctx_handle_ (of type Qnn_ContextHandle_t) may contain garbage if not explicitly initialized before use.
🔎 Proposed fix
 struct QnnDeviceAndContext {
   using ptr_t = std::shared_ptr<QnnDeviceAndContext>;

   std::string name_;
   Qnn_LogHandle_t log_ = nullptr;
   Qnn_BackendHandle_t bk_handle_ = nullptr;
   Qnn_DeviceHandle_t device_handle_ = nullptr;
   QnnBackend_Config_t** bk_cfg_ = nullptr;
   QnnContext_Config_t** qnn_context_config_ = nullptr;
   Qnn_ProfileHandle_t profile_bk_handle_ = nullptr;
-  Qnn_ContextHandle_t qnn_ctx_handle_;
+  Qnn_ContextHandle_t qnn_ctx_handle_ = nullptr;

   std::unordered_map<std::string, QnnAOTGraph::ptr_t> graphs_;              //< for persistence keep graphs.
   std::unordered_map<std::string, QnnAOTNodeTensor::ptr_t> static_tensor_;  //< for weight sharing.
 };
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
506-513: Use consistent types for loop indices to avoid sign comparison warnings.

The loop variables are declared as int but compared against .size() which returns size_t. This can cause signed/unsigned comparison warnings and potential issues with very large collections.
🔎 Proposed fix
   // Inputs
   Qnn_Tensor_t* qnn_inputs_array = (Qnn_Tensor_t*)malloc(qnn_op->inputs.size() * sizeof(Qnn_Tensor_t));
   qnn_op->unreachable_handle_.emplace_back(qnn_inputs_array);
-  for (int i = 0; i < qnn_op->inputs.size(); ++i) { qnn_inputs_array[i] = *qnn_op->inputs[i]->getQnnTensor(); }
+  for (size_t i = 0; i < qnn_op->inputs.size(); ++i) { qnn_inputs_array[i] = *qnn_op->inputs[i]->getQnnTensor(); }

   // Outputs
   Qnn_Tensor_t* qnn_outputs_array = (Qnn_Tensor_t*)malloc(qnn_op->outputs.size() * sizeof(Qnn_Tensor_t));
   qnn_op->unreachable_handle_.emplace_back(qnn_outputs_array);
-  for (int i = 0; i < qnn_op->outputs.size(); ++i) { qnn_outputs_array[i] = *qnn_op->outputs[i]->getQnnTensor(); }
+  for (size_t i = 0; i < qnn_op->outputs.size(); ++i) { qnn_outputs_array[i] = *qnn_op->outputs[i]->getQnnTensor(); }
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)
91-97: Avoid compiling regex inside a loop.

The regex R"(^model\.\d+\.s\d+$)" is compiled on every iteration of the loop. Since the pattern is constant, compile it once outside the loop for better performance.
🔎 Proposed fix
+  // Pre-compile regex patterns
+  static const std::regex model_subgraph_pattern(R"(^model\.\d+\.s\d+$)");
+
   // Validate that at least one model.x.sN subgraph exists (required for the lowering)
   // We don't require specifically model.0.s32, but any model.x.sN pattern
   bool has_valid_subgraph = false;
   for (const auto& [name, _] : subgraph_map_) {
-    if (std::regex_match(name, std::regex(R"(^model\.\d+\.s\d+$)"))) {
+    if (std::regex_match(name, model_subgraph_pattern)) {
       has_valid_subgraph = true;
       break;
     }
   }
73-81: Consider making the regex pattern a class-level constant.

The model_pattern regex is used for validation but could be reused. Making it a static const at class or namespace level improves clarity and avoids recompilation if this function is called multiple times.
🔎 Proposed refactor

At namespace level (around line 16):
namespace {
const std::regex kModelPattern(R"(^model(\.\d+\.s\d+)?$)");
const std::regex kModelSubgraphPattern(R"(^model\.\d+\.s\d+$)");
}  // namespace
Then use kModelPattern at line 75 and kModelSubgraphPattern at line 93.
159-161: Consider more informative error handling on compile failure.

If aot_graph->compile() returns false, the assertion fails without context about which subgraph failed. Adding the subgraph name to the error message would aid debugging.
🔎 Proposed fix
     // Compile
-    MLLM_RT_ASSERT(aot_graph->compile());
+    if (!aot_graph->compile()) {
+      MLLM_ERROR("LLM2QnnLoweringPass: Failed to compile AOT graph for subgraph '{}'", subgraph_name);
+      return ir::PASS_RET_FAILURE;
+    }
   }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f345cd and 65c7818.

📒 Files selected for processing (5)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Embedding.cpp
mllm/backends/qnn/aot/visitor/Embedding.hpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

🧠 Learnings (3)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add comments for complex algorithms or non-obvious logic.

Applied to files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : Encourage consistent coding style and patterns with the existing codebase.

Applied to files:

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

🧬 Code graph analysis (2)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (16)

v (103-105)

v (103-103)

v (112-112)

v (114-114)

v (116-116)

v (118-118)

QnnAOTGraph (181-181)

g_name (183-186)

g_name (183-183)

create (48-50)

create (48-48)

name (158-158)

name (273-273)

name (275-275)

name (277-277)

name (298-298)

mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)

env (26-26)

env (26-26)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

mllm/backends/qnn/QNNUtils.hpp (2)

name (191-192)

name (195-196)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 183-183: 2 adjacent parameters of 'create' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 202-202: constructor does not initialize these fields: name_, qnn_ctx_handle_, graphs_, static_tensor_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-android
GitHub Check: build-macos

🔇 Additional comments (6)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (3)

109-110: LGTM!

Clean inline accessor following the existing pattern used in QnnAOTParamTensor::getQnnTensor().

181-186: LGTM!

The constructor and factory method follow the established patterns in this file (e.g., QnnAOTNodeTensor::create). The factory correctly delegates to the constructor.

298-299: getContext() should validate the context name exists or document its behavior.

The implementation uses operator[] on contexts_ which inserts and returns a default-constructed (null) shared_ptr if the name doesn't exist. While current callers (at lines 522 and 535) use belongs_context_name_ which is validated upstream via MLLM_RT_ASSERT(contexts_.count(qnn_context_name) == 1) before being stored, the public method itself lacks defensive bounds checking. Consider either adding a bounds check and returning nullptr explicitly, or document that callers must ensure the context exists.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

468-477: LGTM with a note on error handling.

The constructor correctly initializes members, links to context, and calls graphCreate. The assertion on QNN_SUCCESS ensures failures are caught during development. For production robustness, consider whether a recoverable error path would be appropriate.

229-229: LGTM!

Good addition to handle the "constant" attribute for marking tensors as static. The priority ordering (attribute overrides tensor type) is consistent with the existing pattern for qnn_graph_outputs and qnn_graph_inputs.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

1-15: LGTM on file structure and includes.

The file follows the project's copyright header format and includes the necessary headers for IR operations, passes, and QNN AOT infrastructure.

coderabbitai · 2025-12-30T11:30:09Z

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp

+  // FIXME: Only support one context right now.
+  {
+    int split_graph = aot_cfg["split_graph"];
+    MLLM_RT_ASSERT_EQ(split_graph, 1);
+    aot_env->createContext("context.0", true);
+  }


⚠️ Potential issue | 🟡 Minor

Add validation for JSON config access and avoid hardcoded context name.

Line 116 accesses aot_cfg["split_graph"] without checking if the key exists or if the value is the expected type. This could throw or cause undefined behavior with malformed config. Additionally, "context.0" is hardcoded, reducing flexibility.

🔎 Proposed fix with validation

// FIXME: Only support one context right now. { - int split_graph = aot_cfg["split_graph"]; - MLLM_RT_ASSERT_EQ(split_graph, 1); - aot_env->createContext("context.0", true); + if (!aot_cfg.contains("split_graph") || !aot_cfg["split_graph"].is_number_integer()) { + MLLM_ERROR("LLM2QnnLoweringPass: 'split_graph' config key missing or invalid"); + return ir::PASS_RET_FAILURE; + } + int split_graph = aot_cfg["split_graph"].get<int>(); + MLLM_RT_ASSERT_EQ(split_graph, 1); + aot_env->createContext("context.0", true); }

coderabbitai · 2025-12-30T11:30:09Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+    qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(contexts_[qnn_context_name]->qnn_ctx_handle_,
+                                                                   ret->getQnnTensor());
  } else {
    contexts_[qnn_context_name]->graphs_[graph_name]->all_tensors_.insert({__qnn_tensor_name, ret});
+    qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor(
+        contexts_[qnn_context_name]->graphs_[graph_name]->qnn_graph_handle_, ret->getQnnTensor());


⚠️ Potential issue | 🟠 Major

Missing error handling for tensor creation API calls.

The tensorCreateContextTensor and tensorCreateGraphTensor return values are not checked. Other QNN API calls in this file consistently check for QNN_SUCCESS. If tensor creation fails silently, it could cause hard-to-debug issues downstream.

🔎 Proposed fix

if (__qnn_enable_static_weight) { contexts_[qnn_context_name]->static_tensor_.insert({__qnn_tensor_name, ret}); - qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(contexts_[qnn_context_name]->qnn_ctx_handle_, - ret->getQnnTensor()); + auto status = qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor( + contexts_[qnn_context_name]->qnn_ctx_handle_, ret->getQnnTensor()); + MLLM_RT_ASSERT_EQ(status, QNN_SUCCESS); } else { contexts_[qnn_context_name]->graphs_[graph_name]->all_tensors_.insert({__qnn_tensor_name, ret}); - qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor( + auto status = qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor( contexts_[qnn_context_name]->graphs_[graph_name]->qnn_graph_handle_, ret->getQnnTensor()); + MLLM_RT_ASSERT_EQ(status, QNN_SUCCESS); }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 868 to 873, the calls to qnn_interface_.tensorCreateContextTensor and tensorCreateGraphTensor are not checking return values; capture each call's return value, compare it to QNN_SUCCESS, and on failure log an error including the tensor name and the context/graph name, perform any necessary cleanup (remove inserted map entry or release allocated objects), and propagate the failure (return an error/nullptr or throw) consistent with other QNN API error handling in this file.

coderabbitai · 2025-12-30T11:30:09Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { return contexts_[name]; }
+


⚠️ Potential issue | 🟡 Minor

getContext returns null shared_ptr for missing keys without indication.

Using operator[] on an unordered_map inserts a default-constructed value (null shared_ptr) if the key doesn't exist. This could lead to null pointer dereferences if callers don't check. Consider using at() (throws on missing key) or find() with explicit null handling.

🔎 Proposed fix using at() for fail-fast behavior

-std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { return contexts_[name]; } +std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { + auto it = contexts_.find(name); + if (it == contexts_.end()) { + MLLM_ERROR("QnnAOTEnv::getContext: context '{}' not found", name); + return nullptr; + } + return it->second; +}

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 879-880, replace the use of operator[] which inserts a default null shared_ptr for missing keys with a fail-fast access: use contexts_.at(name) so a missing key throws std::out_of_range (or, if you prefer explicit handling, use contexts_.find(name) and return nullptr or throw a descriptive exception). Ensure the function either allows the exception to propagate or throws a clear std::runtime_error with context about the missing device name so callers are not left with a null shared_ptr.

UbiquitousLearning

LGTM

chenghuaWang and others added 2 commits December 30, 2025 08:53

feat: add LLM2QnnLoweringPass and update graph splitting logic

9d499b3

Merge branch 'UbiquitousLearning:main' into wch-main

cdb7387

chenghuaWang requested review from liang1232018 and oreomaker as code owners December 30, 2025 08:55

fix: enable LLM2QnnLoweringPass in AOT pipeline and correct attribute…

1f345cd

… check in LLM2QnnLoweringPass

coderabbitai bot reviewed Dec 30, 2025

View reviewed changes

chenghuaWang added 2 commits December 30, 2025 10:13

feat: add QnnAOTGraph and QnnDeviceAndContext structures, enhance LLM…

d47693c

…2QnnLoweringPass with AOT context handling, and create new Embedding files

feat: enhance QnnAOTNodeTensor and QnnAOTEnv with new methods, improv…

65c7818

…e tensor handling in LLM2QnnLoweringPass

coderabbitai bot reviewed Dec 30, 2025

View reviewed changes

UbiquitousLearning approved these changes Dec 30, 2025

View reviewed changes

chenghuaWang merged commit 6fbccd0 into UbiquitousLearning:main Dec 30, 2025
4 checks passed

This was referenced Jan 5, 2026

fix: Qualcomm QNN AOT Pass #579

Merged

fix(qualcomm): Qnn AOT refactor #578

Merged

feat(qualcomm): Qnn AOT Lowering pass #580

Merged

feat(qualcomm): Qnn AOT Lowering Passes #584

Merged

feat(qualcomm): AOTPipeline update #585

Merged

		std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { return contexts_[name]; }

feat: add LLM2QnnLoweringPass and update graph splitting logic #577

feat: add LLM2QnnLoweringPass and update graph splitting logic #577

Uh oh!

Conversation

chenghuaWang commented Dec 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

UbiquitousLearning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghuaWang commented Dec 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 30, 2025 •

edited

Loading