Skip to content

Conversation

@chenghuaWang
Copy link
Collaborator

@chenghuaWang chenghuaWang commented Dec 30, 2025

Summary by CodeRabbit

  • New Features

    • Config-driven layer-based model splitting with sequence-length-aware split graphs for QNN AOT.
    • New graph assembly and capture in the QNN environment, plus context access and broader static-weight recognition.
  • Refactor

    • Augmented AOT compilation pipeline with additional graph-splitting and lowering passes for deterministic, pattern-driven processing.
  • Chores

    • Minor IR convenience alias added for value pointers.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

📝 Walkthrough

Walkthrough

Adds configuration-driven LLM graph splitting and a new LLM-to-QNN lowering pass to the QNN AOT pipeline; updates pipeline ordering to run SplitLLMGraphPass, MarkTensorIOPass, then LLM2QnnLoweringPass; adds a Val ptr alias and a layers field in the example config.

Changes

Cohort / File(s) Summary
Configuration
examples/qwen3_qnn_aot/qnn_aot_cfg.json
Added "layers": 28 under quant_recipe.llm_recipe.
AOT Pipeline
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
Added includes and inserted SplitLLMGraphPass, MarkTensorIOPass, and LLM2QnnLoweringPass into the llm_recipe pipeline (adjusts pass ordering).
New Lowering Pass
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp, .../LLM2QnnLoweringPass.cpp
New pass LLM2QnnLoweringPass: discovers/validates model subgraphs, sorts them, matches/rewrites LinalgIROp via registered QNN AOT patterns, compiles per-subgraph AOT graphs, and exposes createLLM2QnnLoweringPass().
Graph Splitting Pass
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
Reworked to read layers and split_graphs, compute seq_len, create split-graph placeholders, attach qnn_context_name/qnn_graph_name, relocate ops into named split subgraphs, rewire CallGraphOp/IO, and prune obsolete subgraphs.
QNN Wrappers & Env
mllm/backends/qnn/aot/QnnWrappersAPI.cpp, .../QnnWrappersAPI.hpp
Added QnnAOTGraph constructor/factory, graph capture in env, context accessor, graph/tensor creation and registration, static weight handling for "constant" attrib; exposed context storage on QnnAOTGraph.
IR Convenience Alias
mllm/compile/ir/Node.hpp
Added public type alias using ptr_t = val_ptr_t; inside class Val.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Pipeline as AOT Pipeline
    participant Config as AOTCompileContext
    participant Split as SplitLLMGraphPass
    participant MarkIO as MarkTensorIOPass
    participant Lower as LLM2QnnLoweringPass
    participant Graph as Model/Subgraphs

    Pipeline->>Split: run(module)
    Split->>Config: read quant_recipe.layers\nand split_graphs
    Split->>Graph: locate "model" subgraph\ncompute seq_len from inputs
    Split->>Split: create split placeholders\nassign qnn_context_name/qnn_graph_name
    Split->>Graph: move LinalgIROp into\ncorresponding split subgraphs
    Split->>Graph: rewire CallGraphOp and IO\nprune obsolete subgraphs
    Split->>Pipeline: return success
    Pipeline->>MarkIO: run(on updated graph)
    MarkIO->>Graph: mark tensor IO attributes
    MarkIO->>Pipeline: return success
    Pipeline->>Lower: run(on marked graph)
    Lower->>Graph: iterate ordered subgraphs\nmatch & rewrite LinalgIROp with patterns
    Lower->>Pipeline: return success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • oreomaker
  • yirongjie

Poem

🐇 I hopped through graphs both wide and deep,
I split the layers where the LLMs sleep,
I tagged each tensor, named each thread,
Then lowered ops so models tread —
A tiny rabbit’s hop—now builds won't weep!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author; the template requires a clear description of changes, objectives, and contribution context. Provide a comprehensive description following the repository template, including the purpose of the changes, what problems they solve, and relevant implementation details.
Docstring Coverage ⚠️ Warning Docstring coverage is 4.76% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: adding LLM2QnnLoweringPass and updating graph splitting logic, which aligns with the substantial modifications across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (6)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

3-3: Unused include while pass is disabled.

The LLM2QnnLoweringPass.hpp header is included but createLLM2QnnLoweringPass() is commented out at line 27. Consider either removing the include until the pass is enabled, or adding a comment explaining why it's staged for future activation.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)

16-27: Consider explicitly deleting copy/move operations.

Static analysis flags that the class defines a destructor but not other special member functions. Since Pass objects typically shouldn't be copied during pipeline execution, consider explicitly deleting them:

🔎 Proposed fix
 class LLM2QnnLoweringPass final : public ir::Pass {
  public:
   LLM2QnnLoweringPass();
 
   ~LLM2QnnLoweringPass() override = default;
+
+  LLM2QnnLoweringPass(const LLM2QnnLoweringPass&) = delete;
+  LLM2QnnLoweringPass& operator=(const LLM2QnnLoweringPass&) = delete;
+  LLM2QnnLoweringPass(LLM2QnnLoweringPass&&) = delete;
+  LLM2QnnLoweringPass& operator=(LLM2QnnLoweringPass&&) = delete;
 
   uint8_t run(const ir::node_ptr_t& op) override;
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (2)

17-25: Remove redundant TODO comments.

The repeated TODO comments add noise without additional value. A single TODO with a clear description is sufficient.

🔎 Proposed fix
 LLM2QnnLoweringPass::LLM2QnnLoweringPass() {
   named_pattern_.insert(QnnAOTAddPattern::create());
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
-  // TODO reg other patterns here.
+  // TODO: Register additional patterns (e.g., Mul, MatMul, Softmax, etc.)
 }

90-96: Avoid creating regex inside loop; reuse the existing pattern.

The regex at line 92 is redundant—model_pattern (line 74) already matches this pattern. Reuse it to avoid repeated compilation.

🔎 Proposed fix
-  // Validate that at least one model.x.sN subgraph exists (required for the lowering)
-  // We don't require specifically model.0.s32, but any model.x.sN pattern
-  bool has_valid_subgraph = false;
-  for (const auto& [name, _] : subgraph_map_) {
-    if (std::regex_match(name, std::regex(R"(^model\.\d+\.s\d+$)"))) {
-      has_valid_subgraph = true;
-      break;
-    }
-  }
+  // Validate that at least one model.x.sN subgraph exists (required for the lowering)
+  // subgraph_map_ excludes "model", so all entries must match model.x.sN
+  bool has_valid_subgraph = !subgraph_map_.empty();
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)

20-22: Use more descriptive parameter and variable names.

Per static analysis and readability best practices, rename g to subgraph and _ to writer for clarity.

🔎 Proposed fix
 void recursiveAttachGraphNameAndContextName(const ir::IRContext::ptr_t& ctx, const std::string& qnn_context_name,
-                                            const std::string& qnn_graph_name, ir::graph::SubGraphOp::ptr_t& g) {
-  auto _ = ir::IRWriter(ctx, g->getTopRegion());
-  _.walk<ir::Op>([&](ir::IRWriter& w /*writer*/, const ir::Op::ptr_t& owo) -> ir::IRWriter::WalkResult {
+                                            const std::string& qnn_graph_name, ir::graph::SubGraphOp::ptr_t& subgraph) {
+  auto writer = ir::IRWriter(ctx, subgraph->getTopRegion());
+  writer.walk<ir::Op>([&](ir::IRWriter& /* unused */, const ir::Op::ptr_t& op) -> ir::IRWriter::WalkResult {

186-188: Use a more descriptive variable name.

The variable wwww is unclear. Consider renaming to graph_writer or subgraph_writer for clarity.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7c10c and cdb7387.

📒 Files selected for processing (7)
  • examples/qwen3_qnn_aot/qnn_aot_cfg.json
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/compile/ir/Node.hpp
  • pymllm/backends/qualcomm/transformers/static_qwen3_quantizer.py
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/compile/ir/Node.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/compile/ir/Node.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/compile/ir/Node.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧬 Code graph analysis (4)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (3)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
  • LLM2QnnLoweringPass (17-25)
mllm/compile/ir/Node.hpp (2)
  • op (470-470)
  • op (472-472)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)
  • MergeLLMHeadIntoMainGraphPass (9-22)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (3)
  • LLM2QnnLoweringPass (18-18)
  • LLM2QnnLoweringPass (20-20)
  • op (22-22)
mllm/compile/ir/Node.hpp (12)
  • create (267-302)
  • create (267-267)
  • create (371-403)
  • create (371-371)
  • op (470-470)
  • op (472-472)
  • name (304-304)
  • name (308-308)
  • _ (65-65)
  • _ (65-65)
  • region (231-231)
  • region (324-324)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
  • run (58-231)
  • run (58-58)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (5)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (2)
  • run (27-147)
  • run (27-27)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
  • run (16-109)
  • run (16-16)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
  • op (22-22)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (4)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
  • createSplitLLMGraphPass (233-233)
  • createSplitLLMGraphPass (233-233)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
  • createSplitLLMGraphPass (20-20)
mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)
  • createMarkTensorIOPass (120-120)
  • createMarkTensorIOPass (120-120)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
  • createMarkTensorIOPass (20-20)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp

[error] 16-16: class 'LLM2QnnLoweringPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp

[error] 20-20: 3 adjacent parameters of 'recursiveAttachGraphNameAndContextName' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 21-21: parameter name 'g' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 22-22: variable name '_' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 37-37: parameter name 'g' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 38-38: variable name '_' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-macos
  • GitHub Check: build-android
🔇 Additional comments (5)
examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)

18-18: LGTM!

The new layers field correctly specifies the total number of transformer layers for the Qwen3 model, which is consumed by SplitLLMGraphPass for configuration-driven graph splitting.

mllm/compile/ir/Node.hpp (1)

194-195: LGTM!

The ptr_t alias follows the established pattern used by Region (line 119) and Op (line 155), providing consistency across IR node types.

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

25-27: Pass ordering looks correct.

The pipeline flow is logical: graph marking → op naming → head merging → quantization → graph splitting → tensor I/O marking. The commented-out createLLM2QnnLoweringPass() indicates future work is pending.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

119-143: Verify behavior: WALK_BREAK on non-QNN op skips remaining ops in subgraph.

When encountering an op without use_qnn attribute, the code logs a warning and breaks. This leaves remaining ops in that subgraph unprocessed. If this is intentional fail-fast behavior, consider using MLLM_ERROR instead of MLLM_WARN for clarity. Otherwise, consider WALK_CONTINUE to skip and proceed.

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (1)

37-54: The iterator safety mechanism is properly implemented. IRWriter::removeOp() correctly sets is_iterator_modified_ = true (line 310 in Node.cpp), and the walk loop respects this flag by skipping the increment when the iterator has already been advanced by erase(). This pattern is safe and correctly uses the framework's iterator modification tracking.

Comment on lines +74 to +91
if (call_graph_op == nullptr) {
MLLM_ERROR("LLM2QnnLoweringPass: No CallGraphOp found in ModuleOp");
return ir::PASS_RET_FAILURE;
}

// Check call graph op point to a subgraph named "model"
auto symbol_attr = call_graph_op->getSymbolAttr();
if (symbol_attr == nullptr || symbol_attr->str() != "model") {
MLLM_ERROR("LLM2QnnLoweringPass: CallGraphOp should point to a subgraph named 'model'");
return ir::PASS_RET_FAILURE;
}

// Get the "model" subgraph
auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
if (model_subgraph == nullptr) {
MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
return ir::PASS_RET_FAILURE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error messages reference wrong pass name.

The error messages at lines 75, 82, and 89 incorrectly reference LLM2QnnLoweringPass instead of SplitLLMGraphPass.

🔎 Proposed fix
   if (call_graph_op == nullptr) {
-    MLLM_ERROR("LLM2QnnLoweringPass: No CallGraphOp found in ModuleOp");
+    MLLM_ERROR("SplitLLMGraphPass: No CallGraphOp found in ModuleOp");
     return ir::PASS_RET_FAILURE;
   }

   // Check call graph op point to a subgraph named "model"
   auto symbol_attr = call_graph_op->getSymbolAttr();
   if (symbol_attr == nullptr || symbol_attr->str() != "model") {
-    MLLM_ERROR("LLM2QnnLoweringPass: CallGraphOp should point to a subgraph named 'model'");
+    MLLM_ERROR("SplitLLMGraphPass: CallGraphOp should point to a subgraph named 'model'");
     return ir::PASS_RET_FAILURE;
   }

   // Get the "model" subgraph
   auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
   if (model_subgraph == nullptr) {
-    MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
+    MLLM_ERROR("SplitLLMGraphPass: Cannot find 'model' subgraph in symbol table");
     return ir::PASS_RET_FAILURE;
   }
🤖 Prompt for AI Agents
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 74-91: the
logged error messages currently reference the wrong pass name
`LLM2QnnLoweringPass`; update the three MLLM_ERROR messages to reference
`SplitLLMGraphPass` instead (i.e., change the literal strings so they mention
SplitLLMGraphPass and keep the rest of the messages intact) to ensure accurate
error reporting.

Comment on lines +104 to +111
auto cfg = AOTCompileContext::getInstance().getConfig();
int32_t __global_total_layers = cfg["quant_recipe"]["layers"];
int32_t __global_split_graphs = cfg["split_graph"];

// Check seq length
// FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
int32_t __global_seq_len =
model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add validation for required config fields.

Direct access to cfg["quant_recipe"]["layers"] and cfg["split_graph"] will throw if keys are missing. Add validation before use.

🔎 Proposed fix
   // Count how many layers we have.
   auto cfg = AOTCompileContext::getInstance().getConfig();
+  if (!cfg.contains("quant_recipe") || !cfg["quant_recipe"].contains("layers")) {
+    MLLM_ERROR("SplitLLMGraphPass: Config missing 'quant_recipe.layers'");
+    return ir::PASS_RET_FAILURE;
+  }
+  if (!cfg.contains("split_graph")) {
+    MLLM_ERROR("SplitLLMGraphPass: Config missing 'split_graph'");
+    return ir::PASS_RET_FAILURE;
+  }
   int32_t __global_total_layers = cfg["quant_recipe"]["layers"];
   int32_t __global_split_graphs = cfg["split_graph"];
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 104-111, the
code directly indexes cfg["quant_recipe"]["layers"] and cfg["split_graph"] and
assumes model_subgraph->getTopRegion()->inputs() contains a TensorValue; add
explicit validation: verify cfg contains "quant_recipe" and within it "layers",
and contains "split_graph" (or use cfg.find/contains) and that those entries are
the expected numeric type before reading them; if missing or wrong type,
return/throw a clear error message. Also validate
model_subgraph->getTopRegion()->inputs() is non-empty and that the first input
can be cast to ir::tensor::TensorValue and that its tensor_.size() has at least
2 dimensions before accessing size(1); if not, return/throw with a descriptive
error. Ensure any change uses consistent logging or exceptions used elsewhere in
this file.

Comment on lines +108 to +111
// Check seq length
// FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
int32_t __global_seq_len =
model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add bounds check for tensor shape access.

Accessing tensor_.size(1) assumes the tensor has at least 2 dimensions. Add validation to prevent out-of-bounds access.

🔎 Proposed fix
   // Check seq length
   // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
-  int32_t __global_seq_len =
-      model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);
+  auto first_input = model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>();
+  if (first_input->tensor_.dims() < 2) {
+    MLLM_ERROR("SplitLLMGraphPass: Expected first input to have at least 2 dimensions, got {}", first_input->tensor_.dims());
+    return ir::PASS_RET_FAILURE;
+  }
+  int32_t __global_seq_len = first_input->tensor_.size(1);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Check seq length
// FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
int32_t __global_seq_len =
model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);
// Check seq length
// FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
auto first_input = model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>();
if (first_input->tensor_.dims() < 2) {
MLLM_ERROR("SplitLLMGraphPass: Expected first input to have at least 2 dimensions, got {}", first_input->tensor_.dims());
return ir::PASS_RET_FAILURE;
}
int32_t __global_seq_len = first_input->tensor_.size(1);

Comment on lines +135 to +139
size_t last_dot_pos = g_w_g_name.find_last_of('.');
int layer_num = 0;
if (last_dot_pos != std::string::npos && last_dot_pos + 1 < g_w_g_name.length()) {
layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle potential exception from std::stoi.

If the subgraph name doesn't end with a valid integer (e.g., "model.foo"), std::stoi will throw. Consider using std::strtol or wrapping in try-catch.

🔎 Proposed fix
         size_t last_dot_pos = g_w_g_name.find_last_of('.');
         int layer_num = 0;
         if (last_dot_pos != std::string::npos && last_dot_pos + 1 < g_w_g_name.length()) {
-          layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1));
+          try {
+            layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1));
+          } catch (const std::exception& e) {
+            MLLM_WARN("SplitLLMGraphPass: Failed to parse layer number from '{}': {}", g_w_g_name, e.what());
+            continue;
+          }
         }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 135-139, the
call to std::stoi on the substring after the last '.' can throw if the suffix is
not a valid integer; replace this with a safe parse and fallback: either wrap
std::stoi in a try-catch and only set layer_num on success (leaving default 0 on
failure), or use std::strtol on the c_str() of the substring and check the
endptr to ensure the entire substring parsed as digits before assigning
layer_num; also ensure empty substrings are skipped and document the fallback
behavior.

Comment on lines +174 to +177
// FIXME: currently only support one graph!
MLLM_RT_ASSERT_EQ(__global_split_graphs, 1);
auto one_graph =
getCtx()->lookupSymbolTable("model.0.s" + std::to_string(__global_seq_len))->cast_<ir::graph::SubGraphOp>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Convert runtime assertion to early validation.

The assertion at line 175 (and line 217) will crash if split_graph > 1. Since only single-graph splitting is supported, validate this early and return PASS_RET_FAILURE with a clear error message.

🔎 Proposed fix (add after line 106)
   int32_t __global_split_graphs = cfg["split_graph"];
+  if (__global_split_graphs != 1) {
+    MLLM_ERROR("SplitLLMGraphPass: Currently only supports split_graph=1, got {}", __global_split_graphs);
+    return ir::PASS_RET_FAILURE;
+  }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 174-177 (and
also address the similar assertion at line 217), replace the runtime assertions
that crash when __global_split_graphs > 1 with an early validation: after line
106 add a check if (__global_split_graphs != 1) {
getCtx()->logger().error("SplitLLMGraphPass only supports single graph splitting
(split_graphs=%d)", __global_split_graphs); return PASS_RET_FAILURE; } and
remove or replace the MLLM_RT_ASSERT_EQ calls at 174-177 and 217 so the pass
fails gracefully with the clear error message instead of crashing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

72-80: Consider adding a comment explaining the subgraph naming convention.

The regex pattern and validation logic are correct, but a brief comment explaining what "model.x.sN" represents (e.g., layer splits) would improve maintainability.

Example:

  // Validate subgraph names follow the convention:
  // - "model": base graph
  // - "model.x.sN": split subgraphs where x is the split index and N is the sequence number
  std::regex model_pattern(R"(^model(\.\d+\.s\d+)?$)");
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdb7387 and 1f345cd.

📒 Files selected for processing (2)
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-android
  • GitHub Check: build-x86
  • GitHub Check: build-macos
🔇 Additional comments (1)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

156-156: LGTM!

The factory function implementation is straightforward and correct.

Comment on lines +55 to +60
// Get the "model" subgraph
auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
if (model_subgraph == nullptr) {
MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
return ir::PASS_RET_FAILURE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate lookup result before casting.

lookupSymbolTable() may return nullptr, but the code casts before checking. If the symbol doesn't exist, this could cause undefined behavior.

🔎 Proposed fix
-  // Get the "model" subgraph
-  auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
-  if (model_subgraph == nullptr) {
+  // Get the "model" subgraph
+  auto symbol_node = getCtx()->lookupSymbolTable("model");
+  if (symbol_node == nullptr) {
     MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
     return ir::PASS_RET_FAILURE;
   }
+  auto model_subgraph = symbol_node->cast_<ir::graph::SubGraphOp>();
+  if (model_subgraph == nullptr) {
+    MLLM_ERROR("LLM2QnnLoweringPass: Symbol 'model' is not a SubGraphOp");
+    return ir::PASS_RET_FAILURE;
+  }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp around lines 55 to 60,
the result of getCtx()->lookupSymbolTable("model") is cast before being checked
for nullptr; change the logic to first store the lookup result in a local
variable, check if that pointer is nullptr and handle the error/return, and only
then perform the cast to ir::graph::SubGraphOp so you avoid casting a null
pointer and potential undefined behavior.

Comment on lines +126 to +150
subgraph_writer.walk<ir::linalg::LinalgIROp>(
[&](ir::IRWriter& this_tough_writer, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
if (!linalg_op->belongsTo()->getAttr("use_qnn")) {
MLLM_WARN("Found none qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), subgraph_name);
return ir::IRWriter::WalkResult::WALK_BREAK;
}
bool processed = false;
for (auto& [op_type, pass] : named_pattern_) {
if (pass->isMatch(linalg_op)) {
if (!pass->rewrite(this_tough_writer, linalg_op)) {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed when processing op {} with pass {}",
linalg_op->getAOp()->getName(), optype2Str(op_type));
} else {
processed = true;
break;
}
}
}

if (!processed) {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", linalg_op->getAOp()->getName());
}

return ir::IRWriter::WalkResult::WALK_CONTINUE;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add null checks for chained method calls.

Lines 128-129 have chained method calls (belongsTo()->getAttr() and getAOp()->getName()) that could dereference null pointers if intermediate methods return nullptr.

🔎 Proposed fix
     subgraph_writer.walk<ir::linalg::LinalgIROp>(
         [&](ir::IRWriter& this_tough_writer, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
-          if (!linalg_op->belongsTo()->getAttr("use_qnn")) {
-            MLLM_WARN("Found none qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), subgraph_name);
+          auto belongs_to = linalg_op->belongsTo();
+          if (!belongs_to || !belongs_to->getAttr("use_qnn")) {
+            auto aop = linalg_op->getAOp();
+            auto op_name = aop ? aop->getName() : "unknown";
+            MLLM_WARN("Found none qnn op: {} in graph: {}", op_name, subgraph_name);
             return ir::IRWriter::WalkResult::WALK_BREAK;
           }
           bool processed = false;
           for (auto& [op_type, pass] : named_pattern_) {
             if (pass->isMatch(linalg_op)) {
               if (!pass->rewrite(this_tough_writer, linalg_op)) {
+                auto aop = linalg_op->getAOp();
+                auto op_name = aop ? aop->getName() : "unknown";
                 MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed when processing op {} with pass {}",
-                                linalg_op->getAOp()->getName(), optype2Str(op_type));
+                                op_name, optype2Str(op_type));
               } else {
                 processed = true;
                 break;
               }
             }
           }
 
           if (!processed) {
-            MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", linalg_op->getAOp()->getName());
+            auto aop = linalg_op->getAOp();
+            auto op_name = aop ? aop->getName() : "unknown";
+            MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", op_name);
           }
 
           return ir::IRWriter::WalkResult::WALK_CONTINUE;
         });

…2QnnLoweringPass with AOT context handling, and create new Embedding files
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)

19-26: Fix TODO comment formatting and eliminate duplication.

The coding guidelines require TODO comments to follow the format TODO: (with colon), but these use TODO without a colon. Additionally, six identical TODO comments violate DRY principles.

Based on coding guidelines for TODO format.


56-61: Validate lookup result before casting.

lookupSymbolTable() may return nullptr, but the code casts before checking. If the symbol doesn't exist, this causes undefined behavior.


133-157: Add null checks for chained method calls.

Lines 135-136 have chained method calls (belongsTo()->getAttr() and getAOp()->getName()) that could dereference null pointers if intermediate methods return nullptr.

🧹 Nitpick comments (5)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

202-216: Consider initializing qnn_ctx_handle_ to avoid undefined behavior.

The static analysis tool flagged that QnnDeviceAndContext has uninitialized members. While name_, graphs_, and static_tensor_ have default constructors, qnn_ctx_handle_ (of type Qnn_ContextHandle_t) may contain garbage if not explicitly initialized before use.

🔎 Proposed fix
 struct QnnDeviceAndContext {
   using ptr_t = std::shared_ptr<QnnDeviceAndContext>;

   std::string name_;
   Qnn_LogHandle_t log_ = nullptr;
   Qnn_BackendHandle_t bk_handle_ = nullptr;
   Qnn_DeviceHandle_t device_handle_ = nullptr;
   QnnBackend_Config_t** bk_cfg_ = nullptr;
   QnnContext_Config_t** qnn_context_config_ = nullptr;
   Qnn_ProfileHandle_t profile_bk_handle_ = nullptr;
-  Qnn_ContextHandle_t qnn_ctx_handle_;
+  Qnn_ContextHandle_t qnn_ctx_handle_ = nullptr;

   std::unordered_map<std::string, QnnAOTGraph::ptr_t> graphs_;              //< for persistence keep graphs.
   std::unordered_map<std::string, QnnAOTNodeTensor::ptr_t> static_tensor_;  //< for weight sharing.
 };
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

506-513: Use consistent types for loop indices to avoid sign comparison warnings.

The loop variables are declared as int but compared against .size() which returns size_t. This can cause signed/unsigned comparison warnings and potential issues with very large collections.

🔎 Proposed fix
   // Inputs
   Qnn_Tensor_t* qnn_inputs_array = (Qnn_Tensor_t*)malloc(qnn_op->inputs.size() * sizeof(Qnn_Tensor_t));
   qnn_op->unreachable_handle_.emplace_back(qnn_inputs_array);
-  for (int i = 0; i < qnn_op->inputs.size(); ++i) { qnn_inputs_array[i] = *qnn_op->inputs[i]->getQnnTensor(); }
+  for (size_t i = 0; i < qnn_op->inputs.size(); ++i) { qnn_inputs_array[i] = *qnn_op->inputs[i]->getQnnTensor(); }

   // Outputs
   Qnn_Tensor_t* qnn_outputs_array = (Qnn_Tensor_t*)malloc(qnn_op->outputs.size() * sizeof(Qnn_Tensor_t));
   qnn_op->unreachable_handle_.emplace_back(qnn_outputs_array);
-  for (int i = 0; i < qnn_op->outputs.size(); ++i) { qnn_outputs_array[i] = *qnn_op->outputs[i]->getQnnTensor(); }
+  for (size_t i = 0; i < qnn_op->outputs.size(); ++i) { qnn_outputs_array[i] = *qnn_op->outputs[i]->getQnnTensor(); }
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)

91-97: Avoid compiling regex inside a loop.

The regex R"(^model\.\d+\.s\d+$)" is compiled on every iteration of the loop. Since the pattern is constant, compile it once outside the loop for better performance.

🔎 Proposed fix
+  // Pre-compile regex patterns
+  static const std::regex model_subgraph_pattern(R"(^model\.\d+\.s\d+$)");
+
   // Validate that at least one model.x.sN subgraph exists (required for the lowering)
   // We don't require specifically model.0.s32, but any model.x.sN pattern
   bool has_valid_subgraph = false;
   for (const auto& [name, _] : subgraph_map_) {
-    if (std::regex_match(name, std::regex(R"(^model\.\d+\.s\d+$)"))) {
+    if (std::regex_match(name, model_subgraph_pattern)) {
       has_valid_subgraph = true;
       break;
     }
   }

73-81: Consider making the regex pattern a class-level constant.

The model_pattern regex is used for validation but could be reused. Making it a static const at class or namespace level improves clarity and avoids recompilation if this function is called multiple times.

🔎 Proposed refactor

At namespace level (around line 16):

namespace {
const std::regex kModelPattern(R"(^model(\.\d+\.s\d+)?$)");
const std::regex kModelSubgraphPattern(R"(^model\.\d+\.s\d+$)");
}  // namespace

Then use kModelPattern at line 75 and kModelSubgraphPattern at line 93.


159-161: Consider more informative error handling on compile failure.

If aot_graph->compile() returns false, the assertion fails without context about which subgraph failed. Adding the subgraph name to the error message would aid debugging.

🔎 Proposed fix
     // Compile
-    MLLM_RT_ASSERT(aot_graph->compile());
+    if (!aot_graph->compile()) {
+      MLLM_ERROR("LLM2QnnLoweringPass: Failed to compile AOT graph for subgraph '{}'", subgraph_name);
+      return ir::PASS_RET_FAILURE;
+    }
   }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f345cd and 65c7818.

📒 Files selected for processing (5)
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Embedding.cpp
  • mllm/backends/qnn/aot/visitor/Embedding.hpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add comments for complex algorithms or non-obvious logic.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : Encourage consistent coding style and patterns with the existing codebase.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
🧬 Code graph analysis (2)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (16)
  • v (103-105)
  • v (103-103)
  • v (112-112)
  • v (114-114)
  • v (116-116)
  • v (118-118)
  • QnnAOTGraph (181-181)
  • g_name (183-186)
  • g_name (183-183)
  • create (48-50)
  • create (48-48)
  • name (158-158)
  • name (273-273)
  • name (275-275)
  • name (277-277)
  • name (298-298)
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)
  • env (26-26)
  • env (26-26)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
mllm/backends/qnn/QNNUtils.hpp (2)
  • name (191-192)
  • name (195-196)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 183-183: 2 adjacent parameters of 'create' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 202-202: constructor does not initialize these fields: name_, qnn_ctx_handle_, graphs_, static_tensor_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-android
  • GitHub Check: build-macos
🔇 Additional comments (6)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (3)

109-110: LGTM!

Clean inline accessor following the existing pattern used in QnnAOTParamTensor::getQnnTensor().


181-186: LGTM!

The constructor and factory method follow the established patterns in this file (e.g., QnnAOTNodeTensor::create). The factory correctly delegates to the constructor.


298-299: getContext() should validate the context name exists or document its behavior.

The implementation uses operator[] on contexts_ which inserts and returns a default-constructed (null) shared_ptr if the name doesn't exist. While current callers (at lines 522 and 535) use belongs_context_name_ which is validated upstream via MLLM_RT_ASSERT(contexts_.count(qnn_context_name) == 1) before being stored, the public method itself lacks defensive bounds checking. Consider either adding a bounds check and returning nullptr explicitly, or document that callers must ensure the context exists.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

468-477: LGTM with a note on error handling.

The constructor correctly initializes members, links to context, and calls graphCreate. The assertion on QNN_SUCCESS ensures failures are caught during development. For production robustness, consider whether a recoverable error path would be appropriate.


229-229: LGTM!

Good addition to handle the "constant" attribute for marking tensors as static. The priority ordering (attribute overrides tensor type) is consistent with the existing pattern for qnn_graph_outputs and qnn_graph_inputs.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

1-15: LGTM on file structure and includes.

The file follows the project's copyright header format and includes the necessary headers for IR operations, passes, and QNN AOT infrastructure.

Comment on lines +114 to +119
// FIXME: Only support one context right now.
{
int split_graph = aot_cfg["split_graph"];
MLLM_RT_ASSERT_EQ(split_graph, 1);
aot_env->createContext("context.0", true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add validation for JSON config access and avoid hardcoded context name.

Line 116 accesses aot_cfg["split_graph"] without checking if the key exists or if the value is the expected type. This could throw or cause undefined behavior with malformed config. Additionally, "context.0" is hardcoded, reducing flexibility.

🔎 Proposed fix with validation
   // FIXME: Only support one context right now.
   {
-    int split_graph = aot_cfg["split_graph"];
-    MLLM_RT_ASSERT_EQ(split_graph, 1);
-    aot_env->createContext("context.0", true);
+    if (!aot_cfg.contains("split_graph") || !aot_cfg["split_graph"].is_number_integer()) {
+      MLLM_ERROR("LLM2QnnLoweringPass: 'split_graph' config key missing or invalid");
+      return ir::PASS_RET_FAILURE;
+    }
+    int split_graph = aot_cfg["split_graph"].get<int>();
+    MLLM_RT_ASSERT_EQ(split_graph, 1);
+    aot_env->createContext("context.0", true);
   }

Comment on lines +868 to +873
qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(contexts_[qnn_context_name]->qnn_ctx_handle_,
ret->getQnnTensor());
} else {
contexts_[qnn_context_name]->graphs_[graph_name]->all_tensors_.insert({__qnn_tensor_name, ret});
qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor(
contexts_[qnn_context_name]->graphs_[graph_name]->qnn_graph_handle_, ret->getQnnTensor());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing error handling for tensor creation API calls.

The tensorCreateContextTensor and tensorCreateGraphTensor return values are not checked. Other QNN API calls in this file consistently check for QNN_SUCCESS. If tensor creation fails silently, it could cause hard-to-debug issues downstream.

🔎 Proposed fix
   if (__qnn_enable_static_weight) {
     contexts_[qnn_context_name]->static_tensor_.insert({__qnn_tensor_name, ret});
-    qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(contexts_[qnn_context_name]->qnn_ctx_handle_,
-                                                                   ret->getQnnTensor());
+    auto status = qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(
+        contexts_[qnn_context_name]->qnn_ctx_handle_, ret->getQnnTensor());
+    MLLM_RT_ASSERT_EQ(status, QNN_SUCCESS);
   } else {
     contexts_[qnn_context_name]->graphs_[graph_name]->all_tensors_.insert({__qnn_tensor_name, ret});
-    qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor(
+    auto status = qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor(
         contexts_[qnn_context_name]->graphs_[graph_name]->qnn_graph_handle_, ret->getQnnTensor());
+    MLLM_RT_ASSERT_EQ(status, QNN_SUCCESS);
   }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 868 to 873, the calls
to qnn_interface_.tensorCreateContextTensor and tensorCreateGraphTensor are not
checking return values; capture each call's return value, compare it to
QNN_SUCCESS, and on failure log an error including the tensor name and the
context/graph name, perform any necessary cleanup (remove inserted map entry or
release allocated objects), and propagate the failure (return an error/nullptr
or throw) consistent with other QNN API error handling in this file.

Comment on lines +879 to +880
std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { return contexts_[name]; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

getContext returns null shared_ptr for missing keys without indication.

Using operator[] on an unordered_map inserts a default-constructed value (null shared_ptr) if the key doesn't exist. This could lead to null pointer dereferences if callers don't check. Consider using at() (throws on missing key) or find() with explicit null handling.

🔎 Proposed fix using at() for fail-fast behavior
-std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { return contexts_[name]; }
+std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) {
+  auto it = contexts_.find(name);
+  if (it == contexts_.end()) {
+    MLLM_ERROR("QnnAOTEnv::getContext: context '{}' not found", name);
+    return nullptr;
+  }
+  return it->second;
+}
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 879-880, replace the
use of operator[] which inserts a default null shared_ptr for missing keys with
a fail-fast access: use contexts_.at(name) so a missing key throws
std::out_of_range (or, if you prefer explicit handling, use contexts_.find(name)
and return nullptr or throw a descriptive exception). Ensure the function either
allows the exception to propagate or throws a clear std::runtime_error with
context about the missing device name so callers are not left with a null
shared_ptr.

Copy link
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenghuaWang chenghuaWang merged commit 6fbccd0 into UbiquitousLearning:main Dec 30, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants