-
Notifications
You must be signed in to change notification settings - Fork 160
feat: add LLM2QnnLoweringPass and update graph splitting logic #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds configuration-driven LLM graph splitting and a new LLM-to-QNN lowering pass to the QNN AOT pipeline; updates pipeline ordering to run SplitLLMGraphPass, MarkTensorIOPass, then LLM2QnnLoweringPass; adds a Val ptr alias and a layers field in the example config. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Pipeline as AOT Pipeline
participant Config as AOTCompileContext
participant Split as SplitLLMGraphPass
participant MarkIO as MarkTensorIOPass
participant Lower as LLM2QnnLoweringPass
participant Graph as Model/Subgraphs
Pipeline->>Split: run(module)
Split->>Config: read quant_recipe.layers\nand split_graphs
Split->>Graph: locate "model" subgraph\ncompute seq_len from inputs
Split->>Split: create split placeholders\nassign qnn_context_name/qnn_graph_name
Split->>Graph: move LinalgIROp into\ncorresponding split subgraphs
Split->>Graph: rewire CallGraphOp and IO\nprune obsolete subgraphs
Split->>Pipeline: return success
Pipeline->>MarkIO: run(on updated graph)
MarkIO->>Graph: mark tensor IO attributes
MarkIO->>Pipeline: return success
Pipeline->>Lower: run(on marked graph)
Lower->>Graph: iterate ordered subgraphs\nmatch & rewrite LinalgIROp with patterns
Lower->>Pipeline: return success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… check in LLM2QnnLoweringPass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (6)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)
3-3: Unused include while pass is disabled.The
LLM2QnnLoweringPass.hppheader is included butcreateLLM2QnnLoweringPass()is commented out at line 27. Consider either removing the include until the pass is enabled, or adding a comment explaining why it's staged for future activation.mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
16-27: Consider explicitly deleting copy/move operations.Static analysis flags that the class defines a destructor but not other special member functions. Since
Passobjects typically shouldn't be copied during pipeline execution, consider explicitly deleting them:🔎 Proposed fix
class LLM2QnnLoweringPass final : public ir::Pass { public: LLM2QnnLoweringPass(); ~LLM2QnnLoweringPass() override = default; + + LLM2QnnLoweringPass(const LLM2QnnLoweringPass&) = delete; + LLM2QnnLoweringPass& operator=(const LLM2QnnLoweringPass&) = delete; + LLM2QnnLoweringPass(LLM2QnnLoweringPass&&) = delete; + LLM2QnnLoweringPass& operator=(LLM2QnnLoweringPass&&) = delete; uint8_t run(const ir::node_ptr_t& op) override;mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (2)
17-25: Remove redundant TODO comments.The repeated TODO comments add noise without additional value. A single TODO with a clear description is sufficient.
🔎 Proposed fix
LLM2QnnLoweringPass::LLM2QnnLoweringPass() { named_pattern_.insert(QnnAOTAddPattern::create()); - // TODO reg other patterns here. - // TODO reg other patterns here. - // TODO reg other patterns here. - // TODO reg other patterns here. - // TODO reg other patterns here. - // TODO reg other patterns here. + // TODO: Register additional patterns (e.g., Mul, MatMul, Softmax, etc.) }
90-96: Avoid creating regex inside loop; reuse the existing pattern.The regex at line 92 is redundant—
model_pattern(line 74) already matches this pattern. Reuse it to avoid repeated compilation.🔎 Proposed fix
- // Validate that at least one model.x.sN subgraph exists (required for the lowering) - // We don't require specifically model.0.s32, but any model.x.sN pattern - bool has_valid_subgraph = false; - for (const auto& [name, _] : subgraph_map_) { - if (std::regex_match(name, std::regex(R"(^model\.\d+\.s\d+$)"))) { - has_valid_subgraph = true; - break; - } - } + // Validate that at least one model.x.sN subgraph exists (required for the lowering) + // subgraph_map_ excludes "model", so all entries must match model.x.sN + bool has_valid_subgraph = !subgraph_map_.empty();mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
20-22: Use more descriptive parameter and variable names.Per static analysis and readability best practices, rename
gtosubgraphand_towriterfor clarity.🔎 Proposed fix
void recursiveAttachGraphNameAndContextName(const ir::IRContext::ptr_t& ctx, const std::string& qnn_context_name, - const std::string& qnn_graph_name, ir::graph::SubGraphOp::ptr_t& g) { - auto _ = ir::IRWriter(ctx, g->getTopRegion()); - _.walk<ir::Op>([&](ir::IRWriter& w /*writer*/, const ir::Op::ptr_t& owo) -> ir::IRWriter::WalkResult { + const std::string& qnn_graph_name, ir::graph::SubGraphOp::ptr_t& subgraph) { + auto writer = ir::IRWriter(ctx, subgraph->getTopRegion()); + writer.walk<ir::Op>([&](ir::IRWriter& /* unused */, const ir::Op::ptr_t& op) -> ir::IRWriter::WalkResult {
186-188: Use a more descriptive variable name.The variable
wwwwis unclear. Consider renaming tograph_writerorsubgraph_writerfor clarity.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
examples/qwen3_qnn_aot/qnn_aot_cfg.jsonmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/compile/ir/Node.hpppymllm/backends/qualcomm/transformers/static_qwen3_quantizer.py
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/compile/ir/Node.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/compile/ir/Node.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/compile/ir/Node.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧬 Code graph analysis (4)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (3)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
LLM2QnnLoweringPass(17-25)mllm/compile/ir/Node.hpp (2)
op(470-470)op(472-472)mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)
MergeLLMHeadIntoMainGraphPass(9-22)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (3)
LLM2QnnLoweringPass(18-18)LLM2QnnLoweringPass(20-20)op(22-22)mllm/compile/ir/Node.hpp (12)
create(267-302)create(267-267)create(371-403)create(371-371)op(470-470)op(472-472)name(304-304)name(308-308)_(65-65)_(65-65)region(231-231)region(324-324)mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
run(58-231)run(58-58)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (5)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (2)
run(27-147)run(27-27)mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
run(16-109)run(16-16)mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
op(22-22)mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
op(17-17)mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)
op(17-17)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (4)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
createSplitLLMGraphPass(233-233)createSplitLLMGraphPass(233-233)mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
createSplitLLMGraphPass(20-20)mllm/backends/qnn/aot/passes/MarkTensorIO.cpp (2)
createMarkTensorIOPass(120-120)createMarkTensorIOPass(120-120)mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
createMarkTensorIOPass(20-20)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
[error] 16-16: class 'LLM2QnnLoweringPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
[error] 20-20: 3 adjacent parameters of 'recursiveAttachGraphNameAndContextName' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 21-21: parameter name 'g' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 22-22: variable name '_' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 37-37: parameter name 'g' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 38-38: variable name '_' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-x86
- GitHub Check: build-macos
- GitHub Check: build-android
🔇 Additional comments (5)
examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)
18-18: LGTM!The new
layersfield correctly specifies the total number of transformer layers for the Qwen3 model, which is consumed bySplitLLMGraphPassfor configuration-driven graph splitting.mllm/compile/ir/Node.hpp (1)
194-195: LGTM!The
ptr_talias follows the established pattern used byRegion(line 119) andOp(line 155), providing consistency across IR node types.mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)
25-27: Pass ordering looks correct.The pipeline flow is logical: graph marking → op naming → head merging → quantization → graph splitting → tensor I/O marking. The commented-out
createLLM2QnnLoweringPass()indicates future work is pending.mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
119-143: Verify behavior:WALK_BREAKon non-QNN op skips remaining ops in subgraph.When encountering an op without
use_qnnattribute, the code logs a warning and breaks. This leaves remaining ops in that subgraph unprocessed. If this is intentional fail-fast behavior, consider usingMLLM_ERRORinstead ofMLLM_WARNfor clarity. Otherwise, considerWALK_CONTINUEto skip and proceed.mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (1)
37-54: The iterator safety mechanism is properly implemented.IRWriter::removeOp()correctly setsis_iterator_modified_ = true(line 310 in Node.cpp), and the walk loop respects this flag by skipping the increment when the iterator has already been advanced byerase(). This pattern is safe and correctly uses the framework's iterator modification tracking.
| if (call_graph_op == nullptr) { | ||
| MLLM_ERROR("LLM2QnnLoweringPass: No CallGraphOp found in ModuleOp"); | ||
| return ir::PASS_RET_FAILURE; | ||
| } | ||
|
|
||
| // Check call graph op point to a subgraph named "model" | ||
| auto symbol_attr = call_graph_op->getSymbolAttr(); | ||
| if (symbol_attr == nullptr || symbol_attr->str() != "model") { | ||
| MLLM_ERROR("LLM2QnnLoweringPass: CallGraphOp should point to a subgraph named 'model'"); | ||
| return ir::PASS_RET_FAILURE; | ||
| } | ||
|
|
||
| // Get the "model" subgraph | ||
| auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>(); | ||
| if (model_subgraph == nullptr) { | ||
| MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table"); | ||
| return ir::PASS_RET_FAILURE; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error messages reference wrong pass name.
The error messages at lines 75, 82, and 89 incorrectly reference LLM2QnnLoweringPass instead of SplitLLMGraphPass.
🔎 Proposed fix
if (call_graph_op == nullptr) {
- MLLM_ERROR("LLM2QnnLoweringPass: No CallGraphOp found in ModuleOp");
+ MLLM_ERROR("SplitLLMGraphPass: No CallGraphOp found in ModuleOp");
return ir::PASS_RET_FAILURE;
}
// Check call graph op point to a subgraph named "model"
auto symbol_attr = call_graph_op->getSymbolAttr();
if (symbol_attr == nullptr || symbol_attr->str() != "model") {
- MLLM_ERROR("LLM2QnnLoweringPass: CallGraphOp should point to a subgraph named 'model'");
+ MLLM_ERROR("SplitLLMGraphPass: CallGraphOp should point to a subgraph named 'model'");
return ir::PASS_RET_FAILURE;
}
// Get the "model" subgraph
auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
if (model_subgraph == nullptr) {
- MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
+ MLLM_ERROR("SplitLLMGraphPass: Cannot find 'model' subgraph in symbol table");
return ir::PASS_RET_FAILURE;
}🤖 Prompt for AI Agents
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 74-91: the
logged error messages currently reference the wrong pass name
`LLM2QnnLoweringPass`; update the three MLLM_ERROR messages to reference
`SplitLLMGraphPass` instead (i.e., change the literal strings so they mention
SplitLLMGraphPass and keep the rest of the messages intact) to ensure accurate
error reporting.
| auto cfg = AOTCompileContext::getInstance().getConfig(); | ||
| int32_t __global_total_layers = cfg["quant_recipe"]["layers"]; | ||
| int32_t __global_split_graphs = cfg["split_graph"]; | ||
|
|
||
| // Check seq length | ||
| // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence] | ||
| int32_t __global_seq_len = | ||
| model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation for required config fields.
Direct access to cfg["quant_recipe"]["layers"] and cfg["split_graph"] will throw if keys are missing. Add validation before use.
🔎 Proposed fix
// Count how many layers we have.
auto cfg = AOTCompileContext::getInstance().getConfig();
+ if (!cfg.contains("quant_recipe") || !cfg["quant_recipe"].contains("layers")) {
+ MLLM_ERROR("SplitLLMGraphPass: Config missing 'quant_recipe.layers'");
+ return ir::PASS_RET_FAILURE;
+ }
+ if (!cfg.contains("split_graph")) {
+ MLLM_ERROR("SplitLLMGraphPass: Config missing 'split_graph'");
+ return ir::PASS_RET_FAILURE;
+ }
int32_t __global_total_layers = cfg["quant_recipe"]["layers"];
int32_t __global_split_graphs = cfg["split_graph"];🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 104-111, the
code directly indexes cfg["quant_recipe"]["layers"] and cfg["split_graph"] and
assumes model_subgraph->getTopRegion()->inputs() contains a TensorValue; add
explicit validation: verify cfg contains "quant_recipe" and within it "layers",
and contains "split_graph" (or use cfg.find/contains) and that those entries are
the expected numeric type before reading them; if missing or wrong type,
return/throw a clear error message. Also validate
model_subgraph->getTopRegion()->inputs() is non-empty and that the first input
can be cast to ir::tensor::TensorValue and that its tensor_.size() has at least
2 dimensions before accessing size(1); if not, return/throw with a descriptive
error. Ensure any change uses consistent logging or exceptions used elsewhere in
this file.
| // Check seq length | ||
| // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence] | ||
| int32_t __global_seq_len = | ||
| model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add bounds check for tensor shape access.
Accessing tensor_.size(1) assumes the tensor has at least 2 dimensions. Add validation to prevent out-of-bounds access.
🔎 Proposed fix
// Check seq length
// FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence]
- int32_t __global_seq_len =
- model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1);
+ auto first_input = model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>();
+ if (first_input->tensor_.dims() < 2) {
+ MLLM_ERROR("SplitLLMGraphPass: Expected first input to have at least 2 dimensions, got {}", first_input->tensor_.dims());
+ return ir::PASS_RET_FAILURE;
+ }
+ int32_t __global_seq_len = first_input->tensor_.size(1);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Check seq length | |
| // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence] | |
| int32_t __global_seq_len = | |
| model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>()->tensor_.size(1); | |
| // Check seq length | |
| // FIXME: We suppose the first input to LLM is tokend_ids! Whose shape is [Batch, Sequence] | |
| auto first_input = model_subgraph->getTopRegion()->inputs().front()->cast_<ir::tensor::TensorValue>(); | |
| if (first_input->tensor_.dims() < 2) { | |
| MLLM_ERROR("SplitLLMGraphPass: Expected first input to have at least 2 dimensions, got {}", first_input->tensor_.dims()); | |
| return ir::PASS_RET_FAILURE; | |
| } | |
| int32_t __global_seq_len = first_input->tensor_.size(1); |
| size_t last_dot_pos = g_w_g_name.find_last_of('.'); | ||
| int layer_num = 0; | ||
| if (last_dot_pos != std::string::npos && last_dot_pos + 1 < g_w_g_name.length()) { | ||
| layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential exception from std::stoi.
If the subgraph name doesn't end with a valid integer (e.g., "model.foo"), std::stoi will throw. Consider using std::strtol or wrapping in try-catch.
🔎 Proposed fix
size_t last_dot_pos = g_w_g_name.find_last_of('.');
int layer_num = 0;
if (last_dot_pos != std::string::npos && last_dot_pos + 1 < g_w_g_name.length()) {
- layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1));
+ try {
+ layer_num = std::stoi(g_w_g_name.substr(last_dot_pos + 1));
+ } catch (const std::exception& e) {
+ MLLM_WARN("SplitLLMGraphPass: Failed to parse layer number from '{}': {}", g_w_g_name, e.what());
+ continue;
+ }
}🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 135-139, the
call to std::stoi on the substring after the last '.' can throw if the suffix is
not a valid integer; replace this with a safe parse and fallback: either wrap
std::stoi in a try-catch and only set layer_num on success (leaving default 0 on
failure), or use std::strtol on the c_str() of the substring and check the
endptr to ensure the entire substring parsed as digits before assigning
layer_num; also ensure empty substrings are skipped and document the fallback
behavior.
| // FIXME: currently only support one graph! | ||
| MLLM_RT_ASSERT_EQ(__global_split_graphs, 1); | ||
| auto one_graph = | ||
| getCtx()->lookupSymbolTable("model.0.s" + std::to_string(__global_seq_len))->cast_<ir::graph::SubGraphOp>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert runtime assertion to early validation.
The assertion at line 175 (and line 217) will crash if split_graph > 1. Since only single-graph splitting is supported, validate this early and return PASS_RET_FAILURE with a clear error message.
🔎 Proposed fix (add after line 106)
int32_t __global_split_graphs = cfg["split_graph"];
+ if (__global_split_graphs != 1) {
+ MLLM_ERROR("SplitLLMGraphPass: Currently only supports split_graph=1, got {}", __global_split_graphs);
+ return ir::PASS_RET_FAILURE;
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp around lines 174-177 (and
also address the similar assertion at line 217), replace the runtime assertions
that crash when __global_split_graphs > 1 with an early validation: after line
106 add a check if (__global_split_graphs != 1) {
getCtx()->logger().error("SplitLLMGraphPass only supports single graph splitting
(split_graphs=%d)", __global_split_graphs); return PASS_RET_FAILURE; } and
remove or replace the MLLM_RT_ASSERT_EQ calls at 174-177 and 217 so the pass
fails gracefully with the clear error message instead of crashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
72-80: Consider adding a comment explaining the subgraph naming convention.The regex pattern and validation logic are correct, but a brief comment explaining what "model.x.sN" represents (e.g., layer splits) would improve maintainability.
Example:
// Validate subgraph names follow the convention: // - "model": base graph // - "model.x.sN": split subgraphs where x is the split index and N is the sequence number std::regex model_pattern(R"(^model(\.\d+\.s\d+)?$)");
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
mllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- mllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-android
- GitHub Check: build-x86
- GitHub Check: build-macos
🔇 Additional comments (1)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
156-156: LGTM!The factory function implementation is straightforward and correct.
| // Get the "model" subgraph | ||
| auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>(); | ||
| if (model_subgraph == nullptr) { | ||
| MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table"); | ||
| return ir::PASS_RET_FAILURE; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate lookup result before casting.
lookupSymbolTable() may return nullptr, but the code casts before checking. If the symbol doesn't exist, this could cause undefined behavior.
🔎 Proposed fix
- // Get the "model" subgraph
- auto model_subgraph = getCtx()->lookupSymbolTable("model")->cast_<ir::graph::SubGraphOp>();
- if (model_subgraph == nullptr) {
+ // Get the "model" subgraph
+ auto symbol_node = getCtx()->lookupSymbolTable("model");
+ if (symbol_node == nullptr) {
MLLM_ERROR("LLM2QnnLoweringPass: Cannot find 'model' subgraph in symbol table");
return ir::PASS_RET_FAILURE;
}
+ auto model_subgraph = symbol_node->cast_<ir::graph::SubGraphOp>();
+ if (model_subgraph == nullptr) {
+ MLLM_ERROR("LLM2QnnLoweringPass: Symbol 'model' is not a SubGraphOp");
+ return ir::PASS_RET_FAILURE;
+ }🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp around lines 55 to 60,
the result of getCtx()->lookupSymbolTable("model") is cast before being checked
for nullptr; change the logic to first store the lookup result in a local
variable, check if that pointer is nullptr and handle the error/return, and only
then perform the cast to ir::graph::SubGraphOp so you avoid casting a null
pointer and potential undefined behavior.
| subgraph_writer.walk<ir::linalg::LinalgIROp>( | ||
| [&](ir::IRWriter& this_tough_writer, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult { | ||
| if (!linalg_op->belongsTo()->getAttr("use_qnn")) { | ||
| MLLM_WARN("Found none qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), subgraph_name); | ||
| return ir::IRWriter::WalkResult::WALK_BREAK; | ||
| } | ||
| bool processed = false; | ||
| for (auto& [op_type, pass] : named_pattern_) { | ||
| if (pass->isMatch(linalg_op)) { | ||
| if (!pass->rewrite(this_tough_writer, linalg_op)) { | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed when processing op {} with pass {}", | ||
| linalg_op->getAOp()->getName(), optype2Str(op_type)); | ||
| } else { | ||
| processed = true; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (!processed) { | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", linalg_op->getAOp()->getName()); | ||
| } | ||
|
|
||
| return ir::IRWriter::WalkResult::WALK_CONTINUE; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null checks for chained method calls.
Lines 128-129 have chained method calls (belongsTo()->getAttr() and getAOp()->getName()) that could dereference null pointers if intermediate methods return nullptr.
🔎 Proposed fix
subgraph_writer.walk<ir::linalg::LinalgIROp>(
[&](ir::IRWriter& this_tough_writer, const ir::linalg::LinalgIROp::ptr_t& linalg_op) -> ir::IRWriter::WalkResult {
- if (!linalg_op->belongsTo()->getAttr("use_qnn")) {
- MLLM_WARN("Found none qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), subgraph_name);
+ auto belongs_to = linalg_op->belongsTo();
+ if (!belongs_to || !belongs_to->getAttr("use_qnn")) {
+ auto aop = linalg_op->getAOp();
+ auto op_name = aop ? aop->getName() : "unknown";
+ MLLM_WARN("Found none qnn op: {} in graph: {}", op_name, subgraph_name);
return ir::IRWriter::WalkResult::WALK_BREAK;
}
bool processed = false;
for (auto& [op_type, pass] : named_pattern_) {
if (pass->isMatch(linalg_op)) {
if (!pass->rewrite(this_tough_writer, linalg_op)) {
+ auto aop = linalg_op->getAOp();
+ auto op_name = aop ? aop->getName() : "unknown";
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed when processing op {} with pass {}",
- linalg_op->getAOp()->getName(), optype2Str(op_type));
+ op_name, optype2Str(op_type));
} else {
processed = true;
break;
}
}
}
if (!processed) {
- MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", linalg_op->getAOp()->getName());
+ auto aop = linalg_op->getAOp();
+ auto op_name = aop ? aop->getName() : "unknown";
+ MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed processing op {} on all passes", op_name);
}
return ir::IRWriter::WalkResult::WALK_CONTINUE;
});…2QnnLoweringPass with AOT context handling, and create new Embedding files
…e tensor handling in LLM2QnnLoweringPass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)
19-26: Fix TODO comment formatting and eliminate duplication.The coding guidelines require TODO comments to follow the format
TODO:(with colon), but these useTODOwithout a colon. Additionally, six identical TODO comments violate DRY principles.Based on coding guidelines for TODO format.
56-61: Validate lookup result before casting.
lookupSymbolTable()may returnnullptr, but the code casts before checking. If the symbol doesn't exist, this causes undefined behavior.
133-157: Add null checks for chained method calls.Lines 135-136 have chained method calls (
belongsTo()->getAttr()andgetAOp()->getName()) that could dereference null pointers if intermediate methods returnnullptr.
🧹 Nitpick comments (5)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
202-216: Consider initializingqnn_ctx_handle_to avoid undefined behavior.The static analysis tool flagged that
QnnDeviceAndContexthas uninitialized members. Whilename_,graphs_, andstatic_tensor_have default constructors,qnn_ctx_handle_(of typeQnn_ContextHandle_t) may contain garbage if not explicitly initialized before use.🔎 Proposed fix
struct QnnDeviceAndContext { using ptr_t = std::shared_ptr<QnnDeviceAndContext>; std::string name_; Qnn_LogHandle_t log_ = nullptr; Qnn_BackendHandle_t bk_handle_ = nullptr; Qnn_DeviceHandle_t device_handle_ = nullptr; QnnBackend_Config_t** bk_cfg_ = nullptr; QnnContext_Config_t** qnn_context_config_ = nullptr; Qnn_ProfileHandle_t profile_bk_handle_ = nullptr; - Qnn_ContextHandle_t qnn_ctx_handle_; + Qnn_ContextHandle_t qnn_ctx_handle_ = nullptr; std::unordered_map<std::string, QnnAOTGraph::ptr_t> graphs_; //< for persistence keep graphs. std::unordered_map<std::string, QnnAOTNodeTensor::ptr_t> static_tensor_; //< for weight sharing. };mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
506-513: Use consistent types for loop indices to avoid sign comparison warnings.The loop variables are declared as
intbut compared against.size()which returnssize_t. This can cause signed/unsigned comparison warnings and potential issues with very large collections.🔎 Proposed fix
// Inputs Qnn_Tensor_t* qnn_inputs_array = (Qnn_Tensor_t*)malloc(qnn_op->inputs.size() * sizeof(Qnn_Tensor_t)); qnn_op->unreachable_handle_.emplace_back(qnn_inputs_array); - for (int i = 0; i < qnn_op->inputs.size(); ++i) { qnn_inputs_array[i] = *qnn_op->inputs[i]->getQnnTensor(); } + for (size_t i = 0; i < qnn_op->inputs.size(); ++i) { qnn_inputs_array[i] = *qnn_op->inputs[i]->getQnnTensor(); } // Outputs Qnn_Tensor_t* qnn_outputs_array = (Qnn_Tensor_t*)malloc(qnn_op->outputs.size() * sizeof(Qnn_Tensor_t)); qnn_op->unreachable_handle_.emplace_back(qnn_outputs_array); - for (int i = 0; i < qnn_op->outputs.size(); ++i) { qnn_outputs_array[i] = *qnn_op->outputs[i]->getQnnTensor(); } + for (size_t i = 0; i < qnn_op->outputs.size(); ++i) { qnn_outputs_array[i] = *qnn_op->outputs[i]->getQnnTensor(); }mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (3)
91-97: Avoid compiling regex inside a loop.The regex
R"(^model\.\d+\.s\d+$)"is compiled on every iteration of the loop. Since the pattern is constant, compile it once outside the loop for better performance.🔎 Proposed fix
+ // Pre-compile regex patterns + static const std::regex model_subgraph_pattern(R"(^model\.\d+\.s\d+$)"); + // Validate that at least one model.x.sN subgraph exists (required for the lowering) // We don't require specifically model.0.s32, but any model.x.sN pattern bool has_valid_subgraph = false; for (const auto& [name, _] : subgraph_map_) { - if (std::regex_match(name, std::regex(R"(^model\.\d+\.s\d+$)"))) { + if (std::regex_match(name, model_subgraph_pattern)) { has_valid_subgraph = true; break; } }
73-81: Consider making the regex pattern a class-level constant.The
model_patternregex is used for validation but could be reused. Making it astatic constat class or namespace level improves clarity and avoids recompilation if this function is called multiple times.🔎 Proposed refactor
At namespace level (around line 16):
namespace { const std::regex kModelPattern(R"(^model(\.\d+\.s\d+)?$)"); const std::regex kModelSubgraphPattern(R"(^model\.\d+\.s\d+$)"); } // namespaceThen use
kModelPatternat line 75 andkModelSubgraphPatternat line 93.
159-161: Consider more informative error handling on compile failure.If
aot_graph->compile()returns false, the assertion fails without context about which subgraph failed. Adding the subgraph name to the error message would aid debugging.🔎 Proposed fix
// Compile - MLLM_RT_ASSERT(aot_graph->compile()); + if (!aot_graph->compile()) { + MLLM_ERROR("LLM2QnnLoweringPass: Failed to compile AOT graph for subgraph '{}'", subgraph_name); + return ir::PASS_RET_FAILURE; + } }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
mllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/visitor/Embedding.cppmllm/backends/qnn/aot/visitor/Embedding.hpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Applied to files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add comments for complex algorithms or non-obvious logic.
Applied to files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : Encourage consistent coding style and patterns with the existing codebase.
Applied to files:
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
🧬 Code graph analysis (2)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (16)
v(103-105)v(103-103)v(112-112)v(114-114)v(116-116)v(118-118)QnnAOTGraph(181-181)g_name(183-186)g_name(183-183)create(48-50)create(48-48)name(158-158)name(273-273)name(275-275)name(277-277)name(298-298)mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)
env(26-26)env(26-26)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
mllm/backends/qnn/QNNUtils.hpp (2)
name(191-192)name(195-196)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
[error] 183-183: 2 adjacent parameters of 'create' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 202-202: constructor does not initialize these fields: name_, qnn_ctx_handle_, graphs_, static_tensor_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-x86
- GitHub Check: build-android
- GitHub Check: build-macos
🔇 Additional comments (6)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (3)
109-110: LGTM!Clean inline accessor following the existing pattern used in
QnnAOTParamTensor::getQnnTensor().
181-186: LGTM!The constructor and factory method follow the established patterns in this file (e.g.,
QnnAOTNodeTensor::create). The factory correctly delegates to the constructor.
298-299:getContext()should validate the context name exists or document its behavior.The implementation uses
operator[]oncontexts_which inserts and returns a default-constructed (null) shared_ptr if the name doesn't exist. While current callers (at lines 522 and 535) usebelongs_context_name_which is validated upstream viaMLLM_RT_ASSERT(contexts_.count(qnn_context_name) == 1)before being stored, the public method itself lacks defensive bounds checking. Consider either adding a bounds check and returning nullptr explicitly, or document that callers must ensure the context exists.mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
468-477: LGTM with a note on error handling.The constructor correctly initializes members, links to context, and calls
graphCreate. The assertion onQNN_SUCCESSensures failures are caught during development. For production robustness, consider whether a recoverable error path would be appropriate.
229-229: LGTM!Good addition to handle the "constant" attribute for marking tensors as static. The priority ordering (attribute overrides tensor type) is consistent with the existing pattern for
qnn_graph_outputsandqnn_graph_inputs.mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
1-15: LGTM on file structure and includes.The file follows the project's copyright header format and includes the necessary headers for IR operations, passes, and QNN AOT infrastructure.
| // FIXME: Only support one context right now. | ||
| { | ||
| int split_graph = aot_cfg["split_graph"]; | ||
| MLLM_RT_ASSERT_EQ(split_graph, 1); | ||
| aot_env->createContext("context.0", true); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation for JSON config access and avoid hardcoded context name.
Line 116 accesses aot_cfg["split_graph"] without checking if the key exists or if the value is the expected type. This could throw or cause undefined behavior with malformed config. Additionally, "context.0" is hardcoded, reducing flexibility.
🔎 Proposed fix with validation
// FIXME: Only support one context right now.
{
- int split_graph = aot_cfg["split_graph"];
- MLLM_RT_ASSERT_EQ(split_graph, 1);
- aot_env->createContext("context.0", true);
+ if (!aot_cfg.contains("split_graph") || !aot_cfg["split_graph"].is_number_integer()) {
+ MLLM_ERROR("LLM2QnnLoweringPass: 'split_graph' config key missing or invalid");
+ return ir::PASS_RET_FAILURE;
+ }
+ int split_graph = aot_cfg["split_graph"].get<int>();
+ MLLM_RT_ASSERT_EQ(split_graph, 1);
+ aot_env->createContext("context.0", true);
}| qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(contexts_[qnn_context_name]->qnn_ctx_handle_, | ||
| ret->getQnnTensor()); | ||
| } else { | ||
| contexts_[qnn_context_name]->graphs_[graph_name]->all_tensors_.insert({__qnn_tensor_name, ret}); | ||
| qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor( | ||
| contexts_[qnn_context_name]->graphs_[graph_name]->qnn_graph_handle_, ret->getQnnTensor()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for tensor creation API calls.
The tensorCreateContextTensor and tensorCreateGraphTensor return values are not checked. Other QNN API calls in this file consistently check for QNN_SUCCESS. If tensor creation fails silently, it could cause hard-to-debug issues downstream.
🔎 Proposed fix
if (__qnn_enable_static_weight) {
contexts_[qnn_context_name]->static_tensor_.insert({__qnn_tensor_name, ret});
- qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(contexts_[qnn_context_name]->qnn_ctx_handle_,
- ret->getQnnTensor());
+ auto status = qnn_htp_func_symbols_.qnn_interface_.tensorCreateContextTensor(
+ contexts_[qnn_context_name]->qnn_ctx_handle_, ret->getQnnTensor());
+ MLLM_RT_ASSERT_EQ(status, QNN_SUCCESS);
} else {
contexts_[qnn_context_name]->graphs_[graph_name]->all_tensors_.insert({__qnn_tensor_name, ret});
- qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor(
+ auto status = qnn_htp_func_symbols_.qnn_interface_.tensorCreateGraphTensor(
contexts_[qnn_context_name]->graphs_[graph_name]->qnn_graph_handle_, ret->getQnnTensor());
+ MLLM_RT_ASSERT_EQ(status, QNN_SUCCESS);
}🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 868 to 873, the calls
to qnn_interface_.tensorCreateContextTensor and tensorCreateGraphTensor are not
checking return values; capture each call's return value, compare it to
QNN_SUCCESS, and on failure log an error including the tensor name and the
context/graph name, perform any necessary cleanup (remove inserted map entry or
release allocated objects), and propagate the failure (return an error/nullptr
or throw) consistent with other QNN API error handling in this file.
| std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { return contexts_[name]; } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getContext returns null shared_ptr for missing keys without indication.
Using operator[] on an unordered_map inserts a default-constructed value (null shared_ptr) if the key doesn't exist. This could lead to null pointer dereferences if callers don't check. Consider using at() (throws on missing key) or find() with explicit null handling.
🔎 Proposed fix using at() for fail-fast behavior
-std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) { return contexts_[name]; }
+std::shared_ptr<QnnDeviceAndContext> QnnAOTEnv::getContext(const std::string& name) {
+ auto it = contexts_.find(name);
+ if (it == contexts_.end()) {
+ MLLM_ERROR("QnnAOTEnv::getContext: context '{}' not found", name);
+ return nullptr;
+ }
+ return it->second;
+}🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 879-880, replace the
use of operator[] which inserts a default null shared_ptr for missing keys with
a fail-fast access: use contexts_.at(name) so a missing key throws
std::out_of_range (or, if you prefer explicit handling, use contexts_.find(name)
and return nullptr or throw a descriptive exception). Ensure the function either
allows the exception to propagate or throws a clear std::runtime_error with
context about the missing device name so callers are not left with a null
shared_ptr.
UbiquitousLearning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary by CodeRabbit
New Features
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.