Summary
PyTorchSim's MLIR codegen for the expert-mask step of MoE routing emits invalid IR: the same SSA value is defined as vector<NxIndex> but consumed by arith.cmpi as vector<Nxi64>. mlir-opt rejects the IR and the build fails at extension_codecache.py:194 assert(0).
Repro
Minimal LLM that triggers it: a 1-layer DeepSeek-V3 forward.
python scripts/op_coverage.py --models deepseek_v3
(Uses transformers 4.51.3, batch=1, seq_len=32, fp32, num_hidden_layers=1, with n_routed_experts=8 / n_group=2 / topk_group=1.)
The graph fragment that triggers it is the boolean indicator (j == expert_idx) used to mask scores, produced by Inductor as aten._to_copy + aten.bitwise_not + aten.masked_fill + aten.topk. Affected kernel signature:
func.func @kernel(%in_ptr0: memref<32xi64>,
%in_ptr1: memref<256xf32>,
%out_ptr0: memref<256xf32>)
%in_ptr0 is the top-k expert-id buffer (i64), used both as i64 (compared) and as index (in addressing arithmetic).
Error
.../kernel.mlir:68:48: error: use of value '%tmp19' expects different type than prior uses:
'vector<16xi64>' vs 'vector<16xindex>'
%tmp20 = arith.cmpi eq, %tmp1, %tmp19 : vector<16xi64>
^
.../kernel.mlir:67:17: note: prior use here
%tmp19 = arith.addi %tmp18, %const16 : vector<16xindex>
Surrounding IR snippet:
%tmp17 = vector.broadcast %index1 : index to vector<16xindex>
%tmp18 = arith.addi %tmp17, %tmp16 : vector<16xindex>
%tmp19 = arith.addi %tmp18, %const16 : vector<16xindex>
%tmp20 = arith.cmpi eq, %tmp1, %tmp19 : vector<16xi64> <- type mismatch
%tmp25 = arith.select %tmp20, %ones, %zeros : vector<16xi1>, vector<16xf32>
%tmp26 = arith.fptosi %tmp25 : vector<16xf32> to vector<16xi8>
Suggested fix
An arith.index_cast is missing between the vector<16xindex> lane id and the i64 compare operand (or both should be normalized to one type before arith.cmpi). Likely in the MLIR template that lowers the (arange % N == expert_idx) indicator pattern.
Environment
- transformers 4.51.3, torch 2.8.0+cu126, python 3.11
- mlir-opt from /riscv-llvm/bin (PSAL-POSTECH/llvm-project v1.0.8)
- TOGSim build on develop @ feature/build-pins-and-op-coverage
Why this matters
Blocks any MoE model that goes through topk -> bitwise_not -> masked_fill (currently observed for deepseek_v3; likely affects other group-topk routing patterns).
Summary
PyTorchSim's MLIR codegen for the expert-mask step of MoE routing emits invalid IR: the same SSA value is defined as
vector<NxIndex>but consumed byarith.cmpiasvector<Nxi64>. mlir-opt rejects the IR and the build fails atextension_codecache.py:194 assert(0).Repro
Minimal LLM that triggers it: a 1-layer DeepSeek-V3 forward.
(Uses transformers 4.51.3, batch=1, seq_len=32, fp32, num_hidden_layers=1, with n_routed_experts=8 / n_group=2 / topk_group=1.)
The graph fragment that triggers it is the boolean indicator
(j == expert_idx)used to mask scores, produced by Inductor asaten._to_copy + aten.bitwise_not + aten.masked_fill + aten.topk. Affected kernel signature:%in_ptr0is the top-k expert-id buffer (i64), used both as i64 (compared) and as index (in addressing arithmetic).Error
Surrounding IR snippet:
Suggested fix
An
arith.index_castis missing between thevector<16xindex>lane id and thei64compare operand (or both should be normalized to one type beforearith.cmpi). Likely in the MLIR template that lowers the(arange % N == expert_idx)indicator pattern.Environment
Why this matters
Blocks any MoE model that goes through
topk -> bitwise_not -> masked_fill(currently observed for deepseek_v3; likely affects other group-topk routing patterns).