Skip to content

[Bug][MLIR] Type mismatch index vs i64 in MoE expert-mask codegen #228

@YWHyuk

Description

@YWHyuk

Summary

PyTorchSim's MLIR codegen for the expert-mask step of MoE routing emits invalid IR: the same SSA value is defined as vector<NxIndex> but consumed by arith.cmpi as vector<Nxi64>. mlir-opt rejects the IR and the build fails at extension_codecache.py:194 assert(0).

Repro

Minimal LLM that triggers it: a 1-layer DeepSeek-V3 forward.

python scripts/op_coverage.py --models deepseek_v3

(Uses transformers 4.51.3, batch=1, seq_len=32, fp32, num_hidden_layers=1, with n_routed_experts=8 / n_group=2 / topk_group=1.)

The graph fragment that triggers it is the boolean indicator (j == expert_idx) used to mask scores, produced by Inductor as aten._to_copy + aten.bitwise_not + aten.masked_fill + aten.topk. Affected kernel signature:

func.func @kernel(%in_ptr0: memref<32xi64>,
                  %in_ptr1: memref<256xf32>,
                  %out_ptr0: memref<256xf32>)

%in_ptr0 is the top-k expert-id buffer (i64), used both as i64 (compared) and as index (in addressing arithmetic).

Error

.../kernel.mlir:68:48: error: use of value '%tmp19' expects different type than prior uses:
                       'vector<16xi64>' vs 'vector<16xindex>'
    %tmp20 = arith.cmpi eq, %tmp1, %tmp19 : vector<16xi64>
                                   ^
.../kernel.mlir:67:17: note: prior use here
    %tmp19 = arith.addi %tmp18, %const16 : vector<16xindex>

Surrounding IR snippet:

%tmp17 = vector.broadcast %index1 : index to vector<16xindex>
%tmp18 = arith.addi %tmp17, %tmp16  : vector<16xindex>
%tmp19 = arith.addi %tmp18, %const16 : vector<16xindex>
%tmp20 = arith.cmpi eq, %tmp1, %tmp19 : vector<16xi64>   <- type mismatch
%tmp25 = arith.select %tmp20, %ones, %zeros : vector<16xi1>, vector<16xf32>
%tmp26 = arith.fptosi %tmp25 : vector<16xf32> to vector<16xi8>

Suggested fix

An arith.index_cast is missing between the vector<16xindex> lane id and the i64 compare operand (or both should be normalized to one type before arith.cmpi). Likely in the MLIR template that lowers the (arange % N == expert_idx) indicator pattern.

Environment

  • transformers 4.51.3, torch 2.8.0+cu126, python 3.11
  • mlir-opt from /riscv-llvm/bin (PSAL-POSTECH/llvm-project v1.0.8)
  • TOGSim build on develop @ feature/build-pins-and-op-coverage

Why this matters

Blocks any MoE model that goes through topk -> bitwise_not -> masked_fill (currently observed for deepseek_v3; likely affects other group-topk routing patterns).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions