[Bug][MLIR] Type mismatch index vs i64 in MoE expert-mask codegen

## Summary
PyTorchSim's MLIR codegen for the expert-mask step of MoE routing emits invalid IR: the same SSA value is defined as `vector<NxIndex>` but consumed by `arith.cmpi` as `vector<Nxi64>`. mlir-opt rejects the IR and the build fails at `extension_codecache.py:194 assert(0)`.

## Repro
Minimal LLM that triggers it: a 1-layer DeepSeek-V3 forward.

```
python scripts/op_coverage.py --models deepseek_v3
```

(Uses transformers 4.51.3, batch=1, seq_len=32, fp32, num_hidden_layers=1, with n_routed_experts=8 / n_group=2 / topk_group=1.)

The graph fragment that triggers it is the boolean indicator `(j == expert_idx)` used to mask scores, produced by Inductor as `aten._to_copy + aten.bitwise_not + aten.masked_fill + aten.topk`. Affected kernel signature:

```
func.func @kernel(%in_ptr0: memref<32xi64>,
                  %in_ptr1: memref<256xf32>,
                  %out_ptr0: memref<256xf32>)
```

`%in_ptr0` is the top-k expert-id buffer (i64), used both as i64 (compared) and as index (in addressing arithmetic).

## Error
```
.../kernel.mlir:68:48: error: use of value '%tmp19' expects different type than prior uses:
                       'vector<16xi64>' vs 'vector<16xindex>'
    %tmp20 = arith.cmpi eq, %tmp1, %tmp19 : vector<16xi64>
                                   ^
.../kernel.mlir:67:17: note: prior use here
    %tmp19 = arith.addi %tmp18, %const16 : vector<16xindex>
```

Surrounding IR snippet:
```
%tmp17 = vector.broadcast %index1 : index to vector<16xindex>
%tmp18 = arith.addi %tmp17, %tmp16  : vector<16xindex>
%tmp19 = arith.addi %tmp18, %const16 : vector<16xindex>
%tmp20 = arith.cmpi eq, %tmp1, %tmp19 : vector<16xi64>   <- type mismatch
%tmp25 = arith.select %tmp20, %ones, %zeros : vector<16xi1>, vector<16xf32>
%tmp26 = arith.fptosi %tmp25 : vector<16xf32> to vector<16xi8>
```

## Suggested fix
An `arith.index_cast` is missing between the `vector<16xindex>` lane id and the `i64` compare operand (or both should be normalized to one type before `arith.cmpi`). Likely in the MLIR template that lowers the `(arange % N == expert_idx)` indicator pattern.

## Environment
- transformers 4.51.3, torch 2.8.0+cu126, python 3.11
- mlir-opt from /riscv-llvm/bin (PSAL-POSTECH/llvm-project v1.0.8)
- TOGSim build on develop @ feature/build-pins-and-op-coverage

## Why this matters
Blocks any MoE model that goes through `topk -> bitwise_not -> masked_fill` (currently observed for deepseek_v3; likely affects other group-topk routing patterns).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][MLIR] Type mismatch index vs i64 in MoE expert-mask codegen #228

Summary

Repro

Error

Suggested fix

Environment

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug][MLIR] Type mismatch index vs i64 in MoE expert-mask codegen #228

Description

Summary

Repro

Error

Suggested fix

Environment

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions