Open
Description
Describe the issue
WebNN developer preview provides a text-generation demo with some LLM models (Phi-3 Mini 4K Instruct, DeepSeek R1 Distill Qwen, TinyLLama, QWen2) which are generated from ONNXRuntime GenAI.
These models have the similar architecture (GQA, MatMulNBits, RotaryEmbedding...), when tested them with WebGPU EP, only the Phi-3 Mini 4K Instruct got unexpected result. other models worked fine.
To reproduce
Test Phi-3 Mini 4K Instruct:
- Access https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=phi3mini&ort=latest from either Edge or Chrome browser
- Wait a moment for the model loading
- Type question in the input box
Test others:
- TinyLlama: https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=tinyllama&ort=latest
- QWen2: https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=qwen2&ort=latest
- DeepSeek R1 Distill Qwen: https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=deepseekr1&ort=latest
Urgency
No response
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.0-dev.20250612-70f14d7670
Execution Provider
'webgpu' (WebGPU)