onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8.

I built onnxruntime_genai from source with the cuda execution provider then installed the python wheel.
I tried to run the model microsoft/phi-2 but it seems there is a problem with the GroupQueryAttention node.

Here is the command to build the phi-2 model from Hugging Face :

`python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e cuda -p int4 -o ./example-models/phi2-int4-cuda`

Here is the python code to reproduce the error :

```
import onnxruntime_genai as og
import time

prompt = '''def is_prime(n):
    """
    Determine if n is prime or not
    """'''

model=og.Model(f'example-models\phi2-int4-cuda')

tokenizer = og.Tokenizer(model)

tokens = tokenizer.encode(prompt)

params=og.GeneratorParams(model)
params.set_search_options(max_length=100)
params.input_ids = tokens

start_time = time.time()

output_tokens=model.generate(params)[0]

end_time = time.time()

text = tokenizer.decode(output_tokens)

print(text)
```

Here is the error that I obtain :

```
onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8.
```

OS : Windows 10
Architecture : x64
Language : Python
Onnxruntime version : 1.17.1
Onnxruntime_genai version  : 0.3.0-dev
Cuda version : 12.3


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions