-
Notifications
You must be signed in to change notification settings - Fork 474
Description
Description
Using a model without a matching template, use LlamaTemplate passing in the model to pull the template from the model. Calling Apply results in an index out of range exception. The reason is that llama_chat_apply_template returns a -1 instead of the length when there isn't a matching model, and resulting in this line failing
output.AsSpan(0, outputLength).CopyTo(_result);Reproduction Steps
Using Ministral-8B-Instruct-2410-Q6_K_L.gguf, use LlamaTemplate passing in the model to pull the template from the model. The StatelessModeExecute is an easy way to reproduce now that it is tries to apply the system template, but anything using LlamaTemplate will get an exception.
Environment & Configuration
- Operating system: Windows 11
- .NET runtime version: .net 9
- LLamaSharp version: main branch
- CUDA version (if you are using cuda backend): 12
Known Workarounds
Following the llama.cpp lead (https://github.com/ggerganov/llama.cpp/blob/master/src/llama.cpp#L23348), if we can't find a template then apply Chatml. Funny enough this doesn't work at all for the model I'm testing, but it feels like the proper behavior until llama.cpp gets updated to support it, I suppose. llama.cpp has llama_chat_detect_template they can call to complete circumvent even trying to apply the template which would probably be the move, but that's no exposed. in the meantime, this seems to work
var outputLength = ApplyInternal(_nativeChatMessages.AsSpan(0, Count), output);
if (outputLength == -1)
{
// worst case: there is no information about template, we will use chatml by default
outputLength = ApplyChatmlInternal(_nativeChatMessages.AsSpan(0, Count), output);
}
// snip
unsafe int ApplyChatmlInternal(Span<LLamaChatMessage> messages, byte[] output)
{
fixed (byte* customTemplatePtr = Encoding.GetBytes("chatml\0"))
fixed (byte* outputPtr = output)
fixed (LLamaChatMessage* messagesPtr = messages)
{
return NativeApi.llama_chat_apply_template(_model, customTemplatePtr, messagesPtr, (nuint)messages.Length, AddAssistant, outputPtr, output.Length);
}
}