-
Notifications
You must be signed in to change notification settings - Fork 238
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
In the README the following warning can be read:
"Note that the Anthropic API, llama-server (and ollama) currently does not support sampling multiple responses from a model, which limits the available approaches"
I have two questions regarding this warning:
- Is this accurate? Llama-server allows for the -np switch which will allow for decoding parallel requests. Should this allow the MOA approach to work for example?
- Would it be possible to do these requests sequentially instead of in parallel? I understand that this won't be ideal for speed, but it's better than not working at all.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested