Warning: Do not support sampling multiple responses

In the README the following warning can be read:

"Note that the Anthropic API, llama-server (and ollama) currently does not support sampling multiple responses from a model, which limits the available approaches"

I have two questions regarding this warning:

1. Is this accurate? Llama-server allows for the -np switch which will allow for decoding parallel requests. Should this allow the MOA approach to work for example?
2. Would it be possible to do these requests sequentially instead of in parallel? I understand that this won't be ideal for speed, but it's better than not working at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Warning: Do not support sampling multiple responses #99

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Warning: Do not support sampling multiple responses #99

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions