Skip to content

Warning: Do not support sampling multiple responses #99

@Mushoz

Description

@Mushoz

In the README the following warning can be read:

"Note that the Anthropic API, llama-server (and ollama) currently does not support sampling multiple responses from a model, which limits the available approaches"

I have two questions regarding this warning:

  1. Is this accurate? Llama-server allows for the -np switch which will allow for decoding parallel requests. Should this allow the MOA approach to work for example?
  2. Would it be possible to do these requests sequentially instead of in parallel? I understand that this won't be ideal for speed, but it's better than not working at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions