Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for llama.cpp completions #164

Closed
ShelbyJenkins opened this issue Dec 6, 2023 · 2 comments
Closed

Add support for llama.cpp completions #164

ShelbyJenkins opened this issue Dec 6, 2023 · 2 comments
Labels
out of scope Requests which are not related to OpenAI API wontfix This will not be worked on

Comments

@ShelbyJenkins
Copy link

Happy to have this crate!

I have it working with llama.cpp in server mode documented here: https://github.com/ggerganov/llama.cpp/tree/master/examples/server.
Just create the client like:

fn setup_client() -> Client<OpenAIConfig> {
        let backoff = backoff::ExponentialBackoffBuilder::new()
            .with_max_elapsed_time(Some(std::time::Duration::from_secs(60)))
            .build();
        let config = OpenAIConfig::new().with_api_key("").with_api_base(format!(
            "http://{}:{}/v1",
            server::HOST,
            server::PORT
        ));

        Client::with_config(config).with_backoff(backoff)
    }

However, it only works with the llama.cpp /v1/chat/completions end point, and that endpoint lacks some features (notably logit bias). The /completion endpoint, with all the extra features, does not work.

I don't know if this is a tenable long term solution, but as the rust llama.cpp crates haven't been updated for months, and the llama.cpp library seems to be moving very quickly, I was reticent to rely on crates that would require overhauls for every change. This method seemed like it would be stable long term since the local server probably won't change as often.

I think this crate has some potential to be a good base for building projects that rely on multiple APIs as the industry moves towards a standard. Interested to hear your thoughts, and if it's viable happy to contribute anything I create.

@ShelbyJenkins
Copy link
Author

FYI, I wrote a backend to do this, and ended up doing enough to make it usable for others.

Just published it here: https://github.com/ShelbyJenkins/llm_client/

IMO, I think it makes more sense to integrate these features into this library rather than duplicating the code.

@64bit
Copy link
Owner

64bit commented Jan 5, 2024

Hello, Thank you for your appreciation.

The challenge here is maintainence, I'd rather stick with just one provider which is OpenAI. If multiple providers do converge into same APIs then this crate should work with the changes in open PR #125

@64bit 64bit added out of scope Requests which are not related to OpenAI API wontfix This will not be worked on labels Jan 9, 2024
@64bit 64bit closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
out of scope Requests which are not related to OpenAI API wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants