Closed
Description
Let me first congratulate everyone working on this for:
- Python bindings for llama.cpp
- Making them compatible with openai's api
- Superb documentation!
Was wondering if anyone can help me get this working with BLAS? Right now when the model loads, I see BLAS=0.
I've been using kobold.cpp, and they have a BLAS flag at compile time which enables BLAS. It cuts down the prompt loading time by 3-4X. This is a major factor in handling longer prompts and chat-style messages.
P.S - Was also wondering what the difference is between create_embedding(input) and embed(input)?
Metadata
Metadata
Assignees
Labels
No labels