-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add local evaluation support using LlamaCppChatGenerator #7745
Conversation
It seems like DynamicChatPromptBuilder is being deprecated. I will need to update the generator and this implementation as well. |
Thanks for working on this! Could you fix the code so that the CI tests pass? |
The previous CI fails were due to llama-cpp-python and llama-cpp-haystack not being in the hatch env for testing. I added them and it should fix it, but on my end I'm still failing this test: This is one of those things that I brought up on the discord about this being maybe not the right approach due to coupling (and also now the main haystack repository relies on a package from haystack-core-components). |
I'm going to close this as I don't think this is the proper way to integrate local evaluation support with the current implementation of LLMEvaluator. If there ever exists a way to connect a generator to an evaluator without coupling, I can implement support for llama.cpp for it. |
Related Issues
Proposed Changes:
This implements a WIP version of local evaluation support with the feedback received in the respective issue.
During my work on this, there were some changes to the files, so I fixed the conflicts and merged. I will need to go over those changes more in depth and see if it makes sense to implement some of the changes for the llama.cpp version.
How did you test it?
Added tests (copies of OpenAI tests slightly modified for llama cpp implementation.
Notes for the reviewer
The implementation will require anyone using llm_evaluator to install llama-cpp-python which might not be ideal, but I don't see a way around it (lazy import best solution?)
There is a TODO in the code that I will need to figure out. Ideally, I would like to be able to extract whether the specific model supports the system prompt from Llama, but that doesn't seem easy. My current thought is to allow an optional flag and default to supporting system prompt when no flag is provided.
The link to LlamaCppChatGenerator does not work as it is not live yet.
Installing a model is necessary to run the tests. I added logic to handle that the same way we handle it in the llama cpp generator.
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.