-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an AnswerCorrectness Metric #653
Comments
@AndresPrez Agreed with the embedding + similarity bit. Have you tried using GEval for this with |
@penguine-ip g-eval is actually pretty nice! thanks for the advise ❤️ |
@AndresPrez No problem! I'm actually going to push out more tutorials soon on the documentation page, I feel like the full capabilities of deepeval mostly go undiscovered 😅 Be including Answer Correctness for sure |
@penguine-ip - thanks for the explanation on this - I too got a bit fooled just because the docs state that ragas metrics are supported, and AnswerCorrectness is one of ragas metrics - bit of internal translation to do to get there. |
@gtmtech Yes but we eventually deleted most of it except for the flagship ragas metrics since they weren't very good n well maintained |
Is your feature request related to a problem? Please describe.
When dealing with evolving Q&A applications, it is important to have a way of evaluating if and how a golden set of question and answers changes over time. For this an Answer Correctness metric that lets an LLM compare an expected answer with an actual answer can be good solution to detect if the actual answers for the golden set question are changing or breaking.
Describe the solution you'd like
Add support for another deepeval metric that takes the above evaluation into account.
Describe alternatives you've considered
Ragas implements an AnswerCorrectness metric that uses both llms and embedding similarity. But it does not give you a reason and the fact that includes embedding similarity ruins the final score. Similarity in embeddings is not a very good idea to ensure two different answers are the same.
The text was updated successfully, but these errors were encountered: