Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an AnswerCorrectness Metric #653

Closed
AndresPrez opened this issue Apr 2, 2024 · 5 comments
Closed

Implement an AnswerCorrectness Metric #653

AndresPrez opened this issue Apr 2, 2024 · 5 comments

Comments

@AndresPrez
Copy link
Contributor

AndresPrez commented Apr 2, 2024

Is your feature request related to a problem? Please describe.
When dealing with evolving Q&A applications, it is important to have a way of evaluating if and how a golden set of question and answers changes over time. For this an Answer Correctness metric that lets an LLM compare an expected answer with an actual answer can be good solution to detect if the actual answers for the golden set question are changing or breaking.

Describe the solution you'd like
Add support for another deepeval metric that takes the above evaluation into account.

Describe alternatives you've considered
Ragas implements an AnswerCorrectness metric that uses both llms and embedding similarity. But it does not give you a reason and the fact that includes embedding similarity ruins the final score. Similarity in embeddings is not a very good idea to ensure two different answers are the same.

@penguine-ip
Copy link
Contributor

@AndresPrez Agreed with the embedding + similarity bit. Have you tried using GEval for this with strict=true? I'm sure it will give good results.

@AndresPrez
Copy link
Contributor Author

@penguine-ip g-eval is actually pretty nice! thanks for the advise ❤️

@penguine-ip
Copy link
Contributor

@AndresPrez No problem! I'm actually going to push out more tutorials soon on the documentation page, I feel like the full capabilities of deepeval mostly go undiscovered 😅 Be including Answer Correctness for sure

@gtmtech
Copy link

gtmtech commented May 20, 2024

@penguine-ip - thanks for the explanation on this - I too got a bit fooled just because the docs state that ragas metrics are supported, and AnswerCorrectness is one of ragas metrics - bit of internal translation to do to get there.

@penguine-ip
Copy link
Contributor

@gtmtech Yes but we eventually deleted most of it except for the flagship ragas metrics since they weren't very good n well maintained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants