Implement an AnswerCorrectness Metric #653

AndresPrez · 2024-04-02T15:34:20Z

Is your feature request related to a problem? Please describe.
When dealing with evolving Q&A applications, it is important to have a way of evaluating if and how a golden set of question and answers changes over time. For this an Answer Correctness metric that lets an LLM compare an expected answer with an actual answer can be good solution to detect if the actual answers for the golden set question are changing or breaking.

Describe the solution you'd like
Add support for another deepeval metric that takes the above evaluation into account.

Describe alternatives you've considered
Ragas implements an AnswerCorrectness metric that uses both llms and embedding similarity. But it does not give you a reason and the fact that includes embedding similarity ruins the final score. Similarity in embeddings is not a very good idea to ensure two different answers are the same.

penguine-ip · 2024-04-02T18:19:30Z

@AndresPrez Agreed with the embedding + similarity bit. Have you tried using GEval for this with strict=true? I'm sure it will give good results.

AndresPrez · 2024-04-03T20:32:15Z

@penguine-ip g-eval is actually pretty nice! thanks for the advise ❤️

penguine-ip · 2024-04-04T02:53:19Z

@AndresPrez No problem! I'm actually going to push out more tutorials soon on the documentation page, I feel like the full capabilities of deepeval mostly go undiscovered 😅 Be including Answer Correctness for sure

gtmtech · 2024-05-20T08:24:06Z

@penguine-ip - thanks for the explanation on this - I too got a bit fooled just because the docs state that ragas metrics are supported, and AnswerCorrectness is one of ragas metrics - bit of internal translation to do to get there.

penguine-ip · 2024-05-20T13:21:30Z

@gtmtech Yes but we eventually deleted most of it except for the flagship ragas metrics since they weren't very good n well maintained

AndresPrez closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an AnswerCorrectness Metric #653

Implement an AnswerCorrectness Metric #653

AndresPrez commented Apr 2, 2024 •

edited

Loading

penguine-ip commented Apr 2, 2024

AndresPrez commented Apr 3, 2024

penguine-ip commented Apr 4, 2024

gtmtech commented May 20, 2024

penguine-ip commented May 20, 2024

Implement an AnswerCorrectness Metric #653

Implement an AnswerCorrectness Metric #653

Comments

AndresPrez commented Apr 2, 2024 • edited Loading

penguine-ip commented Apr 2, 2024

AndresPrez commented Apr 3, 2024

penguine-ip commented Apr 4, 2024

gtmtech commented May 20, 2024

penguine-ip commented May 20, 2024

AndresPrez commented Apr 2, 2024 •

edited

Loading