Add new binary metrics #69

ethan-tonic · 2024-02-23T20:13:44Z

This pull request adds new binary metrics. The added metrics include:

BinaryMetric: Lets you pass in an arbitrary function that evaluates to true or false
RegexMetric: Searches response matching regex
AnswerMatchMetric: Checks answer matches exactly
DuplicationMetric: Checks whether or not there's duplicate information in the response
ResponseLengthMetric: Checks that response is within a certain number of characters
ContextLengthMetric: Checks that context length is within a certain range
ContainsTextMetric: Checks whether or not response contains the given text

tonic_validate/metrics/regex_metric.py

joeferraratonic · 2024-03-06T01:00:29Z

How do you envision these metrics being used with ValidateScorer and logged to the UI? Or do you not envision that?

I ask because a lot of the metrics would depend on the question, and so could not be used on a batch of questions and answers like how ValidateScorer uses the already existing metrics. The regex, answer match, and contains text metrics I'd imagine you'd want the metric to change for each question.

joeferraratonic · 2024-03-06T01:09:10Z

tonic_validate/metrics/binary_metric.py

+        self._name = name
+        self.callback = callback
+
+    def score(self, llm_response: LLMResponse, openai_service: OpenAIService) -> float:


We need some way to be able to use these metrics without passing an OpenAIService for the metrics that do not require an OpenAIService. My suggestion would be to make this parameter optional and default to None.

All the metrics are called by our ValidateScorer. If we allow for it to be None, then we'd need to change the logic in the scorer to not pass it for metrics that don't need it. That becomes harder to do when people are writing their own metrics (e.g. a custom binary metric) as we don't know if they're using OpenAI or not. Sure we could have them set something like self.uses_openai = True, but that's a bad pattern since it relies on the user remembering to change it. It's easier just to pass it in regardless in which case OpenAIService will never be None anyhow. Plus, if we set it to be optional, then we will have to rewrite all of our existing code to check that it's not None in order to satisfy typechecking.

The case I'm thinking of is when a user is using one of the metrics outside of ValidateScorer. In that case, they should not need to instantiate and pass an OpenAIService when it's not needed.

If it's optional then ValidateScorer can pass all the metrics its OpenAIService without any issues.

That's not worth it I think because that's a really niche use case and it would lead to us having to add checks everywhere in the code for it (as well as checks when the person writes their own metrics)

I think whether a metric actually needs an LLM evaluator is an important distinction to make and should be reflected in the metric. I can imagine users thinking, I want to create a bunch of tests that use metrics that do not use an LLM evaluator (because the tests are way cheaper without an LLM evaluator). Then they don't need an OpenAIService and passing a model evaluator to ValidateScorer is unnecessary.

What do you think about putting the logic into the classes, so we would have a metric base class that uses LLM assisted evaluation and a metric base class that does not use LLM assisted evaluation? In the metrics that do not use LLM assisted evaluation, they would not need an OpenAIService.

Yeah, I can add a subclass in for this. That would make sense, but I'll probably do it in a separate PR since it's a bit out of scope

tonic_validate/metrics/answer_match_metric.py

joeferraratonic

These metrics look great! I have three main comments about changes:

The main question in my mind is, "how is the user going to use these metrics and log them to the validate UI?" Are they going to use ValidateScorer?
An example jupyter notebook using these metrics would be valuable.
There are some small changes I requested in other comments.

1 and 2 can be left for a separate PR, but I'd like to hear your thoughts on 1.

ethan-tonic · 2024-03-06T18:24:44Z

How do you envision these metrics being used with ValidateScorer and logged to the UI? Or do you not envision that?

I ask because a lot of the metrics would depend on the question, and so could not be used on a batch of questions and answers like how ValidateScorer uses the already existing metrics. The regex, answer match, and contains text metrics I'd imagine you'd want the metric to change for each question.

Yeah, they should all be compatible with the UI. I did not really think much about it changing per question though. Simply because the way I imagined it is that you have some format you want for your LLM responses and those text matching metrics allow you to check that it meets your desired format. The answer match is probably the least useful at checking if it matches the desired format, but could be used still to make sure if given a certain question (e.g. asking the rag system to give you sensitive info) then the rag system will always return the same refusal answer. Anyhow, I can definitely see the usefulness for changing it question by question (which we should allow in the future)

joeferraratonic · 2024-03-06T18:52:50Z

How do you envision these metrics being used with ValidateScorer and logged to the UI? Or do you not envision that?
I ask because a lot of the metrics would depend on the question, and so could not be used on a batch of questions and answers like how ValidateScorer uses the already existing metrics. The regex, answer match, and contains text metrics I'd imagine you'd want the metric to change for each question.

Yeah, they should all be compatible with the UI. I did not really think much about it changing per question though. Simply because the way I imagined it is that you have some format you want for your LLM responses and those text matching metrics allow you to check that it meets your desired format. The answer match is probably the least useful at checking if it matches the desired format, but could be used still to make sure if given a certain question (e.g. asking the rag system to give you sensitive info) then the rag system will always return the same refusal answer. Anyhow, I can definitely see the usefulness for changing it question by question (which we should allow in the future)

In this case, we need clear documentation in the docs and perhaps in doc strings for intended use of these metrics. It's not clear at the moment because I imaged the regex, answer match, and contains text metrics would change what they are looking for question to question.

ethan-tonic · 2024-03-06T20:22:37Z

Re docs: Adam mentioned given the number of metrics we should just add them to the gitbook docs and not to the readme itself. So when janice does the docs for that, I think she can mention that

…_validate into new-binary-metrics

joeferraratonic

Looks good, thanks for the PR feedback changes. I made this issue about the metric subclasses #86.

ethan-tonic added 2 commits February 23, 2024 15:11

Added new binary metrics

6ddfc89

Fixed context length metric naming

a778426

ethan-tonic marked this pull request as draft February 26, 2024 22:29

AzizCode92 mentioned this pull request Feb 29, 2024

feat: add hate speech binary metric #79

Merged

ethan-tonic marked this pull request as ready for review March 4, 2024 22:48

ethan-tonic added 6 commits March 4, 2024 17:54

Fixed imports

9f0a3ff

Remove offensive content metric due to duplicate pr

c04a40c

Fix init

f5fc911

Fix name property on binary metrics

5655794

Remove contains numbers due to too many edge cases

293fb09

Fix context length metric

6c1cff3

ethan-tonic requested a review from joeferraratonic March 4, 2024 23:45

ethan-tonic assigned joeferraratonic Mar 4, 2024

joeferraratonic reviewed Mar 6, 2024

View reviewed changes

tonic_validate/metrics/regex_metric.py Outdated Show resolved Hide resolved

add light doc strings

7f5c4c4

joeferraratonic reviewed Mar 6, 2024

View reviewed changes

tonic_validate/metrics/answer_match_metric.py Outdated Show resolved Hide resolved

joeferraratonic reviewed Mar 6, 2024

View reviewed changes

tonic_validate/metrics/answer_match_metric.py Show resolved Hide resolved

joeferraratonic requested changes Mar 6, 2024

View reviewed changes

ethan-tonic added 2 commits March 6, 2024 15:43

PR feedback

d9915d8

Merge branch 'new-binary-metrics' of https://github.com/TonicAI/tonic…

486b10f

…_validate into new-binary-metrics

ethan-tonic requested a review from joeferraratonic March 6, 2024 20:44

joeferraratonic mentioned this pull request Mar 6, 2024

Metric classes for metrics that do and do not use LLM assisted evaluation #86

Open

joeferraratonic approved these changes Mar 6, 2024

View reviewed changes

ethan-tonic merged commit afb9f82 into main Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new binary metrics #69

Add new binary metrics #69

ethan-tonic commented Feb 23, 2024 •

edited

joeferraratonic commented Mar 6, 2024

joeferraratonic Mar 6, 2024

ethan-tonic Mar 6, 2024

joeferraratonic Mar 6, 2024

ethan-tonic Mar 6, 2024

joeferraratonic Mar 6, 2024

ethan-tonic Mar 6, 2024

joeferraratonic left a comment •

edited

ethan-tonic commented Mar 6, 2024

joeferraratonic commented Mar 6, 2024

ethan-tonic commented Mar 6, 2024

joeferraratonic left a comment

Add new binary metrics #69

Add new binary metrics #69

Conversation

ethan-tonic commented Feb 23, 2024 • edited

joeferraratonic commented Mar 6, 2024

joeferraratonic Mar 6, 2024

Choose a reason for hiding this comment

ethan-tonic Mar 6, 2024

Choose a reason for hiding this comment

joeferraratonic Mar 6, 2024

Choose a reason for hiding this comment

ethan-tonic Mar 6, 2024

Choose a reason for hiding this comment

joeferraratonic Mar 6, 2024

Choose a reason for hiding this comment

ethan-tonic Mar 6, 2024

Choose a reason for hiding this comment

joeferraratonic left a comment • edited

Choose a reason for hiding this comment

ethan-tonic commented Mar 6, 2024

joeferraratonic commented Mar 6, 2024

ethan-tonic commented Mar 6, 2024

joeferraratonic left a comment

Choose a reason for hiding this comment

ethan-tonic commented Feb 23, 2024 •

edited

joeferraratonic left a comment •

edited