Skip to content

LLM Rater improvements#70

Merged
viditchopra1500 merged 1 commit intomainfrom
llm-rater-improvements
Oct 21, 2024
Merged

LLM Rater improvements#70
viditchopra1500 merged 1 commit intomainfrom
llm-rater-improvements

Conversation

@viditchopra1500
Copy link
Copy Markdown
Collaborator

LLM Rater can now handle the following cases:

  1. Dealing with missing DISTINCT cases.
  2. Being more lenient with calculated values.
  3. And paying less attention to column names.

Colab Experiments Link: Notebook

@viditchopra1500 viditchopra1500 force-pushed the llm-rater-improvements branch 5 times, most recently from 6613794 to fdc45b2 Compare October 21, 2024 12:20
1) Dealing with missing DISTINCT cases.
2) Being more lenient with calculated values.
3) And paying less attention to column names.
import vertexai
from vertexai.preview.generative_models import GenerationConfig, GenerativeModel

from ratelimit import limits
Copy link
Copy Markdown
Collaborator Author

@viditchopra1500 viditchopra1500 Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rearranged the imports and removed the unused imports.

self.model = GenerativeModel(self.config["model"])

@staticmethod
def remove_duplicates(output_list: list) -> list:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate rows, part of preprocessing before feeding execution outputs into the prompt.

3. Compare the data in the mapped columns between OUTPUT #1 and OUTPUT #2. The data should be
an exact match; there should be no extra or missing data in OUTPUT #2.
3. Compare the data within each mapped column pair between OUTPUT #1 and OUTPUT #2.
Ensure that OUTPUT #2 contains all the data from OUTPUT #1, with no missing or extra rows.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed exact match from here, instead the comparison logic is part of the rules section.


1. Map all columns in OUTPUT #1 to the same columns in OUTPUT #2. Column names might differ as
long as the data they contain represents the same information.
1. Ensure that every column in OUTPUT #1 has a corresponding column in OUTPUT #2 that represents
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased this a bit to highlight the mapping of columns irrespective of column names being different.

@hardikgu23
Copy link
Copy Markdown
Collaborator

Here are a few local bird runs results:

Just curious if these local runs are using 1.5 002 model? If yes, were we able to get similar level of improvement with 001 model or 002 model clearly outperformed 001 model ?

@viditchopra1500
Copy link
Copy Markdown
Collaborator Author

Hii @hardikgu23, yes these local run results are using 1.5-002 model.

Based on my experiments(for this bug: b/373040746). The old prompt for the LLM rater already contained the instructions to allow mapping of columns having different names.
I gave the same prompt to the 1.5-002 model, and it followed the instruction for all the test cases that I was testing on, while the 1.5-001 model failed on half of those cases. Based on this experiment and getting no regression on this new model, I chose to upgrade to it.

@viditchopra1500 viditchopra1500 merged commit 6a66b09 into main Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants