Conversation
6613794 to
fdc45b2
Compare
1) Dealing with missing DISTINCT cases. 2) Being more lenient with calculated values. 3) And paying less attention to column names.
fdc45b2 to
b785f0a
Compare
| import vertexai | ||
| from vertexai.preview.generative_models import GenerationConfig, GenerativeModel | ||
|
|
||
| from ratelimit import limits |
There was a problem hiding this comment.
Rearranged the imports and removed the unused imports.
| self.model = GenerativeModel(self.config["model"]) | ||
|
|
||
| @staticmethod | ||
| def remove_duplicates(output_list: list) -> list: |
There was a problem hiding this comment.
Remove duplicate rows, part of preprocessing before feeding execution outputs into the prompt.
| 3. Compare the data in the mapped columns between OUTPUT #1 and OUTPUT #2. The data should be | ||
| an exact match; there should be no extra or missing data in OUTPUT #2. | ||
| 3. Compare the data within each mapped column pair between OUTPUT #1 and OUTPUT #2. | ||
| Ensure that OUTPUT #2 contains all the data from OUTPUT #1, with no missing or extra rows. |
There was a problem hiding this comment.
Removed exact match from here, instead the comparison logic is part of the rules section.
|
|
||
| 1. Map all columns in OUTPUT #1 to the same columns in OUTPUT #2. Column names might differ as | ||
| long as the data they contain represents the same information. | ||
| 1. Ensure that every column in OUTPUT #1 has a corresponding column in OUTPUT #2 that represents |
There was a problem hiding this comment.
Rephrased this a bit to highlight the mapping of columns irrespective of column names being different.
Just curious if these local runs are using 1.5 002 model? If yes, were we able to get similar level of improvement with 001 model or 002 model clearly outperformed 001 model ? |
|
Hii @hardikgu23, yes these local run results are using 1.5-002 model. Based on my experiments(for this bug: |
LLM Rater can now handle the following cases:
Colab Experiments Link: Notebook