Feature/enhance harness report to include detailed score counts and grouped results#1132
Merged
chakravarthik27 merged 11 commits intorelease/2.5.0from Nov 18, 2024
Conversation
…ounts like rating 1 to 5,
…unncessary comments.
…ed grading functionality
…transformer_prompt_eval function for flexible grade handling
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces several changes to the
langtestpackage, focusing on enhancing the evaluation framework and improving code structure. The key changes include the addition of theEvalTemplateclass, modifications to theis_pass_llm_evalfunction, and updates to themodel_reportfunction.Enhancements to Evaluation Framework:
Addition of
EvalTemplateClass: Introduced theEvalTemplateclass inlangtest/metrics/llm_eval.pyto build a prompt for evaluating student answers based on a given rubric. This class includes a methodbuild_promptthat constructs a grading prompt. (langtest/metrics/llm_eval.py)Updates to
is_pass_llm_evalFunction: Modified theis_pass_llm_evalfunction inlangtest/utils/custom_types/helpers.pyto accept aneval_templateparameter. This allows for customizable evaluation templates, improving the flexibility of the evaluation process. (langtest/utils/custom_types/helpers.py) [1] [2]Code Structure and Typing Improvements:
Typing Enhancements: Updated type annotations to include
MappingandUnionfor better type safety and clarity. (langtest/metrics/llm_eval.py,langtest/utils/custom_types/helpers.py) [1] [2]Changes in
BaseQASampleClass: Modified theconfigattribute in theBaseQASampleclass to use aMappingtype for better structure and clarity. (langtest/utils/custom_types/sample.py)Reporting Improvements:
model_reportFunction: Improved themodel_reportfunction to handle multiple keys in the summary dictionary, calculate pass rates more accurately, and rearrange the columns in the final report for better readability. (langtest/utils/report_utils.py)These changes collectively enhance the flexibility, readability, and maintainability of the codebase.