Skip to content

Feature/enhance harness report to include detailed score counts and grouped results#1132

Merged
chakravarthik27 merged 11 commits intorelease/2.5.0from
feature/enhance-harness-report-to-include-detailed-score-counts-and-grouped-results
Nov 18, 2024
Merged

Feature/enhance harness report to include detailed score counts and grouped results#1132
chakravarthik27 merged 11 commits intorelease/2.5.0from
feature/enhance-harness-report-to-include-detailed-score-counts-and-grouped-results

Conversation

@chakravarthik27
Copy link
Copy Markdown
Collaborator

This pull request introduces several changes to the langtest package, focusing on enhancing the evaluation framework and improving code structure. The key changes include the addition of the EvalTemplate class, modifications to the is_pass_llm_eval function, and updates to the model_report function.

Enhancements to Evaluation Framework:

  • Addition of EvalTemplate Class: Introduced the EvalTemplate class in langtest/metrics/llm_eval.py to build a prompt for evaluating student answers based on a given rubric. This class includes a method build_prompt that constructs a grading prompt. (langtest/metrics/llm_eval.py)

  • Updates to is_pass_llm_eval Function: Modified the is_pass_llm_eval function in langtest/utils/custom_types/helpers.py to accept an eval_template parameter. This allows for customizable evaluation templates, improving the flexibility of the evaluation process. (langtest/utils/custom_types/helpers.py) [1] [2]

Code Structure and Typing Improvements:

  • Typing Enhancements: Updated type annotations to include Mapping and Union for better type safety and clarity. (langtest/metrics/llm_eval.py, langtest/utils/custom_types/helpers.py) [1] [2]

  • Changes in BaseQASample Class: Modified the config attribute in the BaseQASample class to use a Mapping type for better structure and clarity. (langtest/utils/custom_types/sample.py)

Reporting Improvements:

  • Enhanced model_report Function: Improved the model_report function to handle multiple keys in the summary dictionary, calculate pass rates more accurately, and rearrange the columns in the final report for better readability. (langtest/utils/report_utils.py)

These changes collectively enhance the flexibility, readability, and maintainability of the codebase.

@chakravarthik27 chakravarthik27 self-assigned this Oct 26, 2024
@chakravarthik27 chakravarthik27 merged commit 70a7d3a into release/2.5.0 Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance Harness Report to Include Detailed Score Counts and Grouped Results

1 participant