Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two layer evaluation #918

Merged

Conversation

Prikshit7766
Copy link
Contributor

@Prikshit7766 Prikshit7766 commented Dec 6, 2023

Description

Robustness testing aims to evaluate the ability of a model to maintain consistent performance when faced with various perturbations or modifications in the input data. For LLMs, this involves understanding how changes in capitalization, punctuation, typos, contractions, and contextual information affect their prediction performance.

Two-layer method where the comparison between the expected_result and actual_result is conducted

two_layer_evaluation

  • Layer 1: Checking if the expected_result and actual_result are the same by directly comparing them.
    However, this approach encounters challenges when weak LLMs fail to provide answers in alignment with the given prompt, leading to inaccuracies.

  • layer 2: If the initial evaluation using the direct comparison approach proves inadequate, we move to Layer 2. we provide three alternative options for evaluation: String distance, Embedding distance, or utilizing LLMs as evaluators.

This dual-layered approach enhances the robustness of our evaluation metric, allowing for adaptability in scenarios where direct comparisons may fall short.


➤ Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Usage

Checklist:

  • I've added Google style docstrings to my code.
  • I've used pydantic for typing when/where necessary.
  • I have linted my code
  • I have added tests to cover my changes.

Screenshots (if appropriate):

@Prikshit7766 Prikshit7766 changed the base branch from release/1.10.0 to fix-qa-default-config-transformer-llm December 14, 2023 14:00
@Prikshit7766 Prikshit7766 merged commit fe5393b into fix-qa-default-config-transformer-llm Dec 15, 2023
@ArshaanNazir ArshaanNazir deleted the two_layer_evaluation branch December 22, 2023 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants