Two layer evaluation #918

Prikshit7766 · 2023-12-06T08:09:19Z

Description

Robustness testing aims to evaluate the ability of a model to maintain consistent performance when faced with various perturbations or modifications in the input data. For LLMs, this involves understanding how changes in capitalization, punctuation, typos, contractions, and contextual information affect their prediction performance.

Two-layer method where the comparison between the expected_result and actual_result is conducted

Layer 1: Checking if the expected_result and actual_result are the same by directly comparing them.
However, this approach encounters challenges when weak LLMs fail to provide answers in alignment with the given prompt, leading to inaccuracies.
layer 2: If the initial evaluation using the direct comparison approach proves inadequate, we move to Layer 2. we provide three alternative options for evaluation: String distance, Embedding distance, or utilizing LLMs as evaluators.

This dual-layered approach enhances the robustness of our evaluation metric, allowing for adaptability in scenarios where direct comparisons may fall short.

➤ Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Usage

Checklist:

I've added Google style docstrings to my code.
I've used pydantic for typing when/where necessary.
I have linted my code
I have added tests to cover my changes.

Screenshots (if appropriate):

Prikshit7766 added 3 commits December 5, 2023 18:30

updated sample.py

74dea35

updated Evaluation_Metrics notebook

b644960

updated Evaluation_Metrics notebook

d6936f6

Prikshit7766 requested a review from chakravarthik27 December 6, 2023 08:09

chakravarthik27 approved these changes Dec 8, 2023

View reviewed changes

updated image links

0140cf5

Prikshit7766 changed the base branch from release/1.10.0 to fix-qa-default-config-transformer-llm December 14, 2023 14:00

Prikshit7766 added 2 commits December 14, 2023 19:59

resolve conflict

0e1ea88

resolve conflict

e85c886

Prikshit7766 merged commit fe5393b into fix-qa-default-config-transformer-llm Dec 15, 2023

ArshaanNazir deleted the two_layer_evaluation branch December 22, 2023 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two layer evaluation #918

Two layer evaluation #918

Prikshit7766 commented Dec 6, 2023 •

edited

Two layer evaluation #918

Two layer evaluation #918

Conversation

Prikshit7766 commented Dec 6, 2023 • edited

Description

Type of change

Usage

Checklist:

Screenshots (if appropriate):

Prikshit7766 commented Dec 6, 2023 •

edited