feature/Factuality test by Prikshit7766 · Pull Request #767 · PacificAI/langtest

Prikshit7766 · 2023-09-15T15:11:15Z

Description

Title: Factuality Test Implementation for Language Models

Overview

This pull request introduces the implementation of the Factuality Test for language models (LLMs). The Factuality Test is designed to evaluate the ability of LLMs to determine the factuality of statements within summaries, particularly focusing on the accuracy of LLM-generated summaries and potential biases in their judgments.

Test Objective

The primary goal of the Factuality Test is to assess how well LLMs can identify the factual accuracy of summary sentences. This ensures that LLMs generate summaries consistent with the information presented in the source article.

Data Source

For this test, we utilize the Factual-Summary-Pairs dataset, which is sourced from the following GitHub repository: Factual-Summary-Pairs Dataset.

Methodology

Our test methodology draws inspiration from a reference article titled "LLAMA-2 is about as factually accurate as GPT-4 for summaries and is 30x cheaper".

Bias Identification

We identify bias in the responses based on specific patterns:

Bias Towards A: Occurs when both the "result" and "swapped_result" are "A." This bias is in favor of "A," but it's incorrect, so it's marked as False.
Bias Towards B: Occurs when both the "result" and "swapped_result" are "B." This bias is in favor of "B," but it's incorrect, so it's marked as False.
No Bias : When "result" is "B" and "swapped_result" is "A," there is no bias. However, this statement is incorrect, so it's marked as False.
No Bias : When "result" is "A" and "swapped_result" is "B," there is no bias. This statement is correct, so it's marked as True.

Accuracy Assessment

Accuracy is assessed by examining the "pass" column. If "pass" is marked as True, it indicates a correct response. Conversely, if "pass" is marked as False, it indicates an incorrect response.

Notebook

Fixes Add new Factuality test #748

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Usage

from langtest import Harness

import os
import openai
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

model = {"model": "text-davinci-003", "hub":"openai"}
data = {"data_source": "Factual-Summary-Pairs"}
harness = Harness(task="factuality-test", model=model, data=data)

Checklist:

I've added Google style docstrings to my code.
I've used pydantic for typing when/where necessary.
I have linted my code
I have added tests to cover my changes.

Screenshots (if appropriate):

These results were obtained after running the model on a set of 50 records.

chakravarthik27

LGTM 😄

Prikshit7766 added 11 commits September 15, 2023 13:25

langtest.py: added factuality-test

7064c88

add default config for factuality-test

3b66f74

added FactualitySample

02dbef6

Update datasource

511db42

dataset: Factual-Summary-Pairs

d33dd49

Update modelhandler

75ee984

transform\__init__.py: added FactualityTestFactory

7d7208f

added default_user_prompt

9509f78

updated default config

2945ef8

fixed typo

a0a4f8f

notebook: Factuality_Test

814e88b

Prikshit7766 added the v2.1.0 Issue or request to be done in v2.1.0 release label Sep 15, 2023

Prikshit7766 linked an issue Sep 15, 2023 that may be closed by this pull request

Add new Factuality test #748

Closed

resolve conflicts

9403aa9

Prikshit7766 requested a review from chakravarthik27 September 15, 2023 15:18

Prikshit7766 self-assigned this Sep 15, 2023

Prikshit7766 added 4 commits September 15, 2023 21:12

fix lint

d0cf2a5

update package data

f94ff68

update doc-string

896b6ac

notebook: updated Factuality_Test

35af211

chakravarthik27 approved these changes Sep 18, 2023

View reviewed changes

ArshaanNazir merged commit aebd842 into release/1.5.0 Sep 18, 2023

ArshaanNazir mentioned this pull request Sep 19, 2023

Release/1.5.0 #778

Merged

ArshaanNazir deleted the factuality-test branch September 21, 2023 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/Factuality test#767

feature/Factuality test#767
ArshaanNazir merged 16 commits intorelease/1.5.0from
factuality-test

Prikshit7766 commented Sep 15, 2023 •

edited

Loading

Uh oh!

chakravarthik27 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Prikshit7766 commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Title: Factuality Test Implementation for Language Models

Overview

Test Objective

Data Source

Methodology

Bias Identification

Accuracy Assessment

Notebook

Type of change

Usage

Checklist:

Screenshots (if appropriate):

Uh oh!

chakravarthik27 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Prikshit7766 commented Sep 15, 2023 •

edited

Loading