Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b1b098d
website: Added Factuality Test to tutorials
RakshitKhajuria Sep 18, 2023
ba6ebf7
website: added factuality test
RakshitKhajuria Sep 18, 2023
5796523
website: added sensitivity to navigation
RakshitKhajuria Sep 18, 2023
d7fe2f3
website: added sensitivity.md
RakshitKhajuria Sep 18, 2023
2878583
website: added Sensitivity Test to tutorials
RakshitKhajuria Sep 18, 2023
b5550b5
website: added sensitivity test
RakshitKhajuria Sep 18, 2023
ce330c3
website: updated title
RakshitKhajuria Sep 18, 2023
94571f4
website: added new tasks
RakshitKhajuria Sep 18, 2023
c216cf4
website: added boolq-bias and xsum-bias
RakshitKhajuria Sep 18, 2023
79fc890
website: added boolq-bias and xsum-bias
RakshitKhajuria Sep 18, 2023
8639746
update gender classifier disclaimer
alytarik Sep 18, 2023
abb8ab6
updated data.md
Prikshit7766 Sep 18, 2023
3d6ed32
updated one_liner.md
Prikshit7766 Sep 18, 2023
9679242
Merge branch 'docs/website-changes' of https://github.com/JohnSnowLab…
Prikshit7766 Sep 18, 2023
19fed76
Merge branch 'release/1.5.0' into docs/website-changes
alytarik Sep 18, 2023
f7f8f03
updated order_bias.md and test.md
Prikshit7766 Sep 18, 2023
5d260df
update fairness & representation notebook
alytarik Sep 18, 2023
b150bf3
Merge branch 'docs/website-changes' of https://github.com/JohnSnowLab…
alytarik Sep 18, 2023
3836c1f
updated tutorials.md
Prikshit7766 Sep 18, 2023
d6da1b9
data.md : Sensitivity Test
Prikshit7766 Sep 18, 2023
48345ce
add tutorials for wino-bias and legal-support
ArshaanNazir Sep 19, 2023
e317349
add wini-bias test to website
ArshaanNazir Sep 19, 2023
83cf418
Add Legal-Support Test and Update NB
ArshaanNazir Sep 19, 2023
bcdf976
update test.md
ArshaanNazir Sep 19, 2023
0df9729
update task/one-liners/NB
ArshaanNazir Sep 19, 2023
dad8449
update one-liners
ArshaanNazir Sep 19, 2023
ef2bc4e
remove unnecessary transformers
alytarik Sep 19, 2023
8a799c7
Chore(notebook): Removed langchain from pip install
RakshitKhajuria Sep 19, 2023
6d1fac3
Chore(website): Removed langchain -> model/landing page
RakshitKhajuria Sep 19, 2023
2166fd9
Chore(website): Removed langchain -> OneLiners
RakshitKhajuria Sep 19, 2023
4510eba
Merge branch 'release/1.5.0' of https://github.com/JohnSnowLabs/langt…
RakshitKhajuria Sep 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,langchain,openai,transformers]\" "
"!pip install \"langtest[evaluate,openai,transformers]\" "
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion demo/tutorials/llm_notebooks/Clinical_Tests.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
"outputs": [],
"source": [
"!pip install \"langtest[langchain,openai,transformers]\""
"!pip install \"langtest[openai,transformers]\""
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions demo/tutorials/llm_notebooks/Factuality_Test.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[langchain,openai,transformers]\" "
"!pip install \"langtest[openai,transformers]\" "
]
},
{
Expand Down Expand Up @@ -1391,7 +1391,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.9.13"
},
"orig_nbformat": 4
},
Expand Down
4 changes: 2 additions & 2 deletions demo/tutorials/llm_notebooks/Legal_Support.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
"outputs": [],
"source": [
"!pip install \"langtest[langchain,openai,transformers]\""
"!pip install \"langtest[openai]\""
]
},
{
Expand Down Expand Up @@ -175,7 +175,7 @@
"id": "jWPAw9q0PwD1"
},
"source": [
"We have specified task as `wino-bias` , hub as `huggingface` and model as `bert-base-uncased`\n",
"We have specified task as `legal-tests` , hub as `openai` and model as `text-davinci-002`\n",
"\n"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,langchain,openai,transformers]\" "
"!pip install \"langtest[evaluate,openai,transformers]\" "
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion demo/tutorials/llm_notebooks/Sensitivity_Test.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,langchain,openai,transformers]\""
"!pip install \"langtest[evaluate,openai,transformers]\" "
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion demo/tutorials/llm_notebooks/Toxicity_NB.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,langchain,openai,transformers]\" "
"!pip install \"langtest[evaluate,openai,transformers]\" "
]
},
{
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
"outputs": [],
"source": [
"!pip install \"langtest[langchain,openai,transformers,evaluate]\""
"!pip install \"langtest[openai,transformers,evaluate]\""
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
"outputs": [],
"source": [
"!pip install \"langtest[langchain,openai,transformers,evaluate]\""
"!pip install \"langtest[openai,transformers,evaluate]\""
]
},
{
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion demo/tutorials/misc/Different_Report_formats.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion demo/tutorials/misc/HuggingFace_Dataset_Notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3929,7 +3929,7 @@
},
"outputs": [],
"source": [
"!pip install \"langtest[evaluate,langchain,openai,transformers]\""
"!pip install \"langtest[evaluate,openai,transformers]\""
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion demo/tutorials/misc/Loading_Data_with_Custom_Columns.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
},
"outputs": [],
"source": [
"!pip install \"langtest[langchain,openai,transformers,evaluate]\""
"!pip install \"langtest[openai,transformers,evaluate]\""
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion demo/tutorials/test-specific-notebooks/Fairness_Demo.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
"outputs": [],
"source": [
"!pip install langtest[evaluate,langchain,openai,transformers]"
"!pip install langtest[evaluate,openai,transformers]"
]
},
{
Expand Down
8 changes: 8 additions & 0 deletions docs/_data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,11 @@ tests:
url: /docs/pages/tests/security
- title: Disinformation
url: /docs/pages/tests/disinformation
- title: Sensitivity
url: /docs/pages/tests/sensitivity
- title: Factuality
url: /docs/pages/tests/factuality
- title: Wino Bias
url: /docs/pages/tests/wino-bias
- title: Legal
url: /docs/pages/tests/legal
2 changes: 1 addition & 1 deletion docs/_layouts/landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ <h3 class="grey h3_title">{{ _section.title }}</h3>
<div class="highlight-box">
{% highlight python %}

!pip install "langtest[langchain,openai,transformers]"
!pip install "langtest[openai,transformers]"

from langtest import Harness

Expand Down
72 changes: 71 additions & 1 deletion docs/pages/docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ Supported `data_source` formats are task-dependent. The following table provides
| **clinical-tests** | Select list of curated datasets |
| **disinformation-test** | Select list of curated datasets |
| **political** | Select list of curated datasets |
| **factuality test** | Select list of curated datasets |
| **sensitivity test** | Select list of curated datasets |

</div><div class="h3-box" markdown="1">

Expand Down Expand Up @@ -188,6 +190,7 @@ To test Question Answering models, the user is meant to select a benchmark datas
| **BoolQ-dev-tiny** | [BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions](https://aclanthology.org/N19-1300/) | Truncated version of the dev set from the BoolQ dataset, containing 50 labeled examples |
| **BoolQ-test** | [BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions](https://aclanthology.org/N19-1300/) | Test set from the BoolQ dataset, containing 3,245 labeled examples. This dataset does not contain labels and accuracy & fairness tests cannot be run with it. |
| **BoolQ-test-tiny** | [BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions](https://aclanthology.org/N19-1300/) | Truncated version of the test set from the BoolQ dataset, containing 50 labeled examples. This dataset does not contain labels and accuracy & fairness tests cannot be run with it. |
| **BoolQ-bias** | [BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions](https://aclanthology.org/N19-1300/) | Manually annotated bias version of BoolQ dataset, containing 136 labeled examples
| **NQ-open** | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/) | Training & development set from the NaturalQuestions dataset, containing 3,569 labeled examples |
| **NQ-open-test** | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/) | Development set from the NaturalQuestions dataset, containing 1,769 labeled examples |
| **NQ-open-test-tiny** | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/) | Training, development & test set from the NaturalQuestions dataset, containing 50 labeled examples |
Expand Down Expand Up @@ -276,6 +279,7 @@ To test Summarization models, the user is meant to select a benchmark dataset fr
| **XSum** | [Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization](https://aclanthology.org/D18-1206/) | Training & development set from the Extreme Summarization (XSum) Dataset, containing 226,711 labeled examples |
| **XSum-test** | [Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization](https://aclanthology.org/D18-1206/) | Test set from the Xsum dataset, containing 1,000 labeled examples |
| **XSum-test-tiny** | [Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization](https://aclanthology.org/D18-1206/) | Truncated version of the test set from the Xsum dataset, containing 50 labeled examples |
| **XSum-bias** | [Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization](https://aclanthology.org/D18-1206/) | Manually annotated bias version of the Xsum dataset, containing 382 labeled examples

</div><div class="h3-box" markdown="1">
#### Summarization Benchmarks: Use Cases and Evaluations
Expand Down Expand Up @@ -391,7 +395,7 @@ harness = Harness(task='disinformation-test',
model={"model": "j2-jumbo-instruct", "hub":"ai21"},
data={"data_source": "Narrative-Wedging"})
```

</div><div class="h3-box" markdown="1">

### Political Test

Expand All @@ -418,5 +422,71 @@ harness = Harness(task='political',
model={"model": "j2-jumbo-instruct", "hub":"ai21"})
```

</div><div class="h3-box" markdown="1">

### Factuality Test

The Factuality Test is designed to evaluate the ability of LLMs to determine the factuality of statements within summaries, particularly focusing on the accuracy of LLM-generated summaries and potential biases in their judgments. Users should choose a benchmark dataset from the provided list.

#### Datasets

{:.table2}
| Dataset | Source | Description |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| **Factual-Summary-Pairs** | [LLAMA-2 is about as factually accurate as GPT-4 for summaries and is 30x cheaper](https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper) | Factual-Summary-Pairs, containing 371 labeled examples. |

</div><div class="h3-box" markdown="1">

#### Factuality Test Dataset: Use Cases and Evaluations

{:.table2}
| Dataset | Use Case | Notebook |
| --------------------- |
| **Factual-Summary-Pairs** | The Factuality Test is designed to evaluate the ability of LLMs to determine the factuality of statements within summaries, particularly focusing on the accuracy of LLM-generated summaries and potential biases in their judgments. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/Factuality_Test.ipynb) |

</div><div class="h3-box" markdown="1">

#### Passing a Factuality Test Dataset to the Harness

In the Harness, we specify the data input in the following way:

```python
# Import Harness from the LangTest library
from langtest import Harness

harness = Harness(task='factuality-test',
model={"model": "text-davinci-003", "hub":"openai"},
data={"data_source": "Factual-Summary-Pairs"})
```
</div><div class="h3-box" markdown="1">

### Sensitivity Test

The Evaluating Model’s Sensitivity to Negation Test focuses on assessing a model’s responsiveness to negations introduced into its input text. The primary objective is to determine whether the model can effectively detect and respond to negations. Users should choose a benchmark dataset from the provided list.

#### Datasets

{:.table2}
| Dataset | Source | Description |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| **NQ-open** | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/) | Training & development set from the NaturalQuestions dataset, containing 3,569 labeled examples |
| **NQ-open-test** | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/) | Development set from the NaturalQuestions dataset, containing 1,769 labeled examples |
| **NQ-open-test-tiny** | [Natural Questions: A Benchmark for Question Answering Research](https://aclanthology.org/Q19-1026/) | Training, development & test set from the NaturalQuestions dataset, containing 50 labeled examples
| **OpenBookQA-test** | [OpenBookQA Dataset](https://allenai.org/data/open-book-qa) | Testing set from the OpenBookQA dataset, containing 500 multiple-choice elementary-level science questions |
| **OpenBookQA-test-tiny** | [OpenBookQA Dataset](https://allenai.org/data/open-book-qa) | Truncated version of the test set from the OpenBookQA dataset, containing 50 multiple-choice examples.

</div><div class="h3-box" markdown="1">

#### Passing a Sensitivity Test Dataset to the Harness

In the Harness, we specify the data input in the following way:

```python
# Import Harness from the LangTest library
from langtest import Harness

harness = Harness(task='sensitivity-test',
model={"model": "text-davinci-003", "hub":"openai"},
data = {"data_source": "NQ-open-test-tiny"})
```
</div></div>
10 changes: 5 additions & 5 deletions docs/pages/docs/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ h.generate().run().report()
Using any large language model from the [OpenAI API](https://platform.openai.com/docs/models/overview):

```bash
!pip install "langtest[langchain,openai,transformers]"
!pip install "langtest[openai]"
```

```python
Expand Down Expand Up @@ -170,7 +170,7 @@ h.generate().run().report()
#### Pretrained Models

```bash
!pip install "langtest[transformers,langchain,cohere]"
!pip install "langtest[langchain,cohere]"
```

```python
Expand All @@ -193,7 +193,7 @@ h.generate().run().report()
#### Pretrained Models

```bash
!pip install "langtest[transformers,langchain,ai21]"
!pip install "langtest[langchain,ai21]"
```

```python
Expand All @@ -215,7 +215,7 @@ h.generate().run().report()
#### Pretrained Models

```bash
!pip install "langtest[transformers,langchain,openai]"
!pip install "langtest[openai]"
```

```python
Expand Down Expand Up @@ -243,7 +243,7 @@ h.generate().run().report()
#### Pretrained Models

```bash
!pip install "langtest[transformers,langchain,huggingface-hub]"
!pip install "langtest[langchain,huggingface-hub]"
```

```python
Expand Down
Loading