GitHub - davidheineman/thresh: 🌾 Universal, customizable and deployable fine-grained evaluation for text generation.

Build an Interface | Video Tutorial | Paper

thresh.tools is a platform which makes it easy to create and share fine-grained annotation. It was written specifically for complex annotation of text generation (such as Scarecrow or SALSA) and built to be universal across annotation tasks, quickly customizable and easily deployable.

Quick Start

Visit thresh.tools/demo for an explanation of how our interface creation works!

demo-8-13.mp4

Getting Started with `thresh.tools`

Overview

thresh.tools can be used to customize a fine-grained typology, deploy an interface with co-authors, annotators or the research community and manage fine-grained annotations using Python. We support each step of the fine-grained annotation lifecycle:

Interface Customization Tutorials

These tutorials show how to customize an interface for annotation on thresh.tools.

feature	tutorial	documentation
Edit Types	🔗	Adding Edits
Recursive Question Trees	🔗	Annotating with Recursive Structure
Custom Instructions	🔗	Add Instructions
Paragraph-level Annotation	🔗	Paragraph-level Annotation
Adjudication	🔗	Multi-interface Adjudication
Disable Features	🔗	--
Sub-word Selection	🔗	Sub-word Selection
Multi-lingual Annotation	🔗	Multi-lingual Annotation
Crowdsource Deployment	🔗	Deploy to Crowdsource Platforms

Deploy & Manage Annotation Tutorials

These notebook tutorials show broader usage of thresh for deploying to annoation platforms and managing annotations in Python:

description	tutorial
Load data using the `thresh` library	load_data.ipynb
Deploy an interface to the Prolific platform	deploy_to_prolific.ipynb
Use `tokenizers` to pre-process your dataset	subword_annotation.ipynb

Customize an Interface

All interfaces consists of two elements the typology and the data. The typology defines your interface and data defines the examples to be annotated.

<typology>.yml:

template_name: my_template
template_label: My First thresh.tools Template!
edits:
    ...

<data>.json:

{
    "source": "...",
    "target": "..."
}

Adding Edits ↗️

The edits command defines a list of edits. Each edit can be one of these types:


`type: single_span`	`type: multi_span`	`type: composite`

Additionally, the enable_input and enable_output commands are used to enable selecting the span on the source or target sentences respectively.


`enable_input: true`	`enable_output: true`	`enable_input: true` & `enable_output: true`

To style your edits, the icon is any Font Awesome icon and color is the associated edit color.

edits:
  - name: edit_with_annotation
    label: "Custom Annotation"
    icon: fa-<icon>
    color: <red|orange|yellow|green|teal|blue>
    type: <single_span|multi_span|composite>
    enable_input: <true|false>
    enable_output: <true|false>
    annotation: ...

Annotating with Recursive Structure ↗️

Within each edit, the annotation command is used to specify the annotation questions for each edit. Using the options command, you can specify the question type:


`options: binary`	`options: likert-3`	`options: textbox` & `options: textarea`

Qustions are structured as a tree, so if you list sub-questions under the options field, they will appear after the user has selected a certain annotation.


List of children in `options`	Multiple questions in `options`	Nested sub-children in `options`

edits:
  - name: edit_with_annotation
    ...
    annotation:
    - name: simple_question
      question: "Can you answer this question?"
      options: <likert-3|binary|textbox|textarea>
    - name: grandparent_question
      question: "Which subtype question is important"
      options:
      - name: parent_question_1
        label: "Custom Parent Question"
        question: "Which subchild would you like to select"
        options:
      - name: child_1
        label: "Custom Child Option 1"
      - name: child_2
        label: "Custom Child Option 2"
        ...
      - name: parent_question_2
        label: "Pre-defined Parent Question"
        question: "Can you rate the span on a scale of 1-3?"
        options: <likert-3|binary|textbox|textarea>
      ...
  ...

Add Instructions ↗️

Using the instructions flag, you can add an instructions modal, or prepend the text above the interface using the prepend_instructions flag. Instructions are fomatted with Markdown.

prepend_instructions: <true|false>
instructions: |
  Your instruction text in markdown format.

Paragraph-level Annotation ↗️

To add text before or after the annotation, add the context and _context_before entries to your data JSON. The context field is formatted in Markdown, allowing for titles, subsections or code in your annotation context.

[
  {
    "context": "<context written in markdown>",
    "source_context_before": "...",
    "source": "<selectable text with context>",
    "source_context_after": "...",
    "target_context_before": "...",
    "target": "<selectable text with context>",
    "target_context_after": "...",
  }
]

Additionally, we have utilities under the display command to help with side-by-side or long-context annotations:

display:
 - side-by-side         # shows text and editor next to each other
 - text-side-by-side    # shows source and target next to each other
 - disable-lines        # disables lines between annotations which can be distracting
 - hide-context         # hides the context by default, adding a "show context" button

Multi-interface Adjudication ↗️

To display multiple interfaces simultaneously, use the adjudication flag with the number of interfaces you want to show, and use highlight_first_interface to add a "Your Annotations" label on the first interface.

adjudication: 2
highlight_first_interface: <true|false>

Unlike the traditional data loader (which uses the d parameter), you can specify multiple datasources with the dX parameter as such:

thresh.tools/?d1=<DATASET_1>&d2=<DATASET_2>

Sub-word Selection ↗️

To allow a smoother annotation experience, the span selection will "snap" to the closest word boundary. This boundary is word by default, but can also be defined as such:

For a guide on pre-processing your dataset, please see notebook_tutorials/subword_annotation.ipynb.

tokenization: <word|char|tokenized>

Multi-lingual Annotation ↗️

Any text in our interface can be overriden by specifying its source using the interface_text flag. We create templates for different languages which can be used the language flag.

For a full list of interface text overrides, please reference a langauage template.

language: <zh|en|es|hi|pt|bn|ru|ja|vi|tr|ko|fr|ur>
interface_text:
  typology:
    source_label: "莎士比亚"
    target_label: "现代英语"
  ...

Looking to expand our language support? See our section on contributing.

Deploy an Interface

Please reference the "Deploy" modal within the interface builder for more detail!

Deploy with a Database ↗️

Use the database command to specify a public database to save annotations after users click a "Submit" button. We currently support Firebase for any deployment method (in-house or crowdsourcing). Please see notebook_tutorials/deploy_database_with_firebase.md for a full tutorial on connecting a Firebase database to Thresh.

crowdsource: "custom"
database: 
    type: firebase
    project_id: [your-project-id]
    url: https://[your-project-id].firebaseio.com/
    # collection: thresh     # (default: thresh) The database to use
    # document: annotation   # (default: annotation) The document to use
    field: annotation_set_1  # The document field to store annotations

Deploy to Crowdsource Platforms ↗️

Use the crowdsource command to specify a "Submit" button at the end of annotation. Please see notebook_tutorials/deploy_to_prolific.ipynb for a full guide on deploying an interface programatically.

crowdsource: <prolific>
prolific_completion_code: "XXXXXXX"

Manage Data with the `thresh` Library

pip install thresh

Loading Annotations

To load annotations, simply load your JSON data and call load_annotations:

from thresh import load_interface

# Serialize your typology into a class
YourInterface = load_interface(
    "<path_to_typology>.yml"
)

# Load & serialize data from <file_name>.json
thresh_data = YourInterface.load_annotations(
    "<file_name>.json"
)

For example, using the SALSA demo data:

from thresh import load_interface

# Load SALSA data using the SALSA typology
Salsa = load_interface("salsa.yml")
salsa_data = Salsa.load_annotations("salsa.json")

print(salsa_data[0])
>> SalsaEntry(
>>   annotator: annotator_1, 
>>   system: new-wiki-1/Human-2-written, 
>>   source: "Further important aspects of Fungi ...", 
>>   target: "An important aspect of Fungi in Art is ...", 
>>   edits: [
>>     DeletionEdit(
>>       input_idx: [[259, 397]], 
>>       annotation: DeletionAnnotation(
>>         deletion_type: GoodDeletion(
>>           val: 3
>>         ), 
>>         coreference: False, 
>>         grammar_error: False
>>       ),
>>     ), 
>>     ...
>>   ]
>> )

To prepare a dataset for annotation, simply export your List[Annotation] object and call export_data:

# Export data to <file_name>.json for annotation
YourInterface.export_data(
    data=thresh_data,
    output_filename="<file_name>.json"
)

For a full tutorial with examples and advanced usage, please see /notebook_tutorials/load_data.ipynb.

Internal Data Classes

Our data loading code is backed by custom internal classes which are created based on your typology. You can access these classes directly:

from thresh import get_entry_class

# Get the custom data class for the SALSA typology
Salsa = load_interface("salsa.yml")
SalsaEntry = Salsa.get_entry_class()

# Create a new entry
custom_entry = SalsaEntry(
    annotator = annotator_1, 
    system = new-wiki-1/GPT-3-zero-shot, 
    target = The film has made more than $552 million at the box office and is currently the eighth most successful movie of 2022., 
    source = The film has grossed over $552 million worldwide, becoming the eighth highest-grossing film of 2022.
)

print(custom_entry.system)
>> new-wiki-1/GPT-3-zero-shot

Data Conversion

Our thresh data format is meant to be universal across fine-grained annotation tasks. To show this, we have created conversion scripts from exisitng fine-grained typologies. Use the thresh library to convert from existing data formats:

pip install thresh

To convert to our standardized data format, our library includes bi-directional conversion from existing fine-grained annotation typologies:

from thresh import convert_dataset

# To convert to the thresh.tools standardized format:
thresh_data = convert_dataset(
    data_path="<path_to_original_data>", 
    output_path="<path_to_output_data>.json", # (Optional) Will save data locally
    dataset_name="<dataset_name>"
)

# To convert back to the original format:
original_data = convert_dataset(
    data_path="<path_to_original_data>.json", 
    output_path="<path_to_output_data>",
    dataset_name="<dataset_name>", 
    reverse=True
)

We support conversion for the following datasets:

frank, scarecrow, mqm, snac, fg-rlhf, propaganda, arxivedits

Demo Data Sources

In the table below you can find all the original data for each interface. For our demo data, we randomly selected 50 annotations from each dataset. We include the file names of the specific datsets we use below, selecting from the test set when applicable:

interface	data	implementation	file name
FRANK	🔗	thresh.tools/frank	`human_annotations.json`
Scarecrow	🔗	thresh.tools/scarecrow	`grouped_data.csv`
MQM	🔗	thresh.tools/mqm	`mqm_newstest2020_ende.tsv`
SALSA	🔗	thresh.tools/salsa	`salsa_test.json`
SNaC	🔗	thresh.tools/snac	`SNaC_data.json`
arXivEdits	🔗	thresh.tools/arxivedits	`test.json`
Wu et al., 2023	🔗	thresh.tools/fg-rlhf	`dev_feedback.json`
Da San Martino et al., 2019	🔗	thresh.tools/propaganda	`test/article<X>.labels.tsv`

We do not create dataloaders for the following interfaces:

interface	reason
MultiPIT	This is an inspection interface, examples are taken from Table 7 of the MultiPIT paper.
CWZCC	The example is taken from App. B of the CWZCC paper. Full dataset is not publically available due to copyright and privacy concerns.
ERRANT	Our example data is taken from the annotations from the W&I+LOCNESS corpus collected by Bryant et al., 2019 from original exerpts from Yannakoudakis et al., 2018 and Granger, 1998. The dataset was released as part of the Building Educational Applications 2019 Shared Task.

Contributing

Set Up `thresh.tools` Locally

Clone this repo:

git clone https://github.com/davidheineman/thresh.git

Set up Vue:

npm install
npm run dev     # To run a dev environment
npm run build   # To build a prod environment in ./build
npm run deploy  # Push to gh-pages

Deployment will create a gh-pages branch. You will need to go into GitHub Pages settings and set the source branch to gh-pages.

Submit a New Typology

You do not need to do this if you want to use your interface (please see Deploy an Interface). This will add your interface to the thresh.tools homepage!

To make your interface available in the thresh.tools builder, please clone this repo and submit a pull request with the following:

Add your typology YML file to public/templates/.
Add your demo data JSON file to public/data/. We encourage authors to submit a sample of 50 examples for their full dataset, but this is not required.

Modify src/main.js to link to your dataset, by adding a line to templates:

const templates = [
    { name: "SALSA", path: "salsa", task: "Simplification", hosted: true },
    { name: "Scarecrow", path: "scarecrow", task: "Open-ended Generation", hosted: true },
    ...
    { name: "<display_name>", path: "<your_interface>", task: "<your_task>", hosted: true }
]

In this case <your_task> will correspond to the task you are grouped with. Note: You can preview your changes by setting up thresh.tools locally!

Submit a pull request with your changes! Then we will merge with the thresh.tools main branch. Please reach out if you have any questions.

Add Language Support

Multi-lingual deployment is core to thresh.tools, and we are actively working to add support for more languages. If you would like to add support for a new language (or revise our existing support), our language templates are located in public/lang/.

To add support for a new language, simply create a new .yml using the structure of an existing language template.
To revise an existing template, simply make changes within the template.

When you are finished, please submit a pull request with your changes.

Set Up the `thresh` Python Library

Clone this repo:

git clone https://github.com/davidheineman/thresh.git
cd data_tools

Make any changes to the library and push to PyPi:

rm -r dist 
python -m build
python -m twine upload --repository pypi dist/*

Cite `thresh.tools`

If you find our library helpful, please consider citing our work:

@article{heineman2023thresh,
  title={Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation},
  author={Heineman, David and Dou, Yao and and Xu, Wei},
  journal={arXiv preprint arXiv:2308.06953},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
data_tools		data_tools
dist		dist
notebook_tutorials		notebook_tutorials
public		public
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CNAME		CNAME
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Getting Started with `thresh.tools`

Overview

Interface Customization Tutorials

Deploy & Manage Annotation Tutorials

Customize an Interface

Adding Edits ↗️

Annotating with Recursive Structure ↗️

Add Instructions ↗️

Paragraph-level Annotation ↗️

Multi-interface Adjudication ↗️

Sub-word Selection ↗️

Multi-lingual Annotation ↗️

Deploy an Interface

Deploy with a Database ↗️

Deploy to Crowdsource Platforms ↗️

Manage Data with the `thresh` Library

Loading Annotations

Internal Data Classes

Data Conversion

Demo Data Sources

Contributing

Set Up `thresh.tools` Locally

Submit a New Typology

Add Language Support

Set Up the `thresh` Python Library

Cite `thresh.tools`

About

Contributors 2

Languages

License

davidheineman/thresh

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Getting Started with thresh.tools

Overview

Interface Customization Tutorials

Deploy & Manage Annotation Tutorials

Customize an Interface

Adding Edits ↗️

Annotating with Recursive Structure ↗️

Add Instructions ↗️

Paragraph-level Annotation ↗️

Multi-interface Adjudication ↗️

Sub-word Selection ↗️

Multi-lingual Annotation ↗️

Deploy an Interface

Deploy with a Database ↗️

Deploy to Crowdsource Platforms ↗️

Manage Data with the thresh Library

Loading Annotations

Internal Data Classes

Data Conversion

Demo Data Sources

Contributing

Set Up thresh.tools Locally

Submit a New Typology

Add Language Support

Set Up the thresh Python Library

Cite thresh.tools

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages

Getting Started with `thresh.tools`

Manage Data with the `thresh` Library

Set Up `thresh.tools` Locally

Set Up the `thresh` Python Library

Cite `thresh.tools`