Skip to content

carsondahlberg/context-eval

 
 

Repository files navigation

This repository contains tools for evaluating the performance of unsupervised Word Sense Disambiguation (WSD).

The scripts are based on the SemEval 2013 Task 13 dataset and the TWSI 2.0 dataset.

Setup

Prerequisites

  • Python 2.7 (Python 3 is not supported)
  • Java 1.7 or higher
  • Either Linux or Mac OS (for Windows users Cygwin might be an option, though it has not been tested)

Installation

  1. Clone this repository.
  2. Install dependencies: pip install -r requirements.txt.

TLDR

TWSI: provide your sense inventory, fill the predict_sense_ids of the dataset and run :

python twsi_eval.py data/Inventory-TWSI-2.csv data/Dataset-TWSI-2-GOLD.csv.gz

SemEval 2013: fill the predict_sense_ids of the dataset and run:

./semeval_eval.sh semeval_2013_13/keys/gold/all.key data/Dataset-SemEval-2013-13-adagram-ukwac-wacky-raw.csv

Run Evaluation

Two evaluation tools exist:

  • one for the SemEval 2013 Task 13 dataset
  • and the other for the TWSI 2.0 dataset.

Both tools handle sense alignment of the sense inventories between the provided datasets and your system senses differently. Please see the section of each tool for more information.

The schema of the datasets for both evaluation tools is however the same: a tab seperated CSV file, with no quoting and no escape character and the following headers:

context_id	target	target_pos	target_position	gold_sense_ids	predict_sense_ids	golden_related	predict_related	context

General invocation procedure

The general procedure for both tools is to fill in the respective dataset the column predict_sense_ids "by hand" with the predicted sense identifiers of your system. And than provide this modified dataset to each tool. Multiple ids can be inserted, the order has no relevance.

Also note that if your system cannot confidently classify some contexts, you can, by setting the predict_sense_ids to -1, implement a "reject option".

SemEval 2013 Evaluation Tool

This tool is based on the SemEval 2013 Task 13 dataset, which is located at data/Dataset-SemEval-2013-13.csv.

Sense alignment is done by the usage of cluster grouping between your sense IDs and the SemEval dataset senses. Hence the tool does not need any additional information about your senses.

For a more detailed introduction to the SemEval 2013 Task 13 evaluation and dataset refer to: http://www.aclweb.org/website/old_anthology/S/S13/S13-2.pdf#page=326

See the original cluster comparision tools for SemEval 2013, which we simply use in this repository.

Invocation procedure

  1. Fill the dataset with your sense IDs (see General invocation procedure ) Note: you must delete the headers!
  2. Provide the dataset to the script semeval_eval.sh and also the SemEval sense keys (see example below)
  3. Results are printed to stdout

Example invocation

Note: The files used here actually exist in the repository and you can use them for reference.

./semeval_eval.sh semeval_2013_13/keys/gold/all.key data/Dataset-SemEval-2013-13-adagram-ukwac-wacky-raw.csv

Here the first argument semeval_2013_13/keys/gold/all.key however is a list of all gold senses and will always need to be exactly this file. The second argument data/Dataset-SemEval-2013-13-adagram-ukwac-wacky-raw.csv is the dataset filled with sense IDs and you should change it to your modified dataset.

TWSI 2.0 Evaluation Tool

This tool is based on the TWSI 2.0 dataset, which is located at data/Dataset-SemEval-2013-13.csv.

For sense alignment it requires you to provide a word sense inventory with related terms. The alignment is calculated as the maximum overlap between related terms from your word sense inventory to sense substitutions provided by TWSI.

For an in depth introduction to the TWSI word senses and their contextualized word substitutions refer to https://www.lt.informatik.tu-darmstadt.de/de/data/twsi-turk-bootstrap-word-sense-inventory/

Invocation procedure

  1. Fill the dataset with your sense IDs (see General invocation procedure) and gzip it.
  2. Create a file containing the whole sense inventory your system uses of the following format: a tab separated CSV file, with no quoting and no escape character and the following columns: word, sense_id, related_words. The related_words column is a comma separated word list, where each word can have a colon separated weight, i.e. list:5,of:3,related:1,words:1. And the sense_id must be numeric. Note: the file must be given without the header!
  3. Provide both files from the previous two steps to the python script twsi_eval.py. Note: in case your dataset file has no header, you can provide the --no-header flag!
  4. Results of the evaluations are printed to stdout. Most essential metrics are also printed to stderr.

Example invocation

Note: The files used here actually exist in the repository and you can use them for reference.

python twsi_eval.py data/Inventory-TWSI-2.csv data/Dataset-TWSI-2-GOLD.csv.gz

Here the first argument data/Inventory-TWSI-2.csv should contain the sense inventory of your system as described in the 2. point above. You need to change it. The second argument data/Dataset-TWSI-2-GOLD.csv.gz is the dataset filled with sense IDs and gzipped and of course you also need to change it to your modified dataset.

License

TWSI 2.0 is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported license (https://creativecommons.org/licenses/by-sa/3.0/). The combined TWSI 2.0 sentences are shared under the same license.

The code of the evaluation script is under the Apache Software License (ASL) 2.0 (http://www.apache.org/licenses/LICENSE-2.0).

References

  • [Biemann and Nygaard, 2010] C. Biemann and V. Nygaard (2010): Crowdsourcing WordNet. In Proceedings of the 5th Global WordNet conference, Mumbai, India.
  • [Biemann, 2012] C. Biemann (2012): Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution. Proceedings of LREC 2012, Istanbul, Turkey.

About

Tools for Evaluation of Unsupervised Word Sense Disambiguation Systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.0%
  • Java 3.2%
  • Other 0.8%