Skip to content

Data and code for the experiments in: Thomas Bott, Comparative Evaluation of Unsupervised Hypernymy Measures on the Directionality Task for English and German, B.Sc. Thesis, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 2020; Supervisors: Apl. Prof. Dr. Sabine Schulte im Walde, Dominik Schlechtweg

Notifications You must be signed in to change notification settings

Thommy96/hypernymy_directionality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 

Repository files navigation

Hypernymy Directionality

Data and code for the experiments in: Thomas Bott, Comparative Evaluation of Unsupervised Hypernymy Measures on the Directionality Task for English and German, B.Sc. Thesis, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 2020; Supervisors: Apl. Prof. Dr. Sabine Schulte im Walde, Dominik Schlechtweg

Usage notes

The scripts should be run with python3. Each script contains a usage pattern, which indicates how to use it. Further information about the arguments and options can be obtained with the -h (--help) option. For successfull usage, it is recommended to execute the scripts in the following order:

  1. Create distributional semantic space(s) (scripts/dsm_creation/)
  • dsm_creation.py
  • dsm_combine.py
  1. Calculate row sums (scripts/dsm_creation/)
  • rowSums.py
  1. Read data set(s) (scripts/dataset_processing/)
  • read_dataset.py
  1. Filter data set(s) (scripts/dataset_processing/)
  • filter_dataset.py
  1. Optionally create subsets of data set(s) (scripts/dataset_processing/)
  • freqDiff_freqBias.py
  1. For SLQS, calculate Positive Local Mutual Information scores (scripts/dsm_creation/)
  • plmi.py
  1. Calculate unsupervised hypernymy measures (scripts/measures/)
  • weedsPrec.py
  • invCL.py
  • slqsRow.py
  • slqs.py
  1. Evaluate measures, unsupervised and supervised (scripts/evaluation/)
  • evaluation.py
  • classification.py

Test corpora

For testing purposes, the two test corpora in materials/corpora/ can be used. They are smaller parts of the original SdeWaC and PukWaC copora and allow to test the code with a short runtime.

Corpora

File paths for SdeWaC and PukWaC can be found in materials/corpora/corpora_paths.txt The corpora are located on the server of the IMS (Institut für Maschinelle Sprachverarbeitung), Stuttgart

Data sets

File paths for all data sets used in the thesis can be found in materials/datasets/ All data sets are located on the server of the IMS (Institut für Maschinelle Sprachverarbeitung), Stuttgart

Required packages

  • pickle (to save and load files)
  • docopt (for command line interface)
  • tqdm (for processing bars)
  • sklearn (for classification)
  • graphviz (for drawing trees)
  • numpy
  • tabulate (for creating tables)

About

Data and code for the experiments in: Thomas Bott, Comparative Evaluation of Unsupervised Hypernymy Measures on the Directionality Task for English and German, B.Sc. Thesis, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 2020; Supervisors: Apl. Prof. Dr. Sabine Schulte im Walde, Dominik Schlechtweg

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages