Skip to content
No description, website, or topics provided.
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
experiments Public release May 15, 2018
morphology Public release May 15, 2018
scripts Public release May 15, 2018
.gitignore Initial commit with gitignore Apr 10, 2018
Readme.md Updating readme Jul 10, 2018
config.yaml Public release May 15, 2018
setup.sh Public release May 15, 2018

Readme.md

Derivational Morphology

This is the code repository for the ACL 2018 paper A Distributional and Orthographic Aggregation Model for English Derivational Morphology. If you use this code for your work, please cite

@InProceedings{P18-1180,
  author = 	"Deutsch, Daniel
		and Hewitt, John
		and Roth, Dan",
  title = 	"A Distributional and Orthographic Aggregation Model for English Derivational Morphology",
  booktitle = 	"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1938--1947",
  location = 	"Melbourne, Australia",
  url = 	"http://aclweb.org/anthology/P18-1180"
}

Setup

To clone the repository with the dataset, shard the training data, and download the unigram-counts.txt data file, run

sh setup.sh

The data/unigram-counts.txt file is stored here, but can be optionally reproduced with the following command

sh scripts/ngram_counts.sh download data/unigram-counts.txt

The distributional model uses pretrained word vectors from https://code.google.com/archive/p/word2vec/. Please download the word vectors from here to data/GoogleNews-vectors-negative300.bin.gz.

Reproducing experiments

Our experiments are built on the Sun Grid Engine and use the qsub command to run several random seed restarts simultaneously.

To reproduce the accuracy and edit-distance results

sh scripts/run-experiment.sh unconstrained experiments/default.yaml
sh scripts/run-experiment.sh constrained experiments/constrained.yaml

python scripts/summarize_results.py unconstrained unconstrained-metrics.txt
python scripts/summarize_results.py constrained constrained-metrics.txt
python scripts/make_acc_tables.py unconstrained-metrics.txt constrained-metrics.txt

To reproduce the search results table

sh scripts/run-all-search-experiments.sh search experiments/search.yaml

python scripts/summarize_search_results.py search
python scripts/make_search_table.py search/summary.txt

To run the training for an individual model instead of the entire set of 30 random restarts, run one of the following scripts

scripts/train-dist.sh
scripts/train-seq2seq.sh
scripts/train-seq2seq-reranker.sh
scripts/train-dist-seq2seq.sh
scripts/train-dist-seq2seq-reranker.sh

Each script takes 3 arguments: the yaml config file (e.g. see experiments/default.yaml or experiments/constrained.yaml), an output directory, and a random seed.

You can’t perform that action at this time.