This repo contains code for learning reduplication (or more generally phonological transformations) with encoder-decoder networks. Results from experiments run with this code are reported in Nelson, Dolatian, Rawksi, and Prickett (2020)
src contains all necessary code
data contains sample reduplication data files, created by transducers generated from a typology of natural language reduplication patterns (Dolatian and Heinz, 2019).
You can create your own data files by following the format shown in
data. Each line represents a single mapping and is structured "input(\tab)-->(\tab)output"
hyperparameters contains a sample hyperparameter file,
example.txt. You can create your own hyperparameter file by following the format in the sample file.
input_file- path to the file containing the training data
teacher_force- boolean, whether or not to use teacher forcing, default
attention- boolean, whether or not to use a global attention mechanism, default
attention_type- string indicating what type of attention to use, options are
'weighted', ignored if
recurrent_type- string indicating type of recurrent network to use for encoder and decoder, options are
embedding_size- integer size of phoneme representations, default
hidden_size- integer size of encoder and decoder hidden states, default
num_epochs- integer number of epochs to train the model, default
print_freq- integer frequency with which to print loss during training, default
batch_size- integer batch_size used during training, default
learning_rate- float initial learning rate (used with Adam optimization), default
dropout_prob- float dropout probability during training, default
Running the model
Requirements: Python 3.6+ with NumPy and Pytorch (1.0 or later)
A sample version of the model can be run from the command line with:
python src/main.py hyperparameters/example.txt