Skip to content

Generating Sentences from Disentangled Syntactic and Semantic Spaces

License

Notifications You must be signed in to change notification settings

baoy-nlp/DSS-VAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of DSS-VAE: Generating Sentences from Disentangled Syntactic and Semantic Spaces in ACL-2019.

Environment requirements

  • PyTorch 0.4 +
  • nltk
  • tensorboardX
  • Numpy
  • PyYAML
  • pickle

Data Preparation

Pre: you may need use a constituency parser ZPar for obtaining the constituency parse tree of a sentence.

There are total THREE steps for preprocessing:

  1. tokenization
python dss_vae/preprocess/my_tokenize.py --raw_file [raw_file_path] --token_file [token_out_path] --for_parse
  1. parsing
Please refer to ZPar, a easy-to-use constituency parser [ZPar](https://sourceforge.net/projects/zpar/files/0.7.5/zpar-0.7.5.tar.gz/download), for obtaining the constituency parse tree of a sentence.
  1. build the dataset
  • Convert to <Sentence, Linearized Tree>
python dss_vae/preprocess/tree_linearization.py --tree_file [tree_file_path] --out_file [tree_out_path] --mode s2b
  • Generate dataset and vocabulary
python dss_vae/structs/generate_dataset.py --train_file [<Sentence,LinearTree> file] --dev_file [<Sentence,LinearTree> file] --test_file [<Sentence,LinearTree> file] --tgt_dir [output_dir] --max_src_vocab 30000 --max_src_len 30 --max_tgt_len 90 --train_size 100000

After Pre-Process, the prepared data directory structure is as follows:

+-- Target Dir
|   +-- train.bin
|   +-- test.bin
|   +-- dev.bin
|   +-- vocab.bin

Training

We can set all the hyper-parametes in the file of config.yaml, and train the model or its variants with the following command:

python main.py --config_files [config.yaml] --mode train_vae --exp_name [exp_name]

Some examples of config.yaml are provided in the directory of CONFIGS.

Citations

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@inproceedings{bao-etal-2019-generating,
    title = "Generating Sentences from Disentangled Syntactic and Semantic Spaces",
    author = "Bao, Yu  and
      Zhou, Hao  and
      Huang, Shujian  and
      Li, Lei  and
      Mou, Lili  and
      Vechtomova, Olga  and
      Dai, Xin-yu  and
      Chen, Jiajun",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1602",
    doi = "10.18653/v1/P19-1602",
    pages = "6008--6019",
}

About

Generating Sentences from Disentangled Syntactic and Semantic Spaces

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published