Skip to content
Codes for <CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling>
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
correction fix some bugs Mar 3, 2019
data 'version_1' Nov 8, 2018
key_gen
paraphrase fix some bugs Mar 3, 2019
utils/dict_emb
README.md

README.md

Constrained Sentence Generation via Metropolis-Hastings Sampling

Introduction

CGMH is a sampling based model for constrained sentence generation, which can be used in keyword-to-sentence generation, paraphrase, sentence correction and many other tasks.

Examples

  • Running example for parahrase: (All rejected proposal is omitted)
    what movie do you like most . ->
    which movie do you like most . (replace what with which) ->
    which movie do you like . (delete most) ->
    which movie do you like best . (insert best) ->
    which movie do you think best . (replace like with think) ->
    which movie do you think the best . (insert the) ->
    which movie do you think is the best . (insert is)

  • Running example for sentence correction: in the word oil price very high right now . ->
    in the word , oil price very high right now . (insert ,) ->
    in the word , oil prices very high right now . (replace price with prices) ->
    in the word , oil prices are very high right now . (insert are)

  • Extra Examples for sentence correction:
    origin: even if we are failed , we have to try to get a new things .->
    generated: even if we are failing , we have to try to get some new things .

    origin: in the word oil price very high right now .->
    generated: in the word , oil prices are very high right now .

    origin: the reason these problem occurs is also becayse of the exam .->
    generated: the reason these problems occur is also because of the exam .

Requirement

  • python

    • ==2.7
  • python packages

  • word embedding

    • If you want to try using word embedding for paraphrase, you should download or train a word embedding first and place it at config.emb_path and set config.emb_path='word_max'.

Language model download

Word embedding download

Running

  • Training language models

    • For each task, first train a backward and a language model:
      set mode='forward' and mode='backward' in config.py successively.
      run python correction.py / paraphrase.py / key-gen.py to train each model.
  • Generation

    • For generating new sample for each tasks:
      set mode='use' and choose proper parameter in config.py.
      give inputs in 'input/input.txt' run python correction.py / paraphrase.py / key-gen.py to generate.
      outputs are in output.
  • Details

    • Make sure that paths for package and data are correctly set in 'config.py'.
You can’t perform that action at this time.