HyperGrex

A hypergraph-based syntactic translation grammar extractor for use with cdec and similar translation systems.

Supported functionality

extract various kinds (tree-to-string, tree-to-tree, string-to-tree) of tree transduction rules from aligned parallel corpora with parses on one or both sides
tranduction rules can be minimal or composed, with limits on the size and complexity of the rules
extract rules from parse forests or (k-best) lists of parses
score extracted rules with a variety of standard features

Example tree-to-string extraction

python hg_rule_extractor.py test_data/test.fr test_data/test.en test_data/test.al --t2s -m -s 1000 > rules.t2s

The options are:

test_data/test.fr is the source side of the bitext, parsed, one tree per line
test_data/test.en is the target side of the bitext, one sentence per line (not parsed)
--t2s indicates that xRs rules should be extracted
-m indicates that minimal (non-composed) rules should be extracted
-s 1000 indicates that rules my have up to 1000 symbols in them (effectively, this disables any size-based filtering)

The above command writes the following rules to the file rules.t2s:

(PP [P] (NP (DT l') [NN])) ||| [1] [2] ||| 0-0 2-1 ||| count=1.0 sent_count=1
(VP (VB a) [VBN] [PP]) ||| [1] [2] ||| 1-0 2-1 ||| count=1.0 sent_count=1
(P à) ||| to ||| 0-0 ||| count=1.0 sent_count=1
(S [NP] [VP] [PUNC]) ||| [1] [2] [3] ||| 0-0 1-1 2-2 ||| count=1.0 sent_count=1
(PUNC .) ||| . ||| 0-0 ||| count=1.0 sent_count=1
(VBN marché) ||| walked ||| 0-0 ||| count=1.0 sent_count=1
(JJ petit) ||| young ||| 0-0 ||| count=1.0 sent_count=1
(DT le) ||| the ||| 0-0 ||| count=1.0 sent_count=1
(NP [DT] [JJ] [NN]) ||| [1] [2] [3] ||| 0-0 1-1 2-2 ||| count=1.0 sent_count=1
(NN école) ||| school ||| 0-0 ||| count=1.0 sent_count=1
(NN garçon) ||| boy ||| 0-0 ||| count=1.0 sent_count=1

Adding features to rules

./t2s_score/score.sh rules.t2s test_data/sgt-params.txt test_data/tgs-params.txt

The second and third files are lexical translation probabilities.

For further information

For information on tree-to-string (xRs) translation rules, see
For more information on the supported tree-to-tree formalism, see
- this paper

This software is a rewrite of the Grex grammar extractor

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
mtm2014		mtm2014
t2s_score		t2s_score
test_data		test_data
.gitignore		.gitignore
README.md		README.md
helpers.py		helpers.py
hg_rule_extractor.py		hg_rule_extractor.py
hypergraph.py		hypergraph.py
rule_formatters.py		rule_formatters.py
tree.py		tree.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperGrex

Supported functionality

Example tree-to-string extraction

Adding features to rules

For further information

About

Releases

Packages

Languages

cmu-mtlab/hypergrex

Folders and files

Latest commit

History

Repository files navigation

HyperGrex

Supported functionality

Example tree-to-string extraction

Adding features to rules

For further information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages