Switching Generation from LKB to ACE #662

rosypen · 2022-08-12T21:28:40Z

TLDR; These changes officially switch test by generation to use ACE instead of LKB

There are 3 commits included in this change and the titles are fairly self explanatory but there are also comments on the original commits that go into further details about what was changed.

Pending Issues

Parse trees are still not included at this time. They will be added in the next commit
Cosmetically, the sentences don't look 'clickable' so it would be nice to add that
Also want to add a way to specify max sentences generated.
A major flaw that i found while testing with other grammars is the verbs and noun selected for generation is arbitrarily the first verb/noun in the choices files. This may result in no sentences being produced if they are not compatible with each other. It may help to be able to specify the verb/nouns used in test by generation options.
Lastly, test by generation using test sentences is also coming soon. Following adding the parse trees (:

goodmami

Hi, this is great to see, but it's maybe unfortunate that you've already put in so much effort in understanding ACE's stdout protocols given that PyDelphin does this already. May I suggest using PyDelphin for the system calls to ACE?

Compiling (ace.compile()):

>>> from delphin import ace
>>> ace.compile('path/to/config.tdl', 'grm.dat')

Generating (ace.generate(); example using the ERG):

>>> grm = '../erg-2018.dat'
>>> mrs = '[ TOP: h0 INDEX: e2 [ e TENSE: pres ] RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] > HCONS: < h0 qeq h1 > ]'
>>> response = ace.generate(grm, mrs)
>>> [result['surface'] for result in response.results()]
['It rains.']

The PyDelphin documentation has a guide and API docs to explain further.

Also, PyDelphin is already a dependency of the Grammar Matrix, so it should be already installed in your testing environment. Let me know if you have any questions.

I also have some specific comments in the code, in case you don't use PyDelphin, but I only looked at the part where it calls out to ACE and didn't review the rest.

goodmami · 2022-08-13T01:24:58Z

gmcs/generate.py

+    subprocess.run(compile_grammar_cmd.split(),
+                    stdout=output, stderr=ace_error, env=os.environ)


Using str.split() on the compile_grammar_cmd can fail if there are ever spaces in the grammar_dir directory name. I suggest doing one of the following:

Put quotes around your arguments when constructing the command string, then use shlex.split()

Just make a list instead of making then splitting a string:
compile_cmd_args = [ '/usr/local/bin/ace', '-G', f'{grammar_dir}/{iso}.dat', '-g', f'{grammar_dir}/ace/config.tdl', ]

Use PyDelphin's ace.compile()

I think (3) is the best option because it would simplify this code a lot.

Hi Michael!

Thanks so much for your help and advice! I've made a new commit that now uses PyDelphin to call ace for both compiling and generating!

The commit is here - rosypen@be5b865

The good news is it did clean up the code considerably and I got the parse trees, yay!

The bad news is I no longer have nicely formatted mrs because I don't think the tsdb output format does that for me. Is there a library I could use in PyDelphin for formatting the mrs?

Is there a library I could use in PyDelphin for formatting the mrs?

Yes, there are options. See the list of codecs. If you just want SimpleMRS with indentation, that's easy:

>>> from delphin.codecs import simplemrs >>> m = simplemrs.decode('[ TOP: h0 INDEX: e2 [ e TENSE: pres ] RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] > HCONS: < h0 qeq h1 > ]') >>> print(simplemrs.encode(m, indent=True)) [ TOP: h0 INDEX: e2 [ e TENSE: pres ] RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] > HCONS: < h0 qeq h1 > ]

If you want to display DMRS or EDS instead, you'll first need to convert the MRS.

And if you want a graphical web display, you can try delphin-viz.

Thank you again!

DMRS and EDS are pending TODOs that I may get to in the future. For now this has helped me get the simple mrs formatting!

gmcs/generate.py

rosypen added 3 commits August 11, 2022 18:29

switched test by gen to ace

5697d13

Improves preprocessing steps for generation.

f8978ec

Removes Capitalized lexical entries sentence.

3515435

emilymbender merged commit 32aca3d into delph-in:trunk Aug 12, 2022

goodmami reviewed Aug 13, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switching Generation from LKB to ACE #662

Switching Generation from LKB to ACE #662

rosypen commented Aug 12, 2022

goodmami left a comment

goodmami Aug 13, 2022

rosypen Aug 13, 2022

goodmami Aug 13, 2022

rosypen Aug 13, 2022

		subprocess.run(compile_grammar_cmd.split(),
		stdout=output, stderr=ace_error, env=os.environ)

Switching Generation from LKB to ACE #662

Switching Generation from LKB to ACE #662

Conversation

rosypen commented Aug 12, 2022

goodmami left a comment

Choose a reason for hiding this comment

goodmami Aug 13, 2022

Choose a reason for hiding this comment

rosypen Aug 13, 2022

Choose a reason for hiding this comment

goodmami Aug 13, 2022

Choose a reason for hiding this comment

rosypen Aug 13, 2022

Choose a reason for hiding this comment