-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switching Generation from LKB to ACE #662
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is great to see, but it's maybe unfortunate that you've already put in so much effort in understanding ACE's stdout protocols given that PyDelphin does this already. May I suggest using PyDelphin for the system calls to ACE?
Compiling (ace.compile()):
>>> from delphin import ace
>>> ace.compile('path/to/config.tdl', 'grm.dat')
Generating (ace.generate(); example using the ERG):
>>> grm = '../erg-2018.dat'
>>> mrs = '[ TOP: h0 INDEX: e2 [ e TENSE: pres ] RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] > HCONS: < h0 qeq h1 > ]'
>>> response = ace.generate(grm, mrs)
>>> [result['surface'] for result in response.results()]
['It rains.']
The PyDelphin documentation has a guide and API docs to explain further.
Also, PyDelphin is already a dependency of the Grammar Matrix, so it should be already installed in your testing environment. Let me know if you have any questions.
I also have some specific comments in the code, in case you don't use PyDelphin, but I only looked at the part where it calls out to ACE and didn't review the rest.
subprocess.run(compile_grammar_cmd.split(), | ||
stdout=output, stderr=ace_error, env=os.environ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using str.split()
on the compile_grammar_cmd
can fail if there are ever spaces in the grammar_dir
directory name. I suggest doing one of the following:
- Put quotes around your arguments when constructing the command string, then use shlex.split()
- Just make a list instead of making then splitting a string:
compile_cmd_args = [ '/usr/local/bin/ace', '-G', f'{grammar_dir}/{iso}.dat', '-g', f'{grammar_dir}/ace/config.tdl', ]
- Use PyDelphin's
ace.compile()
I think (3) is the best option because it would simplify this code a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Michael!
Thanks so much for your help and advice! I've made a new commit that now uses PyDelphin to call ace for both compiling and generating!
The commit is here - rosypen@be5b865
The good news is it did clean up the code considerably and I got the parse trees, yay!
The bad news is I no longer have nicely formatted mrs because I don't think the tsdb output format does that for me. Is there a library I could use in PyDelphin for formatting the mrs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a library I could use in PyDelphin for formatting the mrs?
Yes, there are options. See the list of codecs. If you just want SimpleMRS with indentation, that's easy:
>>> from delphin.codecs import simplemrs
>>> m = simplemrs.decode('[ TOP: h0 INDEX: e2 [ e TENSE: pres ] RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] > HCONS: < h0 qeq h1 > ]')
>>> print(simplemrs.encode(m, indent=True))
[ TOP: h0
INDEX: e2 [ e TENSE: pres ]
RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] >
HCONS: < h0 qeq h1 > ]
If you want to display DMRS or EDS instead, you'll first need to convert the MRS.
And if you want a graphical web display, you can try delphin-viz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR; These changes officially switch test by generation to use ACE instead of LKB
There are 3 commits included in this change and the titles are fairly self explanatory but there are also comments on the original commits that go into further details about what was changed.
Pending Issues