Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching Generation from LKB to ACE #662

Merged
merged 3 commits into from
Aug 12, 2022

Conversation

rosypen
Copy link
Contributor

@rosypen rosypen commented Aug 12, 2022

TLDR; These changes officially switch test by generation to use ACE instead of LKB

Screen Shot 2022-08-12 at 2 18 53 PM

There are 3 commits included in this change and the titles are fairly self explanatory but there are also comments on the original commits that go into further details about what was changed.

Pending Issues

  • Parse trees are still not included at this time. They will be added in the next commit
  • Cosmetically, the sentences don't look 'clickable' so it would be nice to add that
  • Also want to add a way to specify max sentences generated.
  • A major flaw that i found while testing with other grammars is the verbs and noun selected for generation is arbitrarily the first verb/noun in the choices files. This may result in no sentences being produced if they are not compatible with each other. It may help to be able to specify the verb/nouns used in test by generation options.
  • Lastly, test by generation using test sentences is also coming soon. Following adding the parse trees (:

@emilymbender emilymbender merged commit 32aca3d into delph-in:trunk Aug 12, 2022
Copy link
Member

@goodmami goodmami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is great to see, but it's maybe unfortunate that you've already put in so much effort in understanding ACE's stdout protocols given that PyDelphin does this already. May I suggest using PyDelphin for the system calls to ACE?

Compiling (ace.compile()):

>>> from delphin import ace
>>> ace.compile('path/to/config.tdl', 'grm.dat')

Generating (ace.generate(); example using the ERG):

>>> grm = '../erg-2018.dat'
>>> mrs = '[ TOP: h0 INDEX: e2 [ e TENSE: pres ] RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] > HCONS: < h0 qeq h1 > ]'
>>> response = ace.generate(grm, mrs)
>>> [result['surface'] for result in response.results()]
['It rains.']

The PyDelphin documentation has a guide and API docs to explain further.

Also, PyDelphin is already a dependency of the Grammar Matrix, so it should be already installed in your testing environment. Let me know if you have any questions.

I also have some specific comments in the code, in case you don't use PyDelphin, but I only looked at the part where it calls out to ACE and didn't review the rest.

Comment on lines +174 to +175
subprocess.run(compile_grammar_cmd.split(),
stdout=output, stderr=ace_error, env=os.environ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using str.split() on the compile_grammar_cmd can fail if there are ever spaces in the grammar_dir directory name. I suggest doing one of the following:

  1. Put quotes around your arguments when constructing the command string, then use shlex.split()
  2. Just make a list instead of making then splitting a string:
    compile_cmd_args = [
        '/usr/local/bin/ace',
        '-G', f'{grammar_dir}/{iso}.dat',
        '-g', f'{grammar_dir}/ace/config.tdl',
    ]
  3. Use PyDelphin's ace.compile()

I think (3) is the best option because it would simplify this code a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Michael!

Thanks so much for your help and advice! I've made a new commit that now uses PyDelphin to call ace for both compiling and generating!

The commit is here - rosypen@be5b865

The good news is it did clean up the code considerably and I got the parse trees, yay!

The bad news is I no longer have nicely formatted mrs because I don't think the tsdb output format does that for me. Is there a library I could use in PyDelphin for formatting the mrs?

Screen Shot 2022-08-13 at 1 55 20 AM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a library I could use in PyDelphin for formatting the mrs?

Yes, there are options. See the list of codecs. If you just want SimpleMRS with indentation, that's easy:

>>> from delphin.codecs import simplemrs
>>> m = simplemrs.decode('[ TOP: h0 INDEX: e2 [ e TENSE: pres ] RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] > HCONS: < h0 qeq h1 > ]')
>>> print(simplemrs.encode(m, indent=True))
[ TOP: h0
  INDEX: e2 [ e TENSE: pres ]
  RELS: < [ _rain_v_1 LBL: h1 ARG0: e2 ] >
  HCONS: < h0 qeq h1 > ]

If you want to display DMRS or EDS instead, you'll first need to convert the MRS.

And if you want a graphical web display, you can try delphin-viz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again!

DMRS and EDS are pending TODOs that I may get to in the future. For now this has helped me get the simple mrs formatting!

Screen Shot 2022-08-13 at 3 06 36 PM

gmcs/generate.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants