Skip to content

GaliciaLLM

John Carroll edited this page Jul 2, 2023 · 1 revision

Large language models and DELPH-IN technologies

Presenter: Alexandre
Scribe: John

Alexandre: gives an introductory presentation.

  • Recommends Stanford Webinar - GPT-3 & Beyond by Chris Potts
  • Some questions: can LLMs be used to extract intermediary representations (e.g. syntactic analyses, semantic representations, word senses)? Can they be used to improve Delph-in tools, e.g. to rank parses, label text with supertags, etc?

Guy: Getting LLMs to parse - for example to UD - does not seem practical, since the LLM training data would contain few if any UD representations. A more suitable task would be pruning a parser's search space via supertagging.

Emily: LLMs contain a lot of knowledge about word co-occurence, which can be used to create word embeddings e.g. for parse selection. Even better would be embeddings computed from DRMS, but there is only a comparatively tiny amount of sembanked data.

Alexandre: ...information from WordNet for parse selection [didn't catch this]

Guy: The BLOOM transformer-based LLM could be fine-tuned on MRS bank for a more customised LM.

Hei: There are neural net methods for text generation.

Emily: co-authored a NAACL 2019 paper with Jan Buys and others 'Neural Text Generation from Rich Semantic Representations', mapping DMRS to text.

Eric: A couple of ways to generate complex sentences: (1) generate a sentence which might be ungrammatical, make it better with LLM, then parse with ERG to tame any hallucinations; (2) use common-sense knowledge from LLM in a low-stakes decision scenario to fill in gaps in a KB, e.g. "are there normally people in a building?".

Alexandre: We should be wary of relying on LLMs to provide common-sense knowledge because they are only trained on text, and have no world knowledge or logical inference capability.

Guy: Agreed for textual entailment - because it is not really low-stakes. However, there is less risk in Eric's game scenario; moreover, in a game a degree of variability in LLM output might be beneficial.

Alexandre: Would it be possible to avoid the LLM hallucination issue by combining them with KBs? Parse ranking is a suitable application: a grammar gives possibilities and then an LLM is used to rank mostly valid analyses.

Hei: Weiwei Sun has investigated neural parsing with DMRS output via embedding of predicates, and has achieved good accuracy. Another approach: generation from EDS then a final pass with an LM.

Guy: Linguistic and engineering perspectives often lead to distinct uses for MRS. In the Delph-in community we accomodate both perspectives, with common tools and data.

Emily: Practical approach constrains linguistic thinking, and a linguistic perspective gives practical benefits. Most of our use-cases are where precision is essential. We do 'precision' not 'recall' grammars, assuming it's better to give no answer than a wrong one.

Carlos: Explainability can be important, for example in in sentiment analysis. Having a grammar-based analysis makes this possible.

Eric: Precision is important in his work on NL interfaces to software. He needs both explainability and debuggability.

Alexandre: Adaptability is also important. AMR gives structure but with less deep information. If a predicate is missing, then in a grammar-based approach one knows how to make the changes to add it.

Emily: In the Delph-in framework, if something isn't working then it's much easier to fix than in other frameworks. Her paper at IWCS 2014 co-authored with Dan and others argues that one gets a more consistent sembank by creating it with a grammar than by attempting to follow annotation guidelines.

Alexandre: How can we attract people to take advantage of our tools? Via Hugging Face and similar? Or an online list of interesting projects?

Carlos: High entry barrier to Delph-in technologies. UD parsing is much easier to get into - just read the published standards and install an off-the shelf parser. Would it be possible to lower the barrier?

Guy: Stephan Oepen's semantic dependencies shared tasks gave a way into Delph-in.

Hei: found his way in through this.

Eric: There are two populations of potential new users: newcomers completely from the outside (need to entice them in), vs. someone who is already studying in a Delph-in research group (so already has a connection).

Alexandre: teaches an Introduction to NLP course to mathematics students, which can lead on to interesting student projects. We want these projects also to be useful for the Delph-in community. They provide a way of getting proof of concept experiments done.

Olga: For the newcomer population, the online demo is good for impressing people. They often don't quite understand what's going on, but it looks consistent and principled. On the other hand the interface looks a bit old and less shiny than ChatGPT.

Eric: was impressed with the online demo.

Emily: The demo would be more friendly with category labels in trees rather than rule names. The default should be tree + DMRS. It needs a more recent release of the ERG - 2020?

Eric: A Google documentation grant would provide good on-ramps for two kinds of new users: system developers and people who want to build grammars. A spec for documentation exists; need to secure funding (around $15K) and then engage a writer.

Alexandre: would like to add PorGram to the online demo. Need to prominently link from the wiki site to the demo http://delph-in.github.io/delphin-viz/

Olga: How possible is it to fine-tune LLMs to help grammarians? Could we use a code completion tool such as Copilot to auto-complete TDL by fine-tuning on grammar files?

Emily: Grammar developers often spend less time writing tdl than python. Could we use LLMs to speed up treebanking by getting them to rank discriminants?

Luis: Yes, we could use LLMs to get constituent boundaries, e.g. by "Give me a constituency parse for this sentence: ..."

Guy: There is also recent research into probing embeddings for constituent structure.

Luis: Could use RoBERTa, and ask it whether a particular word span is a constituent or not. Ideally, we'd get semantic discriminants from LLMs, but syntactic seems feasible now. Another idea: robust parsing in Delph-in could use a neural parser instead of the PCFG used to date.

Alexandre: Regarding semantic discriminants, for the sentence "most house cats are eager for dogs to chase", ChatGPT gives a poor explanation of the meaning, containing conflicting paraphrases. Copilot suggests "to please" as a completion after "most cats are easy".

Olga: Another possible task is generating regression test sentences.

Alexandre: Many math students are easy to train to program, but their code needs to be cleaned up afterwards. Some are particularly interested in algorithms. A possible project would be to scale up maximum entropy training. Nowadays most people want to download a dataset, apply a ML toolkit and graph results. Delph-in technologies are much more complex and multi-layered.

Olga: Some stuff that's fundamental to what we're doing is no longer in NLP textbooks, e.g. unification grammar.

Alexandre: Also, some of the more recent Delph-in technology is not even published.

Emily: There might be a new edition of the HPSG Handbook. The book contains grammar fragments. It might be possible to add a chapter on implementing HPSG processors.

Alexandre: The Delph-in wiki contains a lot of information, but it's difficult to navigate. An interesting related initiative is the Open Logic Project and Free Logic Textbooks.

Francis: Is there someone who can take over maintenance of http://delph-in.github.io/delphin-viz/ ?

Luis: To maintain demos it's a problem getting access to servers. It would be great to get access to VU Amsterdam or University of Washington servers.

Alexandre: If it's possible to put a demo in a Docker container then it could easily be run in the IBM cloud.

Clone this wiki locally