Bobcat parser crashes on English contractions #60

pmarcis · 2023-02-09T15:04:01Z

Hi!

When passing tokenised data containing English contractions, the parser crashes. Passing non-tokenised data seems wrong as the parser does not perform tokenisation internally (all punctuation gets attached to words, contractions are attached to the verb).

E.g.:

from lambeq import BobcatParser
bobcat_parser = BobcatParser()
diagram = bobcat_parser.sentence2diagram("Baby didn 't like it")
diagram.draw()

results in:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File .../site-packages/lambeq/text2diagram/bobcat_parser.py:382, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    381     result = self.parser(sentence_input)
--> 382     trees.append(self._build_ccgtree(result[0]))
    383 except Exception:
File .../site-packages/lambeq/bobcat/parser.py:258, in ParseResult.__getitem__(self, index)
    256 def __getitem__(self, index: Union[int, slice]) -> Union[ParseTree,
    257                                                          list[ParseTree]]:
--> 258     return self.root[index]

IndexError: list index out of range

During handling of the above exception, another exception occurred:

BobcatParseError                          Traceback (most recent call last)
Cell In[2], line 1
----> 1 diagram = bobcat_parser.sentence2diagram("Baby didn 't like it")
      2 diagram.draw()

File .../site-packages/lambeq/text2diagram/ccg_parser.py:231, in CCGParser.sentence2diagram(self, sentence, tokenised, planar, suppress_exceptions)
    228 if not isinstance(sentence, str):
    229     raise ValueError('`tokenised` set to `False`, but variable '
    230                      '`sentence` does not have type `str`.')
--> 231 return self.sentences2diagrams(
    232                 [sentence],
    233                 planar=planar,
    234                 suppress_exceptions=suppress_exceptions,
    235                 tokenised=tokenised,
    236                 verbose=VerbosityLevel.SUPPRESS.value)[0]

File .../site-packages/lambeq/text2diagram/ccg_parser.py:161, in CCGParser.sentences2diagrams(self, sentences, tokenised, planar, suppress_exceptions, verbose)
    125 def sentences2diagrams(
    126         self,
    127         sentences: SentenceBatchType,
   (...)
    130         suppress_exceptions: bool = False,
    131         verbose: Optional[str] = None) -> list[Optional[Diagram]]:
    132     """Parse multiple sentences into a list of discopy diagrams.
    133 
    134     Parameters
   (...)
    159 
    160     """
--> 161     trees = self.sentences2trees(sentences,
    162                                  suppress_exceptions=suppress_exceptions,
    163                                  tokenised=tokenised,
    164                                  verbose=verbose)
    165     diagrams = []
    166     if verbose is None:

File .../site-packages/lambeq/text2diagram/bobcat_parser.py:387, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    385                 trees.append(None)
    386             else:
--> 387                 raise BobcatParseError(' '.join(sent.words))
    389 for i in empty_indices:
    390     trees.insert(i, None)

BobcatParseError: Bobcat failed to parse "Baby didn 't like it".

The text was updated successfully, but these errors were encountered:

dimkart · 2023-02-09T15:25:59Z

Hi, you can use lambeq's SpasyTokeniser class to tokenise your sentences before feeding them to the parser. From the command line interface, you can just use the -t option. If you want to provide the sentence already tokenised, be sure to separate the words correctly, i.e. "did" and "n't", as below, otherwise the model will not recognise "didn" as a proper word.

Hope that helps.

pmarcis · 2023-02-10T13:09:13Z

Thanks! That solves this problem!

pmarcis closed this as completed Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bobcat parser crashes on English contractions #60

Bobcat parser crashes on English contractions #60

pmarcis commented Feb 9, 2023

dimkart commented Feb 9, 2023

pmarcis commented Feb 10, 2023

Bobcat parser crashes on English contractions #60

Bobcat parser crashes on English contractions #60

Comments

pmarcis commented Feb 9, 2023

dimkart commented Feb 9, 2023

pmarcis commented Feb 10, 2023