You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When passing tokenised data containing English contractions, the parser crashes. Passing non-tokenised data seems wrong as the parser does not perform tokenisation internally (all punctuation gets attached to words, contractions are attached to the verb).
E.g.:
from lambeq import BobcatParser
bobcat_parser = BobcatParser()
diagram = bobcat_parser.sentence2diagram("Baby didn 't like it")
diagram.draw()
results in:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
File .../site-packages/lambeq/text2diagram/bobcat_parser.py:382, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
381 result = self.parser(sentence_input)
--> 382 trees.append(self._build_ccgtree(result[0]))
383 except Exception:
File .../site-packages/lambeq/bobcat/parser.py:258, in ParseResult.__getitem__(self, index)
256 def __getitem__(self, index: Union[int, slice]) -> Union[ParseTree,
257 list[ParseTree]]:
--> 258 return self.root[index]
IndexError: list index out of range
During handling of the above exception, another exception occurred:
BobcatParseError Traceback (most recent call last)
Cell In[2], line 1
----> 1 diagram = bobcat_parser.sentence2diagram("Baby didn 't like it")
2 diagram.draw()
File .../site-packages/lambeq/text2diagram/ccg_parser.py:231, in CCGParser.sentence2diagram(self, sentence, tokenised, planar, suppress_exceptions)
228 if not isinstance(sentence, str):
229 raise ValueError('`tokenised` set to `False`, but variable '
230 '`sentence` does not have type `str`.')
--> 231 return self.sentences2diagrams(
232 [sentence],
233 planar=planar,
234 suppress_exceptions=suppress_exceptions,
235 tokenised=tokenised,
236 verbose=VerbosityLevel.SUPPRESS.value)[0]
File .../site-packages/lambeq/text2diagram/ccg_parser.py:161, in CCGParser.sentences2diagrams(self, sentences, tokenised, planar, suppress_exceptions, verbose)
125 def sentences2diagrams(
126 self,
127 sentences: SentenceBatchType,
(...)
130 suppress_exceptions: bool = False,
131 verbose: Optional[str] = None) -> list[Optional[Diagram]]:
132 """Parse multiple sentences into a list of discopy diagrams.
133
134 Parameters
(...)
159
160 """
--> 161 trees = self.sentences2trees(sentences,
162 suppress_exceptions=suppress_exceptions,
163 tokenised=tokenised,
164 verbose=verbose)
165 diagrams = []
166 if verbose is None:
File .../site-packages/lambeq/text2diagram/bobcat_parser.py:387, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
385 trees.append(None)
386 else:
--> 387 raise BobcatParseError(' '.join(sent.words))
389 for i in empty_indices:
390 trees.insert(i, None)
BobcatParseError: Bobcat failed to parse "Baby didn 't like it".
The text was updated successfully, but these errors were encountered:
Hi, you can use lambeq's SpasyTokeniser class to tokenise your sentences before feeding them to the parser. From the command line interface, you can just use the -t option. If you want to provide the sentence already tokenised, be sure to separate the words correctly, i.e. "did" and "n't", as below, otherwise the model will not recognise "didn" as a proper word.
Hi!
When passing tokenised data containing English contractions, the parser crashes. Passing non-tokenised data seems wrong as the parser does not perform tokenisation internally (all punctuation gets attached to words, contractions are attached to the verb).
E.g.:
results in:
The text was updated successfully, but these errors were encountered: