Bobcat fails with extra space tokens #120

toumix · 2023-09-20T08:39:45Z

This happens only when tokenising, not with raw strings:

sentence = "Alice  sleeps"

from lambeq import SpacyTokeniser, BobcatParser
tokeniser, parser = SpacyTokeniser(), BobcatParser()
tokens = tokeniser.tokenise_sentence(sentence)
tree = parser.sentence2tree(tokens, tokenised=True)

I get the following error:

ValueError                                Traceback (most recent call last)
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:291, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    290 try:
--> 291     sentence_input = self._prepare_sentence(sent, tags)
    292     result = self.parser(sentence_input)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:214, in BobcatParser._prepare_sentence(sent, tags)
    212 spans = {(start, end): {id: score for id, score in scores}
    213          for start, end, scores in sent.spans}
--> 214 return Sentence(sent.words, sent_tags, spans)

File <string>:6, in __init__(self, words, input_supertags, span_scores)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/bobcat/parser.py:62, in Sentence.__post_init__(self)
     61 if len(self.words) != len(self.input_supertags):
---> 62     raise ValueError(
     63             '`words` must be the same length as `input_supertags`')

ValueError: `words` must be the same length as `input_supertags`

The above exception was the direct cause of the following exception:

BobcatParseError                          Traceback (most recent call last)
Cell In[17], line 1
----> 1 tree = parser.sentence2tree(tokens, tokenised=True)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/ccg_parser.py:108, in CCGParser.sentence2tree(self, sentence, tokenised, suppress_exceptions)
    104         raise ValueError('`tokenised` set to `True`, but variable '
    105                          '`sentence` does not have type '
    106                          '`list[str]`.')
    107     sent: list[str] = [str(token) for token in sentence]
--> 108     return self.sentences2trees(
    109                     [sent],
    110                     suppress_exceptions=suppress_exceptions,
    111                     tokenised=tokenised,
    112                     verbose=VerbosityLevel.SUPPRESS.value)[0]
    113 else:
    114     if not isinstance(sentence, str):

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:298, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    296                 trees.append(None)
    297             else:
--> 298                 raise BobcatParseError(' '.join(sent.words)) from e
    300 for i in empty_indices:
    301     trees.insert(i, None)

BobcatParseError: Bobcat failed to parse 'Alice   sleeps'.

The text was updated successfully, but these errors were encountered:

toumix · 2023-09-20T08:56:20Z

Also fails with just a space:

parser.sentence2tree(tokeniser.tokenise_sentence(' '), tokenised=True)

I get some weird error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x0 and 2048x968)

dimkart · 2023-09-20T11:41:14Z

Thanks for spotting this, will be fixed in the next release.

dimkart · 2024-01-16T13:37:00Z

This is now fixed in version 0.4. The issue will be closed.

dimkart closed this as completed Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bobcat fails with extra space tokens #120

Bobcat fails with extra space tokens #120

toumix commented Sep 20, 2023

toumix commented Sep 20, 2023

dimkart commented Sep 20, 2023

dimkart commented Jan 16, 2024

Bobcat fails with extra space tokens #120

Bobcat fails with extra space tokens #120

Comments

toumix commented Sep 20, 2023

toumix commented Sep 20, 2023

dimkart commented Sep 20, 2023

dimkart commented Jan 16, 2024