Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bobcat fails with extra space tokens #120

Closed
toumix opened this issue Sep 20, 2023 · 3 comments
Closed

Bobcat fails with extra space tokens #120

toumix opened this issue Sep 20, 2023 · 3 comments

Comments

@toumix
Copy link
Contributor

toumix commented Sep 20, 2023

This happens only when tokenising, not with raw strings:

sentence = "Alice  sleeps"

from lambeq import SpacyTokeniser, BobcatParser
tokeniser, parser = SpacyTokeniser(), BobcatParser()
tokens = tokeniser.tokenise_sentence(sentence)
tree = parser.sentence2tree(tokens, tokenised=True)

I get the following error:

ValueError                                Traceback (most recent call last)
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:291, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    290 try:
--> 291     sentence_input = self._prepare_sentence(sent, tags)
    292     result = self.parser(sentence_input)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:214, in BobcatParser._prepare_sentence(sent, tags)
    212 spans = {(start, end): {id: score for id, score in scores}
    213          for start, end, scores in sent.spans}
--> 214 return Sentence(sent.words, sent_tags, spans)

File <string>:6, in __init__(self, words, input_supertags, span_scores)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/bobcat/parser.py:62, in Sentence.__post_init__(self)
     61 if len(self.words) != len(self.input_supertags):
---> 62     raise ValueError(
     63             '`words` must be the same length as `input_supertags`')

ValueError: `words` must be the same length as `input_supertags`

The above exception was the direct cause of the following exception:

BobcatParseError                          Traceback (most recent call last)
Cell In[17], line 1
----> 1 tree = parser.sentence2tree(tokens, tokenised=True)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/ccg_parser.py:108, in CCGParser.sentence2tree(self, sentence, tokenised, suppress_exceptions)
    104         raise ValueError('`tokenised` set to `True`, but variable '
    105                          '`sentence` does not have type '
    106                          '`list[str]`.')
    107     sent: list[str] = [str(token) for token in sentence]
--> 108     return self.sentences2trees(
    109                     [sent],
    110                     suppress_exceptions=suppress_exceptions,
    111                     tokenised=tokenised,
    112                     verbose=VerbosityLevel.SUPPRESS.value)[0]
    113 else:
    114     if not isinstance(sentence, str):

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:298, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    296                 trees.append(None)
    297             else:
--> 298                 raise BobcatParseError(' '.join(sent.words)) from e
    300 for i in empty_indices:
    301     trees.insert(i, None)

BobcatParseError: Bobcat failed to parse 'Alice   sleeps'.
@toumix
Copy link
Contributor Author

toumix commented Sep 20, 2023

Also fails with just a space:

parser.sentence2tree(tokeniser.tokenise_sentence(' '), tokenised=True)

I get some weird error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x0 and 2048x968)

@dimkart
Copy link
Contributor

dimkart commented Sep 20, 2023

Thanks for spotting this, will be fixed in the next release.

@dimkart
Copy link
Contributor

dimkart commented Jan 16, 2024

This is now fixed in version 0.4. The issue will be closed.

@dimkart dimkart closed this as completed Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants