textEmbed, model = 'distilroberta-base' fails based on length of text #36

sebsilas · 2022-12-20T10:55:16Z

The textEmbed function can fail when the model is set to 'distilroberta-base', seemingly depending on the length of the given text.

The following does not fail using the default model argument ("bert-base-uncased"):

t1 <- text::textEmbed(texts = "Voice sounds a little too English and a bit boring. The tone could be a bit more upbeat and happy. The pace is a little slow. I think the speed could be a little quicker. I wouldn't want to meet this person as they speak a bit too slowly and I may get bored of them")
# Success

But trying this with model = 'distilroberta-base' seems to not work (note, I am using layers = 5, as per #35)...

t2 <- text::textEmbed(texts = "Voice sounds a little too English and a bit boring. The tone could be a bit more upbeat and happy. The pace is a little slow. I think the speed could be a little quicker. I wouldn't want to meet this person as they speak a bit too slowly and I may get bored of them",
                             model = 'distilroberta-base',
                             layers = 5)
# Fail

t3 <- text::textEmbed(texts = "Voice sounds a little too English and a bit boring. The tone could be a bit more upbeat and happy. The pace is a little slow. I think the speed could be a little quicker. ",
                             model = 'distilroberta-base',
                             layers = 5)

# Fail

t4 <- text::textEmbed(texts = "Voice sounds a little too English and a bit boring. The tone could be a bit more upbeat and happy. The pace is a little slow.",
                      model = 'distilroberta-base',
                      layers = 5)

# Fail

t5 <- text::textEmbed(texts = "Voice sounds a little too English and a bit boring. The tone could be a bit more upbeat and happy.",
                      model = 'distilroberta-base',
                      layers = 5)

# Success

... until I subtract to latter length of text.

The error on failure is:

Error in dplyr::bind_cols(tokens_layer_number, layers_4_token) :

Can't recycle ..1 (size 120) to match ..2 (size 71).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textEmbed, model = 'distilroberta-base' fails based on length of text #36

textEmbed, model = 'distilroberta-base' fails based on length of text #36

sebsilas commented Dec 20, 2022 •

edited

Error in dplyr::bind_cols(tokens_layer_number, layers_4_token) :

Can't recycle `..1` (size 120) to match `..2` (size 71).

textEmbed, model = 'distilroberta-base' fails based on length of text #36

textEmbed, model = 'distilroberta-base' fails based on length of text #36

Comments

sebsilas commented Dec 20, 2022 • edited

Error in dplyr::bind_cols(tokens_layer_number, layers_4_token) :

Can't recycle ..1 (size 120) to match ..2 (size 71).

sebsilas commented Dec 20, 2022 •

edited

Can't recycle `..1` (size 120) to match `..2` (size 71).