Skip to content
This repository has been archived by the owner on May 5, 2023. It is now read-only.

Consider using state of the art semantic similarity algorithms #5

Closed
LifeIsStrange opened this issue Nov 15, 2021 · 1 comment
Closed

Comments

@LifeIsStrange
Copy link

The state of the art can be found here:
https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark

Other benchmarcks:
https://paperswithcode.com/task/semantic-textual-similarity

@paulbricman
Copy link
Owner

Thanks for the suggestion! Thing is, I feel the current SOTA encoders which can handle multi-modal data (i.e. both texts and images) are a bit behind strictly text encoders, especially when mainly working with text.

One way of combining the better ones for text with also having images around is to have two embeddings stored for texts, one with a strictly text encoder, one with an image one. When you're working with text, it would use the cleaner text one, while when working with images it would use the CLIP one.

In your view, would the slightly better performance with text justify the increase in complexity, @LifeIsStrange ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants