You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The token restrictions only apply to the embedding technique that you use. In your case, that would depend on the SentenceTransformer model that you are using. Moreover, if you were to use Flair or Huggingface Transformer embeddings, then you might run into token limits if you are not careful.
In general, it should not be an issue as long as you are not going too far over the limit. The thing is with going too far over the limit is that the document representation gets vague. In those cases, I would advise splitting your documents into sentences.
In SentenceTransformer (and most transformer-based models), the maximum length of a document is 512 tokens.
I was wondering if there is a similar document length restrictions in BERTopic?
Thanks!
The text was updated successfully, but these errors were encountered: