-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make encode() in wrapped model compatible with super encode() #337
Conversation
markstur
commented
Mar 14, 2024
- Adding missing params
- Don't return unexpected tuple (with token count) unless asked
- Adding check to not use our params if given an unwrapped model
- Fixing some param position things
* Adding missing params * Don't return unexpected tuple (with token count) unless asked * Adding check to not use our params if given an unwrapped model * Fixing some param position things Signed-off-by: Mark Sturdevant <markstur@Marks-MacBook-Pro.local>
|
||
# Else... | ||
# It's possible to init with a model that doesn't have the added kwargs. | ||
# E.g. a SentenceTransformer or other transormer model. Remove those kwargs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# E.g. a SentenceTransformer or other transormer model. Remove those kwargs! | |
# E.g. a SentenceTransformer or other transformer model. Remove those kwargs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, done
:param truncate_input_tokens: Truncation length for input tokens. | ||
Truncation length for input tokens. | ||
If less than zero, this truncation is left up to the tokenizer default (model max). | ||
If zero or greater than the model's maximum, then this is used as a test | ||
to see if truncation is needed. If needed is needed, an exception is thrown. | ||
Otherwise, we take this usable truncation limit to truncate the input tokens. | ||
:param return_token_count: If true, a tuple is returned to add the input token count. | ||
|
||
:return: | ||
A tuple of the embedding, as a numpy matrix, and the input_token_count int. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this return
be updated to reflect the return_token_count
update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost missed this again, but done!
# This is not the normal use case but at least don't pass invalid kwargs, to encode() | ||
# and don't return the unexpected tuple (adding token count). | ||
del kwargs["truncate_input_tokens"] | ||
del kwargs["return_token_count"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to have a small test for this case, in case an update could break it (again)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and I came here to make the PR a draft because this isn't catching keyerrors (noticed while reviewing with Flavia)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep and done. Definitely was a silly miss to not check AND not test. :(
…tion behavior. * First draft could KeyError when deleting kwargs that don't exist. Tests added. * Adding a config option so the desired default behavior can be either: - Throw an error if truncation is happening implicitly, or - Nah. Just let it go. First was requested, so trunction does not happen quietly. This is common behavoir for some models. The second is more aligned with SentenceTransformers and is probably necessary for standard tests to run without errors. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
b97e34f
to
c6ed5b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the test update!