-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Summary Type Arg Meaning (BCE Task Head) #466
Comments
Hello @jacobdineen. Thanks for your question. Q1: If we select first as our summary_type with an encoder-only model, does that token have additional context over the last or mean? Yes, it will have the context over the other tokens due to self-attention mechanism - which computes all the pairwise scores given the items in the current session. Q2: This question stems from literature, but in encoder-only models, there is a special CLS (and/or end) token that is said to provide a sentence-level representation over all input tokens. The intermediate output appears to provide a nn_dim sized embedding for each token (middle axis) for each element of the batch, but does the above theory still hold if we don't have a tokenization framework that includes special start/stop tokens? This theory holds for NLP as you already explained above, but for session-based or sequential recommendation case, we do not need to use special CLS separator, since for us every session (which is the input to the model, not multiple sessions) is treated like a single sentence. Note that, we might need this in case of |
Thanks @rnyak ! |
@jacobdineen for this task, Please let us know how it will go. |
Hey, @rnyak - Going to respond separately to the other thread as that answer is a bit more verbose re: what solution we have tried. For this one, leveraging a t4rec model as a feature extractor, we still need to use the BC head, which requires us to use Next item prediction in our context reduces down to predicting the next item that a user will not convert on, due to the natural sparsity of our data and the infrequency of user action. semi-supervised learning may not be applicable to our problem for that reason. Intra-session recommendation would be a good use case for this, but the full customer journey is out of scope for our team. |
❓ Questions & Help
Details
Hello!
My team and I are in the process of leveraging a trained t4rec model as a feature extractor. We are currently using Albert, but this question is for any encoder/decoder model offered. When instantiating a model, a user has the option of selecting the model summary type (first/last/mean/etc..). This is to slice the hidden representation of shape (batch, sequence_length, nn_dim) along the middle axis, which is the representation passed into the final nn layer in the BCE task head. I sourced the dependency back to huggingface's
SequenceSummary
here, which is a straightforward implementation.Q: If we select
first
as oursummary_type
with an encoder-only model, does that token have additional context over thelast
ormean
? Or should there be additional tokenization built into the preprocessing stage?This question stems from literature, but in encoder-only models, there is a special CLS (and/or end) token that is said to provide a sentence-level representation over all input tokens. The intermediate output appears to provide a nn_dim sized embedding for each token (middle axis) for each element of the batch, but does the above theory still hold if we don't have a tokenization framework that includes special start/stop tokens?
The text was updated successfully, but these errors were encountered: