[QST] Summary Type Arg Meaning (BCE Task Head) #466

jacobdineen · 2022-08-01T21:29:05Z

❓ Questions & Help

Details

Hello!

My team and I are in the process of leveraging a trained t4rec model as a feature extractor. We are currently using Albert, but this question is for any encoder/decoder model offered. When instantiating a model, a user has the option of selecting the model summary type (first/last/mean/etc..). This is to slice the hidden representation of shape (batch, sequence_length, nn_dim) along the middle axis, which is the representation passed into the final nn layer in the BCE task head. I sourced the dependency back to huggingface's SequenceSummary here, which is a straightforward implementation.

Q: If we select first as our summary_type with an encoder-only model, does that token have additional context over the last or mean? Or should there be additional tokenization built into the preprocessing stage?

This question stems from literature, but in encoder-only models, there is a special CLS (and/or end) token that is said to provide a sentence-level representation over all input tokens. The intermediate output appears to provide a nn_dim sized embedding for each token (middle axis) for each element of the batch, but does the above theory still hold if we don't have a tokenization framework that includes special start/stop tokens?

The text was updated successfully, but these errors were encountered:

rnyak · 2022-08-02T14:35:22Z

Hello @jacobdineen. Thanks for your question.

Q1: If we select first as our summary_type with an encoder-only model, does that token have additional context over the last or mean?

Yes, it will have the context over the other tokens due to self-attention mechanism - which computes all the pairwise scores given the items in the current session.

Q2: This question stems from literature, but in encoder-only models, there is a special CLS (and/or end) token that is said to provide a sentence-level representation over all input tokens. The intermediate output appears to provide a nn_dim sized embedding for each token (middle axis) for each element of the batch, but does the above theory still hold if we don't have a tokenization framework that includes special start/stop tokens?

This theory holds for NLP as you already explained above, but for session-based or sequential recommendation case, we do not need to use special CLS separator, since for us every session (which is the input to the model, not multiple sessions) is treated like a single sentence. Note that, we might need this in case of session-aware recommendation task which we do not support yet.

jacobdineen · 2022-08-02T20:36:43Z

Thanks @rnyak !

rnyak · 2022-08-03T14:59:34Z

@jacobdineen for this task, process of leveraging a trained t4rec model as a feature extractor. you do not need to use custom model.fit() and BC head right? Basically you can train the model with HF trainer class with next item prediction task, and then extract the embeddings from the layer you want. In such case, you should be able to use torch.nn.parallel.DistributedDataParallel as described in here #456.

Please let us know how it will go.

jacobdineen · 2022-08-03T15:08:37Z

Hey, @rnyak - Going to respond separately to the other thread as that answer is a bit more verbose re: what solution we have tried.

For this one, leveraging a t4rec model as a feature extractor, we still need to use the BC head, which requires us to use model.fit(). Essentially, we are looking for general purpose features (embeddings) for downstream tasks which are generated via training on an explicit target feature (conversion).

Next item prediction in our context reduces down to predicting the next item that a user will not convert on, due to the natural sparsity of our data and the infrequency of user action. semi-supervised learning may not be applicable to our problem for that reason. Intra-session recommendation would be a good use case for this, but the full customer journey is out of scope for our team.

jacobdineen added the status/needs-triage label Aug 1, 2022

jacobdineen closed this as completed Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Summary Type Arg Meaning (BCE Task Head) #466

[QST] Summary Type Arg Meaning (BCE Task Head) #466

jacobdineen commented Aug 1, 2022 •

edited

rnyak commented Aug 2, 2022

jacobdineen commented Aug 2, 2022

rnyak commented Aug 3, 2022 •

edited

jacobdineen commented Aug 3, 2022

[QST] Summary Type Arg Meaning (BCE Task Head) #466

[QST] Summary Type Arg Meaning (BCE Task Head) #466

Comments

jacobdineen commented Aug 1, 2022 • edited

❓ Questions & Help

Details

rnyak commented Aug 2, 2022

jacobdineen commented Aug 2, 2022

rnyak commented Aug 3, 2022 • edited

jacobdineen commented Aug 3, 2022

jacobdineen commented Aug 1, 2022 •

edited

rnyak commented Aug 3, 2022 •

edited