[Task] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference #471

MarkMoTrin · 2022-07-18T12:41:20Z

Problem:

Two+ customer teams would like to leverage encoder-based embeddings as additional features in Transformers4Rec. Currently, there is no clear example/path to pass in extra embeddings. This gap in feature prevents teams from adopting Transformers4Rec in production.

Goal:

Produce a comprehensive example on how to incorporate embeddings as extra features into Transformers4Rec. Embeddings may be stored in parquet format. Generalizations should be made where possible in examples, especially on defining the schema and other input parameters.

Constraints:

Embeddings may be stored in parquet files
Workflow should be able to be deployed in Triton

Starting Point:

MarkMoTrin · 2022-07-22T14:59:33Z

using-model-embeddings-nvtabular.pdf

viswa-nvidia · 2022-07-27T01:23:31Z

Removing the milestone 22.09. This is not in the POR yet.

viswa-nvidia · 2022-08-15T20:18:55Z

@benfred , @jperez999 ,
Review this ticket and add planning details. It is still in needs definition.
Convert the check boxes to tickets in the specific repos ( Please remember to update the initiative field )
Once you are satisfied with the planning change status to RMP review
Refer to this doc for the workflow
There is a link to the video that explains the workflow. Let me know if you have any questions.

oliverholworthy · 2022-08-17T16:31:58Z

Edit: sorry, this comment is unrelated to this issue (Transformers4Rec). Related Merlin Models (Tensorflow).

For small embedding tables (fit in GPU memory). Using a tensor initializer is the currently supported functionality (in Merlin Models) for pre-trained embedding tables. An example in this notebook:

karlhigley · 2022-09-07T17:44:23Z

@MarkMoTrin When you say "encoder-based embeddings as additional features," does that imply encoding an item attribute (like "text description") using an external model (like BERT) and supplying the resulting embeddings as features for each item in a session? We're not talking about adding a single embedding as a session-level feature, we're talking about adding a list of embeddings as item-level features, yeah?

gabrielspmoreira · 2022-12-08T16:21:37Z

When the embedding table are not huge and fit GPU memory, the new PretrainedEmbeddingsInitializer ( NVIDIA-Merlin/Transformers4Rec#572 ) can be used to initialize the embedding matrix with pre-trained embeddings and set them to trainable or not.

MarkMoTrin added the roadmap label Jul 18, 2022

MarkMoTrin assigned EvenOldridge Jul 18, 2022

EvenOldridge assigned benfred and jperez999 and unassigned EvenOldridge Jul 19, 2022

EvenOldridge added this to the Merlin 22.09 milestone Jul 19, 2022

NVIDIA-Merlin deleted a comment from MarkMoTrin Jul 21, 2022

viswa-nvidia removed this from the Merlin 22.09 milestone Jul 27, 2022

EvenOldridge added this to the Merlin 22.10 milestone Aug 3, 2022

EvenOldridge changed the title ~~[RMP] - Incorporate embeddings (pre-trained and encoder-based) into Transformers4Rec Training and Inference~~ [RMP] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference Aug 3, 2022

EvenOldridge changed the title ~~[RMP] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference~~ [Task] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference Aug 17, 2022

EvenOldridge mentioned this issue Aug 17, 2022

[RMP] Support pre-trained vector embeddings as input features into a model via the dataloader #211

Closed

33 tasks

rnyak mentioned this issue Aug 18, 2022

[FEA] Support feeding pre-trained embeddings to TF4Rec model with high-level api NVIDIA-Merlin/Transformers4Rec#475

Open

3 tasks

viswa-nvidia modified the milestones: Merlin 22.10, Merlin 22.11 Oct 6, 2022

viswa-nvidia modified the milestones: Merlin 22.11, Merlin 22.12 Nov 1, 2022

viswa-nvidia modified the milestones: Merlin 22.12, Merlin 23.01 Nov 15, 2022

gabrielspmoreira mentioned this issue Dec 8, 2022

Support to pre-trained embeddings initializer (trainable or not) NVIDIA-Merlin/Transformers4Rec#572

Merged

viswa-nvidia removed the roadmap label Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference #471

[Task] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference #471

MarkMoTrin commented Jul 18, 2022 •

edited by EvenOldridge

Loading

MarkMoTrin commented Jul 22, 2022

viswa-nvidia commented Jul 27, 2022

viswa-nvidia commented Aug 15, 2022

oliverholworthy commented Aug 17, 2022 •

edited

Loading

karlhigley commented Sep 7, 2022

gabrielspmoreira commented Dec 8, 2022

[Task] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference #471

[Task] - Incorporate embeddings (pre-trained and encoder-based) into Merlin Training and Inference #471

Comments

MarkMoTrin commented Jul 18, 2022 • edited by EvenOldridge Loading

Problem:

Goal:

Constraints:

Starting Point:

MarkMoTrin commented Jul 22, 2022

viswa-nvidia commented Jul 27, 2022

viswa-nvidia commented Aug 15, 2022

oliverholworthy commented Aug 17, 2022 • edited Loading

karlhigley commented Sep 7, 2022

gabrielspmoreira commented Dec 8, 2022

MarkMoTrin commented Jul 18, 2022 •

edited by EvenOldridge

Loading

oliverholworthy commented Aug 17, 2022 •

edited

Loading