Name		Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt		CMakeLists.txt
CustomEmbLayerNormPluginDynamic_PluginConfig.yaml		CustomEmbLayerNormPluginDynamic_PluginConfig.yaml
README.md		README.md
embLayerNormKernel.cu		embLayerNormKernel.cu
embLayerNormPlugin.cpp		embLayerNormPlugin.cpp
embLayerNormPlugin.h		embLayerNormPlugin.h
embLayerNormVarSeqlenKernelHFace.cu		embLayerNormVarSeqlenKernelHFace.cu
embLayerNormVarSeqlenKernelMTron.cu		embLayerNormVarSeqlenKernelMTron.cu
embLayerNormVarSeqlenPlugin.cpp		embLayerNormVarSeqlenPlugin.cpp
embLayerNormVarSeqlenPlugin.h		embLayerNormVarSeqlenPlugin.h

README.md

embLayerNormPlugin

Table Of Contents

Description
- Structure
Parameters
Additional resources
License
Changelog
Known issues

Description

The plugin performs the following two tasks:

Embeds an input sequence consisting of token ids and segment ids. This consists of token embedding lookup, segment embedding lookup, adding positional embeddings and finally, layer normalization.
Preprocesses input masks, that are used to mark valid input tokens in sequences that are padded to the target sequence length. Assuming contiguous input masks, encodes the masks as a single number denoting the number of valid elements, e.g.:

111100 => 4
110000 => 2
110100: Invalid mask, because it is not contiguous

Structure

The embLayerNormPlugin takes three inputs; token_id, segmend_id, and input_mask.

token_id An input sequence containing token ids. token_id is an int32 tensor with shape [S, B] where S is the sequence length and B is the batch size. Tokens typically identify words or word pieces that were obtained by preprocessing the input text.

segment_id An input sequence containing segment ids. segment_id is an int32 tensor with shape [S, B] where S is the sequence length and B is the batch size. The segment id is used to distinguish between different parts of the input sequence that might serve different purposes. E.g. in a squad task, the input sequence might consist of a segment representing the knowledge base (i.e. a paragraph of text) and a segment representing the question.

input_mask input_mask is an int32 tensor with shape [S, B] where S is the sequence length and B is the batch size. The input mask denotes valid elements in a sequence that was padded to the sequence length S.

The embLayerNormPlugin generates the following two outputs:

embedded_input embedded_input is an floating point tensor with shape [S, B, E] where S is sequence length, B is batch size, and E is hidden size. The final output embedding is the sum of embeddings for the token, the segment and the position in the sequence.

maskIdx The maskIdx is a more compact representation of the input mask, consisting of the number of valid elements, assuming that the original mask was contiguous. For fixed sequence length version 1, the maskIdx is an int32 tensor with shape [B, packSize] where B is batch size, packSize is the packed mask size that depends on the sequence length. For huggingface style variable sequence length version 2, the maskIdx is an int32 empty tensor. For megatron style variable sequence length version 3, the maskIdx is a half tensor with shape [B, S, 1, 1] where B is batch size, S is the sequence length.

Parameters

embLayerNormPlugin has plugin creator class EmbLayerNormPluginDynamicCreator and plugin class CustomEmbLayerNormPluginDynamic.

The parameters are defined below and consists of the following attributes:

Type	Parameter	Version	Description
`int`	`output_fp16`	1, 2	Integer encoding the DataType, set 0 when build FP32 network and set 1 when build FP32/INT8 network (0: FP32, 1: FP16)
`int`	`full_mask`	1	Whether to output the full mask that works with the specialized multi-head-attention plugin kernels (this is deprecated, please use mha_type_id)
`int`	`mha_type_id`	1	Integer encoding the multi-head-attention plugin DataType (0: FP32, 1: FP16, 2: INT8)
`Weights`	`bert_embeddings_layernorm_beta`	1, 2	Beta parameter for layer norm. Shape: `[E,]` where `E` is hidden size
`Weights`	`bert_embeddings_layernorm_gamma`	1, 2	Gamma parameter for layer norm. Shape: `[E,]` where `E` is hidden size
`Weights`	`bert_embeddings_word_embeddings`	1, 2	Token embedding matrix. Shape: `[word_vocab_size, E]` where `E` is hidden size
`Weights`	`bert_embeddings_token_type_embeddings`	1, 2	Token type embedding matrix. Shape: `[type_vocab_size, E]` where `E` is hidden size
`Weights`	`bert_embeddings_position_embeddings`	1, 2	Positional embedding matrix. Shape: `[S, E]` where `S` is the maximum sequence length and `E` is hidden size

Additional resources

The following resources provide a deeper understanding of the embLayerNormPlugin plugin:

Networks:

BERT

License

For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.

Changelog

October 2020 Add V2 plugin that supports variable sequence length.

November 2019 This is the first release of this README.md file.

Known issues

This plugin only supports GPUs with compute capability >= 7.0. For more information see the CUDA GPU Compute Capability Support Matrix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embLayerNormPlugin

embLayerNormPlugin

CMakeLists.txt

CMakeLists.txt

CustomEmbLayerNormPluginDynamic_PluginConfig.yaml

CustomEmbLayerNormPluginDynamic_PluginConfig.yaml

README.md

README.md

embLayerNormKernel.cu

embLayerNormKernel.cu