Skip to content

A Transformer Encoder where the embedding size can be down-sized.

Notifications You must be signed in to change notification settings

Mascerade/scale-transformer-encoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Transformer Encoder

Implementation (kind of) of a Transformer Encoder. Able to down-scale dmodel to make the dimensions smaller for later in a model. Pretty simple.

Example

import torch
from scale_transformer_encoder import ScalingLayer
x = torch.randn(16, 40, 256)
scale = ScalingLayer(in_features=256,
                     out_features=512,
                     pwff_inner_features=1028,
                     heads=8,
                     multihead_scale=False,
                     head_scale=False,
                     return_attn=True)
out, attn = scale(x)
print("Input size: {}".format(x.size()))
print("Output size: {}".format(out.size()))
print("Attention size: {}".format(attn.size()))

Output

Input size: torch.Size([16, 40, 256])
Output size: torch.Size([16, 40, 512])
Attention size: torch.Size([16, 8, 40, 40])