Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 2.31 KB

attention.MD

File metadata and controls

23 lines (18 loc) · 2.31 KB

We will support different attention approaches. Candle provides us a brought varierty on existing implementations.

Type From where
SelfAttention Integrated- here for one input sequence. Depending on the implementation it is also called a dot product attention or global attention.
CrossAttention (aka Co-Attention) Not Integrated so far - here for multiple input sequences
CausalSelfAttention Not Integrated so far - here for parts of one or multiple input sequences e.g., only all token before the present. Depending on the implementation also called local attention.
MultiHeadAttention Not Integrated so far - here for multiple concerns/ questions
MultiQueryAttention Not Integrated so far - here for multiple concerns/ questions but knowing the other concerns/ questions
GroupQueryAttention Not Integrated so far - here for building logical groups between the questions

Terms:

  • Heads: amount on parallel questions on a given stream.
  • Contexts: amount of parallel streams.
  • Temporal: Time.
  • Spatial: Dimensionality.

Note: All attention should be available for multiple dimensions. This includes spatial transformer which acts in >= 2D space (=spatial) as required for CNN applications.

More complex models mappes as own layer: