Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 2.24 KB

embedding.MD

File metadata and controls

22 lines (16 loc) · 2.24 KB

Both embedding ( e.g., Word2Vec) and encoding (e.g., Bag of words) is about representing data in a different space. Embedding isually talks about continous vector spaces (aka sequences), usually capturing semantic relationships, where encoding also includes compressing and dimensional reduction.

The following embeddings shall be integrated from Candle:

Type From where
Embedding Integrated - Standard layer
Timestep Embedding Not Integrated so far - here for relative (aka local) positional encoding
Positional Embedding Not Integrated so far - here for absolute positional encoding
Falcon Rotary Positional Embedding Not Integrated so far - here for absolute and relative positional encoding
Sinusoidal Positional Embedding Not Integrated so far - here for absolute and relative positional encoding

Notes:

  • Dimensional reduction such as PCA tend to perform poorly. Check e.g., here.
  • Next steps will be
    • here, by just using randomly n dimensions.
    • Quantization via
      • Converting to binary quatization (aka hemming distance ). Speedup is roughly a factor 3 (compare here).
      • Converting to scalar quantization (aka integer mapping). Speedup is roughly a factor 24 (compare here).