Both embedding ( e.g., Word2Vec) and encoding (e.g., Bag of words) is about representing data in a different space. Embedding isually talks about continous vector spaces (aka sequences), usually capturing semantic relationships, where encoding also includes compressing and dimensional reduction.

The following embeddings shall be integrated from Candle:

Type	From where
Embedding	Integrated - Standard layer
Timestep Embedding	Not Integrated so far - here for relative (aka local) positional encoding
Positional Embedding	Not Integrated so far - here for absolute positional encoding
Falcon Rotary Positional Embedding	Not Integrated so far - here for absolute and relative positional encoding
Sinusoidal Positional Embedding	Not Integrated so far - here for absolute and relative positional encoding

Notes:

Dimensional reduction such as PCA tend to perform poorly. Check e.g., here.
Next steps will be
- here, by just using randomly n dimensions.
- Quantization via
  - Converting to binary quatization (aka hemming distance ). Speedup is roughly a factor 3 (compare here).
  - Converting to scalar quantization (aka integer mapping). Speedup is roughly a factor 24 (compare here).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedding.MD

embedding.MD

Files

embedding.MD

Latest commit

History

embedding.MD

File metadata and controls