Skip to content

domguia/thinker

Repository files navigation

Thinker

The trained computer

We want to train a model that does numeric computation such as 1 + 2 = 3, what do we need for computation:

  1. input
  2. reusable computer unit -> repeat transformer block
  3. memory -> concatenated embeddings accessed via cross attention
  4. algorithm, that gives the desired output

in our case computer & algorithm will be merged in the model
memory will be intermediates latent states concatenated

the algorithm we decided to learn are (from easy to difficult) :

  1. copy input to output, with some variations
  2. addition
  3. multiplication
  4. number factorisation

Task 3 in particular will help to test how this method performs in variable complexity. Since the last task highly relies on memory to reduce computation we will observe how the model model will use the given memory.

Checkout :

Based on the observed result we could re-use the same approach on Language Modeling Task following the original ideas.

About the model
The model is a cross-attention latent-based transformer (like Perceiver):

  1. layer weight sharing to allow reuseable compute block
  2. hidden latent vector as information passing
  3. cross attention on input
  4. cross attention on past latent (wider information passing)

here's a visual

here's a draft of the initial idea

Similar ideas:

  1. Looped Transformers - paper - x_post - code

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published