Energy Transformer

A novel architecture that is a Transformer, an Energy-Based Model, and an Associative Memory. See our paper. Barebones homepage with important links.

Structure

This repository has been cleaned and rewritten for the purpose of clear communication rather than complete features (as was done in the experiments of the paper). The architecture is built using equinox, an excellent and barebones JAX library that looks a lot like pytorch. All pseudocode examples in this README use equinox.

For legacy purposes we include the flax code that was used in the original paper in the og_implementation folder.

Introduction

Energy Transformer (ET) is a continuous dynamical system with a tractable energy -- this means that the forward pass through the model can be done using autograd! This comes with additional benefits like being highly parameter efficient and interpretable (see the paper, Table 15 & Figs 4,5,6). Pseudocode on layernorm representations g below:

import equinox as eqx
import jax
class EnergyTransformer(eqx.Module):
    # Define all parameters
    Wq: jax.Array  # n_heads, head_dim, token_dim
    Wk: jax.Array  # n_heads, head_dim, token_dim
    Xi: jax.Array  # n_memories, token_dim

    def __init__(self, token_dim, n_heads, head_dim, n_memories):
        ...

    def attn_energy(self, g):
        Q = jnp.einsum("qd,hzd->qhz", g, self.Wq)
        K = jnp.einsum("kd,hzd->khz", g, self.Wk)

        beta = 1 / jnp.sqrt(head_dim)
        A = -1 / beta * jax.nn.logsumexp(beta * jnp.einsum("qhz,khz->hqk", Q, K), -1).sum()
    
    def hn_energy(self, g):
        return -1 / 2 * jax.nn.relu(jnp.einsum("nd,md->nm", g, self.Xi)).sum()

    def energy(self, g):
        return self.attn_energy(g) + self.hn_energy(g)

et = EnergyTransformer(...)

key = jr.PRNGkey(0)
x = jr.normal(key, (n_tokens, token_dim))

for i in range(n_steps):
    g = lnorm(x)
    E, dEdg = jax.value_and_grad(et.energy)(g)
    x = x - alpha * dEdg

There is also an energy on the LayerNorm that we cannot ignore, but the above is an excellent starting point for the architecture.

See working code in tutorial.py (using random weights) with architecture code written in architecture.py. We load the model weights from the paper in image_core.py.

Quick start

We are still in the process of cleaning up the environment setup for this repository. For the main tutorial code, you can run:

conda env create -f environment.yml
conda activate et-jax
pip install -r requirements.txt

Demo code (randomized weights) and environment works on a CPU. Observe energy behavior:

python tutorial.py

Demo code (trained weights, also works on CPU). See how ET can be applied to MASKed images.

python image_core.py

Testing

Currently very limited testing.

pytest tests

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
checkpoints		checkpoints
imgs		imgs
og_implementation		og_implementation
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.py		architecture.py
conftest.py		conftest.py
environment.yml		environment.yml
image_core.py		image_core.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tutorial.py		tutorial.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Energy Transformer

Structure

Introduction

Quick start

Testing

About

Releases

Packages

Languages

License

bhoov/energy-transformer-jax

Folders and files

Latest commit

History

Repository files navigation

Energy Transformer

Structure

Introduction

Quick start

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages