goformer

Pure Go BERT-family transformer inference. No CGO. No ONNX. No native dependencies.

import "github.com/MichaelAyles/goformer"

model, err := goformer.Load("./bge-small-en-v1.5")
if err != nil {
    log.Fatal(err)
}

embedding, err := model.Embed("DMA channel configuration")
// embedding is a []float32 of length model.Dims()

What This Is

A Go library that loads BERT-family model weights directly from HuggingFace safetensors format and runs inference to produce embeddings. Point it at a model directory downloaded from HuggingFace, call Embed(), get a float32 slice. No Python export step, no ONNX conversion, no native libraries.

Reference model: BGE-small-en-v1.5 (384-dim, 6 layers, 33M params). Any BERT-family model in safetensors format with a compatible tokeniser should work.

Why

Every existing pure Go option for running transformer models requires ONNX format. To get ONNX, you need a Python environment with transformers, optimum, torch, and onnx — a non-trivial dependency chain just to produce the artifact your Go binary needs. If the point of writing in Go is to escape Python in production, requiring Python in your build pipeline undermines the argument.

goformer loads directly from the canonical safetensors weights published by model authors on HuggingFace. Same files, same format, same config. No intermediate conversion.

API

// Load reads model weights from a HuggingFace model directory
// (config.json, tokenizer.json, model.safetensors).
func Load(path string) (*Model, error)

// Embed produces a normalised embedding vector for the input text.
func (m *Model) Embed(text string) ([]float32, error)

// EmbedBatch produces embeddings for multiple texts, padded to the
// longest sequence and processed together.
func (m *Model) EmbedBatch(texts []string) ([][]float32, error)

// Dims returns the embedding dimensionality (e.g. 384 for BGE-small).
func (m *Model) Dims() int

// MaxSeqLen returns the maximum sequence length the model supports.
func (m *Model) MaxSeqLen() int

That is the entire public surface.

Usage

Download a model from HuggingFace:

# Using git (requires git-lfs)
git clone https://huggingface.co/BAAI/bge-small-en-v1.5

# Or download files manually: config.json, tokenizer.json, model.safetensors

Load and embed:

model, err := goformer.Load("./bge-small-en-v1.5")
if err != nil {
    log.Fatal(err)
}

// Single embedding
vec, err := model.Embed("What is a DMA controller?")

// Batch embedding
vecs, err := model.EmbedBatch([]string{
    "What is a DMA controller?",
    "How does SPI communication work?",
    "Configure the UART baud rate",
})

Compute similarity:

func cosineSimilarity(a, b []float32) float32 {
    var dot float32
    for i := range a {
        dot += a[i] * b[i]
    }
    return dot // vectors are already L2-normalised
}

Performance

Benchmarked with BGE-small-en-v1.5 on Apple M1:

Inference comparison

Input	goformer (pure Go)	PyTorch (CPU)	ONNX Runtime (CPU)
Short (~5 tokens)	154ms	12.9ms	3.0ms
Medium (~11 tokens)	287ms	13.4ms	4.2ms
Long (~40 tokens)	1.1s	13.8ms	8.8ms
Batch of 8	2.4s	22.5ms	17.6ms

goformer is ~10-50x slower than optimised native runtimes. The trade-off is zero native dependencies — no CGO, no ONNX conversion pipeline, no Python in your build. For applications where embedding latency is not the bottleneck (offline indexing, RAG pipelines, document processing), this is an acceptable cost.

Component breakdown

Operation	Time	Allocs
Model load	91ms	261MB
MatMul 384×384	31ms	0.6MB
MatMul 384×1536	158ms	2.3MB
LayerNorm (128×384)	139µs	0
Softmax (12×128×128)	1.0ms	0
GELU (128×1536)	436µs	0
Tokenise	1.9µs	1KB

MatMul dominates inference time. There is headroom for tiling optimisation and SIMD.

How It Works

Safetensors parser reads the binary weight file directly — 8-byte header length, JSON metadata, raw float32 data at specified offsets.
WordPiece tokeniser parses HuggingFace tokenizer.json and produces token IDs + attention masks.
BERT forward pass: embedding lookup → N transformer layers (self-attention + FFN + layer norm with residual connections) → mean pooling → L2 normalisation.
All math is pure Go float32 operations. Matrix multiplication uses loop tiling for cache locality.

Correctness

All outputs validated against the HuggingFace Python transformers library:

Test case	Cosine similarity	Max element-wise diff
`DMA channel configuration`	1.000000	0.000292
`The quick brown fox jumps over the lazy dog`	0.999999	0.000389
`Hello`	0.999999	0.000213
Long paragraph (40 tokens)	0.999999	0.000241
`café résumé naïve` (unicode)	0.999999	0.000211
`Hello, world! How's it going?`	1.000000	0.000199

Token IDs: exact match against Python for all test cases
Batch embeddings match single embeddings

Limitations

CPU only. No GPU acceleration.
Inference only. No training or fine-tuning.
BERT-family only. Encoder models with safetensors weights. Not GPT, T5, or other architectures.
F32 and F16 weights. Float16 weights are converted to float32 at load time.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
assets		assets
testdata		testdata
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
attention.go		attention.go
bench_test.go		bench_test.go
doc.go		doc.go
embedding.go		embedding.go
example_test.go		example_test.go
feedforward.go		feedforward.go
go.mod		go.mod
goformer.go		goformer.go
goformer_e2e_test.go		goformer_e2e_test.go
goformer_test.go		goformer_test.go
layernorm.go		layernorm.go
matmul.go		matmul.go
plan.md		plan.md
pooling.go		pooling.go
safetensors.go		safetensors.go
tensor.go		tensor.go
tokeniser.go		tokeniser.go
transformer.go		transformer.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

goformer

What This Is

Why

API

Usage

Performance

Inference comparison

Component breakdown

How It Works

Correctness

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

goformer

What This Is

Why

API

Usage

Performance

Inference comparison

Component breakdown

How It Works

Correctness

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages