bashGPT

A character-level decoder-only Transformer (GPT) implemented in BASH.

Yes, BASH.

The script trains on a small list of names and generates new ones. All model state, intermediate values, and gradients live inside the shell script. Floating-point arithmetic is handled through bc -l, which is exactly as sensible as it sounds.

This is either educational or a poor life choice, depending on your mood.

Important

Performance Notice: This original version (v1) is hand-written for maximum educational transparency but is extremely slow (~7 hours for a training run) due to BASH process forking overhead. If you actually want to see the model train in a reasonable amount of time (~11 minutes), please see the v2 README for the optimised version using bc coprocesses and AI-assisted refactoring.

What this is

bashGPT is a minimal GPT-style language model written as a single BASH program.

It includes:

character-level tokenisation
token embeddings
positional embeddings
causal multi-head self-attention
RMSNorm
an MLP with ReLU
softmax
negative log-likelihood loss
Adam optimiser

The current model is intentionally tiny.

1 layer
2 attention heads
4-dimensional embeddings
context window of 8 tokens

That keeps the runtime barely civilised.

Why this exists

I wanted to understand the mechanism, not just the tooling. BASH gave me the lowest possible cognitive load, which meant I could focus on the raw math. It also introduced several new and unnecessary problems, which turned out to be part of the lesson.

Most Transformer code is hidden behind tensor libraries, kernels, and enough abstraction to make the mechanism harder to see. This script leaves the mechanism exposed.

Each value is tracked directly. Each gradient is stored explicitly. The forward pass is assembled step by step. The backward pass is walked by hand. If you want to inspect the moving parts of a GPT without leaving the terminal, this is one way to do it.

It is also a slightly unreasonable way to spend time, which is part of the appeal.

How it works

Scalar autograd

Each scalar in the computation graph gets an ID. Data, gradients, parent nodes, and local gradients are stored in BASH associative arrays.

There are no tensors here. Everything is scalar math.

Backpropagation

The backward pass uses an iterative topological traversal. No recursive autograd. No hidden engine. Just explicit graph bookkeeping in shell.

Arithmetic

Non-integer math is delegated to bc -l with fixed precision. This works well enough for a proof of concept and slowly enough to build character.

Training

On first run, the script downloads the names dataset automatically. It tokenises the data at character level, trains for 100 steps, and then samples 20 outputs.

NOTE: Ideally, you want 1000+ steps to be any level of useful. 100 was chosen because, well... BASH. If you want a more performant version, try V2.

What to expect

Expect this to be slow.

BASH is many things. A high-performance numerical runtime is not one of them. Training happens one scalar operation at a time, with a lot of process overhead and very little mercy shown to the CPU.

Expect the outputs to be limited too. The model is small, the training run is short, and the numerical behaviour is less stable than you would get from standard ML tooling.

Still, it trains. It samples. It behaves like a language model. In BASH.

That is the whole point.

Limits

This project has very clear constraints.

no batching
no tensors
no useful scaling story
no competition with standard frameworks on speed, stability, or ergonomics

Some implementation choices are simplified so the whole thing can remain understandable inside a shell script.

Running it

chmod +x bashGPT.bash
./bashGPT.bash

Requirements

BASH 4+
bc
curl

On the first run, the script downloads the training data automatically.

Notes

This repository is for studying mechanism, not for training serious models.
If you want performance, use literally almost anything other than BASH.
If you want to inspect the gradient state directly, it is sitting in BASH associative arrays:

for id in $(printf '%s\n' "${!values_grad[@]}" | sort -n | head -20); do
  printf '%s\t%s\n' "$id" "${values_grad[$id]}"
done

If you want to watch a tiny GPT crawl out of a shell script and somehow function, this could be useful.

Acknowledgments

Inspired by Andrej Karpathy's makemore and his lectures on neural networks. His teaching made this project possible. The conceptual debt to Karpathy's work is real, and gratefully acknowledged.

The included (input.txt) dataset, as an example, has the most common 32K names takes from ssa.gov for the year 2018, slightly neatened.

The implementation is an original rewrite in BASH. No Python code or GPU was harmed in the making of this script. The CPU is a different matter altogether.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README-v2.md		README-v2.md
README.md		README.md
bashGPT-v2.bash		bashGPT-v2.bash
bashGPT.bash		bashGPT.bash
input.txt		input.txt
input_sorted.txt		input_sorted.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bashGPT

What this is

Why this exists

How it works

Scalar autograd

Backpropagation

Arithmetic

Training

What to expect

Limits

Running it

Requirements

Notes

Acknowledgments

Further reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

bashGPT

What this is

Why this exists

How it works

Scalar autograd

Backpropagation

Arithmetic

Training

What to expect

Limits

Running it

Requirements

Notes

Acknowledgments

Further reading

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors