Skip to content

ResonantEntropy/bashGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bashGPT

A character-level decoder-only Transformer (GPT) implemented in BASH.

Yes, BASH.

The script trains on a small list of names and generates new ones. All model state, intermediate values, and gradients live inside the shell script. Floating-point arithmetic is handled through bc -l, which is exactly as sensible as it sounds.

This is either educational or a poor life choice, depending on your mood.

Important

Performance Notice: This original version (v1) is hand-written for maximum educational transparency but is extremely slow (~7 hours for a training run) due to BASH process forking overhead. If you actually want to see the model train in a reasonable amount of time (~11 minutes), please see the v2 README for the optimised version using bc coprocesses and AI-assisted refactoring.

What this is

bashGPT is a minimal GPT-style language model written as a single BASH program.

It includes:

  • character-level tokenisation
  • token embeddings
  • positional embeddings
  • causal multi-head self-attention
  • RMSNorm
  • an MLP with ReLU
  • softmax
  • negative log-likelihood loss
  • Adam optimiser

The current model is intentionally tiny.

  • 1 layer
  • 2 attention heads
  • 4-dimensional embeddings
  • context window of 8 tokens

That keeps the runtime barely civilised.

Why this exists

I wanted to understand the mechanism, not just the tooling. BASH gave me the lowest possible cognitive load, which meant I could focus on the raw math. It also introduced several new and unnecessary problems, which turned out to be part of the lesson.

Most Transformer code is hidden behind tensor libraries, kernels, and enough abstraction to make the mechanism harder to see. This script leaves the mechanism exposed.

Each value is tracked directly. Each gradient is stored explicitly. The forward pass is assembled step by step. The backward pass is walked by hand. If you want to inspect the moving parts of a GPT without leaving the terminal, this is one way to do it.

It is also a slightly unreasonable way to spend time, which is part of the appeal.

How it works

Scalar autograd

Each scalar in the computation graph gets an ID. Data, gradients, parent nodes, and local gradients are stored in BASH associative arrays.

There are no tensors here. Everything is scalar math.

Backpropagation

The backward pass uses an iterative topological traversal. No recursive autograd. No hidden engine. Just explicit graph bookkeeping in shell.

Arithmetic

Non-integer math is delegated to bc -l with fixed precision. This works well enough for a proof of concept and slowly enough to build character.

Training

On first run, the script downloads the names dataset automatically. It tokenises the data at character level, trains for 100 steps, and then samples 20 outputs.

NOTE: Ideally, you want 1000+ steps to be any level of useful. 100 was chosen because, well... BASH. If you want a more performant version, try V2.

What to expect

Expect this to be slow.

BASH is many things. A high-performance numerical runtime is not one of them. Training happens one scalar operation at a time, with a lot of process overhead and very little mercy shown to the CPU.

Expect the outputs to be limited too. The model is small, the training run is short, and the numerical behaviour is less stable than you would get from standard ML tooling.

Still, it trains. It samples. It behaves like a language model. In BASH.

That is the whole point.

Limits

This project has very clear constraints.

  • no batching
  • no tensors
  • no useful scaling story
  • no competition with standard frameworks on speed, stability, or ergonomics

Some implementation choices are simplified so the whole thing can remain understandable inside a shell script.

Running it

chmod +x bashGPT.bash
./bashGPT.bash

Requirements

  • BASH 4+
  • bc
  • curl

On the first run, the script downloads the training data automatically.

Notes

  • This repository is for studying mechanism, not for training serious models.
  • If you want performance, use literally almost anything other than BASH.
  • If you want to inspect the gradient state directly, it is sitting in BASH associative arrays:
for id in $(printf '%s\n' "${!values_grad[@]}" | sort -n | head -20); do
  printf '%s\t%s\n' "$id" "${values_grad[$id]}"
done
  • If you want to watch a tiny GPT crawl out of a shell script and somehow function, this could be useful.

Acknowledgments

Inspired by Andrej Karpathy's makemore and his lectures on neural networks. His teaching made this project possible. The conceptual debt to Karpathy's work is real, and gratefully acknowledged.

The included (input.txt) dataset, as an example, has the most common 32K names takes from ssa.gov for the year 2018, slightly neatened.

The implementation is an original rewrite in BASH. No Python code or GPU was harmed in the making of this script. The CPU is a different matter altogether.

Further reading

The philosophical implications of this project are explored in A Philosophical Approach to IT Architecture.

About

An GPT created in BASH 4.x

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors

Languages