TRiP — TRansformer in Progress

A few-files, all-in-one C engine for Transformer AI models: inference, training, tokenizer creation, chat, and vision.

Built from scratch over 18 months (from March 2024 to August 2025) during my lunch breaks and weekend nights, TRiP exists just because I wanted to truly understand the transformer internals - from the matrix multiplications up.

TRiP's purpose is purely educational, for me and for anyone willing to learn about transformers. It supports Gemma 1, Llama 2, PaliGemma, and GPT-2, with full inference and training. It does not aim to track the latest model releases, and is not trying to compete with llama.cpp.

NOTE: since people are asking: here's what's AI-generated in the code:

the json parser (with some fix)
the safetensors checkpoint save function
the whole jpeg-X11 management functions (I had no interest in developing them)
the final file split (I initially wrote everything as main.c :D )
some revisions of the comments before I made the commit
this readme, for the most part :D

That's all, I think; the rest, it's all hand-coded by me. And it would had no sense otherwise, since the whole point in doing this was: to achieve the closest to a full-stack understanding of the transformer internals.

What TRiP supports

Architectures: Llama2, Gemma 1.0/1.1, PaliGemma 1 (vision+language), GPT-2
Checkpoint formats: SafeTensors (HuggingFace), Karpathy's llama2.c and gpt2 formats
Weight types: bf16, float16, float32
Training: full backpropagation with AdamW, cosine annealing LR, gradient clipping
Tokenizer: BPE (SentencePiece-compatible), with vocabulary creation from scratch
Inference: greedy, top-k, and nucleus (top-p) sampling
Chat: interactive chat with Llama, Gemma, and TinyLlama chat templates
Vision: multimodal inference with PaliGemma (JPEG input, X11 display)
Memory: RAM-optimized mode via mmap for large models on limited hardware

Building

Dependencies

gcc (recommended: version 13 or higher, to get support for bfloat16; with OpenMP support)
libjpeg-dev (or libjpeg62-turbo-dev)
libx11-dev

WARNING: do NOT expect higher performance with bfloat16 or float16 on CPUs; today's CPUs are not optimized for floating point operations in such formats, and float32 always performs best. That surprised me a lot, too.

On Debian:

sudo apt install build-essential libomp-dev libjpeg62-turbo-dev libx11-dev

On Ubuntu:

sudo apt install build-essential libomp-dev libjpeg-dev libx11-dev

Windows (WSL)

TRiP runs natively under WSL (Windows Subsystem for Linux). To enable the X11 display features (vision mode, image display), install an X server on the Windows side:

VcXsrv (free, lightweight)
Xming (free version available)
MobaXterm (has a built-in X server)

Then in your WSL terminal, before running TRiP:

export DISPLAY=:0

If using WSL2 (most setups), use instead:

export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0

X11 is only needed for vision mode. Chat, inference, and training work without it.

Compile

make

That's it. No cmake, no external frameworks, no Python. Just make.

Quick start

Chat with a Gemma model

Download a Gemma-2B-IT model from HuggingFace (safetensors format), then:

./trip --chat \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json \
    --chat_scheme GEMMA

Run inference on a prompt

./trip --decode \
    --input_text "The capital of Italy is" \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json

Or from a text file:

./trip --decode prompt.txt \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json

Train a model

./trip --train \
    --checkpoint my_model/model.safetensors \
    --tokenizer my_model/tokenizer.json \
    --train_data my_dataset.txt \
    --train_config training_args.json

Vision (PaliGemma)

./trip --vision photo.jpg \
    --checkpoint paligemma/model.safetensors \
    --tokenizer paligemma/tokenizer.json \
    --input_text "Describe this image"

Build a tokenizer vocabulary from scratch

./trip --build_vocab corpus.txt --vocab_size 32000 --tokenizer my_tokenizer.json

Full CLI reference

USAGE:
  ./trip <ACTION> [OPTIONS...]

Actions (pick one)

Flag	Description
`--decode [file]`	Run inference on a prompt (from file, `--input_text`, or stdin)
`--chat`	Interactive chat session
`--vision [image.jpg]`	Multimodal inference with an image
`--train`	Train the model
`--create`	Create a new model from a configuration file
`--build_vocab <data.txt>`	Build a new tokenizer vocabulary from a text corpus
`--utest`	Run unit tests
`--help`	Show help

Model & tokenizer options

Flag	Default	Description
`--checkpoint <path>`	`default.model`	Path to model checkpoint file(s)
`--checkpoint_type <type>`	`SAFETENSORS`	Format: `SAFETENSORS`, `LLAMA2_AK`, `GPT2_AK`
`--configuration <path>`	(auto)	Path to `config.json` (for SafeTensors)
`--tokenizer <path>`	`default.tokenizer`	Path to tokenizer file
`--tokenizer_format <type>`	`JSON_HUGGINGFACE`	Format: `JSON_HUGGINGFACE`, `LLAMA2_AK`, `GPT2_AK`
`--tokenizer_type <type>`	`SENTENCEPIECE`	Algorithm: `SENTENCEPIECE`, `TRIP`

Inference & sampling options

Flag	Default	Description
`--input_text "<prompt>"`	—	Provide prompt text directly on the command line
`--system_prompt "<text>"`	—	System prompt for chat mode
`--chat_scheme <scheme>`	(none)	Chat template: `LLAMA`, `TINY_LLAMA`, `GEMMA`
`--chat_save_context <file>`	—	Pre-process and save chat context for faster startup
`--chat_load_context <file>`	—	Load a previously saved chat context
`--temperature <value>`	`1.0`	Sampling temperature. `0.0` = greedy (always pick the most probable token)
`--top_p <value>`	`0.9`	Nucleus sampling: sample from the smallest set of tokens whose cumulative probability exceeds this value
`--top_k <value>`	(disabled)	Top-k sampling: sample from the k most probable tokens
`--ram`	(off)	Memory-map weights instead of loading them (slower, uses less RAM)

Training options

Flag	Default	Description
`--train_config <path>`	`training_args.json`	Path to training configuration JSON
`--train_data <path>`	`training_data.txt`	Path to training data (plain text)

File map

TRiP is organized into 7 files. Open trip.h for the complete map.

File	Lines	What it contains
`trip.h`	~900	The map. Every type, struct, global, and declaration.
`math.c`	~3000	Tensor ops, each forward+backward paired side by side: matmul, softmax, layernorm, RMSnorm, RoPE, attention, FFN activations, vector arithmetic
`forward.c`	~1500	Forward pass orchestration + token sampling
`backward.c`	~1500	Backward pass + AdamW optimizer + gradient management
`model.c`	~5500	Checkpoint I/O, model init, memory management, tokenizer, vision preprocessing
`utils.c`	~1000	Logging, JSON parser, terminal I/O, JPEG/X11 image handling
`main.c`	~1900	CLI argument parsing, chat loop, training loop, inference loop

How it works (for the curious)

TRiP implements a transformer from first principles in C. No PyTorch, no TensorFlow, no ONNX — just linear algebra on arrays of floats.

The residual stream is the central concept: a vector that flows through the model like data on a bus. Each layer reads from it, processes it through attention and a feed-forward network, and writes back to it. The forward pass walks the layers top to bottom; the backward pass walks them bottom to top, computing gradients via the chain rule.

Every math operation (math.c) is implemented as a forward+backward pair: you can read rmsnorm() and immediately below it rmsnorm_backward(), and see exactly how the gradient flows through the same computation in reverse.

I put a lot of comments in the code, both as reminders to me, and to render TRiP basically an annotated school book about transformers.

For a deeper understanding of backpropagation, see Andrej Karpathy's lecture; TRiP would never have existed without his work.

License

CC BY-NC 4.0 — free to use, study, modify, and share for non-commercial purposes, with attribution. For commercial licensing, contact the author.

Acknowledgments

Andrej Karpathy — for llama2.c, nanoGPT, and the lectures that made all of this possible
Google — for releasing the Gemma model family
Meta — for releasing the Llama model family

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRiP — TRansformer in Progress

What TRiP supports

Building

Dependencies

Windows (WSL)

Compile

Quick start

Chat with a Gemma model

Run inference on a prompt

Train a model

Vision (PaliGemma)

Build a tokenizer vocabulary from scratch

Full CLI reference

Actions (pick one)

Model & tokenizer options

Inference & sampling options

Training options

File map

How it works (for the curious)

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
backward.c		backward.c
forward.c		forward.c
main.c		main.c
math.c		math.c
model.c		model.c
trip.h		trip.h
utils.c		utils.c

Folders and files

Latest commit

History

Repository files navigation

TRiP — TRansformer in Progress

What TRiP supports

Building

Dependencies

Windows (WSL)

Compile

Quick start

Chat with a Gemma model

Run inference on a prompt

Train a model

Vision (PaliGemma)

Build a tokenizer vocabulary from scratch

Full CLI reference

Actions (pick one)

Model & tokenizer options

Inference & sampling options

Training options

File map

How it works (for the curious)

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages