RLM — Recursive Language Model

This is a learning project where I built an agentic document search system from scratch. The idea is simple: instead of letting an LLM answer questions from memory (which it loves to do), you give it a Python REPL and force it to actually search through the document using find(), slicing, and print(). It writes code, sees the output, writes more code, and keeps going until it finds what it needs. Then it gives a final answer backed by evidence it actually found.

The system runs on local models through Ollama — DeepSeek R1 14B as the main "root" model that drives the search loop, and Qwen3 Coder 30B as the sub-model for answering questions about specific chunks. There's also a grading system that scores how well the RLM found formatting errors in a GPO style manual, so I could track improvement across runs.

I built this to get a deeper understanding of how agentic LLM systems work — the REPL loop, prompt engineering to keep models on track, and the challenge of preventing LLMs from just making things up.

Project Structure

src/ — the core RLM runtime. rlm.py handles the REPL loop (LLM writes code, we exec it, feed output back) and llm.py wraps the Ollama API
main.py — runs the full pipeline: asks a question about the GPO manual, scores the answer, and plots performance over time
ml-training/ — fine-tuning Llama 3.2 1B to learn the RLM search workflow
karpathy-gpt/ — building a transformer from scratch following Karpathy's tutorial

ml-training

This is where I fine-tuned Llama 3.2 1B Instruct to learn the RLM search behavior instead of relying on huge models like DeepSeek R1 to follow the workflow. The goal was to take a tiny 1B parameter model and teach it to write Python search code when given a document, rather than answering from memory.

The training data is 155 synthetic multi-turn conversations generated by generate_training_data.py. Each example simulates the full RLM loop — the model gets a question, writes context.find() code, sees the real exec() output, and either searches more or gives a final answer. The documents are a mix of Project Gutenberg texts and synthetic docs to keep things diverse. Categories cover clean searches, keyword retries, dead ends, error recovery, and more.

Fine-tuning uses QLoRA through Unsloth — the base model is loaded in 4-bit, and only LoRA adapters on the attention layers get trained (3.4M params out of 1.24B, about 0.28%). It trains in under a minute on an RTX 5080 Laptop GPU with 17GB VRAM. Loss went from 3.54 down to around 1.0 without overfitting.

karpathy-gpt

This is a separate section where I worked through Andrej Karpathy's "Let's build GPT from scratch" series to understand how transformers actually work under the hood. It starts from a basic bigram model that predicts the next character using nothing but a lookup table, and incrementally adds the pieces that make transformers work — positional embeddings, self-attention, multi-head attention, and feed-forward networks.

The dataset is Tiny Shakespeare (~1.1M characters). The model learns at the character level, so it's predicting one character at a time. Early outputs are complete gibberish, but as each component gets added the generated text starts looking more and more like English. Concepts like attention masks, key-query-value, softmax over logits, and cross-entropy loss make way more sense after implementing them by hand.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
karpathy-gpt		karpathy-gpt
ml-training		ml-training
src		src
.gitignore		.gitignore
README.md		README.md
gpo_manual.txt		gpo_manual.txt
main.py		main.py
rlm_performance.png		rlm_performance.png
run_results.json		run_results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLM — Recursive Language Model

Project Structure

ml-training

karpathy-gpt

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

ddavidgao/rlm-from-scratch

Folders and files

Latest commit

History

Repository files navigation

RLM — Recursive Language Model

Project Structure

ml-training

karpathy-gpt

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages