Skip to content

ddavidgao/rlm-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLM — Recursive Language Model

This is a learning project where I built an agentic document search system from scratch. The idea is simple: instead of letting an LLM answer questions from memory (which it loves to do), you give it a Python REPL and force it to actually search through the document using find(), slicing, and print(). It writes code, sees the output, writes more code, and keeps going until it finds what it needs. Then it gives a final answer backed by evidence it actually found.

The system runs on local models through Ollama — DeepSeek R1 14B as the main "root" model that drives the search loop, and Qwen3 Coder 30B as the sub-model for answering questions about specific chunks. There's also a grading system that scores how well the RLM found formatting errors in a GPO style manual, so I could track improvement across runs.

I built this to get a deeper understanding of how agentic LLM systems work — the REPL loop, prompt engineering to keep models on track, and the challenge of preventing LLMs from just making things up.

Project Structure

  • src/ — the core RLM runtime. rlm.py handles the REPL loop (LLM writes code, we exec it, feed output back) and llm.py wraps the Ollama API
  • main.py — runs the full pipeline: asks a question about the GPO manual, scores the answer, and plots performance over time
  • ml-training/ — fine-tuning Llama 3.2 1B to learn the RLM search workflow
  • karpathy-gpt/ — building a transformer from scratch following Karpathy's tutorial

ml-training

This is where I fine-tuned Llama 3.2 1B Instruct to learn the RLM search behavior instead of relying on huge models like DeepSeek R1 to follow the workflow. The goal was to take a tiny 1B parameter model and teach it to write Python search code when given a document, rather than answering from memory.

The training data is 155 synthetic multi-turn conversations generated by generate_training_data.py. Each example simulates the full RLM loop — the model gets a question, writes context.find() code, sees the real exec() output, and either searches more or gives a final answer. The documents are a mix of Project Gutenberg texts and synthetic docs to keep things diverse. Categories cover clean searches, keyword retries, dead ends, error recovery, and more.

Fine-tuning uses QLoRA through Unsloth — the base model is loaded in 4-bit, and only LoRA adapters on the attention layers get trained (3.4M params out of 1.24B, about 0.28%). It trains in under a minute on an RTX 5080 Laptop GPU with 17GB VRAM. Loss went from 3.54 down to around 1.0 without overfitting.

karpathy-gpt

This is a separate section where I worked through Andrej Karpathy's "Let's build GPT from scratch" series to understand how transformers actually work under the hood. It starts from a basic bigram model that predicts the next character using nothing but a lookup table, and incrementally adds the pieces that make transformers work — positional embeddings, self-attention, multi-head attention, and feed-forward networks.

The dataset is Tiny Shakespeare (~1.1M characters). The model learns at the character level, so it's predicting one character at a time. Early outputs are complete gibberish, but as each component gets added the generated text starts looking more and more like English. Concepts like attention masks, key-query-value, softmax over logits, and cross-entropy loss make way more sense after implementing them by hand.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors