Skip to content

VPanjeta/LLM-MRI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-MRI: Inside the Mind of an LLM

An interactive, visual explainability lab for transformer models. Type a prompt and explore how GPT-2 processes it — from tokenization to attention to probability distributions.

Features

Module What it shows
Tokenization BPE/WordPiece/BPE-RoBERTa token splits side-by-side
3D Embeddings Token vectors projected via PCA/t-SNE/UMAP, interactive 3D orbit
Attention Heatmap All 12 layers × 12 heads, head ablation toggle
Layer Explorer Animated forward pass, activation norms, layer contributions
Probability Top-30 next-token distribution with live temperature/top-k/top-p sliders
Hallucination Three scenarios showing attention collapse and entropy spikes
Prediction Game 10-round game — guess the next token before the model
Metrics Bar Persistent sparklines for attention entropy, activation norm, perplexity
Math Toggle KaTeX-rendered formulas for every visualization

Quick Start

Docker (recommended)

docker-compose up --build

Then open http://localhost.

First build downloads GPT-2 and DistilGPT-2 (~800MB). Subsequent builds are cached.

Local development

Backend:

cd backend
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

Frontend:

cd frontend
npm install
npm run dev

Then open http://localhost:5173.

Architecture

Browser (React + Three.js + D3) → Nginx → FastAPI → GPT-2 (CPU)
                                        ↕
                                      Redis cache
  • Frontend: React 18 + TypeScript + Vite, Three.js for 3D, D3.js for heatmaps
  • Backend: FastAPI + PyTorch CPU, HuggingFace Transformers
  • Models: GPT-2 small (124M) + DistilGPT-2 (82M), both CPU inference
  • Cache: Redis, 1-hour TTL keyed by SHA256(prompt+model+projection)

Memory Requirements

  • Backend container: ~3-4GB RAM (both models in memory)
  • Redis: 512MB max
  • Frontend: ~30MB image

API Endpoints

Method Path Description
POST /api/analyze Full inference: tokens + attention + hidden states + logits + metrics
POST /api/tokenize Multi-tokenizer comparison
POST /api/generate Next token with sampling parameters
POST /api/compare Side-by-side model comparison
GET /health Model load status

About

An interactive, visual explainability lab for transformer models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors