A minimal, self-hosted chat UI for llama.cpp.
No Python. No Docker. No npm. No framework. Just nginx + three files.
Built with the help of Claude (Anthropic).
- Multi-session sidebar — create, rename, and delete conversations; auto-named from your first message
- Streaming responses — real-time token streaming with blinking cursor and wait-time spinner
- Token stats panel — toggle browser-side estimates and llama.cpp server-side stats (tokens/s, eval time, etc.)
- System prompt — global system prompt that persists across all sessions via localStorage
- Language toggle — switch UI language and AI reply language between English / 简体中文 / 廣東話 at any time
- File attachments — attach plain text, PDF (extracted via PDF.js), images, audio; modality support is auto-detected from the model
- Save chat — download any session as
.mdor.json - Mobile responsive — sidebar overlay, iOS viewport fix, works on Android and iPhone
- Zero build step — edit and deploy directly, no compilation needed
Browser ──→ nginx ──→ llama.cpp (192.168.x.x:8080 or 127.0.0.1:8080)
(static files +
reverse proxy)
nginx does two jobs:
- Serves the three static files (
chat.html,chat.css,chat.js) and thefonts/directory - Reverse-proxies
/ai1/v1/(the LLM API calls) to your llama.cpp server
That's the entire stack.
tiny-llama-chat/
├── chat.html
├── chat.css
├── chat.js
├── chat.nginx_sample.conf
├── fonts/
│ ├── ai01/ # UI font
│ │ ├── ai01.css
│ │ └── *.ttf
│ └── pdf_4.3.136/ # PDF.js (for PDF attachment support)
│ ├── pdf.min.mjs
│ └── pdf.worker.min.mjs
└── README.md
└── gpl-2.0.txt
The included chat.nginx_sample.conf is a working example you can adapt.
It uses placeholder paths — you must replace them before use (see below).
server {
location = /ai1/ { return 302 /ai1/chat.html; }
location = /ai1 { return 302 /ai1/chat.html; }
location = /ai1/chat.html { alias /path_to_dir1/ai1/v2/chat.html; }
location = /ai1/chat.css { alias /path_to_dir1/ai1/v2/chat.css; }
location = /ai1/chat.js { alias /path_to_dir1/ai1/v2/chat.js; }
# API proxy — LLM API calls, password protected
location ^~ /ai1/v1/ {
proxy_pass http://192.168.99.152:8080/v1/;
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 300s;
proxy_connect_timeout 10s;
proxy_send_timeout 300s;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_hide_header Keep-Alive;
add_header X-Accel-Buffering no;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/sites-enabled/htpasswd_for_ai01;
}
# General /ai1/ proxy (llama.cpp built-in UI and other endpoints)
location ^~ /ai1/ {
proxy_pass http://192.168.99.152:8080/;
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 300s;
proxy_connect_timeout 10s;
proxy_send_timeout 300s;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_hide_header Keep-Alive;
add_header X-Accel-Buffering no;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# Fonts and PDF.js
location ^~ /fonts/ {
alias /path_to_dir2/fonts/;
types { application/javascript mjs; }
}
}The sample config contains two placeholder paths:
| Placeholder | Replace with |
|---|---|
/path_to_dir1/ai1/v2/ |
The actual directory where chat.html, chat.css, chat.js live |
/path_to_dir2/fonts/ |
The actual directory where the fonts/ folder lives |
Example — if you clone the repo to /var/www/tiny-llama-chat:
location = /ai1/chat.html { alias /var/www/tiny-llama-chat/chat.html; }
location = /ai1/chat.css { alias /var/www/tiny-llama-chat/chat.css; }
location = /ai1/chat.js { alias /var/www/tiny-llama-chat/chat.js; }
location ^~ /fonts/ { alias /var/www/tiny-llama-chat/fonts/; types { application/javascript mjs; } }The sample config points to 192.168.99.152:8080 (the author's LAN machine).
Change it to match where your llama.cpp is running:
# llama.cpp on the same machine
proxy_pass http://127.0.0.1:8080/v1/;
# llama.cpp on another machine in your LAN
proxy_pass http://192.168.1.50:8080/v1/;There are two proxy_pass lines in the config — change both.
chat.js loads PDF.js from a hardcoded URL path:
// Inside chat.js — find this line:
const pdfjsLib = await import('/fonts/pdf_4.3.136/pdf.min.mjs');This matches the location ^~ /fonts/ block in the nginx config.
If you serve the fonts/ directory at a different URL path, update this line to match.
In the sample config, auth_basic is applied only to the API proxy block (/ai1/v1/),
meaning static files are public but actual LLM queries require a password.
You can move auth_basic to cover the static files too, remove it entirely, or replace it
with IP-based access control:
# Remove password — just delete these two lines from the proxy block:
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/sites-enabled/htpasswd_for_ai01;
# Or restrict by IP instead:
allow 192.168.1.0/24;
deny all;sudo apt install apache2-utils
sudo htpasswd -c /etc/nginx/sites-enabled/htpasswd_for_ai01 yourusernameThe path /ai1/ is hardcoded in two places. You must change both together or it will break.
| File | What to change |
|---|---|
chat.js line 4 |
const API_BASE = '/ai1/v1/chat/completions'; |
nginx.conf |
All location blocks containing /ai1/ |
chat.js
// Before
const API_BASE = '/ai1/v1/chat/completions';
// After
const API_BASE = '/ai2/v1/chat/completions';nginx.conf — rename every occurrence of /ai1/ to /ai2/.
sudo nginx -t && sudo nginx -s reload- nginx
- llama.cpp running with its built-in HTTP server (
--port 8080or your chosen port) - A modern browser (Firefox, Chrome, Safari, Android WebView)
That's it.
- UI built with the help of Claude by Anthropic
- PDF extraction via PDF.js by Mozilla
- LLM inference via llama.cpp by Georgi Gerganov


