LLM Forge 🔥

A self-hosted platform for fine-tuning, RAG, and inference with local LLMs via Ollama.

llm-forge/
├── backend/              FastAPI API (Python)
│   ├── main.py
│   ├── routers/
│   │   ├── data.py       Upload & ingest documents
│   │   ├── rag.py        RAG query / chat
│   │   ├── finetune.py   LoRA fine-tuning jobs
│   │   └── models.py     Ollama model management
│   ├── services/
│   │   ├── vector_store.py    ChromaDB + embeddings
│   │   ├── ollama_service.py  Ollama HTTP client
│   │   ├── rag_pipeline.py    RAG pipeline
│   │   └── finetune_service.py  LoRA trainer
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   └── index.html        Single-page dashboard
├── docker/
│   └── nginx.conf
├── data/
│   ├── uploads/          Uploaded files land here
│   ├── adapters/         LoRA adapter weights
│   └── example_dataset.jsonl
└── docker-compose.yml

Quick Start

# 1. Clone / copy this project
cd llm-forge

# 2. Start all services
docker compose up --build -d

# 3. Open the UI
open http://localhost:3000

# 4. Pull your first model (one-time)
#    Either through the UI → Models → Pull
#    Or via CLI:
docker exec llmforge-ollama ollama pull llama3.2:3b

Architecture

Browser (localhost:3000)
    │
    ▼
Nginx (frontend)  ──/api/──►  FastAPI Backend (:8000)
                                  │          │
                               Ollama     ChromaDB
                             (:11434)      (:8001)
                          (LLM models)  (vectors)

Data Flow

RAG:

User question
  → embed question (sentence-transformers)
  → query ChromaDB for top-K chunks
  → build prompt: system=context + user=question
  → Ollama generate
  → return answer + source chunks

Fine-tune:

Upload JSONL dataset (instruction/output pairs)
  → load HuggingFace base model
  → attach LoRA adapters (PEFT)
  → SFTTrainer (TRL)
  → save adapter weights to /data/adapters/<job_id>/

API Endpoints

Method	Path	Description
GET	`/api/health`	Health check
POST	`/api/data/ingest/text`	Ingest raw text
POST	`/api/data/ingest/file`	Upload & ingest file
POST	`/api/data/ingest/jsonl-dataset`	Upload fine-tune dataset
GET	`/api/data/stats`	Vector DB stats
DELETE	`/api/data/clear`	Clear vector DB
POST	`/api/rag/query`	RAG single query
POST	`/api/rag/chat`	RAG multi-turn chat
GET	`/api/models/`	List Ollama models
POST	`/api/models/pull`	Pull Ollama model
POST	`/api/models/generate`	Direct generate
POST	`/api/finetune/start`	Start LoRA job
GET	`/api/finetune/jobs`	List all jobs
GET	`/api/finetune/jobs/{id}`	Job status + logs

Fine-tuning Dataset Format

Create a .jsonl file with one JSON object per line:

{"instruction": "What is the capital of France?", "output": "Paris."}
{"instruction": "Write a Python hello world.", "output": "print('Hello, world!')"}

Upload it via Ingest → Upload Fine-tune Dataset, then use the filename in Fine-tune → Dataset File.

GPU Support

Uncomment the deploy.resources block in docker-compose.yml for NVIDIA GPU:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

Requires NVIDIA Container Toolkit.

Environment Variables

Variable	Default	Description
`OLLAMA_URL`	`http://ollama:11434`	Ollama API endpoint
`CHROMA_HOST`	`chromadb`	ChromaDB host
`DEFAULT_MODEL`	`llama3.2:3b`	Default Ollama model
`EMBED_MODEL`	`all-MiniLM-L6-v2`	Sentence-transformer model
`RAG_TOP_K`	`5`	Default RAG chunks to retrieve
`CHUNK_SIZE`	`512`	Words per chunk
`CHUNK_OVERLAP`	`64`	Overlap between chunks

Extending

Add PDF support: Install pypdf and extend _extract_text() in routers/data.py
Add evaluation: Create services/eval_service.py with accuracy metrics
Scheduled retraining: Add a cron job or APScheduler that calls start_finetune() with new data
Export to Ollama: After fine-tuning, use llama.cpp to convert the merged model to GGUF and load it into Ollama via ollama create
Swap embedding model: Change EMBED_MODEL env var to any sentence-transformers model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Forge 🔥

Quick Start

Architecture

Data Flow

API Endpoints

Fine-tuning Dataset Format

GPU Support

Environment Variables

Extending

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
data		data
docker		docker
frontend		frontend
.env.example		.env.example
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

LLM Forge 🔥

Quick Start

Architecture

Data Flow

API Endpoints

Fine-tuning Dataset Format

GPU Support

Environment Variables

Extending

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages