Skip to content

MythWho/LLM-Forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Forge 🔥

A self-hosted platform for fine-tuning, RAG, and inference with local LLMs via Ollama.

llm-forge/
├── backend/              FastAPI API (Python)
│   ├── main.py
│   ├── routers/
│   │   ├── data.py       Upload & ingest documents
│   │   ├── rag.py        RAG query / chat
│   │   ├── finetune.py   LoRA fine-tuning jobs
│   │   └── models.py     Ollama model management
│   ├── services/
│   │   ├── vector_store.py    ChromaDB + embeddings
│   │   ├── ollama_service.py  Ollama HTTP client
│   │   ├── rag_pipeline.py    RAG pipeline
│   │   └── finetune_service.py  LoRA trainer
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   └── index.html        Single-page dashboard
├── docker/
│   └── nginx.conf
├── data/
│   ├── uploads/          Uploaded files land here
│   ├── adapters/         LoRA adapter weights
│   └── example_dataset.jsonl
└── docker-compose.yml

Quick Start

# 1. Clone / copy this project
cd llm-forge

# 2. Start all services
docker compose up --build -d

# 3. Open the UI
open http://localhost:3000

# 4. Pull your first model (one-time)
#    Either through the UI → Models → Pull
#    Or via CLI:
docker exec llmforge-ollama ollama pull llama3.2:3b

Architecture

Browser (localhost:3000)
    │
    ▼
Nginx (frontend)  ──/api/──►  FastAPI Backend (:8000)
                                  │          │
                               Ollama     ChromaDB
                             (:11434)      (:8001)
                          (LLM models)  (vectors)

Data Flow

RAG:

User question
  → embed question (sentence-transformers)
  → query ChromaDB for top-K chunks
  → build prompt: system=context + user=question
  → Ollama generate
  → return answer + source chunks

Fine-tune:

Upload JSONL dataset (instruction/output pairs)
  → load HuggingFace base model
  → attach LoRA adapters (PEFT)
  → SFTTrainer (TRL)
  → save adapter weights to /data/adapters/<job_id>/

API Endpoints

Method Path Description
GET /api/health Health check
POST /api/data/ingest/text Ingest raw text
POST /api/data/ingest/file Upload & ingest file
POST /api/data/ingest/jsonl-dataset Upload fine-tune dataset
GET /api/data/stats Vector DB stats
DELETE /api/data/clear Clear vector DB
POST /api/rag/query RAG single query
POST /api/rag/chat RAG multi-turn chat
GET /api/models/ List Ollama models
POST /api/models/pull Pull Ollama model
POST /api/models/generate Direct generate
POST /api/finetune/start Start LoRA job
GET /api/finetune/jobs List all jobs
GET /api/finetune/jobs/{id} Job status + logs

Fine-tuning Dataset Format

Create a .jsonl file with one JSON object per line:

{"instruction": "What is the capital of France?", "output": "Paris."}
{"instruction": "Write a Python hello world.", "output": "print('Hello, world!')"}

Upload it via Ingest → Upload Fine-tune Dataset, then use the filename in Fine-tune → Dataset File.


GPU Support

Uncomment the deploy.resources block in docker-compose.yml for NVIDIA GPU:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

Requires NVIDIA Container Toolkit.


Environment Variables

Variable Default Description
OLLAMA_URL http://ollama:11434 Ollama API endpoint
CHROMA_HOST chromadb ChromaDB host
DEFAULT_MODEL llama3.2:3b Default Ollama model
EMBED_MODEL all-MiniLM-L6-v2 Sentence-transformer model
RAG_TOP_K 5 Default RAG chunks to retrieve
CHUNK_SIZE 512 Words per chunk
CHUNK_OVERLAP 64 Overlap between chunks

Extending

  • Add PDF support: Install pypdf and extend _extract_text() in routers/data.py
  • Add evaluation: Create services/eval_service.py with accuracy metrics
  • Scheduled retraining: Add a cron job or APScheduler that calls start_finetune() with new data
  • Export to Ollama: After fine-tuning, use llama.cpp to convert the merged model to GGUF and load it into Ollama via ollama create
  • Swap embedding model: Change EMBED_MODEL env var to any sentence-transformers model

About

A simple project that helps you Fine Tune any offline models ( in this case, any models from ollama ), working with RAG, LORA and also Docker.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors