-
Notifications
You must be signed in to change notification settings - Fork 17
Install Ollama Local LLM Linux
Running a local LLM keeps all data private and offline. There are no subscription fees. Hardware and electricity costs apply.
It requires Ollama and a capable GPU.
To run Elite Dangerous and the LLM on the same machine, a minimum of an NVIDIA RTX 3060 with 12 GB VRAM is required. Performance headroom is limited at this specification.
Tip: Elite Intel can be pointed at an Ollama instance running on a separate PC on your network. If a second machine with a capable GPU is available, the game PC carries no inference load in this configuration.
| Model | VRAM Required | Notes |
|---|---|---|
Tulu-3.1-8B-SuperNova-Q4_K_M |
~5 GB | β Recommended. Reliable for commands and queries. |
qwen3 8B |
~8 GB | Experimental. Expect occasional missed commands and hallucinations. |
Note: For the fastest local inference, consider LM Studio with
matrixportalx/tulu-3.1-8b-supernova. In testing, it is noticeably faster than Ollama on the same hardware with the same model.
curl -fsSL https://ollama.com/install.sh | shOllama installs as a systemd service and starts automatically.
ollama pull hf.co/matrixportalx/Tulu-3.1-8B-SuperNova-Q4_K_M-GGUFOr experimental alternatives:
ollama pull qwen3:8bOllama works without tuning. The following configuration improves VRAM management when running alongside Elite Dangerous.
sudo nano /etc/systemd/system/ollama.service.d/override.confPaste this in:
[Service]
Environment="OLLAMA_MAX_VRAM=14000000000"
Environment="OLLAMA_DEBUG=0"
Environment="OLLAMA_NUM_PARALLEL=3"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=-1"
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=5Then reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama.serviceOLLAMA_MAX_VRAM: Hard cap on VRAM Ollama can use, in bytes. 14000000000 = 14 GB. Leaves the remainder for Elite Dangerous. Adjust based on your GPU and game requirements.
OLLAMA_NUM_PARALLEL: Number of requests Ollama handles simultaneously. Elite Intel makes async calls, so setting this too low causes failures. 3 covers the typical command and query overlap without over-allocating.
OLLAMA_MAX_LOADED_MODELS: Keeps only one model in VRAM at a time.
OLLAMA_FLASH_ATTENTION: Enables Flash Attention, which reduces memory bandwidth usage during inference. Generally faster, especially for repeated requests.
OLLAMA_KEEP_ALIVE=-1: Keeps the model loaded in VRAM indefinitely. Without this, Ollama may unload the model after a period of inactivity, incurring a reload penalty on the next request.
Open the Settings tab in Elite Intel:
- Leave the LLM Key field blank (local Ollama does not require one).
-
LLM Address defaults to
http://localhost:11434/api/chat. If Ollama is on another machine, replacelocalhostwith that machine's IP. -
Command LLM: set to
hf.co/matrixportalx/Tulu-3.1-8B-SuperNova-Q4_K_M-GGUF:latest(or the name shown byollama ls). -
Query LLM: set to
hf.co/matrixportalx/Tulu-3.1-8B-SuperNova-Q4_K_M-GGUF:latest(or the name shown byollama ls). - Click Stop then Start on the AI tab to apply changes.
Community πMatrixπ