This project runs local LLMs on your DGX Spark and exposes a nice web UI over the internet via Cloudflare Tunnel at https://nated.ai.
It includes:
- Ollama – runs local language models with GPU acceleration on port 11434.
- AnythingLLM – a full RAG/chat UI that talks to Ollama (and optionally OpenAI) on port 3001.
- Cloudflare Tunnel (cloudflared) – securely publishes AnythingLLM at
nated.aiusing your Cloudflare account.
-
ollama
- Docker image:
ollama/ollama:latest - Host port:
11434 - Data volume:
./ollama_data:/root/.ollama - Uses the NVIDIA runtime with
NVIDIA_VISIBLE_DEVICES=all.
- Docker image:
-
anythingllm
- Docker image:
mintplexlabs/anythingllm:latest - Host port:
3001 - Data volume:
./anythingllm_storage:/app/server/storage - Uses Ollama as the default LLM and embedding engine.
- Can optionally use OpenAI via your
OPENAI_API_KEY.
- Docker image:
-
tunnel-spark
- Docker image:
cloudflare/cloudflared:latest - Runs
cloudflared tunnel runwith yourTUNNEL_TOKEN. - Exposes AnythingLLM at
https://nated.aithrough Cloudflare’s network.
- Docker image:
In the same folder as docker-compose.yml, create a .env file:
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TUNNEL_TOKEN=eyJhIjoi...paste-from-cloudflare...OPENAI_API_KEY– your real OpenAI API key.TUNNEL_TOKEN– token from Cloudflare when you create the tunnel fornated.ai.
Docker Compose will automatically substitute ${OPENAI_API_KEY} and ${TUNNEL_TOKEN} in docker-compose.yml.
From the DGX Spark, in the directory with docker-compose.yml:
docker compose pull
docker compose up -dCheck containers:
docker psYou should see ollama, anythingllm, and tunnel-spark running.
-
On the DGX, get its IP address:
hostname -I
-
From your Mac (or any machine on the same network), open in a browser:
http://<DGX_IP>:3001 # example: http://192.168.1.50:3001 -
First-time AnythingLLM setup:
- Create an admin account.
- Open Settings → LLM Preference.
- Choose Ollama as the LLM provider.
- Confirm:
- Base URL:
http://ollama:11434 - Default model:
llama3.1:8b-instruct-q4_K_M(or any other installed model).
- Base URL:
- Save.
You now have a local, GPU-accelerated assistant powered by Ollama, managed through AnythingLLM.
-
Log into the Cloudflare dashboard for
nated.ai. -
Go to Zero Trust → Networks → Tunnels.
-
Create a new tunnel (for example, name it
spark-tunnel). -
Choose the Cloudflared connector option.
-
Cloudflare will show a command containing a
--tokenvalue, e.g.:cloudflared tunnel run --token <TUNNEL_TOKEN>
-
Copy the token portion (
<TUNNEL_TOKEN>) into your.envfile asTUNNEL_TOKEN. -
Restart the tunnel container:
docker compose up -d tunnel-spark
The tunnel-spark service now connects your DGX to Cloudflare.
In the Cloudflare dashboard:
-
Edit your tunnel configuration.
-
Add a Public Hostname:
- Hostname:
nated.ai - Type: HTTP
- URL / Service:
http://<DGX_LOCAL_IP>:3001
(Use the same IP fromhostname -I.)
- Hostname:
-
Make sure
nated.ai’s DNS record is proxied by Cloudflare (orange cloud icon).
Once the tunnel is healthy, you can reach AnythingLLM from anywhere via:
https://nated.ai
Ollama runs as a server inside the ollama container. You add or manage models using the ollama CLI inside that container.
On the DGX:
docker exec -it ollama bashNow you are inside the container.
ollama listThis shows all models currently downloaded and ready to use.
Use ollama pull with the model name you want. Examples:
# Smaller / faster models
ollama pull llama3.1:8b-instruct-q4_K_M
ollama pull mistral:7b-instruct
# Larger, more capable models (if you have the memory and patience)
ollama pull llama3.1:70b
ollama pull gemma2:27bOllama will download the model and store it in /root/.ollama (which is mapped to ./ollama_data on the host).
ollama run llama3.1:8b-instruct-q4_K_MType a quick prompt to verify it works, then press Ctrl+C to exit.
Once Ollama has a model, AnythingLLM can use it by referring to the model name.
You have two ways to set which model AnythingLLM uses:
In docker-compose.yml, update:
environment:
- OLLAMA_MODEL_PREF=llama3.1:8b-instruct-q4_K_Mto the new model name, e.g.:
- OLLAMA_MODEL_PREF=llama3.1:70bThen restart AnythingLLM:
docker compose up -d anythingllmAnythingLLM will now default to the new model for chats (assuming it exists in Ollama).
-
Go to the AnythingLLM web UI (
http://<DGX_IP>:3001orhttps://nated.ai). -
Open Settings → LLM Preference.
-
Select Ollama as provider if it isn’t already.
-
Set the Model field to the new model name, e.g.:
llama3.1:8b-instruct-q4_K_Mllama3.1:70bmistral:7b-instruct- etc.
-
Save.
From now on, that workspace will send prompts to the selected Ollama model.
Because OPENAI_API_KEY is passed into the AnythingLLM container, you can configure OpenAI as an additional provider:
- In the AnythingLLM UI, open Settings → LLM Preference.
- Choose OpenAI (or the equivalent option).
- Set:
- API key: it should read from the environment, or you can paste it manually.
- Model: e.g.
gpt-4.1,gpt-4.1-mini, etc.
- Save.
Now you can:
- Use Ollama for most local/private/general work.
- Switch specific workspaces or agents to OpenAI for more powerful or specialized tasks.
From the DGX project directory:
# Start or update the stack
docker compose pull
docker compose up -d
# View logs
docker logs -f ollama
docker logs -f anythingllm
docker logs -f tunnel-spark
# Shell into Ollama container
docker exec -it ollama bash
# Stop everything
docker compose down