-
Notifications
You must be signed in to change notification settings - Fork 1
Installation
Scenarios covered:
- Mac One-Click β automatic installation (recommended, zero configuration)
- Mac β manual native installation (maximum LLM performance with llama.cpp)
- Linux native β direct installation with local llama.cpp
- Linux Docker β Docker with Ollama as a separate container (alternative)
- Windows β Docker with llama.cpp server as a separate container
- One-liner β guided installation from scratch with Docker (AI option included, recommended for non-technical users)
The LLM backend, server URL, API keys and model are configured from within the app on the βοΈ Settings page β not in the
.envfile.The
.envfile contains only the database path and the taxonomy path.
The configuration is saved in the database (user_settings) and persists across restarts. On first launch the app uses llama.cpp as the default backend β no external service needed.
No, with the one-click installation the model is downloaded automatically on first launch. The app detects your hardware (RAM, GPU, VRAM) and downloads the optimal model:
| Effective memory | Model | Size |
|---|---|---|
| 4 GB | Qwen2.5-1.5B | 1.1 GB |
| 8 GB | Qwen2.5-3B | 2.1 GB |
| 12 GB | Qwen2.5-7B | 4.7 GB |
| 16+ GB | Gemma-3-12B | 6.8 GB |
How is effective memory determined? On Mac (Apple Silicon) memory is unified, so it equals RAM. On Linux/Windows with a discrete GPU (NVIDIA or AMD ROCm), VRAM is the bottleneck: the app detects VRAM via
nvidia-smiorrocm-smiand usesmin(RAM, VRAM). If no GPU is detected, system RAM is used.
Note: the app works without an active LLM. Import, ledger, rules, analytics and reports are always available. If the LLM is unreachable, transactions receive the category "Other" and
to_review=True.
Prerequisites: macOS 12+, Python 3.11+, internet connection
- Download
install_spendifai.commandfrom the repository - Double-click the file in Finder
- The script:
- Checks Python and installs
uv(package manager) - Downloads Spendif.ai to
~/Applications/Spendif.ai/ - Installs all dependencies
- Detects your hardware (RAM, GPU, VRAM) and recommends the optimal LLM model
- Checks Python and installs
- On the first import, the model is downloaded automatically
To launch Spendif.ai every day: double-click Spendif.ai.command in ~/Applications/Spendif.ai/packaging/macos/
Minimum HW: 4 GB RAM, Apple Silicon or Intel with AVX2. Metal GPU acceleration is automatic.
llama.cpp on Mac uses Metal (Apple Silicon) acceleration automatically. Inside Docker this acceleration is not available β inference is 5-10x slower.
-
Python >= 3.13 β check with
python3 --version. On macOS:brew install python@3.13 - Git
git clone https://github.com/drake69/spendif-ai.git spendifai
cd spendifaicurl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.zshrc # or reopen the terminaluv synccp .env.example .env
# The .env file requires no changes for a standard local installationThe model is loaded directly by llama.cpp (built into Spendif.ai) β no external service to install.
# Download a GGUF model (once)
# Choose based on your available RAM:
# RAM >= 16 GB (recommended):
uv run huggingface-cli download google/gemma-3-12b-it-GGUF gemma-3-12b-it-Q4_K_M.gguf \
--local-dir ~/.spendifai/models
# RAM 8 GB:
uv run huggingface-cli download Qwen/Qwen2.5-7B-Instruct-GGUF qwen2.5-7b-instruct-q4_k_m.gguf \
--local-dir ~/.spendifai/models
# RAM 4-6 GB β Gemma 4 E2B (latest architecture, great for Italian):
uv run huggingface-cli download unsloth/gemma-4-E2B-it-GGUF gemma-4-E2B-it-Q4_K_M.gguf \
--local-dir ~/.spendifai/models
# RAM 4 GB β lightweight alternative:
uv run huggingface-cli download Qwen/Qwen2.5-3B-Instruct-GGUF qwen2.5-3b-instruct-q4_k_m.gguf \
--local-dir ~/.spendifai/modelsThe model is downloaded only once into
~/.spendifai/models/and reused on every subsequent launch. Alternatively, you can download the model directly from the app in βοΈ Settings β Download model.
Gemma 4 E2B: requires an up-to-date
llama-cpp-python. If you getunknown model architecture: 'gemma4', run:uv pip install --upgrade llama-cpp-python.
Qwen 3.5 (hybrid SSM architecture): the standard PyPI wheel is not enough β a source build is required. If you get
missing tensor 'blk.X.ssm_conv1d.weight'orunknown model architecture: 'qwen3', run:bash scripts/setup_ssm_build.sh. The script detects the GPU backend, compiles with the rightCMAKE_ARGS(Metal/CUDA/ROCm/Vulkan) and registersllama_cpp_pythoninbenchmark/.custom_packagesso the build is preserved across future syncs.
# Startup script (recommended) β checks prerequisites, activates virtualenv and starts
./start.sh # UI only (default)
./start.sh api # REST API only
./start.sh all # UI + API
# Or manually
uv run streamlit run app.pyThe app is available at http://localhost:8501
Go to βοΈ Settings β LLM Backend section:
- Backend:
llama.cpp (local, zero-config)β already selected by default - Model path: the downloaded
.gguffile is detected automatically
Ollama alternative: if you prefer Ollama, install it (
brew install ollama), download a model (ollama pull gemma3:12b), and selectOllama (local)in settings.
./start.sh # no service to start β llama.cpp is built inSame procedure as Mac. llama.cpp automatically supports NVIDIA GPUs (CUDA) if drivers are installed, and CPUs with AVX2. Recommended if you want to avoid Docker.
-
Python >= 3.13 β check with
python3 --version. On Ubuntu/Debian:sudo add-apt-repository ppa:deadsnakes/ppa && sudo apt install python3.13 python3.13-venv -
Git β
sudo apt install git -
curl β
sudo apt install curl
git clone https://github.com/drake69/spendif-ai.git spendifai
cd spendifaicurl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc # or reopen the terminaluv synccp .env.example .env
# The .env file requires no changes for a standard local installation# Download a GGUF model (once β choose based on your RAM):
# RAM >= 16 GB (recommended):
uv run huggingface-cli download google/gemma-3-12b-it-GGUF gemma-3-12b-it-Q4_K_M.gguf \
--local-dir ~/.spendifai/models
# RAM 8 GB:
uv run huggingface-cli download Qwen/Qwen2.5-7B-Instruct-GGUF qwen2.5-7b-instruct-q4_k_m.gguf \
--local-dir ~/.spendifai/models
# RAM 4-6 GB β Gemma 4 E2B (latest architecture, great for Italian):
uv run huggingface-cli download unsloth/gemma-4-E2B-it-GGUF gemma-4-E2B-it-Q4_K_M.gguf \
--local-dir ~/.spendifai/models
# RAM 4 GB β lightweight alternative:
uv run huggingface-cli download Qwen/Qwen2.5-3B-Instruct-GGUF qwen2.5-3b-instruct-q4_k_m.gguf \
--local-dir ~/.spendifai/modelsGPU on Linux: llama.cpp uses CUDA automatically if NVIDIA drivers are installed. For AMD GPUs (ROCm):
CMAKE_ARGS="-DGGML_HIPBLAS=on" uv pip install llama-cpp-python --upgrade(requiresrocm-devandhipblas-dev).
VRAM note: if downloading manually, choose the model based on your GPU VRAM, not system RAM. On automatic first launch, Spendif.ai detects VRAM via
nvidia-smi(NVIDIA) orrocm-smi(AMD) and downloads the appropriate model.
./start.sh # UI only (default)
./start.sh api # REST API only
./start.sh all # UI + APIThe app is available at http://localhost:8501
Go to βοΈ Settings β LLM Backend section:
- Backend:
llama.cpp (local, zero-config)β already selected by default - Model path: detected automatically from
~/.spendifai/models/
Ollama alternative: install with
curl -fsSL https://ollama.com/install.sh | sh, download a model (ollama pull gemma3:12b), and selectOllama (local)in settings.
./start.sh # no service to startThis configuration starts two containers:
-
spendifai_appβ the web application -
spendifai_ollamaβ the Ollama LLM server
git clone https://github.com/drake69/spendif-ai.git spendifai
cd spendifaicp .env.example .env
# No changes needed for the base installationInstall the toolkit and uncomment the GPU lines in docker-compose.yml under the ollama section:
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart dockerThen in docker-compose.yml uncomment:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]docker compose --profile ollama up -dWith the one-liner installation (
install.sh) the model is downloaded automatically β this step is not required. The command above is for repository-based installations.
Note: if you used
install.shwith local AI, the download is already running in the background via theollama-initcontainer. Check with:docker compose --project-directory ~/spendifai logs -f ollama-init
# ~8 GB download, a few minutes of waiting
docker compose exec ollama ollama pull gemma3:12b
# Lighter version:
# docker compose exec ollama ollama pull gemma3:4bThe model is saved in the Docker volume
spendifai_ollama_modelsand persists across restarts. No need to download it again.
Go to βοΈ Settings β LLM Backend section:
- Backend:
Ollama (local) - URL:
http://ollama:11434β Docker service name, notlocalhost - Model:
gemma3:12b
docker compose --profile ollama down # stop
docker compose --profile ollama up -d # restart
docker compose logs -f # logs
docker compose exec ollama ollama list # downloaded modelsOn Windows we use llama.cpp server as the LLM backend because it works in Docker without complex GPU configuration and is compatible with the OpenAI API (already supported by Spendif.ai).
- Docker Desktop installed and running (WSL2 backend)
- Git for Windows: https://git-scm.com/download/win
Open PowerShell or Git Bash:
git clone https://github.com/drake69/spendif-ai.git spendifai
cd spendifaiDownload a GGUF file from HuggingFace. Recommended for CPU:
| Model | Size | Required RAM |
|---|---|---|
| gemma-3-4b-it-Q4_K_M.gguf | ~2.5 GB | 6 GB |
| gemma-3-12b-it-Q4_K_M.gguf | ~7.5 GB | 12 GB |
Create the models folder and place the downloaded file there:
mkdir models
# Move the .gguf file into the models\ foldercopy .env.example .envOpen .env and uncomment the LLAMA_MODEL line with the name of the downloaded file:
LLAMA_MODEL=gemma-3-4b-it-Q4_K_M.ggufdocker compose --profile llama-cpp up -dThe first launch takes a few minutes (Docker downloads the images).
Go to βοΈ Settings β LLM Backend section:
- Backend:
OpenAI-compatible - URL:
http://llama-cpp:8080/v1β Docker service name, notlocalhost - API Key:
none - Model: GGUF filename without
.gguf, e.g.gemma-3-4b-it-Q4_K_M
docker compose --profile llama-cpp down # stop
docker compose --profile llama-cpp up -d # restart
docker compose logs -f # logs
docker compose ps # container status- Download LM Studio: https://lmstudio.ai
- In the Discover tab search for and download
gemma-3-4b-it - Go to Local Server β press Start Server (port
1234) - Start Spendif.ai with standard Docker (without profile):
docker compose up -d - In the app βοΈ Settings β LLM Backend:
- Backend:
OpenAI-compatible - URL:
http://host.docker.internal:1234/v1 - API Key:
lm-studio - Model:
gemma-3-4b-it(or the name shown in LM Studio)
- Backend:
The database is saved in the Docker volume spendifai_data and persists across restarts and app updates.
For backup, restore, moving to another computer and direct inspection β Database guide.
| Scenario | Command |
|---|---|
| Mac native | ./start.sh |
| Linux native | ./start.sh |
| Linux + Ollama Docker | docker compose --profile ollama up -d |
| Windows + llama.cpp Docker | docker compose --profile llama-cpp up -d |
| Windows + external LM Studio | docker compose up -d |
Does the model need to be re-downloaded on every launch? No. It is saved only once:
- llama.cpp (native) β
~/.spendifai/models/ - Native Ollama β
~/.ollama/models/ - Ollama Docker β volume
spendifai_ollama_models - llama.cpp Docker β
./models/folder
Can I change the model after the first launch?
Yes. Go to βοΈ Settings and select a different model. For llama.cpp, download the new GGUF into ~/.spendifai/models/. For Ollama: ollama pull <model> (native) or docker compose exec ollama ollama pull <model> (Docker).
Can I use OpenAI or Anthropic instead of a local LLM?
Yes. In βοΈ Settings select OpenAI or Anthropic and enter the API key. No LLM container required.
Does the LLM server URL change between native and Docker installation? Yes:
- LLM running natively on the host β
http://localhost:11434(or1234for LM Studio) - LLM in a Docker container β use the service name (
http://ollama:11434orhttp://llama-cpp:8080) - LLM on the host, app in Docker β
http://host.docker.internal:11434
Can I make a backup with the one-liner installation too?
Yes. The one-liner installation uses the same Docker volume spendifai_data. β Database guide
Can I move my data to another computer? Yes. β Moving the database
How do I completely uninstall Spendif.ai? Use the interactive uninstall script:
curl -fsSL https://raw.githubusercontent.com/drake69/spendifai/main/installer/uninstall.sh | bash
# Windows:
# irm https://raw.githubusercontent.com/drake69/spendifai/main/installer/uninstall.ps1 | iexThe script asks separately whether to remove: database, GGUF models (~/.spendifai/models/), Ollama models, Docker images, installation folder, and shows instructions for uninstalling Docker Desktop.