The fastest way to install llama.cpp on Linux and macOS.
installama.sh is a simple shell script that downloads and sets up a prebuilt llama-server binary for your system.
It automatically detects your OS, architecture, and GPU capabilities, so you can start using llama.cpp in seconds.
- Automatic detection of CPU architecture (
x86_64/aarch64) and OS (Linux/macOS). - Support for GPU acceleration:
- CUDA:
50617075808689. - ROCm:
gfx803gfx900gfx906gfx908gfx90agfx942gfx1010gfx1011gfx1030gfx1032gfx1100gfx1101gfx1102gfx1200gfx1201gfx1151. - Metal:
M1M2M3M4.
- CUDA:
- Fallback to CPU optimized builds if the GPU is not available.
- Lightweight and Fast!
Install llama-server in one easy step:
curl angt.github.io/installama.sh | sh
Then run the server, for example, with the new awesome WebGUI:
~/.installama/llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja
And open your favorite browser to http://127.0.0.1:8080/.
You can also run everything in a single command:
curl angt.github.io/installama.sh | MODEL=ggml-org/gpt-oss-20b-GGUF sh
In some scenarios, you may want to skip the CUDA backend. You can do this with the following command:
curl angt.github.io/installama.sh | SKIP_CUDA=1 sh
Skipping ROCm is also possible by setting SKIP_ROCM=1.