Skip to content

Managing resources for running Llama.cpp on local environments.

Notifications You must be signed in to change notification settings

deans-code/llama.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resources for running Llama.cpp locally

🎥 Background

I have explored running language models locally using LM Studio, and Ollama.

Under the hood both of these tools use Llama.cpp runtimes.

I am now exploring using Llama.cpp directly to run local LLMs, primarily as a server for integration into applications and CLIs.

✅ Scope

  • Configure Llama.cpp environment.
  • Create script to execute Llama.cpp with predefined parameters.
  • Add support for multiple LLMs.
  • Support per-model parameters, for tweaking settings based on model performance.
  • Integrate into OpenCode.
  • Identify method for verifying GPU offload.
  • Apply AI generated optimial llama-server parameter values.

🔭 Future Gazing

  • Design method of benchmarking performance, automate repeatable tests.
  • Apply improvements to each model execution using benchmarking results.

🪲 Known defects

No known defects.

🔮 Use of AI

GitHub Copilot was used to assist in the development of this software.

🚀 Getting Started

💻 System Requirements

Software

Windows Llama.cpp VS Code Insiders Windows Terminal GPU-Z

Note

Other operating systems and versions will work, where versions are specified treat as minimums.

Note

TechPowerUp's GPU-Z is optional. This application provides a simple method of verifying GPU offload.

Hardware

A system capable of running Llama.cpp is required.

Details of my personal system are below.

APU

Note

The hardware in use on my PC includes an Accelerated Processor Unit (APU) which combines CPU and GPU on a single chip. Llama.cpp is focused on supporting a wide range of hardware. Performance will depend upon your hardware, the use of CPU v GPU, the models you choose to run and other operational factors.

💾 System Configuration

Installation of Llama.cpp via Winget, no other configuration needed.

Note

Works on my machine!

🔧 Development Setup

Clone the repository.

Download supported models, place models within the models directory.

Note

The repository shows which models I am currently experimenting with. The script currently hardcodes their values.

Scripts can be executed within the VS Code terminal window, or via any other supported terminal e.g. Windows Terminal.

Note

The scripts are opinionated, they are hardcoded to use Windows Terminal when launching new Llama.cpp servers.

⚡ Features

  • Asks user on each execution whether they wish to update Llama.cpp.
  • Asks user which model they wish to run.
  • Runs the model in a new Windows Terminal window.

📎 Usage

Run start-llama-cpp.bat in your preferred terminal.

Run GPU-Z to verify GPU offload:

GPU-Z reporting memory load

🙌 Thanks

Thanks to Nico Domino who shared his GLM-4.7-Flash Strix Halo Docker setup, I used this as a basis for running my own local Llama.cpp server.

Thanks also to the open source contributors of Llama.cpp.

👋 Contributing

This repository was created primarily for my own exploration of the technologies involved.

🎁 License

I have selected an appropriate license using this tool.

This software is licensed under the MIT license.

📖 Further reading

More detailed information can be found in the documentation:

About

Managing resources for running Llama.cpp on local environments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published