English | 简体中文
llama.cpp is written purely in C/C++ without any external dependencies, delivering optimal performance and cross-platform compatibility that runs smoothly from embedded devices to high-performance servers.
Specially optimized for large language model inference, it supports quantization techniques and hardware acceleration, significantly reducing memory usage and computational costs while maintaining model quality.
Supports multiple hardware platforms including CPU, GPU (CUDA, OpenCL, Metal), and is compatible with operating systems like Windows, macOS, and Linux to meet various deployment needs.
This product is provided as a pre-installed image on Huawei Cloud's HCE2.0 system (Kunpeng architecture).
This project provides the open-source image product llama.cpp Inference Framework, which comes pre-installed with the llama.cpp inference framework and its related runtime environment, along with deployment templates. Follow the usage guide to easily enjoy an "out-of-the-box" efficient experience.
System Requirements:
- CPU: 2vCPUs or higher
- RAM: 4GB or more
- Disk: At least 40GB
Register a Huawei Account and Activate Huawei Cloud
Image Specification | Features | Notes |
---|---|---|
llama.cpp-b5834-Kunpeng | Deployed on Kunpeng Cloud Server + Huawei Cloud EulerOS 2.0 64bit |
- For more issues, contact us via GitHub Issues or Huawei Cloud Marketplace product support
- Other open-source images available at open-source-image-repos
- Fork this repository and submit merge requests
- Synchronize updates to README.md based on your open-source image information