This repo is created to easily build binaries related to common LLM/NLP libraries. Before doing a release for any package, refer to following key steps,
- Run
update-submodules.yamlto update all submodules. - Update the respective workflow to choose your preferred CUDA/Python/OS version.
- Create a new release, including a description with the latest source package version.
- Run the desired workflow file.
Provide custom-built CUDA-compatible wheels that override the default CMAKE args for your preferred choice of GPU.
References:
- Source: llama-cpp-python
- CMAKE documentation: llama.cpp build guide
- Pre-built wheels are generated using the
llama-build-cpu.yamlorllama-build-cuda.yamlworkflows - CUDA Support & Architecture Compatibility for different GPUs:
bash -DCMAKE_CUDA_ARCHITECTURE=75;80
| Compute Capability | CUDA Architecture | GPUs | Azure Support |
|---|---|---|---|
| sm_50 | Maxwell | GTX 750, Tesla M40 | ❌ |
| sm_60 / sm_61 | Pascal | GTX 1080, Tesla P100 | ❌ |
| sm_70 / sm_72 | Volta | Tesla V100, Jetson AGX Xavier | ❌ |
| sm_75 | Turing | RTX 2080, T4 | ✅ |
| sm_80 | Ampere | A100 | ✅ |
| sm_86 | Ampere | RTX 3090, | ✅ |
| sm_89 / sm_90 | Ada / Hopper | RTX 4090, H100 | ✅ |
Negspacy is a spaCy pipeline component for negation detection. Wheels are built for easy use, as the latest GitHub release is not available on PyPI at time of writing.
References:
- Source: negspacy repository
- Pre-built wheels are generated using the
negspacy-build.yamlworkflow - The negspacy repository is included as a git submodule in the vendor folder
All artifacts are uploaded as binaries to the release tagged v0.0.1
- Scipy: en_core_sci_md-0.5.4.tar.gz
- Open-MPI (For multi-GPU processing): openmpi_4.1.6.orig.tar.xz
If you come across compatibility issues for llama_cpp_python CPU release, recommend that you update the workflow to include repair for following shared files.
This tends to happen when you build the image on ubuntu-latest (25.04/24.04) which has the latest C compiler, making it unable to be used on outdated Linux machines (e.g., Ubuntu 20). By default, we've omitted 32-bit builds as they would require additional shared files to be repaired.
- name: Build wheels
uses: pypa/cibuildwheel@v2.21.1
env:
CIBW_SKIP: "*manylinux_i686* *musllinux* pp*"
#CIBW_REPAIR_WHEEL_COMMAND_LINUX: "auditwheel repair --exclude libllama.so --exclude libggml.so --exclude libggml-base.so --exclude libggml-cpu.so {wheel} -w {dest_dir}"
CIBW_REPAIR_WHEEL_COMMAND: ""
CIBW_BUILD: "cp312-*"
CIBW_BUILD_FRONTEND: "build[uv]"
with:
package-dir: ./vendor/llama-cpp-python
output-dir: ./dist
- uses: actions/upload-artifact@v4
with:
name: wheels
path: ./dist/*.whl