Skip to content
Merged

1 #30

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
c2082d9
ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034)
PABannier Dec 3, 2024
a8cbab2
ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037)
PABannier Dec 4, 2024
0cd182e
sync : ggml
ggerganov Dec 5, 2024
6fe6247
llama : add Minerva 7B model support (#10673)
Riccorl Dec 5, 2024
c9c6e01
vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash a…
jeffbolznv Dec 5, 2024
7736837
fix(server) : not show alert when DONE is received (#10674)
pminev Dec 5, 2024
6c5bc06
server : (refactoring) do not rely on JSON internally (#10643)
ngxson Dec 6, 2024
f162d45
common : bring back --no-warmup to server (#10686)
ngxson Dec 6, 2024
c5ede38
convert : add custom attention mapping
ggerganov Dec 6, 2024
784a14a
convert : add support for Roberta embeddings (#10695)
Ssukriti Dec 7, 2024
86a1934
metal : Extend how Llama.cpp locates metal resources (#10676)
ormandi Dec 7, 2024
3df784b
Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processi…
0cc4m Dec 7, 2024
c2a16c0
server : fix free of spec context and batch (#10651)
ggerganov Dec 7, 2024
19d8762
ggml : refactor online repacking (#10446)
Djip007 Dec 7, 2024
ce4a7b8
server : various fixes (#10704)
ggerganov Dec 7, 2024
d9c3ba2
ggml : disable iq4_nl interleave size 8 (#10709)
ggerganov Dec 7, 2024
3573fa8
server : (refactor) no more json in server_task input (#10691)
ngxson Dec 7, 2024
62e84d9
llama : add 128k yarn context for Qwen (#10698)
robbiemu Dec 7, 2024
ecc93d0
vulkan: compile a test shader in cmake to check for coopmat2 support …
jeffbolznv Dec 8, 2024
43ed389
llama : use cmake for swift build (#10525)
slaren Dec 8, 2024
06d7014
Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (…
stduhpf Dec 8, 2024
e52522b
server : bring back info of final chunk in stream mode (#10722)
ngxson Dec 8, 2024
ce8784b
server : fix format_infill (#10724)
ngxson Dec 8, 2024
1a05004
cmake : simplify msvc charsets (#10672)
iboB Dec 9, 2024
3d98b4c
vulkan: fix compile warnings (#10731)
jeffbolznv Dec 9, 2024
c37fb4c
Changes to CMakePresets.json to add ninja clang target on windows (#1…
Srihari-mcw Dec 9, 2024
26a8406
CUDA: fix shared memory access condition for mmv (#10740)
JohannesGaessler Dec 9, 2024
a05e2af
vulkan: disable spirv-opt for coopmat shaders (#10763)
jeffbolznv Dec 10, 2024
a86ad84
server : add flag to disable the web-ui (#10762) (#10751)
eugeniosegala Dec 10, 2024
750cb3e
CUDA: rename macros to avoid conflicts with WinAPI (#10736)
aendk Dec 10, 2024
ae4b922
imatrix : Add imatrix to --no-context-shift (#10766)
bartowski1182 Dec 10, 2024
dafae66
vulkan: dynamic subgroup size for the remaining k quants (#10745)
netrunnereve Dec 10, 2024
b685daf
vulkan: request round-to-even for fp16 in im2col/rope_head (#10767)
jeffbolznv Dec 10, 2024
43041d2
ggml: load all backends from a user-provided search path (#10699)
giladgd Dec 11, 2024
4b4d92b
docs: fix server documentation formatting (#10776)
CentricStorm Dec 11, 2024
484d2f3
bug-fix: snprintf prints NULL in place of the last character (#10419)
kallewoof Dec 11, 2024
92f77a6
ci : pin nodejs to 22.11.0 (#10779)
ngxson Dec 11, 2024
1a31d0d
Update README.md (#10772)
Dec 11, 2024
235f6e1
server : (UI) add tok/s, get rid of completion.js (#10786)
ngxson Dec 11, 2024
fb18934
gguf-py : bump version to 0.11.0
ggerganov Dec 11, 2024
973f328
Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0
ggerganov Dec 11, 2024
5555c0c
docs: update server streaming mode documentation (#9519)
CentricStorm Dec 11, 2024
9fdb124
common : add missing env var for speculative (#10801)
ngxson Dec 12, 2024
dc5301d
Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgr…
0cc4m Dec 12, 2024
4064c0e
Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#…
0cc4m Dec 12, 2024
cb13ef8
remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797)
slaren Dec 12, 2024
8faa1d4
CUDA: faster non-contiguous concat (#10760)
A3shTnT Dec 12, 2024
274ec65
contrib : add ngxson as codeowner (#10804)
ngxson Dec 12, 2024
adffa6f
common : improve -ctv -ctk CLI arguments (#10806)
ngxson Dec 12, 2024
d583cd0
ggml : Fix compilation issues on ARM platform when building without f…
kkontny Dec 13, 2024
83ed24a
SYCL: Reduce most of the compiler warnings (#10748)
qnixsynapse Dec 13, 2024
64ae065
vulkan: small mul_mat_vec optimizations (#10665)
netrunnereve Dec 13, 2024
9f35e44
Fix crash caused by ggml_backend_load_all when launching on Android A…
sienaiwun Dec 13, 2024
4601a8b
gguf-py : numpy 2 newbyteorder fix (#9772)
jettjaniak Dec 13, 2024
11e07fd
fix: graceful shutdown for Docker images (#10815)
co42 Dec 13, 2024
c27ac67
Opt class for positional argument handling (#10508)
ericcurtin Dec 13, 2024
a76c56f
Introducing experimental OpenCL backend with support for Qualcomm Adr…
lhez Dec 13, 2024
56eea07
Removes spurious \r in output that causes logging in journalctl to tr…
cduk Dec 13, 2024
ba1cb19
llama : add Qwen2VL support + multimodal RoPE (#10361)
HimariO Dec 14, 2024
e52aba5
nix: allow to override rocm gpu targets (#10794)
kurnevsky Dec 14, 2024
89d604f
server: Fix `has_next_line` in JSON response (#10818)
MichelleTanPY Dec 14, 2024
b5ae1dd
gguf-py : bump to v0.13.0
ggerganov Dec 15, 2024
5478bbc
server: (UI) add syntax highlighting and latex math rendering (#10808)
VJHack Dec 15, 2024
87cf323
scripts : change build path to "build-bench" for compare-commits.sh (…
ggerganov Dec 15, 2024
a097415
llama : add Deepseek MoE v1 & GigaChat models (#10827)
Inf1delis Dec 15, 2024
4ddd199
llava : Allow locally downloaded models for QwenVL (#10833)
bartowski1182 Dec 15, 2024
644fd71
sampling : refactor + optimize penalties sampler (#10803)
ggerganov Dec 16, 2024
08ea539
unicode : improve naming style (#10838)
ggerganov Dec 16, 2024
160bc03
rwkv6: add wkv6 support for Vulkan backend (#10829)
zhiyuan1i Dec 16, 2024
7b1ec53
vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10…
netrunnereve Dec 17, 2024
227d7c5
server : (UI) fix missing async generator on safari (#10857)
ngxson Dec 17, 2024
4f51968
readme : update typos (#10863)
ruanych Dec 17, 2024
382bc7f
llama : add Falcon3 support (#10864)
mokeddembillel Dec 17, 2024
05c3a44
server : fill usage info in embeddings and rerank responses (#10852)
krystiancha Dec 17, 2024
0006f5a
ggml : update ggml_backend_cpu_device_supports_op (#10867)
ggerganov Dec 17, 2024
3919da8
ggml : add check for grad_accs (ggml/1046)
danbev Dec 13, 2024
130d0c9
ggml : remove return from ggml_gallocr_allocate_node (ggml/1048)
danbev Dec 14, 2024
8dd19a4
vulkan : fix soft_max.comp division by zero (whisper/2633)
gn64 Dec 16, 2024
78f7667
cmake : fix "amd64" processor string (whisper/2638)
ggerganov Dec 17, 2024
5437d4a
sync : ggml
ggerganov Dec 17, 2024
081b29b
tests: add tests for GGUF (#10830)
JohannesGaessler Dec 17, 2024
d62b532
Use model->gguf_kv for loading the template instead of using the C AP…
dranger003 Dec 17, 2024
4da69d1
Revert "llama : add Falcon3 support (#10864)" (#10876)
slaren Dec 18, 2024
6b064c9
docs: Fix HIP (née hipBLAS) in README (#10880)
brianredbeard Dec 18, 2024
4682887
server : (embeddings) using same format for "input" and "content" (#1…
ngxson Dec 18, 2024
0e70ba6
server : add "tokens" output (#10853)
ggerganov Dec 18, 2024
152610e
server : output embeddings for all tokens when pooling = none (#10861)
ggerganov Dec 18, 2024
7bbb5ac
server: avoid overwriting Authorization header (#10878)
vesath Dec 18, 2024
0bf2d10
tts : add OuteTTS support (#10784)
ggerganov Dec 18, 2024
9177484
ggml : fix arm build (#10890)
slaren Dec 18, 2024
7909e85
llama-run : improve progress bar (#10821)
ericcurtin Dec 19, 2024
cd920d0
tests: disable GGUF test for bad value size (#10886)
JohannesGaessler Dec 19, 2024
7585edb
convert : Add support for Microsoft Phi-4 model (#10817)
fairydreaming Dec 19, 2024
2fffc52
llama : fix Roberta embeddings (#10856)
Ssukriti Dec 19, 2024
a3c33b1
ggml: fix arm build with gcc (#10895)
angt Dec 19, 2024
4ec10e6
Merge branch 'master' into 1
apicalshark Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
# Increases the runtime closure size by ~700M
useMpi ? false,
useRocm ? config.rocmSupport,
rocmGpuTargets ? builtins.concatStringsSep ";" rocmPackages.clr.gpuTargets,
enableCurl ? true,
useVulkan ? false,
llamaVersion ? "0.0.0", # Arbitrary version, substituted by the flake
Expand Down Expand Up @@ -188,7 +189,7 @@ effectiveStdenv.mkDerivation (finalAttrs: {
]
++ optionals useRocm [
(cmakeFeature "CMAKE_HIP_COMPILER" "${rocmPackages.llvm.clang}/bin/clang")
(cmakeFeature "CMAKE_HIP_ARCHITECTURES" (builtins.concatStringsSep ";" rocmPackages.clr.gpuTargets))
(cmakeFeature "CMAKE_HIP_ARCHITECTURES" rocmGpuTargets)
]
++ optionals useMetalKit [
(lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1")
Expand Down
10 changes: 5 additions & 5 deletions .devops/tools.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,23 @@ arg1="$1"
shift

if [[ "$arg1" == '--convert' || "$arg1" == '-c' ]]; then
python3 ./convert_hf_to_gguf.py "$@"
exec python3 ./convert_hf_to_gguf.py "$@"
elif [[ "$arg1" == '--quantize' || "$arg1" == '-q' ]]; then
./llama-quantize "$@"
exec ./llama-quantize "$@"
elif [[ "$arg1" == '--run' || "$arg1" == '-r' ]]; then
./llama-cli "$@"
exec ./llama-cli "$@"
elif [[ "$arg1" == '--all-in-one' || "$arg1" == '-a' ]]; then
echo "Converting PTH to GGML..."
for i in `ls $1/$2/ggml-model-f16.bin*`; do
if [ -f "${i/f16/q4_0}" ]; then
echo "Skip model quantization, it already exists: ${i/f16/q4_0}"
else
echo "Converting PTH to GGML: $i into ${i/f16/q4_0}..."
./llama-quantize "$i" "${i/f16/q4_0}" q4_0
exec ./llama-quantize "$i" "${i/f16/q4_0}" q4_0
fi
done
elif [[ "$arg1" == '--server' || "$arg1" == '-s' ]]; then
./llama-server "$@"
exec ./llama-server "$@"
else
echo "Unknown command: $arg1"
echo "Available commands: "
Expand Down
141 changes: 93 additions & 48 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ jobs:
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
sudo apt-get update -y
sudo apt-get install -y build-essential vulkan-sdk
sudo apt-get install -y build-essential mesa-vulkan-drivers vulkan-sdk

- name: Build
id: cmake_build
Expand All @@ -320,6 +320,12 @@ jobs:
cmake -DGGML_VULKAN=ON ..
cmake --build . --config Release -j $(nproc)

- name: Test
id: cmake_test
run: |
cd build
ctest -L main --verbose --timeout 900

ubuntu-22-cmake-hip:
runs-on: ubuntu-22.04
container: rocm/dev-ubuntu-22.04:6.0.2
Expand Down Expand Up @@ -545,35 +551,44 @@ jobs:
-DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu) -- CODE_SIGNING_ALLOWED=NO

# TODO: tmp disabled. see for possible re-enable:
# https://github.com/ggerganov/llama.cpp/pull/10525
# macOS-latest-swift:
# runs-on: macos-latest
#
# strategy:
# matrix:
# destination: ['generic/platform=macOS', 'generic/platform=iOS', 'generic/platform=tvOS']
#
# steps:
# - name: Clone
# id: checkout
# uses: actions/checkout@v4
#
# - name: Dependencies
# id: depends
# continue-on-error: true
# run: |
# brew update
#
# - name: xcodebuild for swift package
# id: xcodebuild
# run: |
# xcodebuild -scheme llama -destination "${{ matrix.destination }}"
#
# - name: Build Swift Example
# id: make_build_swift_example
# run: |
# make swift
macOS-latest-swift:
runs-on: macos-latest

strategy:
matrix:
destination: ['generic/platform=macOS', 'generic/platform=iOS', 'generic/platform=tvOS']

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update

- name: Build llama.cpp with CMake
id: cmake_build
run: |
sysctl -a
mkdir build
cd build
cmake -G Xcode .. \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=ON \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_SERVER=OFF \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64"
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)
sudo cmake --install . --config Release

- name: xcodebuild for swift package
id: xcodebuild
run: |
xcodebuild -scheme llama-Package -destination "${{ matrix.destination }}"

windows-msys2:
runs-on: windows-latest
Expand Down Expand Up @@ -646,6 +661,8 @@ jobs:
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON'
- build: 'msvc-arm64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-msvc.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON'
- build: 'llvm-arm64-opencl-adreno'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON'

steps:
- name: Clone
Expand Down Expand Up @@ -687,6 +704,28 @@ jobs:
run: |
choco install ninja

- name: Install OpenCL Headers and Libs
id: install_opencl
if: ${{ matrix.build == 'llvm-arm64-opencl-adreno' }}
run: |
git clone https://github.com/KhronosGroup/OpenCL-Headers
cd OpenCL-Headers
mkdir build && cd build
cmake .. `
-DBUILD_TESTING=OFF `
-DOPENCL_HEADERS_BUILD_TESTING=OFF `
-DOPENCL_HEADERS_BUILD_CXX_TESTS=OFF `
-DCMAKE_INSTALL_PREFIX="$env:RUNNER_TEMP/opencl-arm64-release"
cmake --build . --target install
git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader
cd OpenCL-ICD-Loader
mkdir build-arm64-release && cd build-arm64-release
cmake .. `
-A arm64 `
-DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" `
-DCMAKE_INSTALL_PREFIX="$env:RUNNER_TEMP/opencl-arm64-release"
cmake --build . --target install --config release

- name: Build
id: cmake_build
run: |
Expand Down Expand Up @@ -716,7 +755,7 @@ jobs:
- name: Test
id: cmake_test
# not all machines have native AVX-512
if: ${{ matrix.build != 'msvc-arm64' && matrix.build != 'llvm-arm64' && matrix.build != 'kompute-x64' && matrix.build != 'vulkan-x64' && (matrix.build != 'avx512-x64' || env.HAS_AVX512F == '1') }}
if: ${{ matrix.build != 'msvc-arm64' && matrix.build != 'llvm-arm64' && matrix.build != 'llvm-arm64-opencl-adreno' && matrix.build != 'kompute-x64' && matrix.build != 'vulkan-x64' && (matrix.build != 'avx512-x64' || env.HAS_AVX512F == '1') }}
run: |
cd build
ctest -L main -C Release --verbose --timeout 900
Expand Down Expand Up @@ -1097,6 +1136,29 @@ jobs:
- name: Checkout code
uses: actions/checkout@v4

- name: Build
id: cmake_build
run: |
sysctl -a
mkdir build
cd build
cmake -G Xcode .. \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=ON \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_SERVER=OFF \
-DCMAKE_SYSTEM_NAME=iOS \
-DCMAKE_OSX_DEPLOYMENT_TARGET=14.0 \
-DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu) -- CODE_SIGNING_ALLOWED=NO
sudo cmake --install . --config Release

- name: xcodebuild for swift package
id: xcodebuild
run: |
xcodebuild -scheme llama-Package -destination 'generic/platform=iOS'

- name: Build Xcode project
run: xcodebuild -project examples/llama.swiftui/llama.swiftui.xcodeproj -scheme llama.swiftui -sdk iphoneos CODE_SIGNING_REQUIRED=NO CODE_SIGN_IDENTITY= -destination 'generic/platform=iOS' build

Expand Down Expand Up @@ -1124,23 +1186,6 @@ jobs:

./gradlew build --no-daemon

# freeBSD-latest:
# runs-on: macos-12
# steps:
# - name: Clone
# uses: actions/checkout@v4
#
# - name: Build
# uses: cross-platform-actions/action@v0.19.0
# with:
# operating_system: freebsd
# version: '13.2'
# hypervisor: 'qemu'
# run: |
# sudo pkg update
# sudo pkg install -y gmake automake autoconf pkgconf llvm15 openblas
# gmake CC=/usr/local/bin/clang15 CXX=/usr/local/bin/clang++15 -j `sysctl -n hw.ncpu`

release:
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ jobs:
# Setup nodejs (to be used for verifying bundled index.html)
- uses: actions/setup-node@v4
with:
node-version: 22
node-version: '22.11.0'

- name: Verify bundled index.html
id: verify_server_index_html
Expand Down
8 changes: 3 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,9 @@ if (WIN32)
add_compile_definitions(_CRT_SECURE_NO_WARNINGS)
endif()

if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC")
add_compile_options("$<$<COMPILE_LANGUAGE:C>:/source-charset:utf-8>")
add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/source-charset:utf-8>")
add_compile_options("$<$<COMPILE_LANGUAGE:C>:/execution-charset:utf-8>")
add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/execution-charset:utf-8>")
if (MSVC)
add_compile_options("$<$<COMPILE_LANGUAGE:C>:/utf-8>")
add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/utf-8>")
endif()

#
Expand Down
12 changes: 12 additions & 0 deletions CMakePresets.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,13 @@
{ "name": "sycl_f16", "hidden": true, "cacheVariables": { "GGML_SYCL_F16": "ON" } },
{ "name": "vulkan", "hidden": true, "cacheVariables": { "GGML_VULKAN": "ON" } },

{
"name": "x64-windows-llvm", "hidden": true,
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/x64-windows-llvm.cmake"
}
},

{
"name": "arm64-windows-msvc", "hidden": true,
"architecture": { "value": "arm64", "strategy": "external" },
Expand Down Expand Up @@ -70,6 +77,11 @@
{ "name": "arm64-windows-msvc-release", "inherits": [ "base", "arm64-windows-msvc", "reldbg" ] },
{ "name": "arm64-windows-msvc+static-release", "inherits": [ "base", "arm64-windows-msvc", "reldbg", "static" ] },

{ "name": "x64-windows-llvm-debug", "inherits": [ "base", "x64-windows-llvm", "debug" ] },
{ "name": "x64-windows-llvm-release", "inherits": [ "base", "x64-windows-llvm", "release" ] },
{ "name": "x64-windows-llvm-reldbg", "inherits": [ "base", "x64-windows-llvm", "reldbg" ] },
{ "name": "x64-windows-llvm+static-release", "inherits": [ "base", "x64-windows-llvm", "reldbg", "static" ] },

{ "name": "x64-windows-msvc-debug", "inherits": [ "base", "debug" ] },
{ "name": "x64-windows-msvc-release", "inherits": [ "base", "reldbg" ] },
{ "name": "x64-windows-msvc+static-release", "inherits": [ "base", "reldbg", "static" ] },
Expand Down
4 changes: 3 additions & 1 deletion CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# collaborators can optionally add themselves here to indicate their availability for reviewing related PRs

ci/ @ggerganov
/ci/ @ggerganov
/.devops/ @ngxson
/examples/server/ @ngxson
31 changes: 19 additions & 12 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ BUILD_TARGETS = \
llama-infill \
llama-llava-cli \
llama-minicpmv-cli\
llama-qwen2vl-cli\
llama-lookahead \
llama-lookup \
llama-lookup-create \
Expand Down Expand Up @@ -445,6 +446,10 @@ ifeq ($(UNAME_M),$(filter $(UNAME_M),x86_64 i686 amd64))
MK_CFLAGS += -march=native -mtune=native
HOST_CXXFLAGS += -march=native -mtune=native

# Usage AMX build test
#MK_CFLAGS += -march=graniterapids -mtune=graniterapids
#HOST_CXXFLAGS += -march=graniterapids -mtune=graniterapids

# Usage AVX-only
#MK_CFLAGS += -mfma -mf16c -mavx
#MK_CXXFLAGS += -mfma -mf16c -mavx
Expand Down Expand Up @@ -948,17 +953,18 @@ DIR_COMMON = common

OBJ_GGML = \
$(DIR_GGML)/src/ggml.o \
$(DIR_GGML)/src/ggml-aarch64.o \
$(DIR_GGML)/src/ggml-alloc.o \
$(DIR_GGML)/src/ggml-backend.o \
$(DIR_GGML)/src/ggml-backend-reg.o \
$(DIR_GGML)/src/ggml-opt.o \
$(DIR_GGML)/src/ggml-quants.o \
$(DIR_GGML)/src/ggml-threading.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-cpp.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu_cpp.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-aarch64.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-hbm.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-quants.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-traits.o \
$(OBJ_GGML_EXT)

OBJ_LLAMA = \
Expand Down Expand Up @@ -1098,17 +1104,10 @@ DEP_FILES = $(OBJ_GGML:.o=.d) $(OBJ_LLAMA:.o=.d) $(OBJ_COMMON:.o=.d)
# Default target
all: $(BUILD_TARGETS)

# force c++ build for source file that have same name as c file
# Note: need this exception because `ggml-cpu.c` and `ggml-cpu.cpp` both produce the same obj/dep files
# g++ -M -I ./ggml/include/ -I ./ggml/src ggml/src/ggml-cpu/ggml-cpu.cpp | grep ggml
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-cpp.o: \
ggml/src/ggml-cpu/ggml-cpu.cpp \
ggml/include/ggml-backend.h \
ggml/include/ggml.h \
ggml/include/ggml-alloc.h \
ggml/src/ggml-backend-impl.h \
ggml/include/ggml-cpu.h \
ggml/src/ggml-impl.h
$(CXX) $(CXXFLAGS) -c $< -o $@
$(DIR_GGML)/%_cpp.o: $(DIR_GGML)/%.cpp
$(CXX) $(CXXFLAGS) -MMD -c $< -o $@

# Rules for building object files
$(DIR_GGML)/%.o: $(DIR_GGML)/%.c
Expand Down Expand Up @@ -1406,6 +1405,14 @@ llama-minicpmv-cli: examples/llava/minicpmv-cli.cpp \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual

llama-qwen2vl-cli: examples/llava/qwen2vl-cli.cpp \
examples/llava/llava.cpp \
examples/llava/llava.h \
examples/llava/clip.cpp \
examples/llava/clip.h \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual

ifeq ($(UNAME_S),Darwin)
swift: examples/batched.swift
(cd examples/batched.swift; make build)
Expand Down
Loading
Loading