Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
build/**
build_*/**
.build/**

models/**
Expand Down
28 changes: 27 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,29 @@
# llama.cpp

This repo is cloned from llama.cpp [commit 74d73dc85cc2057446bf63cc37ff649ae7cebd80](https://github.com/ggerganov/llama.cpp/tree/74d73dc85cc2057446bf63cc37ff649ae7cebd80). It is compatible with llama-cpp-python [commit 7ecdd944624cbd49e4af0a5ce1aa402607d58dcc](https://github.com/abetlen/llama-cpp-python/commit/7ecdd944624cbd49e4af0a5ce1aa402607d58dcc)
This repo is cloned from llama.cpp [commit 74d73dc85cc2057446bf63cc37ff649ae7cebd80](https://github.com/ggerganov/llama.cpp/tree/74d73dc85cc2057446bf63cc37ff649ae7cebd80). It is compatible with llama-cpp-python [commit 7ecdd944624cbd49e4af0a5ce1aa402607d58dcc](https://github.com/abetlen/llama-cpp-python/commit/7ecdd944624cbd49e4af0a5ce1aa402607d58dcc)

## Customize quantization group size at compilation (CPU inference only)

The only thing that is different is to add -DQK4_0 flag when cmake.

```bash
cmake -B build_cpu_g128 -DQK4_0=128
cmake --build build_cpu_g128
```

To quantize the model with the customized group size, run

```bash
./build_cpu_g128/bin/llama-quantize <model_path.gguf> <quantization_type>
```

To run the quantized model, run

```bash
./build_cpu_g128/bin/llama-cli -m <quantized_model_path.gguf>
```

### Note:

You should make sure that the model you run is quantized to the same group size as the one you compile with.
Or you'll receive a runtime error when loading the model.