Skip to content

Commit 8602b76

Browse files
lordgamezfgerlits
authored andcommitted
MINIFICPP-2556 Create llama.cpp processor for LLM inference
Co-authored-by: Adam Debreceni <adebreceni@apache.org> Signed-off-by: Ferenc Gerlits <fgerlits@gmail.com> Closes #1903
1 parent 61d347d commit 8602b76

35 files changed

Lines changed: 1380 additions & 7 deletions

.github/workflows/ci.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ on: [push, pull_request, workflow_dispatch]
33
env:
44
DOCKER_CMAKE_FLAGS: -DDOCKER_VERIFY_THREAD=3 -DUSE_SHARED_LIBS= -DSTRICT_GSL_CHECKS=AUDIT -DCI_BUILD=ON -DENABLE_AWS=ON -DENABLE_KAFKA=ON -DENABLE_MQTT=ON -DENABLE_AZURE=ON -DENABLE_SQL=ON \
55
-DENABLE_SPLUNK=ON -DENABLE_GCP=ON -DENABLE_OPC=ON -DENABLE_PYTHON_SCRIPTING=ON -DENABLE_LUA_SCRIPTING=ON -DENABLE_KUBERNETES=ON -DENABLE_TEST_PROCESSORS=ON -DENABLE_PROMETHEUS=ON \
6-
-DENABLE_ELASTICSEARCH=ON -DENABLE_GRAFANA_LOKI=ON -DENABLE_COUCHBASE=ON -DDOCKER_BUILD_ONLY=ON -DMINIFI_PERFORMANCE_TESTS=ON
6+
-DENABLE_ELASTICSEARCH=ON -DENABLE_GRAFANA_LOKI=ON -DENABLE_COUCHBASE=ON -DENABLE_LLAMACPP=ON -DDOCKER_BUILD_ONLY=ON -DMINIFI_PERFORMANCE_TESTS=ON
77
SCCACHE_GHA_ENABLE: true
88
CCACHE_DIR: ${{ GITHUB.WORKSPACE }}/.ccache
99
jobs:
@@ -33,6 +33,7 @@ jobs:
3333
-DENABLE_GCP=ON
3434
-DENABLE_KUBERNETES=ON
3535
-DENABLE_LIBARCHIVE=ON
36+
-DENABLE_LLAMACPP=ON
3637
-DENABLE_KAFKA=ON
3738
-DENABLE_LUA_SCRIPTING=ON
3839
-DENABLE_LZMA=ON
@@ -140,6 +141,7 @@ jobs:
140141
-DENABLE_KUBERNETES=ON
141142
-DENABLE_LIBARCHIVE=ON
142143
-DENABLE_KAFKA=ON
144+
-DENABLE_LLAMACPP=ON
143145
-DENABLE_LUA_SCRIPTING=ON
144146
-DENABLE_LZMA=ON
145147
-DENABLE_MQTT=ON
@@ -243,6 +245,7 @@ jobs:
243245
-DENABLE_GRAFANA_LOKI=ON
244246
-DENABLE_KUBERNETES=ON
245247
-DENABLE_LIBARCHIVE=ON
248+
-DENABLE_LLAMACPP=ON
246249
-DENABLE_KAFKA=ON
247250
-DENABLE_LUA_SCRIPTING=ON
248251
-DENABLE_LZMA=ON
@@ -394,7 +397,7 @@ jobs:
394397
mkdir build && cd build && cmake -DUSE_SHARED_LIBS=ON -DCI_BUILD=ON -DCMAKE_BUILD_TYPE=Release -DSTRICT_GSL_CHECKS=AUDIT -DMINIFI_FAIL_ON_WARNINGS=OFF -DENABLE_AWS=ON -DENABLE_AZURE=ON \
395398
-DENABLE_ENCRYPT_CONFIG=ON -DENABLE_KAFKA=ON -DENABLE_MQTT=ON -DENABLE_OPC=ON -DENABLE_OPENCV=ON -DENABLE_OPS=ON -DENABLE_SQL=ON -DENABLE_SYSTEMD=ON \
396399
-DENABLE_PYTHON_SCRIPTING=ON -DENABLE_LUA_SCRIPTING=ON -DENABLE_KUBERNETES=ON -DENABLE_GCP=ON -DENABLE_PROCFS=ON -DENABLE_PROMETHEUS=ON \
397-
-DENABLE_ELASTICSEARCH=ON -DENABLE_GRAFANA_LOKI=ON -DDOCKER_SKIP_TESTS=OFF -DDOCKER_BUILD_ONLY=ON -DDOCKER_CCACHE_DUMP_LOCATION=${{ env.CCACHE_DIR }} .. && make rocky-test
400+
-DENABLE_ELASTICSEARCH=ON -DENABLE_GRAFANA_LOKI=ON -DENABLE_LLAMACPP=ON -DDOCKER_SKIP_TESTS=OFF -DDOCKER_BUILD_ONLY=ON -DDOCKER_CCACHE_DUMP_LOCATION=${{ env.CCACHE_DIR }} .. && make rocky-test
398401
- name: cache save
399402
uses: actions/cache/save@v4
400403
if: always()

LICENSE

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3460,3 +3460,28 @@ NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
34603460
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
34613461
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
34623462
USE OR OTHER DEALINGS IN THE SOFTWARE.
3463+
3464+
3465+
This product bundles 'llama.cpp' which is available under The MIT License.
3466+
3467+
MIT License
3468+
3469+
Copyright (c) 2023-2024 The ggml authors
3470+
3471+
Permission is hereby granted, free of charge, to any person obtaining a copy
3472+
of this software and associated documentation files (the "Software"), to deal
3473+
in the Software without restriction, including without limitation the rights
3474+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
3475+
copies of the Software, and to permit persons to whom the Software is
3476+
furnished to do so, subject to the following conditions:
3477+
3478+
The above copyright notice and this permission notice shall be included in all
3479+
copies or substantial portions of the Software.
3480+
3481+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
3482+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3483+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
3484+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
3485+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
3486+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
3487+
SOFTWARE.

METRICS.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ This readme defines the metrics published by Apache NiFi. All options defined ar
3232
- [Processor Metrics](#processor-metrics)
3333
- [General Metrics](#general-metrics)
3434
- [GetFileMetrics](#getfilemetrics)
35+
- [RunLlamaCppInferenceMetrics](#runllamacppinferencemetrics)
3536

3637
## Description
3738

@@ -288,3 +289,18 @@ Processor level metric that reports metrics for the GetFile processor if defined
288289
| metric_class | Class name to filter for this metric, set to GetFileMetrics |
289290
| processor_name | Name of the processor |
290291
| processor_uuid | UUID of the processor |
292+
293+
### RunLlamaCppInferenceMetrics
294+
295+
Processor level metric that reports metrics for the RunLlamaCppInference processor if defined in the flow configuration.
296+
297+
| Metric name | Labels | Description |
298+
|-----------------------|----------------------------------------------|----------------------------------------------------------------------------|
299+
| tokens_in | metric_class, processor_name, processor_uuid | Number of tokens parsed from the input prompts in the processor's lifetime |
300+
| tokens_out | metric_class, processor_name, processor_uuid | Number of tokens generated in the completion in the processor's lifetime |
301+
302+
| Label | Description |
303+
|----------------|--------------------------------------------------------------------------|
304+
| metric_class | Class name to filter for this metric, set to RunLlamaCppInferenceMetrics |
305+
| processor_name | Name of the processor |
306+
| processor_uuid | UUID of the processor |

NOTICE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ This software includes third party software subject to the following copyrights:
7777
- snappy - Copyright 2011, Google Inc.
7878
- llhttp - Copyright Fedor Indutny, 2018.
7979
- benchmark - Copyright 2015 Google Inc.
80+
- llama.cpp - Copyright (c) 2023-2024 The ggml authors
8081

8182
The licenses for these third party components are included in LICENSE.txt
8283

PROCESSORS.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ limitations under the License.
6565
- [ListS3](#ListS3)
6666
- [ListSFTP](#ListSFTP)
6767
- [ListSmb](#ListSmb)
68+
- [RunLlamaCppInference](#RunLlamaCppInference)
6869
- [LogAttribute](#LogAttribute)
6970
- [ManipulateArchive](#ManipulateArchive)
7071
- [MergeContent](#MergeContent)
@@ -1745,6 +1746,48 @@ In the list below, the names of required properties appear in bold. Any other pr
17451746
| size | success | The size of the file in bytes. |
17461747

17471748

1749+
## RunLlamaCppInference
1750+
1751+
### Description
1752+
1753+
LlamaCpp processor to use llama.cpp library for running language model inference. The inference will be based on the System Prompt and the Prompt property values, together with the content of the incoming flow file. In the Prompt, the content of the incoming flow file can be referred to as 'the input data' or 'the flow file content'.
1754+
1755+
### Properties
1756+
1757+
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
1758+
1759+
| Name | Default Value | Allowable Values | Description |
1760+
|----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|-------------------------------------------------------------------------------------------------------------------------|
1761+
| **Model Path** | | | The filesystem path of the model file in gguf format. |
1762+
| Temperature | 0.8 | | The temperature to use for sampling. |
1763+
| Top K | 40 | | Limit the next token selection to the K most probable tokens. Set <= 0 value to use vocab size. |
1764+
| Top P | 0.9 | | Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P. 1.0 = disabled. |
1765+
| Min P | | | Sets a minimum base probability threshold for token selection. 0.0 = disabled. |
1766+
| **Min Keep** | 0 | | If greater than 0, force samplers to return N possible tokens at minimum. |
1767+
| **Text Context Size** | 4096 | | Size of the text context, use 0 to use size set in model. |
1768+
| **Logical Maximum Batch Size** | 2048 | | Logical maximum batch size that can be submitted to the llama.cpp decode function. |
1769+
| **Physical Maximum Batch Size** | 512 | | Physical maximum batch size. |
1770+
| **Max Number Of Sequences** | 1 | | Maximum number of sequences (i.e. distinct states for recurrent models). |
1771+
| **Threads For Generation** | 4 | | Number of threads to use for generation. |
1772+
| **Threads For Batch Processing** | 4 | | Number of threads to use for batch processing. |
1773+
| Prompt | | | The user prompt for the inference.<br/>**Supports Expression Language: true** |
1774+
| System Prompt | You are a helpful assistant. You are given a question with some possible input data otherwise called flow file content. You are expected to generate a response based on the question and the input data. | | The system prompt for the inference. |
1775+
1776+
### Relationships
1777+
1778+
| Name | Description |
1779+
|---------|----------------------------------|
1780+
| success | Generated results from the model |
1781+
| failure | Generation failed |
1782+
1783+
### Output Attributes
1784+
1785+
| Attribute | Relationship | Description |
1786+
|------------------------------|--------------|------------------------------------------------|
1787+
| llamacpp.time.to.first.token | success | Time to first token generated in milliseconds. |
1788+
| llamacpp.tokens.per.second | success | Tokens generated per second. |
1789+
1790+
17481791
## LogAttribute
17491792

17501793
### Description

0 commit comments

Comments
 (0)