You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: PROCESSORS.md
+43Lines changed: 43 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,6 +65,7 @@ limitations under the License.
65
65
- [ListS3](#ListS3)
66
66
- [ListSFTP](#ListSFTP)
67
67
- [ListSmb](#ListSmb)
68
+
- [RunLlamaCppInference](#RunLlamaCppInference)
68
69
- [LogAttribute](#LogAttribute)
69
70
- [ManipulateArchive](#ManipulateArchive)
70
71
- [MergeContent](#MergeContent)
@@ -1745,6 +1746,48 @@ In the list below, the names of required properties appear in bold. Any other pr
1745
1746
| size | success | The size of the file in bytes. |
1746
1747
1747
1748
1749
+
## RunLlamaCppInference
1750
+
1751
+
### Description
1752
+
1753
+
LlamaCpp processor to use llama.cpp library for running language model inference. The inference will be based on the System Prompt and the Prompt property values, together with the content of the incoming flow file. In the Prompt, the content of the incoming flow file can be referred to as 'the input data' or 'the flow file content'.
1754
+
1755
+
### Properties
1756
+
1757
+
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
1758
+
1759
+
| Name | Default Value | Allowable Values | Description |
| **Model Path** | | | The filesystem path of the model file in gguf format. |
1762
+
| Temperature | 0.8 | | The temperature to use for sampling. |
1763
+
| Top K | 40 | | Limit the next token selection to the K most probable tokens. Set <= 0 value to use vocab size. |
1764
+
| Top P | 0.9 | | Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P. 1.0 = disabled. |
1765
+
| Min P | | | Sets a minimum base probability threshold for token selection. 0.0 = disabled. |
1766
+
| **Min Keep** | 0 | | If greater than 0, force samplers to return N possible tokens at minimum. |
1767
+
| **Text Context Size** | 4096 | | Size of the text context, use 0 to use size set in model. |
1768
+
| **Logical Maximum Batch Size** | 2048 | | Logical maximum batch size that can be submitted to the llama.cpp decode function. |
1769
+
| **Physical Maximum Batch Size** | 512 | | Physical maximum batch size. |
1770
+
| **Max Number Of Sequences** | 1 | | Maximum number of sequences (i.e. distinct states for recurrent models). |
1771
+
| **Threads For Generation** | 4 | | Number of threads to use for generation. |
1772
+
| **Threads For Batch Processing** | 4 | | Number of threads to use for batch processing. |
1773
+
| Prompt | | | The user prompt for the inference.<br/>**Supports Expression Language: true** |
1774
+
| System Prompt | You are a helpful assistant. You are given a question with some possible input data otherwise called flow file content. You are expected to generate a response based on the question and the input data. | | The system prompt for the inference. |
0 commit comments