v1.4.0
Highlights
- you can now override a new recovery function that is called when too long input that shouldn't be handled,
func recoverFromLengthy(_ input: borrowing String, to output: borrowing AsyncStream<String>.Continuation)
.
open func recoverFromLengthy(_ input: borrowing String, to output: borrowing AsyncStream<String>.Continuation) {
output.yield("tl;dr")
}
- fixed potential bug of inferencing when it shouldn't. usually it won't cause any damage, because we are most likely going to set
maxTokenCount
lower than the actual limit of the model, but still. it used to be if statement now it is awhile
statement.
private func prepare(from input: borrowing String, to output: borrowing AsyncStream<String>.Continuation) -> Bool {
...
if maxTokenCount <= currentCount {
while !history.isEmpty && maxTokenCount <= currentCount {
history.removeFirst(min(2, history.count))
tokens = encode(preProcess(self.input, history))
initialCount = tokens.count
currentCount = Int32(initialCount)
}
if maxTokenCount <= currentCount {
isFull = true
recoverFromLengthy(input, to: output)
return false
}
}
...
return true
}
- i changed the order of
HuggingFaceModel
initializer parameter and its label in 94bcc54
//so now instead of:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", template: .chatML(systemPrompt), with: .Q2_K)
//you should do:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", .Q2_K, template: .chatML(systemPrompt))
this just makes more sense, so i had to change it.
Full Changelog: v1.3.0...v1.4.0