Skip to content

v1.4.0

Compare
Choose a tag to compare
@eastriverlee eastriverlee released this 30 Jan 10:08
· 15 commits to main since this release

Highlights

  1. you can now override a new recovery function that is called when too long input that shouldn't be handled, func recoverFromLengthy(_ input: borrowing String, to output: borrowing AsyncStream<String>.Continuation).
open func recoverFromLengthy(_ input: borrowing String, to output:  borrowing AsyncStream<String>.Continuation) {
    output.yield("tl;dr")
}
  1. fixed potential bug of inferencing when it shouldn't. usually it won't cause any damage, because we are most likely going to set maxTokenCount lower than the actual limit of the model, but still. it used to be if statement now it is a while statement.
private func prepare(from input: borrowing String, to output: borrowing AsyncStream<String>.Continuation) -> Bool {
    ...
    if maxTokenCount <= currentCount {
        while !history.isEmpty && maxTokenCount <= currentCount {
            history.removeFirst(min(2, history.count))
            tokens = encode(preProcess(self.input, history))
            initialCount = tokens.count
            currentCount = Int32(initialCount)
        }
        if maxTokenCount <= currentCount {
            isFull = true
            recoverFromLengthy(input, to: output)
            return false
        }
    }
    ...
    return true
}
  1. i changed the order of HuggingFaceModel initializer parameter and its label in 94bcc54
//so now instead of:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", template: .chatML(systemPrompt), with: .Q2_K)

//you should do:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", .Q2_K, template: .chatML(systemPrompt))

this just makes more sense, so i had to change it.

Full Changelog: v1.3.0...v1.4.0