Highlights

you can now override a new recovery function that is called when too long input that shouldn't be handled, func recoverFromLengthy(_ input: borrowing String, to output: borrowing AsyncStream<String>.Continuation).

open func recoverFromLengthy(_ input: borrowing String, to output:  borrowing AsyncStream<String>.Continuation) {
    output.yield("tl;dr")
}

fixed potential bug of inferencing when it shouldn't. usually it won't cause any damage, because we are most likely going to set maxTokenCount lower than the actual limit of the model, but still. it used to be if statement now it is a while statement.

private func prepare(from input: borrowing String, to output: borrowing AsyncStream<String>.Continuation) -> Bool {
    ...
    if maxTokenCount <= currentCount {
        while !history.isEmpty && maxTokenCount <= currentCount {
            history.removeFirst(min(2, history.count))
            tokens = encode(preProcess(self.input, history))
            initialCount = tokens.count
            currentCount = Int32(initialCount)
        }
        if maxTokenCount <= currentCount {
            isFull = true
            recoverFromLengthy(input, to: output)
            return false
        }
    }
    ...
    return true
}

i changed the order of HuggingFaceModel initializer parameter and its label in 94bcc54

//so now instead of:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", template: .chatML(systemPrompt), with: .Q2_K)

//you should do:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", .Q2_K, template: .chatML(systemPrompt))

this just makes more sense, so i had to change it.

Full Changelog: v1.3.0...v1.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.4.0

Highlights