Skip to content

Releases: gotzmann/llama.go

v1.4: Server Mode

28 Apr 11:50
bf2bddd
Compare
Choose a tag to compare

Introducing a Server Mode enabling easy-to-use REST API for the inner GPT model. Let's go Production :)

Better Defaults

23 Apr 18:29
Compare
Choose a tag to compare

Nothing special, more stable inference and more sane default parameters

AVX2 and NEON

20 Apr 16:12
eea0850
Compare
Choose a tag to compare

Inference performance was boosted for CPUs supporting vector math.

Please use:

--neon flag for Apple Silicon (M1-M3 processors) and ARM servers

--avx for Intel and AMD CPUs which supports AVX2 instruction set

Big Models are OK

15 Apr 15:43
Compare
Choose a tag to compare

This version supports bigger / multipart LLaMA models (tested with 7B / 13B) converted into latest GGMJ binary format with custom Python script (see README).

April 12 - First Man in Space

12 Apr 16:10
Compare
Choose a tag to compare

The very first public release of LLaMA.go