CHANGELOG

v1.0.1

Optimize model conversion memory occupation
Optimize inference memory occupation
Increase prefill speed
Reduce initialization time
Improve quantization accuracy
Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
Add Server invocation
Add inference interruption interface
Add logprob and token_id to the return value

v1.0.0

Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
Compatible with Hugging Face model architectures
Currently supports the models Llama, Qwen, Qwen2, and Phi-2
Supports quantization with w8a8 and w4a16 precision