Dev update (23.8.17.) #4

c0sogi · 2023-08-17T03:31:09Z

🚀 This PR introduces a series of improvements aimed at enhancing user experience and refining the codebase. Here's a breakdown of the changes:

🌟 1. Exllama Module - LoRA Integration

By placing adapter_config.json and adapter_model.bin in the ./models/gptq/YOUR_MODEL directory, the system will now seamlessly initialize LoRA.

🔗 2. OpenAI Logit Bias Support

For API queries to models specified within the openai_replacement_models dictionary, there's an auto-conversion from OpenAI ID to Llama ID,_ courtesy of the Tiktoken tokenizer.

⚖ 3. Optimized Worker Load Balancing

Workers within the process pool have undergone a revamp in their load balancing algorithm. Based on the computed worker_rank, they now allocate clients more efficiently. In scenarios where ranks tie, a random worker is selected.

📜 4. Enhanced Logging Mechanism

Expect crisper log messages henceforth. Additionally, both user prompts and response prompts stemming from Chat Completion and Text Completion operations are archived in logs/chat.log.

🔥 5. Docker Image Upgrades

The antecedent Docker image was reliant on the CPU version of llama.cpp, which can't use of CUDA acceleration. However, given the constraints in utilizing the CUDA Compiler during the build phase, JIT comes to the rescue to ensure automatic compilation.

c0sogi added 15 commits August 13, 2023 01:07

Added logit processors

668faeb

Added xformers

9a726b4

Test suite refactor

cfc18bf

Added persistent docker compose file

72d21f4

Support caching model path

05f6108

Fixed CUDA docker image build error

681bfae

Added chat logger

0775d11

Fixed bug: llama.cpp context tokens

778c0bd

Removed assertion: api key should start with "sk-"

fcb0d58

Improved worker load balancing

b85de0e

bump dependencies

6b2e37f

Implemented OpenAI compatible logit bias

0866583

Better error logger

fbb5d0a

lora support for exllama

7da7b60

update: docker image & readme

1f111ba

c0sogi merged commit 023fb40 into master Aug 17, 2023
14 checks passed

Provide feedback