This project provides a FastAPI-based server that acts as a proxy to dynamically download, load, and unload LoRA (Low-Rank Adaptation) adapters based on user requests.
- Dynamic LoRA Management: Load and unload LoRA adapters on demand.
- Proxy Server: Acts as a middleware to facilitate requests for different LoRA adapters.
-
Clone the repository:
git clone https://github.com/VijayRavichander/nano-LoRAX
-
Create a virtual environment and activate it:
wget -qO- https://astral.sh/uv/install.sh | sh source $HOME/.local/bin/env uv venv source .venv/bin/activate
-
Install dependencies:
uv pip install -r requirements.txt
Export Variables:
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
export HF_HUB_ENABLE_HF_TRANSFER=1Add your HF_TOKEN in .env.example and rename it to .env
Run the vllm server with:
nohup uv run vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 \
--max-model-len 8192 \
--enable-lora \
--max-lora-rank 128 \
--port 8000 > logs/vllm.log 2>&1 &After Starting Your vLLM server, Run the proxy server with:
nohup uv run python -m server > proxy.log 2>&1 &