This colab will setup a ray cluster for ServerlessLLM, following the [quickstart](https://serverlessllm.github.io/docs/stable/getting_started/quickstart).


# Installation
Run the following code to first install ServerlessLLM and its dependencies:

In [None]:
%%capture
#This will take a few minutes
#In a production environment Conda should be used instead
!git clone https://github.com/ServerlessLLM/ServerlessLLM
%cd ServerlessLLM
!pip install virtualenv
!virtualenv sllm; source sllm/bin/activate; pip install -e .; pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store; pip install -U torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
!virtualenv sllm-worker; source sllm-worker/bin/activate; pip install -e ".[worker]"; pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store; pip install -U torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121

# Nodes
Now run the following block to start a ray cluster (note there are less workers due to Colab restrictions), the store server and the main server:


In [None]:
!source sllm/bin/activate; nohup ray start --head --port=6379 --num-cpus=2 --num-gpus=0 --resources='{"control_node": 1}' --block &
!sleep 15
!source sllm-worker/bin/activate; nohup ray start --address=0.0.0.0:6379 --num-cpus=2 --num-gpus=1 --resources='{"worker_node": 1, "worker_id_0": 1}' --block &
!sleep 15
!source sllm-worker/bin/activate; nohup sllm-store-server &
!sleep 10
!source sllm/bin/activate; nohup sllm-serve start &

# Inference
Finally, we can deploy a model, and call the endpoint with an appropriate query:

In [None]:
!source sllm/bin/activate; sllm-cli deploy --model facebook/opt-1.3b --backend transformers

In [None]:
import requests
r = requests.post(r"http://localhost:8343/v1/chat/completions",
  headers = {
      "Content-Type": "application/json"
  },
  json = {
      "model": "facebook/opt-1.3b",
      "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is your name?"}
      ]
  }
)
print(r.json()["choices"][0]["message"]["content"])