Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Dynamic reloading config file #964

Closed
krrishdholakia opened this issue Nov 30, 2023 · 14 comments
Closed

[Feature]: Dynamic reloading config file #964

krrishdholakia opened this issue Nov 30, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request proxy

Comments

@krrishdholakia
Copy link
Contributor

The Feature

"I'm searching for a way to perform dynamic model and endpoint discovery, as it's crucial for our ecosystem. It would be beneficial if LiteLLM allowed the dynamic reloading of the config file, for example. That would help a lot."

Motivation, pitch

user request

Twitter / LinkedIn details

No response

@krrishdholakia
Copy link
Contributor Author

cc: @chanwit

@krrishdholakia
Copy link
Contributor Author

"Our environment uses Kubernetes, and LLM instances are served by the LM-controller. When the LM-controller starts a new LLM instance, we want it to be automatically registered with a proxy. To achieve this, we plan to write a Kubernetes informer that monitors the creation and deletion of LLM instances, captures their endpoint information, and sends it to register the endpoint with the proxy, in this case, LiteLLM.

For LiteLLM to support this dynamic endpoint discovery, it needs to implement endpoint registration and deregistration, such as POST and DELETE at /endpoints, allowing for on-the-fly alias and endpoint registration without restarting the proxy process. From a security standpoint, we would limit this endpoint to local access only, or possibly use a Unix socket."

@krrishdholakia
Copy link
Contributor Author

@chanwit just pushed a new /model/new endpoint - 72381c3

Screenshot 2023-12-01 at 8 36 52 PM Screenshot 2023-12-01 at 8 37 12 PM

@krrishdholakia
Copy link
Contributor Author

I'll update the thread once this is in prod.

@krrishdholakia krrishdholakia self-assigned this Dec 2, 2023
@krrishdholakia
Copy link
Contributor Author

in prod in v1.10.0

@chanwit
Copy link

chanwit commented Dec 4, 2023

Thank you for the ping @krrishdholakia

Here's my take on the acceptance criteria for this feature.

  • There is the POST /model/new endpoint in LiteLLM for registering a new LLM instance. The new LLM instance can be used right away without restarting the current LiteLLM server. This AC has been addressed by 72381c3
  • There is the DELETE /models/{:id} endpoint in LiteLLM for removing an existing LLM instance. The instance is removed from the server without restarting.

@krrishdholakia
Copy link
Contributor Author

@chanwit /model/delete endpoint added - 92b2cbc

Here's how to use it:

Step 1: Add a model with an id to your config.yaml

model_list: 
       - model_name: "openai-model"
          litellm_params:
             model: "gpt-3.5-turbo"
          model_info:
             id: "my-model-id"

Step 2: Start your proxy

$ litellm /path/to/config.yaml

Step 3: Make a POST /model/delete call to remove this model from your proxy

curl -X POST http://0.0.0.0:8000/model/delete \
-H 'Content-Type: application/json' \
-d '{"id": "my-model-id"}'

@krrishdholakia
Copy link
Contributor Author

Let me know if this looks good to you

Will update this ticket once the release is in prod.

@krrishdholakia
Copy link
Contributor Author

closing as this is now out

@ericblue
Copy link

ericblue commented Dec 9, 2023

Hi @krrishdholakia Sorry for the delayed response as I'm just testing this out now.

Can you give a valid example of the POST request for a new model? Following the Swagger example, the following model:

{
"model_name": "ollama-custom",
"litellm_params": {
"model": "ollama/llama2",
"api_base": "http://localhost:11434",
"host": "localhost",
"port": 8000
},
"model_info": {
"id": "my-test-model",
"mode": "embedding",
"input_cost_per_token": 0,
"output_cost_per_token": 0,
"max_tokens": 4096,
"base_model": "gpt-3.5-turbo"
}
}

Results in a server error:

{
"detail": "Internal Server Error: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:litellm.proxy._types.ModelInfo'\n in "config_1702158172.732068.yaml", line 7, column 15"
}

It looks like a file is initially created, but model_info is a serialized python object:

model_list:

  • litellm_params:
    api_base: http://localhost:11434
    host: localhost
    model: ollama/llama2
    port: 8000
    model_info: !!python/object:litellm.proxy._types.ModelInfo
    dict:
    base_model: gpt-3.5-turbo
    id: test
    input_cost_per_token: 0.0
    max_tokens: 4096
    mode: embedding
    output_cost_per_token: 0.0
    pydantic_extra:
    additionalProp1: {}
    pydantic_fields_set: !!set
    additionalProp1: null
    base_model: null
    id: null
    input_cost_per_token: null
    max_tokens: null
    mode: null
    output_cost_per_token: null
    pydantic_private: null
    model_name: ollama-custom

Interestingly enough litellm_params is persisted ok, and both are passed into yaml.dump the same way in the config object

@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Dec 9, 2023

that's concerning. Here's how i was testing it -

def test_add_new_model(client):

I had to skip it on circle ci due to ci/fastapi issues. Thanks for flagging this, i'll have it fixed today + add better ci testing.

@krrishdholakia krrishdholakia reopened this Dec 9, 2023
@krrishdholakia
Copy link
Contributor Author

fix pushed + testing added @ericblue

@krrishdholakia
Copy link
Contributor Author

will update this ticket once the fix is in prod.

@krrishdholakia
Copy link
Contributor Author

fix live v1.12.3+ onwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request proxy
Projects
None yet
Development

No branches or pull requests

3 participants