# 📓 model_garden_litellm_inference.ipynb  
**Deploy an Open-Source Model on Vertex AI and Serve Inference via LiteLLM with OpenAI-Compatible APIs (Chat & Function Calling)**


## 🧭 1. Overview

This notebook demonstrates how to:
- Deploy an open-source LLM (e.g. DeepSeek, LLaMA, Gemma) via [Vertex AI Model Garden](https://cloud.google.com/vertex-ai/docs/generative-ai/model-garden).
- Serve the model with a public Vertex AI endpoint.
- Connect it to [LiteLLM](https://docs.litellm.ai/) using an OpenAI-compatible schema for chat completion and function calling.


## 🔧 2. Setup

In [None]:
!pip install -q google-cloud-aiplatform litellm


In [None]:
from google.colab import auth
auth.authenticate_user()

import vertexai
from google.cloud import aiplatform

PROJECT_ID = "your-project-id"
REGION = "us-central1"

aiplatform.init(project=PROJECT_ID, location=REGION)
vertexai.init(project=PROJECT_ID, location=REGION)


## 🚀 3. Deploy OSS Model from Model Garden

### 3.1 Choose Model

You can browse available models in the [Model Garden UI](https://console.cloud.google.com/vertex-ai/generative/models).

Example: DeepSeek Coder 6.7B or LLaMA3-8B-Instruct


### 3.2 Deploy to Endpoint (Auto or Manual)

You can use the console to deploy a prebuilt model, or automate via SDK.

```python
DEPLOYED_MODEL_NAME = "deepseek-7b-instruct"
ENDPOINT_NAME = f"{DEPLOYED_MODEL_NAME}-endpoint"
```

Insert deployment logic as needed.


## 🔗 4. Set Up LiteLLM

In [None]:
import os

os.environ["LITELLM_MODEL"] = f"vertex_ai/openai/{ENDPOINT_NAME}"
os.environ["GOOGLE_PROJECT"] = PROJECT_ID
os.environ["GOOGLE_REGION"] = REGION


Optional: Save a `litellm_config.yaml` file for easier CLI access.

```yaml
model_list:
  - model_name: openai-compatible-model
    litellm_provider: vertex_ai
    model_info:
      endpoint_id: <your-endpoint-id>
      project: <your-project-id>
      region: <your-region>
```


## 💬 5. Chat Completion via LiteLLM

In [None]:
from litellm import completion

response = completion(
    model=f"vertex_ai/openai/{ENDPOINT_NAME}",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the capital of Japan?"}
    ]
)

print(response["choices"][0]["message"]["content"])


## ⚙️ 6. Function Calling via LiteLLM

In [None]:
response = completion(
    model=f"vertex_ai/openai/{ENDPOINT_NAME}",
    messages=[
        {"role": "user", "content": "What’s the weather in New York today?"}
    ],
    functions=[
        {
            "name": "get_weather",
            "description": "Get weather information",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    ]
)

print(response["choices"][0])


## ✅ 7. Summary

In this notebook, you:
- Deployed an open-source model using Vertex AI Model Garden
- Exposed it via a Vertex AI endpoint
- Used LiteLLM to route OpenAI-style chat and function calling traffic to your model
