# Build Customer Serving Containers that does not contain Model Artifact

This notebooks demonstrates building a serving container that leverages Vertex AI's AIP_STORAGE_URI environment variable, where the container downloads models directly from Cloud Storage at startup, keeping the image lightweight and adaptable.

The pipeline developed in [02-model_pipeline.ipynb](./02-model_pipeline.ipynb) uses these assets to register a model to Model Registry by specifying the `custom serving image` along with the `artifact_uri` parameters.

![serving-container.png](./diagrams/serving-container.png)

Documenation for [Accessing Model Artifacts](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#artifacts)

## Build custom Serving container

Users define image parameters of a custom serving container based on a Python-based FastAPI application (main.py) and Dockerfile. It then runs through the steps to build the custom container image and push it to Google Cloud Artifact Registry

- `PROJECT_ID`: Your Google Cloud Project ID.
- `REGION`: The Google Cloud region where your resources will be deployed (e.g., Vertex AI, Artifact Registry).
- `REPO_NAME`: Name of the Artifact Registry repository to store the custom serving container image.
- `JOB_IMAGE_ID`: Name of the Docker image for the custom serving container.
- `VERSION`: Version or tag of the Docker image. Default set as latest.
- `model_file`: The filename of the serialized model artifact (e.g., .pkl) to be loaded from Cloud Storage.
- `BUCKET_URI`: The Google Cloud Storage URI where model artifacts are located.
- `SERVICE_ACCOUNT`: The Google Cloud service account used for permissions during Vertex AI operations.


In [None]:
import os

In [None]:
# Image Parameters
PROJECT_ID = "sandbox-401718"  # @param {type:"string"}
REGION = "us-central1" # @param {type:"string"}
VERSION="latest" 
REPO_NAME="housing-poc" # @param {type:"string"}
JOB_IMAGE_ID="housing-poc-image" # @param {type:"string"}

# Cloud Storage 
model_file = "house_price_model.pkl"
BUCKET_URI=f"gs://{PROJECT_ID}-pred-benchmark/housing_models" # e.g., where house_price_model.pkl is

# Vertex Custom Job parameters
SERVICE_ACCOUNT="757654702990-compute@developer.gserviceaccount.com" # @param {type:"string"}

In [None]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

In [None]:
%%writefile main.py

## Example serving application


import os
from typing import List, Dict, Any  # Keep these

import joblib
from fastapi import FastAPI, HTTPException
import pandas as pd
from google.cloud import storage
import logging
from pydantic import BaseModel

# Load your model

model = None
# InputType = Annotated[DataFrame[DataFrameModel], WithJsonSchema(DataFrameModel.to_json_schema())]
# OutputType = Annotated[DataFrame[DataFrameModel], WithJsonSchema(DataFrameModel.to_json_schema())]

app = FastAPI()

class PredictionPayload(BaseModel):
    instances: List[Dict[str, Any]]

######## LOAD MODEL FROM GCS (Model artifact outside of container) #########
AIP_STORAGE_URI = os.environ.get("AIP_STORAGE_URI")
if AIP_STORAGE_URI:
    try:
        bucket_name = AIP_STORAGE_URI.split("/")[2]
        blob_name = "/".join(AIP_STORAGE_URI.split("/")[3:]) + "/house_price_model.pkl"
        client = storage.Client()
        bucket = client.bucket(bucket_name)
        blob = bucket.blob(blob_name)
        blob.download_to_filename("house_price_model.pkl")
        with open("house_price_model.pkl", "rb") as f:
            model = joblib.load(f)
            print("model loaded!!!!")
    except Exception as e:
        logging.error(f"Error loading model: {e}", exc_info=True)
        model = None
else:
    logging.warning("AIP_STORAGE_URI not set. Model will not be loaded from GCS.")


@app.get("/health")
def health() -> dict[str, str]:
    if model is None:
        return {"STATUS": "ERROR", "MESSAGE": "Model not loaded"}
    return {"STATUS": "OK"}

@app.post("/predict/")
def predict(
    payload: PredictionPayload,
) -> Dict[str, List[Any]]:
    if model is None:
        raise HTTPException(
            status_code=503, detail="Model is not available or failed to load."
        )
        
    instances_data = payload.instances
    pandas_df = pd.DataFrame(instances_data)

    raw_predictions_numpy = model.predict(pandas_df)
    predictions_list = raw_predictions_numpy.tolist()

    return {"predictions": predictions_list}


In [None]:
%%writefile Dockerfile

FROM python:3.10-slim

COPY ./requirements.txt /app/requirements.txt
COPY ./main.py /app/main.py
WORKDIR ./app

RUN apt-get update && apt-get install gcc libffi-dev -y

RUN pip install -r requirements.txt

EXPOSE 8080

CMD ["uvicorn", "--host", "0.0.0.0", "--port", "8080", "main:app"]

In [None]:
# Build and push image to reigstry
! docker build -f ./Dockerfile -t {REGION}-docker.pkg.dev/{PROJECT_ID}/{REPO_NAME}/{JOB_IMAGE_ID}:{VERSION} .
! gcloud auth configure-docker us-central1-docker.pkg.dev --quiet
! docker push {REGION}-docker.pkg.dev/{PROJECT_ID}/{REPO_NAME}/{JOB_IMAGE_ID}:{VERSION}

### Test Estimator locally

Before deploying to Vertex AI, users can validate a custom serving container locally. You run the Docker image of the FastAPI application, setting up environment variables to simulate the Vertex AI prediction environment, and testing the `/predict` and `/health` endpoints with example inference requests.

To access the deployed application from your web browser, navigate to the External IP address of your Google Cloud Compute Engine VM <br>
Reference: https://cloud.google.com/compute/docs/ip-addresses#externaladdresses

Example payload: <br>
```{ "instances": [  {  "LotFrontage": 70.0,  "LotArea": 9.03777111,  "HouseAge": 61.0,  "SaleType": 8.0  } ] }```

![inference.png](./diagrams/inference.png)



<br>
`image_id`: ID of the local docker image

In [None]:
image_id = "c13a5c977f3d" # @param {type:"string"}

In [None]:
gcloud_config_path = os.path.expanduser("~/.config/gcloud") # For GCS authentication

In [6]:
! docker run \
  -p 8181:8080 \
  -e AIP_HEALTH_ROUTE="/health" \
  -e AIP_PREDICT_ROUTE="/predict" \
  -e AIP_STORAGE_URI="{BUCKET_URI}" \
  -v {gcloud_config_path}:/root/.config/gcloud:ro \
  --rm \
  "$image_id"                  


KeyboardInterrupt



## Register Model (optional)

Example code to show how to register your custom serving container alongside your separately stored model artifact within the Vertex AI Model Registry.

In [None]:
# vertex_model = aiplatform.Model.upload(
#         display_name="housing_model",
#         artifact_uri=f"{BUCKET_URI}",
#         serving_container_image_uri=f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPO_NAME}/{JOB_IMAGE_ID}:{VERSION}",
#         serving_container_predict_route = "/predict",
#         serving_container_health_route = "/health",
#         serving_container_ports=[8080]
#     )