
# Private Endpoints

[Vertex AI Endpoints](https://cloud.google.com/vertex-ai/docs/general/deployment) are scalable resources for online hosting of models.  Models registered in Vertex AI Model Registry can be deployed to an endpoint.  Endpoints are configured by location and deployed models are configurable and with a wide variety of [compute configurations](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute) (machine types and GPUs).  When deploying a model to an endpoint the base level environment is configured with `minReplicaCount` and scaling is configured by choosing a larger value for `maxReplicaCount` to trigger [scaling behavior](https://cloud.google.com/vertex-ai/docs/general/deployment#scaling).

Standard or Private?  Vertex AI offeres both standard and private endpoints.  This is more a statement of the exposure of the endpoint, both require authentication with a user/service that has appropriate IAM permissions.  [Private endpoints](https://cloud.google.com/vertex-ai/docs/predictions/using-private-endpoints) directly peer a project to the Vertex AI Prediction service hosting the underlying VMs which eliminates additional hops in network traffic and allows using efficient gRPC protocol = lower latency!  With either type of endpoint you can also use [customer-managed encryption keys with endpoints](https://cloud.google.com/vertex-ai/docs/general/cmek#resource-list) to encrypt model files used on the VMs.

The [Vertex AI API](../Tips/aiplatform_notes.md) can be used to request predictions from any Vertex AI endpoint.  The user or [service](https://cloud.google.com/vertex-ai/docs/general/access-control#about_service_accounts_and_service_agents) that is authenticating and requesting the prediction will need the appropriate [IAM roles/permissions](https://cloud.google.com/vertex-ai/docs/general/access-control) to make this request - note the permission `aiplatform.endpoint.predict`.

The choice of a private endpoint has several considerations:
- Standard endpoints have traffic splits for hosting multiple models.  Private endpoints do not have traffic splitting.  To accomplish traffic splitting with private endpoints use multiple private endpoints and split traffic between them.
- Standard endpoints have a [1.5Mb limit for prediction request]((https://cloud.google.com/vertex-ai/docs/general/deployment)) while private endpoints do not have this limit.
- Additional [differences for private endpoints](https://cloud.google.com/vertex-ai/docs/predictions/using-private-endpoints#limitations)


**Prerequisites:**
- This notebook uses the model trained and registered by notebook [05a - Vertex AI Custom Model - TensorFlow - Custom Job With Python File.ipynb](../05%20-%20TensorFlow/05a%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Python%20File.ipynb)

---
## Setup

inputs:

In [5]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [46]:
REGION = 'us-central1'
SERIES = 'dev'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [15]:
from google.cloud import aiplatform
from google.cloud import bigquery

clients:

In [16]:
aiplatform.init(project = PROJECT_ID, location = REGION)
bq = bigquery.Client(project = PROJECT_ID)

Enable APIs (if not already) for: 
- Service Networking (needed for createing compute addresses)
- DNS (needed for deployment of model to endpoint

In [35]:
!gcloud services enable servicenetworking.googleapis.com
!gcloud services enable dns.googleapis.com

Operation "operations/acat.p2-1026793852137-862c3e2c-14c2-4a46-a053-d405df4d3291" finished successfully.


---
## Data Sample For Prediction Request

In [47]:
n = 10
pred = bq.query(
    query = f"""
        SELECT * EXCEPT({VAR_TARGET}, {VAR_OMIT}, splits)
        FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}
        WHERE splits='TEST'
        LIMIT {n}
        """
).to_dataframe()

In [48]:
newobs = pred.to_dict(orient = 'records')

---
## Setup VPC Network Peering For Private Endpoints

[Setting up VPC network peering](https://cloud.google.com/vertex-ai/docs/general/vpc-peering) for  the network named `default`to Vertex AI.

In [1]:
NETWORK_NAME = 'default'

List peering connections:

In [3]:
!gcloud compute networks peerings list --network $NETWORK_NAME

Listed 0 items.


Set a reserved range:

In [4]:
!gcloud compute addresses create vertex-ai-peering-range \
    --global \
    --prefix-length=16 \
    --description="peering range for Google service" \
    --network=$NETWORK_NAME \
    --purpose=VPC_PEERING

Created [https://www.googleapis.com/compute/v1/projects/statmike-mlops-349915/global/addresses/vertex-ai-peering-range].


Establish a peering connection between VPC host project (this one) and Google's Service Networking:

In [8]:
!gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --network=$NETWORK_NAME \
    --ranges=vertex-ai-peering-range \
    --project=$PROJECT_ID

Operation "operations/pssn.p24-1026793852137-5fca5f21-7b10-472f-a2de-572bc616ae68" finished successfully.


List peering connections:

In [9]:
!gcloud compute networks peerings list --network $NETWORK_NAME

NAME                              NETWORK  PEER_PROJECT           PEER_NETWORK       STACK_TYPE  PEER_MTU  IMPORT_CUSTOM_ROUTES  EXPORT_CUSTOM_ROUTES  STATE   STATE_DETAILS
servicenetworking-googleapis-com  default  fae5c370cd2ce92a1p-tp  servicenetworking  IPV4_ONLY             False                 False                 ACTIVE  [2023-06-22T09:57:26.212-07:00]: Connected.


---
## Example Workflow: Deploy Model to Private Endpoint and Request Prediction

### Get Model From Vertex AI Model Registry

Getting the model created by the notebook [05a - Vertex AI Custom Model - TensorFlow - Custom Job With Python File.ipynb](../05%20-%20TensorFlow/05a%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Python%20File.ipynb).

Reference:
- [aiplatform.Model()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model)
- Note that `model_name` is either the model id, not the display name.

In [17]:
model = aiplatform.Model(model_name = 'model_05_05a')

In [26]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_05_05a'

### Create Private Endpoint

Reference:
- [aiplatform.PrivateEndpoint.create()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PrivateEndpoint#google_cloud_aiplatform_PrivateEndpoint_create)
- Note that `network` is a full name where the project is represented by the project number rather than name. This can be retrieved from the model resource name.

In [38]:
project_number = model.resource_name.split('/')[1]

In [39]:
endpoint = aiplatform.PrivateEndpoint.create(
    display_name = f'{SERIES}',
    network = f'projects/{project_number}/global/networks/{NETWORK_NAME}'
)

Creating PrivateEndpoint
Create PrivateEndpoint backing LRO: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096/operations/2394302224361586688
PrivateEndpoint created. Resource name: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096
To use this PrivateEndpoint in another session:
endpoint = aiplatform.PrivateEndpoint('projects/1026793852137/locations/us-central1/endpoints/4573084364199428096')


In [40]:
endpoint.network

'projects/1026793852137/global/networks/default'

In [41]:
endpoint.name, endpoint.display_name

('4573084364199428096', 'dev')

### Deploy Model To Endpoint

**Note: This takes 15+ minutes to complete**

Reference:
- [aiplatform.PrivateEndpoint.deploy()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PrivateEndpoint#google_cloud_aiplatform_PrivateEndpoint_deploy)

In [42]:
endpoint.deploy(
    model = model,
    deployed_model_display_name = model.display_name,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

Deploying Model projects/1026793852137/locations/us-central1/models/model_05_05a to PrivateEndpoint : projects/1026793852137/locations/us-central1/endpoints/4573084364199428096
Deploy PrivateEndpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096/operations/7384290611488096256
PrivateEndpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096


### Get Prediction

Reference:
- [aiplatform.PrivateEndpoint.predict()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PrivateEndpoint#google_cloud_aiplatform_PrivateEndpoint_predict)

In [49]:
endpoint.predict(instances = newobs[0:1])

Prediction(predictions=[[0.999771655, 0.000228400182]], deployed_model_id='8093340165314969600', model_version_id=None, model_resource_name=None, explanations=None)

---
## Example Workflow: Request Prediction From Existing Endpoint

When the private endpoint already exists and has a deployed model the workflow is:


### Get Private Endpoint

Reference:
- [aiplatform.PrivateEndpoint.list()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PrivateEndpoint#google_cloud_aiplatform_PrivateEndpoint_list)
- To directly use `aiplatform.PrivateEndpoint` with `endpoint_name = ` which is of the form `projects/<project number>/locations/<region>/endpoints/<endpoint_id>` where `endpoint_id` is the assigned name, not display name, of the endpoint.
- Rather than needing the exact `endpoint_id`, the method used below is to list all Private Endpoints with the known `display_name` as a filter.  Then, if found, take the first match.

In [54]:
endpoints = aiplatform.PrivateEndpoint.list(filter = f'display_name={SERIES}')

In [55]:
if endpoints: endpoint = endpoints[0]

### Get Prediction

Reference:
- [aiplatform.PrivateEndpoint.predict()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PrivateEndpoint#google_cloud_aiplatform_PrivateEndpoint_predict)

In [56]:
endpoint.predict(instances = newobs[0:1])

Prediction(predictions=[[0.999771655, 0.000228400182]], deployed_model_id='8093340165314969600', model_version_id=None, model_resource_name=None, explanations=None)

---
## Cleanup

In [57]:
# remove endpoint
endpoint.delete(force = True)

Undeploying PrivateEndpoint model: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096
Undeploy PrivateEndpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096/operations/6806703959277830144
PrivateEndpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096
Deleting PrivateEndpoint : projects/1026793852137/locations/us-central1/endpoints/4573084364199428096
Delete PrivateEndpoint  backing LRO: projects/1026793852137/locations/us-central1/operations/8916640384700907520
PrivateEndpoint deleted. . Resource name: projects/1026793852137/locations/us-central1/endpoints/4573084364199428096
