Google Vertex Setup

Google Vertex AI Setup

Enterprise-grade AI with Google Cloud Vertex AI integration

Overview

Gatewayz integrates with Google Vertex AI to provide access to:

Gemini Models - Chat completions with advanced capabilities
Image Generation - Stability Diffusion v1.5 on Vertex AI endpoints
Multimodal Support - Text, image, audio, video processing
Enterprise Features - High availability, security, compliance

Authentication: Application Default Credentials (ADC) with OAuth2 JWT fallback

Prerequisites

Google Cloud Project with Vertex AI API enabled
Service Account with roles/aiplatform.user permission
Service Account Key (JSON format)
Python 3.10+ with required dependencies

Quick Setup

1. Install Dependencies

pip install -r requirements.txt

This installs:

google-cloud-aiplatform>=1.38.0
google-auth>=2.0

2. Enable Vertex AI API

# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com \
  --project=YOUR_PROJECT_ID

# Verify API is enabled
gcloud services list --enabled --filter="NAME:aiplatform"

3. Create Service Account

# Create service account
gcloud iam service-accounts create gatewayz-vertex \
  --display-name="Gatewayz Vertex AI Service Account" \
  --project=YOUR_PROJECT_ID

# Grant Vertex AI User role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:gatewayz-vertex@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

# Create and download key
gcloud iam service-accounts keys create gatewayz-vertex-key.json \
  --iam-account=gatewayz-vertex@YOUR_PROJECT_ID.iam.gserviceaccount.com

4. Configure Environment Variables

Add to your .env file:

# Required
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1

# Authentication (choose one method)
# Option 1: File path (recommended for local development)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/gatewayz-vertex-key.json

# Option 2: Raw JSON (recommended for serverless/Railway/Vercel)
GOOGLE_VERTEX_CREDENTIALS_JSON='{"type":"service_account","project_id":"...","private_key":"...","client_email":"..."}'

# Optional: Transport mode (sdk, rest, auto)
GOOGLE_VERTEX_TRANSPORT=rest

Authentication Methods

Method 1: File Path (Local Development)

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_PROJECT_ID="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"

Method 2: Raw JSON (Serverless/Production)

# Set raw JSON credentials
export GOOGLE_VERTEX_CREDENTIALS_JSON='{"type":"service_account",...}'
export GOOGLE_PROJECT_ID="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"

For Railway/Vercel: Paste the entire JSON content as a single-line environment variable.

Method 3: Application Default Credentials (ADC)

# Authenticate with gcloud
gcloud auth application-default login

# Or use workload identity on GKE/GCE
# No additional configuration needed

Supported Models

Gemini Chat Models

Model	Context	Features
gemini-2.0-flash	1M tokens	Latest, fastest, multimodal
gemini-1.5-pro	2M tokens	Advanced reasoning
gemini-1.5-flash	1M tokens	Fast, cost-effective
gemini-2.5-flash-lite	128K tokens	Lightweight, ultra-fast

Image Generation Models

Model	Size	Backend
stable-diffusion-1.5	512x512, 768x768	Custom Vertex endpoint

Usage Examples

Chat Completions

import requests

url = "https://your-gateway.com/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_GATEWAYZ_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "gemini-2.0-flash",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Streaming Chat

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GATEWAYZ_API_KEY",
    base_url="https://your-gateway.com/v1"
)

stream = client.chat.completions.create(
    model="gemini-1.5-flash",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Image Generation

url = "https://your-gateway.com/v1/images/generations"
headers = {
    "Authorization": "Bearer YOUR_GATEWAYZ_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "prompt": "A serene mountain landscape at sunset",
    "model": "stable-diffusion-1.5",
    "size": "512x512",
    "n": 1,
    "provider": "google-vertex",
    "google_project_id": "your-project-id",
    "google_location": "us-central1",
    "google_endpoint_id": "6072619212881264640"
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()

# Save generated image
import base64
from pathlib import Path

if result['data'][0]['b64_json']:
    image_data = base64.b64decode(result['data'][0]['b64_json'])
    Path("generated_image.png").write_bytes(image_data)

Multimodal (Vision)

payload = {
    "model": "gemini-1.5-pro",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
}

response = requests.post(url, headers=headers, json=payload)

Transport Modes

REST Transport (Default)

Uses Google Vertex AI REST API directly:

GOOGLE_VERTEX_TRANSPORT=rest

Pros:

Works everywhere (Vercel, Railway, serverless)
No C++ runtime dependencies
Lightweight, fast startup

Cons:

Slightly higher latency (~10-50ms)

SDK Transport

Uses Google Cloud AI Platform SDK:

GOOGLE_VERTEX_TRANSPORT=sdk

Pros:

Native Google SDK features
Automatic retries and error handling

Cons:

Requires libstdc++.so.6 (may fail on some serverless platforms)
Heavier memory footprint

Auto Transport (Recommended)

Tries SDK first, falls back to REST:

GOOGLE_VERTEX_TRANSPORT=auto

Configuration

Environment Variables

Variable	Required	Default	Description
`GOOGLE_PROJECT_ID`	Yes	-	GCP project ID
`GOOGLE_VERTEX_LOCATION`	Yes	`us-central1`	GCP region
`GOOGLE_APPLICATION_CREDENTIALS`	No*	-	Path to service account JSON
`GOOGLE_VERTEX_CREDENTIALS_JSON`	No*	-	Raw service account JSON
`GOOGLE_VERTEX_TRANSPORT`	No	`rest`	Transport mode (sdk/rest/auto)
`GOOGLE_VERTEX_TIMEOUT`	No	`120`	Request timeout (seconds)
`GOOGLE_VERTEX_ENDPOINT_ID`	No**	-	Custom endpoint ID for image gen

* At least one authentication method required ** Required only for image generation

Pricing

Vertex AI charges are separate from Gatewayz credits:

Model	Prompt ($/1M tokens)	Completion ($/1M tokens)
gemini-2.0-flash	$0.075	$0.30
gemini-1.5-pro	$1.25	$5.00
gemini-1.5-flash	$0.075	$0.30
stable-diffusion-1.5	-	100 tokens/image

Note: Google bills separately through Google Cloud billing.

Troubleshooting

Authentication Errors

Error: Authentication failed or Permission denied

Solution:

Verify service account has roles/aiplatform.user role:

gcloud projects get-iam-policy YOUR_PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:gatewayz-vertex@*"

Check credentials JSON is valid:

cat service-account.json | python3 -m json.tool

Test authentication:

gcloud auth activate-service-account \
  --key-file=service-account.json

Endpoint Not Found

Error: Endpoint not found or Invalid endpoint ID

Solution:

List deployed endpoints:

gcloud ai endpoints list \
  --project=YOUR_PROJECT_ID \
  --region=us-central1

Verify endpoint status is "DEPLOYED"
Check endpoint ID matches configuration

SDK Import Errors

Error: ImportError: libstdc++.so.6: cannot open shared object file

Solution: Switch to REST transport:

export GOOGLE_VERTEX_TRANSPORT=rest

Or install required libraries:

# Debian/Ubuntu
sudo apt-get install libstdc++6

# Alpine (Docker)
apk add --no-cache libstdc++

Timeout Errors

Error: Request times out after 120 seconds

Solution:

Increase timeout:
```
export GOOGLE_VERTEX_TIMEOUT=300
```
Use faster model (gemini-2.0-flash instead of gemini-1.5-pro)
Reduce max_tokens in request

Invalid Instance Format (Image Generation)

Error: Invalid instance format or Prediction failed

Solution:

Verify endpoint accepts the expected input format
Check model deployment configuration in Vertex AI console

Review endpoint prediction schema:

gcloud ai endpoints describe YOUR_ENDPOINT_ID \
  --project=YOUR_PROJECT_ID \
  --region=us-central1

Advanced Configuration

Custom Model Parameters

payload = {
    "model": "gemini-1.5-pro",
    "messages": [...],
    "temperature": 0.9,
    "top_p": 0.95,
    "top_k": 40,
    "max_tokens": 2048,
    "safety_settings": [
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_MEDIUM_AND_ABOVE"
        }
    ]
}

Function Calling

payload = {
    "model": "gemini-1.5-pro",
    "messages": [{"role": "user", "content": "What's the weather in NYC?"}],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    },
                    "required": ["location"]
                }
            }
        }
    ]
}

Note: Function calling format transformation is in progress. Some features may not work yet.

Multiple Images (Batch Generation)

payload = {
    "prompt": "Various mountain landscapes",
    "provider": "google-vertex",
    "n": 4,
    "size": "512x512"
}

Cost: 100 tokens per image (400 tokens for 4 images)

Deployment

Railway

# Set environment variables in Railway dashboard
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_VERTEX_CREDENTIALS_JSON={"type":"service_account",...}
GOOGLE_VERTEX_TRANSPORT=rest

Vercel

# Add to Vercel environment variables
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_VERTEX_CREDENTIALS_JSON={"type":"service_account",...}
GOOGLE_VERTEX_TRANSPORT=rest

Docker

# Dockerfile
ENV GOOGLE_PROJECT_ID=your-project-id
ENV GOOGLE_VERTEX_LOCATION=us-central1
ENV GOOGLE_VERTEX_TRANSPORT=rest

# Copy service account key
COPY service-account.json /app/
ENV GOOGLE_APPLICATION_CREDENTIALS=/app/service-account.json

Performance Optimization

Connection Pooling

The Vertex AI client maintains internal connection pools. No additional configuration needed.

Token Caching

Access tokens are cached for 59 minutes to reduce OAuth2 overhead:

# Automatic caching - no configuration needed
# Tokens refreshed automatically when expired

Regional Endpoints

Use the closest region for lower latency:

# US
GOOGLE_VERTEX_LOCATION=us-central1

# Europe
GOOGLE_VERTEX_LOCATION=europe-west1

# Asia
GOOGLE_VERTEX_LOCATION=asia-northeast1

Monitoring

Check Provider Status

from src.services.google_vertex_client import diagnose_google_vertex_credentials

status = diagnose_google_vertex_credentials()
print(status)

Output:

{
  "credentials_available": true,
  "credential_source": "env_json",
  "project_id": "your-project-id",
  "location": "us-central1",
  "initialization_successful": true,
  "health_status": "healthy",
  "error": null,
  "steps": [...]
}

Logs

Check logs for authentication and request details:

# Look for initialization messages
✓ Successfully initialized Vertex AI for project: your-project-id

# Request logs
Making Google Vertex request for model: gemini-2.0-flash
Using model name: gemini-2.0-flash

Security Best Practices

Never commit service account keys to version control
Use environment variables for credentials
Rotate service account keys regularly (every 90 days)
Grant minimum permissions (roles/aiplatform.user only)
Use separate service accounts for dev/staging/prod
Enable audit logging for Vertex AI API calls
Monitor usage in Google Cloud Console

Migration from SDK to ADC

Previous implementation used Google SDK with manual credential loading. New implementation uses Application Default Credentials (ADC) for better serverless compatibility.

Old:

credentials = get_google_vertex_credentials()
credentials.refresh(Request())

New:

# ADC automatically discovers credentials
initialize_vertex_ai()  # No explicit credentials needed

See: docs/integrations/GOOGLE_VERTEX_MIGRATION.md for full migration guide.

Reference Links

Last Updated: December 2024 Status: Production Ready

Home

Reading Path (start here, in order)

Testing

Security & Access

Billing

Monitoring

Features

Providers

Operations

Data References

Google Vertex Setup

Google Vertex AI Setup

Overview

Prerequisites

Quick Setup

1. Install Dependencies

2. Enable Vertex AI API

3. Create Service Account

4. Configure Environment Variables

Authentication Methods

Method 1: File Path (Local Development)

Method 2: Raw JSON (Serverless/Production)

Method 3: Application Default Credentials (ADC)

Supported Models

Gemini Chat Models

Image Generation Models

Usage Examples

Chat Completions

Streaming Chat

Image Generation

Multimodal (Vision)

Transport Modes

REST Transport (Default)

SDK Transport

Auto Transport (Recommended)

Configuration

Environment Variables

Pricing

Troubleshooting

Authentication Errors

Endpoint Not Found

SDK Import Errors

Timeout Errors

Invalid Instance Format (Image Generation)

Advanced Configuration

Custom Model Parameters

Function Calling

Multiple Images (Batch Generation)

Deployment

Railway

Vercel

Docker

Performance Optimization

Connection Pooling

Token Caching

Regional Endpoints

Monitoring

Check Provider Status

Logs

Security Best Practices

Migration from SDK to ADC

Related Documentation

Reference Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!