Skip to content

Google Vertex Setup

Armin RAD edited this page Dec 16, 2025 · 1 revision

Google Vertex AI Setup

Enterprise-grade AI with Google Cloud Vertex AI integration


Overview

Gatewayz integrates with Google Vertex AI to provide access to:

  • Gemini Models - Chat completions with advanced capabilities
  • Image Generation - Stability Diffusion v1.5 on Vertex AI endpoints
  • Multimodal Support - Text, image, audio, video processing
  • Enterprise Features - High availability, security, compliance

Authentication: Application Default Credentials (ADC) with OAuth2 JWT fallback


Prerequisites

  1. Google Cloud Project with Vertex AI API enabled
  2. Service Account with roles/aiplatform.user permission
  3. Service Account Key (JSON format)
  4. Python 3.10+ with required dependencies

Quick Setup

1. Install Dependencies

pip install -r requirements.txt

This installs:

  • google-cloud-aiplatform>=1.38.0
  • google-auth>=2.0

2. Enable Vertex AI API

# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com \
  --project=YOUR_PROJECT_ID

# Verify API is enabled
gcloud services list --enabled --filter="NAME:aiplatform"

3. Create Service Account

# Create service account
gcloud iam service-accounts create gatewayz-vertex \
  --display-name="Gatewayz Vertex AI Service Account" \
  --project=YOUR_PROJECT_ID

# Grant Vertex AI User role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:gatewayz-vertex@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

# Create and download key
gcloud iam service-accounts keys create gatewayz-vertex-key.json \
  --iam-account=gatewayz-vertex@YOUR_PROJECT_ID.iam.gserviceaccount.com

4. Configure Environment Variables

Add to your .env file:

# Required
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1

# Authentication (choose one method)
# Option 1: File path (recommended for local development)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/gatewayz-vertex-key.json

# Option 2: Raw JSON (recommended for serverless/Railway/Vercel)
GOOGLE_VERTEX_CREDENTIALS_JSON='{"type":"service_account","project_id":"...","private_key":"...","client_email":"..."}'

# Optional: Transport mode (sdk, rest, auto)
GOOGLE_VERTEX_TRANSPORT=rest

Authentication Methods

Method 1: File Path (Local Development)

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_PROJECT_ID="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"

Method 2: Raw JSON (Serverless/Production)

# Set raw JSON credentials
export GOOGLE_VERTEX_CREDENTIALS_JSON='{"type":"service_account",...}'
export GOOGLE_PROJECT_ID="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"

For Railway/Vercel: Paste the entire JSON content as a single-line environment variable.

Method 3: Application Default Credentials (ADC)

# Authenticate with gcloud
gcloud auth application-default login

# Or use workload identity on GKE/GCE
# No additional configuration needed

Supported Models

Gemini Chat Models

Model Context Features
gemini-2.0-flash 1M tokens Latest, fastest, multimodal
gemini-1.5-pro 2M tokens Advanced reasoning
gemini-1.5-flash 1M tokens Fast, cost-effective
gemini-2.5-flash-lite 128K tokens Lightweight, ultra-fast

Image Generation Models

Model Size Backend
stable-diffusion-1.5 512x512, 768x768 Custom Vertex endpoint

Usage Examples

Chat Completions

import requests

url = "https://your-gateway.com/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_GATEWAYZ_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "gemini-2.0-flash",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Streaming Chat

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GATEWAYZ_API_KEY",
    base_url="https://your-gateway.com/v1"
)

stream = client.chat.completions.create(
    model="gemini-1.5-flash",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Image Generation

url = "https://your-gateway.com/v1/images/generations"
headers = {
    "Authorization": "Bearer YOUR_GATEWAYZ_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "prompt": "A serene mountain landscape at sunset",
    "model": "stable-diffusion-1.5",
    "size": "512x512",
    "n": 1,
    "provider": "google-vertex",
    "google_project_id": "your-project-id",
    "google_location": "us-central1",
    "google_endpoint_id": "6072619212881264640"
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()

# Save generated image
import base64
from pathlib import Path

if result['data'][0]['b64_json']:
    image_data = base64.b64decode(result['data'][0]['b64_json'])
    Path("generated_image.png").write_bytes(image_data)

Multimodal (Vision)

payload = {
    "model": "gemini-1.5-pro",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
}

response = requests.post(url, headers=headers, json=payload)

Transport Modes

REST Transport (Default)

Uses Google Vertex AI REST API directly:

GOOGLE_VERTEX_TRANSPORT=rest

Pros:

  • Works everywhere (Vercel, Railway, serverless)
  • No C++ runtime dependencies
  • Lightweight, fast startup

Cons:

  • Slightly higher latency (~10-50ms)

SDK Transport

Uses Google Cloud AI Platform SDK:

GOOGLE_VERTEX_TRANSPORT=sdk

Pros:

  • Native Google SDK features
  • Automatic retries and error handling

Cons:

  • Requires libstdc++.so.6 (may fail on some serverless platforms)
  • Heavier memory footprint

Auto Transport (Recommended)

Tries SDK first, falls back to REST:

GOOGLE_VERTEX_TRANSPORT=auto

Configuration

Environment Variables

Variable Required Default Description
GOOGLE_PROJECT_ID Yes - GCP project ID
GOOGLE_VERTEX_LOCATION Yes us-central1 GCP region
GOOGLE_APPLICATION_CREDENTIALS No* - Path to service account JSON
GOOGLE_VERTEX_CREDENTIALS_JSON No* - Raw service account JSON
GOOGLE_VERTEX_TRANSPORT No rest Transport mode (sdk/rest/auto)
GOOGLE_VERTEX_TIMEOUT No 120 Request timeout (seconds)
GOOGLE_VERTEX_ENDPOINT_ID No** - Custom endpoint ID for image gen

* At least one authentication method required ** Required only for image generation

Pricing

Vertex AI charges are separate from Gatewayz credits:

Model Prompt ($/1M tokens) Completion ($/1M tokens)
gemini-2.0-flash $0.075 $0.30
gemini-1.5-pro $1.25 $5.00
gemini-1.5-flash $0.075 $0.30
stable-diffusion-1.5 - 100 tokens/image

Note: Google bills separately through Google Cloud billing.


Troubleshooting

Authentication Errors

Error: Authentication failed or Permission denied

Solution:

  1. Verify service account has roles/aiplatform.user role:

    gcloud projects get-iam-policy YOUR_PROJECT_ID \
      --flatten="bindings[].members" \
      --filter="bindings.members:serviceAccount:gatewayz-vertex@*"
  2. Check credentials JSON is valid:

    cat service-account.json | python3 -m json.tool
  3. Test authentication:

    gcloud auth activate-service-account \
      --key-file=service-account.json

Endpoint Not Found

Error: Endpoint not found or Invalid endpoint ID

Solution:

  1. List deployed endpoints:

    gcloud ai endpoints list \
      --project=YOUR_PROJECT_ID \
      --region=us-central1
  2. Verify endpoint status is "DEPLOYED"

  3. Check endpoint ID matches configuration

SDK Import Errors

Error: ImportError: libstdc++.so.6: cannot open shared object file

Solution: Switch to REST transport:

export GOOGLE_VERTEX_TRANSPORT=rest

Or install required libraries:

# Debian/Ubuntu
sudo apt-get install libstdc++6

# Alpine (Docker)
apk add --no-cache libstdc++

Timeout Errors

Error: Request times out after 120 seconds

Solution:

  1. Increase timeout:

    export GOOGLE_VERTEX_TIMEOUT=300
  2. Use faster model (gemini-2.0-flash instead of gemini-1.5-pro)

  3. Reduce max_tokens in request

Invalid Instance Format (Image Generation)

Error: Invalid instance format or Prediction failed

Solution:

  1. Verify endpoint accepts the expected input format
  2. Check model deployment configuration in Vertex AI console
  3. Review endpoint prediction schema:
    gcloud ai endpoints describe YOUR_ENDPOINT_ID \
      --project=YOUR_PROJECT_ID \
      --region=us-central1

Advanced Configuration

Custom Model Parameters

payload = {
    "model": "gemini-1.5-pro",
    "messages": [...],
    "temperature": 0.9,
    "top_p": 0.95,
    "top_k": 40,
    "max_tokens": 2048,
    "safety_settings": [
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_MEDIUM_AND_ABOVE"
        }
    ]
}

Function Calling

payload = {
    "model": "gemini-1.5-pro",
    "messages": [{"role": "user", "content": "What's the weather in NYC?"}],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    },
                    "required": ["location"]
                }
            }
        }
    ]
}

Note: Function calling format transformation is in progress. Some features may not work yet.

Multiple Images (Batch Generation)

payload = {
    "prompt": "Various mountain landscapes",
    "provider": "google-vertex",
    "n": 4,
    "size": "512x512"
}

Cost: 100 tokens per image (400 tokens for 4 images)


Deployment

Railway

# Set environment variables in Railway dashboard
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_VERTEX_CREDENTIALS_JSON={"type":"service_account",...}
GOOGLE_VERTEX_TRANSPORT=rest

Vercel

# Add to Vercel environment variables
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_VERTEX_CREDENTIALS_JSON={"type":"service_account",...}
GOOGLE_VERTEX_TRANSPORT=rest

Docker

# Dockerfile
ENV GOOGLE_PROJECT_ID=your-project-id
ENV GOOGLE_VERTEX_LOCATION=us-central1
ENV GOOGLE_VERTEX_TRANSPORT=rest

# Copy service account key
COPY service-account.json /app/
ENV GOOGLE_APPLICATION_CREDENTIALS=/app/service-account.json

Performance Optimization

Connection Pooling

The Vertex AI client maintains internal connection pools. No additional configuration needed.

Token Caching

Access tokens are cached for 59 minutes to reduce OAuth2 overhead:

# Automatic caching - no configuration needed
# Tokens refreshed automatically when expired

Regional Endpoints

Use the closest region for lower latency:

# US
GOOGLE_VERTEX_LOCATION=us-central1

# Europe
GOOGLE_VERTEX_LOCATION=europe-west1

# Asia
GOOGLE_VERTEX_LOCATION=asia-northeast1

Monitoring

Check Provider Status

from src.services.google_vertex_client import diagnose_google_vertex_credentials

status = diagnose_google_vertex_credentials()
print(status)

Output:

{
  "credentials_available": true,
  "credential_source": "env_json",
  "project_id": "your-project-id",
  "location": "us-central1",
  "initialization_successful": true,
  "health_status": "healthy",
  "error": null,
  "steps": [...]
}

Logs

Check logs for authentication and request details:

# Look for initialization messages
✓ Successfully initialized Vertex AI for project: your-project-id

# Request logs
Making Google Vertex request for model: gemini-2.0-flash
Using model name: gemini-2.0-flash

Security Best Practices

  1. Never commit service account keys to version control
  2. Use environment variables for credentials
  3. Rotate service account keys regularly (every 90 days)
  4. Grant minimum permissions (roles/aiplatform.user only)
  5. Use separate service accounts for dev/staging/prod
  6. Enable audit logging for Vertex AI API calls
  7. Monitor usage in Google Cloud Console

Migration from SDK to ADC

Previous implementation used Google SDK with manual credential loading. New implementation uses Application Default Credentials (ADC) for better serverless compatibility.

Old:

credentials = get_google_vertex_credentials()
credentials.refresh(Request())

New:

# ADC automatically discovers credentials
initialize_vertex_ai()  # No explicit credentials needed

See: docs/integrations/GOOGLE_VERTEX_MIGRATION.md for full migration guide.


Related Documentation


Reference Links


Last Updated: December 2024 Status: Production Ready

Clone this wiki locally