-
Notifications
You must be signed in to change notification settings - Fork 1
Google Vertex Setup
Enterprise-grade AI with Google Cloud Vertex AI integration
Gatewayz integrates with Google Vertex AI to provide access to:
- Gemini Models - Chat completions with advanced capabilities
- Image Generation - Stability Diffusion v1.5 on Vertex AI endpoints
- Multimodal Support - Text, image, audio, video processing
- Enterprise Features - High availability, security, compliance
Authentication: Application Default Credentials (ADC) with OAuth2 JWT fallback
- Google Cloud Project with Vertex AI API enabled
-
Service Account with
roles/aiplatform.userpermission - Service Account Key (JSON format)
- Python 3.10+ with required dependencies
pip install -r requirements.txtThis installs:
google-cloud-aiplatform>=1.38.0google-auth>=2.0
# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com \
--project=YOUR_PROJECT_ID
# Verify API is enabled
gcloud services list --enabled --filter="NAME:aiplatform"# Create service account
gcloud iam service-accounts create gatewayz-vertex \
--display-name="Gatewayz Vertex AI Service Account" \
--project=YOUR_PROJECT_ID
# Grant Vertex AI User role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:gatewayz-vertex@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# Create and download key
gcloud iam service-accounts keys create gatewayz-vertex-key.json \
--iam-account=gatewayz-vertex@YOUR_PROJECT_ID.iam.gserviceaccount.comAdd to your .env file:
# Required
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
# Authentication (choose one method)
# Option 1: File path (recommended for local development)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/gatewayz-vertex-key.json
# Option 2: Raw JSON (recommended for serverless/Railway/Vercel)
GOOGLE_VERTEX_CREDENTIALS_JSON='{"type":"service_account","project_id":"...","private_key":"...","client_email":"..."}'
# Optional: Transport mode (sdk, rest, auto)
GOOGLE_VERTEX_TRANSPORT=restexport GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_PROJECT_ID="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"# Set raw JSON credentials
export GOOGLE_VERTEX_CREDENTIALS_JSON='{"type":"service_account",...}'
export GOOGLE_PROJECT_ID="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"For Railway/Vercel: Paste the entire JSON content as a single-line environment variable.
# Authenticate with gcloud
gcloud auth application-default login
# Or use workload identity on GKE/GCE
# No additional configuration needed| Model | Context | Features |
|---|---|---|
| gemini-2.0-flash | 1M tokens | Latest, fastest, multimodal |
| gemini-1.5-pro | 2M tokens | Advanced reasoning |
| gemini-1.5-flash | 1M tokens | Fast, cost-effective |
| gemini-2.5-flash-lite | 128K tokens | Lightweight, ultra-fast |
| Model | Size | Backend |
|---|---|---|
| stable-diffusion-1.5 | 512x512, 768x768 | Custom Vertex endpoint |
import requests
url = "https://your-gateway.com/v1/chat/completions"
headers = {
"Authorization": "Bearer YOUR_GATEWAYZ_API_KEY",
"Content-Type": "application/json"
}
payload = {
"model": "gemini-2.0-flash",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain quantum computing"}
],
"temperature": 0.7,
"max_tokens": 1000
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())from openai import OpenAI
client = OpenAI(
api_key="YOUR_GATEWAYZ_API_KEY",
base_url="https://your-gateway.com/v1"
)
stream = client.chat.completions.create(
model="gemini-1.5-flash",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")url = "https://your-gateway.com/v1/images/generations"
headers = {
"Authorization": "Bearer YOUR_GATEWAYZ_API_KEY",
"Content-Type": "application/json"
}
payload = {
"prompt": "A serene mountain landscape at sunset",
"model": "stable-diffusion-1.5",
"size": "512x512",
"n": 1,
"provider": "google-vertex",
"google_project_id": "your-project-id",
"google_location": "us-central1",
"google_endpoint_id": "6072619212881264640"
}
response = requests.post(url, headers=headers, json=payload)
result = response.json()
# Save generated image
import base64
from pathlib import Path
if result['data'][0]['b64_json']:
image_data = base64.b64decode(result['data'][0]['b64_json'])
Path("generated_image.png").write_bytes(image_data)payload = {
"model": "gemini-1.5-pro",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}
response = requests.post(url, headers=headers, json=payload)Uses Google Vertex AI REST API directly:
GOOGLE_VERTEX_TRANSPORT=restPros:
- Works everywhere (Vercel, Railway, serverless)
- No C++ runtime dependencies
- Lightweight, fast startup
Cons:
- Slightly higher latency (~10-50ms)
Uses Google Cloud AI Platform SDK:
GOOGLE_VERTEX_TRANSPORT=sdkPros:
- Native Google SDK features
- Automatic retries and error handling
Cons:
- Requires
libstdc++.so.6(may fail on some serverless platforms) - Heavier memory footprint
Tries SDK first, falls back to REST:
GOOGLE_VERTEX_TRANSPORT=auto| Variable | Required | Default | Description |
|---|---|---|---|
GOOGLE_PROJECT_ID |
Yes | - | GCP project ID |
GOOGLE_VERTEX_LOCATION |
Yes | us-central1 |
GCP region |
GOOGLE_APPLICATION_CREDENTIALS |
No* | - | Path to service account JSON |
GOOGLE_VERTEX_CREDENTIALS_JSON |
No* | - | Raw service account JSON |
GOOGLE_VERTEX_TRANSPORT |
No | rest |
Transport mode (sdk/rest/auto) |
GOOGLE_VERTEX_TIMEOUT |
No | 120 |
Request timeout (seconds) |
GOOGLE_VERTEX_ENDPOINT_ID |
No** | - | Custom endpoint ID for image gen |
* At least one authentication method required ** Required only for image generation
Vertex AI charges are separate from Gatewayz credits:
| Model | Prompt ($/1M tokens) | Completion ($/1M tokens) |
|---|---|---|
| gemini-2.0-flash | $0.075 | $0.30 |
| gemini-1.5-pro | $1.25 | $5.00 |
| gemini-1.5-flash | $0.075 | $0.30 |
| stable-diffusion-1.5 | - | 100 tokens/image |
Note: Google bills separately through Google Cloud billing.
Error: Authentication failed or Permission denied
Solution:
-
Verify service account has
roles/aiplatform.userrole:gcloud projects get-iam-policy YOUR_PROJECT_ID \ --flatten="bindings[].members" \ --filter="bindings.members:serviceAccount:gatewayz-vertex@*"
-
Check credentials JSON is valid:
cat service-account.json | python3 -m json.tool -
Test authentication:
gcloud auth activate-service-account \ --key-file=service-account.json
Error: Endpoint not found or Invalid endpoint ID
Solution:
-
List deployed endpoints:
gcloud ai endpoints list \ --project=YOUR_PROJECT_ID \ --region=us-central1
-
Verify endpoint status is "DEPLOYED"
-
Check endpoint ID matches configuration
Error: ImportError: libstdc++.so.6: cannot open shared object file
Solution: Switch to REST transport:
export GOOGLE_VERTEX_TRANSPORT=restOr install required libraries:
# Debian/Ubuntu
sudo apt-get install libstdc++6
# Alpine (Docker)
apk add --no-cache libstdc++Error: Request times out after 120 seconds
Solution:
-
Increase timeout:
export GOOGLE_VERTEX_TIMEOUT=300 -
Use faster model (gemini-2.0-flash instead of gemini-1.5-pro)
-
Reduce max_tokens in request
Error: Invalid instance format or Prediction failed
Solution:
- Verify endpoint accepts the expected input format
- Check model deployment configuration in Vertex AI console
- Review endpoint prediction schema:
gcloud ai endpoints describe YOUR_ENDPOINT_ID \ --project=YOUR_PROJECT_ID \ --region=us-central1
payload = {
"model": "gemini-1.5-pro",
"messages": [...],
"temperature": 0.9,
"top_p": 0.95,
"top_k": 40,
"max_tokens": 2048,
"safety_settings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
}payload = {
"model": "gemini-1.5-pro",
"messages": [{"role": "user", "content": "What's the weather in NYC?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
}Note: Function calling format transformation is in progress. Some features may not work yet.
payload = {
"prompt": "Various mountain landscapes",
"provider": "google-vertex",
"n": 4,
"size": "512x512"
}Cost: 100 tokens per image (400 tokens for 4 images)
# Set environment variables in Railway dashboard
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_VERTEX_CREDENTIALS_JSON={"type":"service_account",...}
GOOGLE_VERTEX_TRANSPORT=rest# Add to Vercel environment variables
GOOGLE_PROJECT_ID=your-project-id
GOOGLE_VERTEX_LOCATION=us-central1
GOOGLE_VERTEX_CREDENTIALS_JSON={"type":"service_account",...}
GOOGLE_VERTEX_TRANSPORT=rest# Dockerfile
ENV GOOGLE_PROJECT_ID=your-project-id
ENV GOOGLE_VERTEX_LOCATION=us-central1
ENV GOOGLE_VERTEX_TRANSPORT=rest
# Copy service account key
COPY service-account.json /app/
ENV GOOGLE_APPLICATION_CREDENTIALS=/app/service-account.jsonThe Vertex AI client maintains internal connection pools. No additional configuration needed.
Access tokens are cached for 59 minutes to reduce OAuth2 overhead:
# Automatic caching - no configuration needed
# Tokens refreshed automatically when expiredUse the closest region for lower latency:
# US
GOOGLE_VERTEX_LOCATION=us-central1
# Europe
GOOGLE_VERTEX_LOCATION=europe-west1
# Asia
GOOGLE_VERTEX_LOCATION=asia-northeast1from src.services.google_vertex_client import diagnose_google_vertex_credentials
status = diagnose_google_vertex_credentials()
print(status)Output:
{
"credentials_available": true,
"credential_source": "env_json",
"project_id": "your-project-id",
"location": "us-central1",
"initialization_successful": true,
"health_status": "healthy",
"error": null,
"steps": [...]
}Check logs for authentication and request details:
# Look for initialization messages
✓ Successfully initialized Vertex AI for project: your-project-id
# Request logs
Making Google Vertex request for model: gemini-2.0-flash
Using model name: gemini-2.0-flash- Never commit service account keys to version control
- Use environment variables for credentials
- Rotate service account keys regularly (every 90 days)
-
Grant minimum permissions (
roles/aiplatform.useronly) - Use separate service accounts for dev/staging/prod
- Enable audit logging for Vertex AI API calls
- Monitor usage in Google Cloud Console
Previous implementation used Google SDK with manual credential loading. New implementation uses Application Default Credentials (ADC) for better serverless compatibility.
Old:
credentials = get_google_vertex_credentials()
credentials.refresh(Request())New:
# ADC automatically discovers credentials
initialize_vertex_ai() # No explicit credentials neededSee: docs/integrations/GOOGLE_VERTEX_MIGRATION.md for full migration guide.
- Integration Guide - Add new providers
- Provider Failover - Automatic failover
- Model Health System - Monitor model status
- Google Vertex AI Documentation
- Gemini API Reference
- Google Cloud AI Platform Python Client
- Service Account Best Practices
Last Updated: December 2024 Status: Production Ready
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References