# Google Cloud Platform (GCP) Tutorial

This tutorial demonstrates how to use Clustrix with Google Cloud Platform (GCP) infrastructure for scalable distributed computing.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/source/notebooks/gcp_cloud_tutorial.ipynb)

## Overview

GCP provides several services that integrate well with Clustrix:

- **Compute Engine**: Virtual machines for compute clusters
- **Google Kubernetes Engine (GKE)**: Managed Kubernetes clusters
- **Batch**: Managed job scheduling service
- **Cloud Run**: Serverless container platform
- **Vertex AI**: Machine learning platform
- **Cloud Storage**: Object storage for data and results
- **VPC**: Network isolation and security
- **Preemptible VMs**: Cost-effective compute instances

## Complete Setup Guide from Scratch

### Step 1: Google Cloud Account Setup

1. **Create Google Cloud Account**:
   - Go to [Google Cloud Console](https://console.cloud.google.com/)
   - Sign up with your Google account or create a new one
   - Accept the terms of service

2. **Enable Billing**:
   - Navigate to Billing in the Google Cloud Console
   - Create a billing account and add a payment method
   - **Important**: New users get $300 in free credits
   - Set up billing alerts to avoid unexpected charges

3. **Create a New Project**:
   - Go to the Project Selector in the console
   - Click "New Project"
   - Choose a unique project ID (e.g., `my-clustrix-project-123`)
   - Enable billing for this project

### Step 2: Install Google Cloud SDK (gcloud CLI)

**On macOS:**
```bash
# Using Homebrew (recommended)
brew install google-cloud-sdk

# Or download installer
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
```

**On Linux:**
```bash
# Download and install
curl https://sdk.cloud.google.com | bash
exec -l $SHELL

# Or use package manager (Ubuntu/Debian)
sudo apt-get install google-cloud-sdk
```

**On Windows:**
- Download the installer from [Google Cloud SDK page](https://cloud.google.com/sdk/docs/install)
- Run the installer and follow instructions

### Step 3: Enable Required APIs

Enable the necessary Google Cloud APIs for this tutorial:

```bash
# Set your project ID
export PROJECT_ID="your-project-id-here"
gcloud config set project $PROJECT_ID

# Enable required APIs
gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable batch.googleapis.com
gcloud services enable aiplatform.googleapis.com
gcloud services enable storage.googleapis.com
```

## Prerequisites Checklist

Before proceeding, ensure you have:

- [ ] Google Cloud account with billing enabled
- [ ] Google Cloud project created
- [ ] Google Cloud SDK (gcloud) installed locally
- [ ] Required APIs enabled (compute, container, batch, storage, aiplatform)
- [ ] SSH key pair for VM access (we'll create this below)
- [ ] Basic understanding of command line usage

## Installation and Setup

Install Clustrix with GCP dependencies:

### Step 4: SSH Key Setup

Create SSH keys for secure access to your GCP instances:

```bash
# Generate SSH key pair (if you don't have one)
ssh-keygen -t rsa -b 4096 -C "your-email@example.com" -f ~/.ssh/gcp_key

# Add the public key to GCP
gcloud compute os-login ssh-keys add --key-file=~/.ssh/gcp_key.pub

# Or add to project metadata (alternative method)
gcloud compute project-info add-metadata --metadata-from-file ssh-keys=~/.ssh/gcp_key.pub
```

**Note**: If you're using Google Cloud Shell, SSH keys are automatically managed.

In [None]:
# Install Clustrix with GCP support
!pip install clustrix google-cloud-compute google-cloud-storage google-auth google-auth-oauthlib

# Import required libraries
import clustrix
from clustrix import cluster, configure
from google.cloud import compute_v1
from google.cloud import storage
from google.auth import default
import os
import numpy as np
import time
import json

## GCP Authentication Setup

Configure your GCP credentials. Choose the method that best fits your environment:

### Option 1: gcloud CLI Authentication (Recommended for Local Development)

This method uses your personal Google account credentials:

In [None]:
# Initial authentication and project setup
!gcloud auth login
!gcloud auth application-default login

# Set your project ID (replace with your actual project ID)
PROJECT_ID = "your-project-id-here"  # Replace this!
!gcloud config set project {PROJECT_ID}

# Verify authentication and project setup
!gcloud auth list
!gcloud config get-value project
!gcloud projects describe {PROJECT_ID}

### Option 2: Service Account Authentication (Recommended for Production)

For production environments, create and use a service account with specific permissions:

In [None]:
# Test GCP connection
try:
    credentials, project_id = default()
    print(f"✓ Successfully authenticated with project: {project_id}")
    
    # Test compute API
    compute_client = compute_v1.InstancesClient()
    print("✓ Compute Engine API access confirmed")
    
    # Test storage API
    storage_client = storage.Client()
    print("✓ Cloud Storage API access confirmed")
    
except Exception as e:
    print(f"❌ GCP authentication failed: {e}")
    print("Please check your authentication setup and try again.")

**Service Account Setup (Production Environments)**

For production use, create a service account with specific permissions:

```bash
# Create service account
gcloud iam service-accounts create clustrix-service-account \
  --description="Service account for Clustrix operations" \
  --display-name="Clustrix Service Account"

# Grant necessary permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/compute.admin"

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.admin"

# Create and download service account key
gcloud iam service-accounts keys create ~/clustrix-service-account-key.json \
  --iam-account=clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com

# Set the environment variable
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/clustrix-service-account-key.json"
```

**Important**: Make sure you have completed authentication setup and enabled all required APIs before proceeding. 

If authentication fails, double-check that:
- Your project ID is correct
- Billing is enabled for your project  
- Required APIs are enabled
- Your credentials are properly configured

## Method 1: Google Compute Engine Configuration

### Create Compute Engine Instance for Clustrix

In [None]:
def create_clustrix_compute_instance(project_id, zone='us-central1-a', machine_type='e2-standard-4'):
    """
    Create a GCP Compute Engine instance configured for Clustrix.
    
    Args:
        project_id: GCP project ID
        zone: GCP zone for the instance
        machine_type: Machine type (CPU/memory configuration)
    
    Returns:
        Instance configuration and gcloud commands
    """
    
    # Startup script for instance initialization
    startup_script = '''
#!/bin/bash

# Update system
apt-get update
apt-get install -y python3 python3-pip git htop curl

# Install clustrix and common packages
pip3 install clustrix numpy scipy pandas scikit-learn matplotlib

# Install uv for faster package management
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.cargo/env

# Create clustrix user
useradd -m -s /bin/bash clustrix
usermod -aG sudo clustrix
echo "clustrix ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

# Setup SSH for clustrix user
mkdir -p /home/clustrix/.ssh
# Copy SSH keys from default user
if [ -d "/home/$(logname)/.ssh" ]; then
    cp -r /home/$(logname)/.ssh/* /home/clustrix/.ssh/
    chown -R clustrix:clustrix /home/clustrix/.ssh
    chmod 700 /home/clustrix/.ssh
    chmod 600 /home/clustrix/.ssh/authorized_keys 2>/dev/null || true
fi

# Create working directory
mkdir -p /tmp/clustrix
chown clustrix:clustrix /tmp/clustrix

# Install Google Cloud SDK for clustrix user
curl https://sdk.cloud.google.com | bash
exec -l $SHELL

# Log completion
echo "Clustrix setup completed at $(date)" >> /var/log/clustrix-setup.log
'''
    
    # gcloud commands for instance creation
    gcloud_commands = f"""
# Create firewall rule for SSH (if not exists)
gcloud compute firewall-rules create allow-ssh \
  --allow tcp:22 \
  --source-ranges 0.0.0.0/0 \
  --description "Allow SSH access" \
  --project {project_id} || echo "SSH rule already exists"

# Create the instance
gcloud compute instances create clustrix-instance \
  --project={project_id} \
  --zone={zone} \
  --machine-type={machine_type} \
  --network-interface=network-tier=PREMIUM,subnet=default \
  --maintenance-policy=MIGRATE \
  --provisioning-model=STANDARD \
  --service-account=default \
  --scopes=https://www.googleapis.com/auth/cloud-platform \
  --tags=clustrix,http-server,https-server \
  --create-disk=auto-delete=yes,boot=yes,device-name=clustrix-instance,image=projects/ubuntu-os-cloud/global/images/family/ubuntu-2204-lts,mode=rw,size=50,type=projects/{project_id}/zones/{zone}/diskTypes/pd-balanced \
  --no-shielded-secure-boot \
  --shielded-vtpm \
  --shielded-integrity-monitoring \
  --labels=purpose=clustrix,environment=tutorial \
  --reservation-affinity=any \
  --metadata-from-file startup-script=startup-script.sh

# Get the external IP
gcloud compute instances describe clustrix-instance \
  --project={project_id} \
  --zone={zone} \
  --format='get(networkInterfaces[0].accessConfigs[0].natIP)'

# SSH to the instance (after startup script completes)
gcloud compute ssh clustrix-instance \
  --project={project_id} \
  --zone={zone}
"""
    
    return {
        'project_id': project_id,
        'zone': zone,
        'machine_type': machine_type,
        'instance_name': 'clustrix-instance',
        'gcloud_commands': gcloud_commands,
        'startup_script': startup_script
    }

# Example usage - replace with your actual project ID
instance_config = create_clustrix_compute_instance(
    project_id=PROJECT_ID,  # Using the PROJECT_ID variable from above
    zone='us-central1-a',
    machine_type='e2-standard-4'  # 4 vCPUs, 16 GB RAM
)

# Display the configuration results
print("=== GCP Compute Engine Instance Configuration ===")
print(f"Project ID: {instance_config['project_id']}")
print(f"Zone: {instance_config['zone']}")
print(f"Machine Type: {instance_config['machine_type']}")
print(f"Instance Name: {instance_config['instance_name']}")
print("\n=== Next Steps ===")
print("1. Save the startup script to 'startup-script.sh'")
print("2. Execute the gcloud commands shown above")
print("3. Wait 3-5 minutes for instance initialization")
print("4. Get the external IP and configure Clustrix")

### GCP Compute Engine Instance Creation

The above code defines a function that creates a GCP Compute Engine instance optimized for Clustrix workloads. The function returns:

- **gcloud commands**: Complete CLI commands to create the instance
- **startup script**: Automated setup script that configures the instance

The configuration includes:
- Ubuntu 22.04 LTS base image
- Pre-installed Python packages and Clustrix
- Clustrix user account with sudo privileges  
- SSH key setup and working directories
- 50GB balanced persistent disk
- Appropriate firewall rules and metadata

**Next Steps**: 

1. **Save the startup script** to a file named `startup-script.sh` in your current directory
2. **Execute the gcloud commands** shown above to create your instance
3. **Wait for the instance to fully initialize** (startup script takes 3-5 minutes)
4. **Get the external IP** using the describe command shown above
5. **Test SSH access** to ensure the instance is ready for Clustrix

### Configure Clustrix for Compute Engine

In [None]:
# Get the external IP of your created instance
# Replace with the actual external IP from your instance
INSTANCE_EXTERNAL_IP = "YOUR_INSTANCE_EXTERNAL_IP"  # Replace this!

# Configure Clustrix to use your Compute Engine instance
configure(
    cluster_type="ssh",
    cluster_host=INSTANCE_EXTERNAL_IP,
    username="clustrix",  # or your default user
    key_file="~/.ssh/gcp_key",  # path to your SSH private key
    remote_work_dir="/tmp/clustrix",
    package_manager="auto",  # Will use uv if available, pip otherwise
    default_cores=4,
    default_memory="8GB",
    default_time="01:00:00"
)

# Verify configuration
if INSTANCE_EXTERNAL_IP != "YOUR_INSTANCE_EXTERNAL_IP":
    print(f"✓ Clustrix configured for GCP Compute Engine")
    print(f"  Host: {INSTANCE_EXTERNAL_IP}")
    print(f"  SSH Key: ~/.ssh/gcp_key")
    print(f"  Remote Work Dir: /tmp/clustrix")
else:
    print("⚠️  Please replace INSTANCE_EXTERNAL_IP with your actual IP address")

**Important Configuration Notes**:

- Replace `YOUR_INSTANCE_EXTERNAL_IP` with the actual external IP address from your Compute Engine instance
- Use the SSH key path that corresponds to your setup (either `~/.ssh/gcp_key` if you created one following this tutorial, or `~/.ssh/google_compute_engine` for gcloud-generated keys)
- The `clustrix` user was created by the startup script with appropriate permissions
- If you encounter connection issues, ensure your firewall rules allow SSH access from your IP address

### Example: Remote Computation on Compute Engine

In [None]:
# Example: GCP Data Analysis
@cluster(cores=2, memory="4GB")
def gcp_data_analysis(dataset_size=10000, analysis_type='regression'):
    """Perform data analysis on GCP Compute Engine."""
    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
    from sklearn.metrics import mean_squared_error, accuracy_score
    from sklearn.datasets import make_regression, make_classification
    import time
    
    start_time = time.time()
    
    # Generate synthetic dataset
    if analysis_type == 'regression':
        X, y = make_regression(
            n_samples=dataset_size,
            n_features=20,
            noise=0.1,
            random_state=42
        )
        model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)
        metric_name = 'rmse'
    else:
        X, y = make_classification(
            n_samples=dataset_size,
            n_features=20,
            n_classes=3,
            random_state=42
        )
        model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
        metric_name = 'accuracy'
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    # Train model
    training_start = time.time()
    model.fit(X_train, y_train)
    training_time = time.time() - training_start
    
    # Evaluate
    y_pred = model.predict(X_test)
    
    if analysis_type == 'regression':
        metric_value = np.sqrt(mean_squared_error(y_test, y_pred))
    else:
        metric_value = accuracy_score(y_test, y_pred)
    
    total_time = time.time() - start_time
    
    return {
        'analysis_type': analysis_type,
        'dataset_size': dataset_size,
        'training_time': training_time,
        'total_time': total_time,
        metric_name: metric_value,
        'feature_importance': model.feature_importances_[:5].tolist(),  # Top 5
        'training_samples': len(X_train),
        'test_samples': len(X_test)
    }

# Example: Parallel Computation
@cluster(cores=4, memory="8GB")
def gcp_parallel_computation(n_iterations=1000):
    """Basic parallel computation example."""
    import numpy as np
    import time
    
    start_time = time.time()
    
    # Simulate CPU-intensive work
    results = []
    for i in range(n_iterations):
        # Monte Carlo pi estimation
        points = np.random.random((1000, 2))
        inside_circle = np.sum((points**2).sum(axis=1) <= 1)
        pi_estimate = 4 * inside_circle / 1000
        results.append(pi_estimate)
    
    computation_time = time.time() - start_time
    final_pi_estimate = np.mean(results)
    
    return {
        'iterations': n_iterations,
        'pi_estimate': final_pi_estimate,
        'computation_time': computation_time,
        'accuracy': abs(final_pi_estimate - np.pi)
    }

print("✓ GCP computation examples defined")
print("\n📝 Example usage:")
print("# Data analysis:")
print("# result = gcp_data_analysis(dataset_size=50000, analysis_type='classification')")
print("# print(f'Accuracy: {result[\"accuracy\"]:.4f}')")
print("#")
print("# Parallel computation:")
print("# result = gcp_parallel_computation(n_iterations=5000)")
print("# print(f'Pi estimate: {result[\"pi_estimate\"]:.6f}')")

# Example execution (commented out - uncomment after setup):
# result = gcp_data_analysis(dataset_size=5000, analysis_type='classification')
# print(f"✓ Analysis completed: {result['accuracy']:.4f} accuracy")
# print(f"⏱️  Training time: {result['training_time']:.2f} seconds")

## Method 2: Google Kubernetes Engine (GKE) Configuration

GKE provides managed Kubernetes clusters ideal for containerized Clustrix workloads:

In [None]:
def setup_gke_cluster_for_clustrix(project_id, cluster_name='clustrix-cluster', zone='us-central1-a'):
    """
    Setup GKE cluster optimized for Clustrix workloads.
    """
    
    gke_commands = f"""
# Enable required APIs
gcloud services enable container.googleapis.com \
  --project {project_id}

# Create GKE cluster with auto-scaling
gcloud container clusters create {cluster_name} \
  --project {project_id} \
  --zone {zone} \
  --machine-type e2-standard-4 \
  --num-nodes 1 \
  --enable-autoscaling \
  --min-nodes 0 \
  --max-nodes 10 \
  --enable-autorepair \
  --enable-autoupgrade \
  --disk-size 50GB \
  --disk-type pd-ssd \
  --enable-network-policy \
  --enable-ip-alias \
  --labels purpose=clustrix,environment=tutorial

# Get cluster credentials
gcloud container clusters get-credentials {cluster_name} \
  --project {project_id} \
  --zone {zone}

# Verify cluster access
kubectl get nodes

# Create clustrix namespace
kubectl create namespace clustrix

# Set as default namespace
kubectl config set-context --current --namespace=clustrix
"""
    
    # Clustrix job template for Kubernetes
    k8s_job_template = """
apiVersion: batch/v1
kind: Job
metadata:
  name: clustrix-job-${JOB_ID}
  namespace: clustrix
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: clustrix-worker
        image: python:3.11-slim
        command: ["bash", "-c"]
        args:
        - |
          pip install clustrix numpy scipy pandas scikit-learn
          python -c "
          import pickle
          import sys
          
          # Load and execute function
          with open('/data/function_data.pkl', 'rb') as f:
              data = pickle.load(f)
          
          func = pickle.loads(data['function'])
          args = pickle.loads(data['args'])
          kwargs = pickle.loads(data['kwargs'])
          
          try:
              result = func(*args, **kwargs)
              with open('/data/result.pkl', 'wb') as f:
                  pickle.dump(result, f)
          except Exception as e:
              with open('/data/error.pkl', 'wb') as f:
                  pickle.dump({'error': str(e)}, f)
              raise
          "
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
        volumeMounts:
        - name: job-data
          mountPath: /data
      volumes:
      - name: job-data
        persistentVolumeClaim:
          claimName: clustrix-pvc
  backoffLimit: 3
"""
    
    return {
        'cluster_name': cluster_name,
        'project_id': project_id,
        'zone': zone,
        'setup_commands': gke_commands,
        'job_template': k8s_job_template
    }

def configure_clustrix_for_gke(cluster_endpoint, cluster_name):
    """Configure Clustrix to use GKE cluster."""
    configure(
        cluster_type="kubernetes",
        cluster_host=cluster_endpoint,
        # For GKE, authentication is handled via kubectl config
        remote_work_dir="/tmp/clustrix",
        package_manager="pip",  # Container-based, pip is fine
        default_cores=2,
        default_memory="4GB",
        default_time="01:00:00"
    )
    print(f"✓ Configured Clustrix for GKE cluster: {cluster_name}")

# Create GKE configuration
gke_config = setup_gke_cluster_for_clustrix(
    project_id=PROJECT_ID,
    cluster_name='clustrix-cluster'
)

print("=== GKE Cluster Setup Commands ===")
print(gke_config['setup_commands'])
print("\n=== Kubernetes Job Template ===")
print(gke_config['job_template'])
print("\n📝 Note: GKE integration requires additional implementation in Clustrix.")
print("Current Clustrix supports basic Kubernetes, but GKE-specific features need custom setup.")

## Method 3: Google Cloud Batch

Google Cloud Batch provides managed job scheduling for large-scale workloads:

In [None]:
def setup_gcp_batch_environment(project_id, region='us-central1'):
    """
    Setup Google Cloud Batch for Clustrix workloads.
    """
    
    batch_setup_commands = f"""
# Enable Batch API
gcloud services enable batch.googleapis.com \
  --project {project_id}

# Create a service account for Batch jobs
gcloud iam service-accounts create clustrix-batch-sa \
  --project {project_id} \
  --description="Service account for Clustrix Batch jobs" \
  --display-name="Clustrix Batch Service Account"

# Grant necessary permissions
gcloud projects add-iam-policy-binding {project_id} \
  --member="serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com" \
  --role="roles/batch.jobsEditor"

gcloud projects add-iam-policy-binding {project_id} \
  --member="serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

# Create Cloud Storage bucket for job data
gsutil mb -p {project_id} -l {region} gs://{project_id}-clustrix-batch
"""
    
    # Batch job configuration template
    batch_job_config = {
        "taskGroups": [
            {
                "taskSpec": {
                    "runnables": [
                        {
                            "script": {
                                "text": f"""
#!/bin/bash
set -e

# Install required packages
pip3 install clustrix numpy scipy pandas scikit-learn

# Download job data from Cloud Storage
gsutil cp gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/function_data.pkl .

# Execute the function
python3 -c "
import pickle
import traceback

try:
    with open('function_data.pkl', 'rb') as f:
        data = pickle.load(f)
    
    func = pickle.loads(data['function'])
    args = pickle.loads(data['args'])
    kwargs = pickle.loads(data['kwargs'])
    
    result = func(*args, **kwargs)
    
    with open('result.pkl', 'wb') as f:
        pickle.dump(result, f)
        
except Exception as e:
    with open('error.pkl', 'wb') as f:
        pickle.dump({{
            'error': str(e),
            'traceback': traceback.format_exc()
        }}, f)
    raise
"

# Upload results to Cloud Storage
gsutil cp result.pkl gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/result.pkl || \
gsutil cp error.pkl gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/error.pkl
"""
                            }
                        }
                    ],
                    "computeResource": {
                        "cpuMilli": 2000,  # 2 CPUs
                        "memoryMib": 4096  # 4 GB RAM
                    },
                    "maxRetryCount": 2,
                    "maxRunDuration": "3600s"  # 1 hour
                },
                "taskCount": 1
            }
        ],
        "allocationPolicy": {
            "instances": [
                {
                    "instanceTemplate": {
                        "machineType": "e2-standard-2",
                        "provisioningModel": "STANDARD"
                    }
                }
            ]
        },
        "labels": {
            "purpose": "clustrix",
            "environment": "tutorial"
        },
        "logsPolicy": {
            "destination": "CLOUD_LOGGING"
        }
    }
    
    return {
        'project_id': project_id,
        'region': region,
        'bucket_name': f'{project_id}-clustrix-batch',
        'service_account': f'clustrix-batch-sa@{project_id}.iam.gserviceaccount.com',
        'job_config': batch_job_config,
        'setup_commands': batch_setup_commands
    }

# Create Batch configuration
batch_config = setup_gcp_batch_environment(PROJECT_ID)

print("=== Google Cloud Batch Setup Commands ===")
print(batch_config['setup_commands'])
print("\n=== Batch Job Configuration ===")
print(json.dumps(batch_config['job_config'], indent=2))
print("\n💡 Google Cloud Batch provides excellent integration for large-scale Clustrix workloads.")

## Data Management with Google Cloud Storage

In [None]:
@cluster(cores=2, memory="4GB")
def process_gcs_data(bucket_name, input_blob, output_blob, project_id=None):
    """Process data from Google Cloud Storage and save results back."""
    from google.cloud import storage
    import numpy as np
    import pickle
    import io
    import time
    
    # Initialize Cloud Storage client
    storage_client = storage.Client(project=project_id)
    bucket = storage_client.bucket(bucket_name)
    
    # Download data from Cloud Storage
    input_blob_obj = bucket.blob(input_blob)
    data_bytes = input_blob_obj.download_as_bytes()
    data = pickle.loads(data_bytes)
    
    # Process the data
    processed_data = {
        'original_shape': data.shape if hasattr(data, 'shape') else len(data) if hasattr(data, '__len__') else 'scalar',
        'mean': float(np.mean(data)) if hasattr(data, '__iter__') else float(data),
        'std': float(np.std(data)) if hasattr(data, '__iter__') else 0.0,
        'max': float(np.max(data)) if hasattr(data, '__iter__') else float(data),
        'min': float(np.min(data)) if hasattr(data, '__iter__') else float(data),
        'processing_timestamp': time.time(),
        'processed_on': 'gcp-compute-engine',
        'data_type': str(type(data).__name__)
    }
    
    # Advanced processing based on data type
    if hasattr(data, 'shape') and len(data.shape) >= 2:
        # Matrix operations
        processed_data.update({
            'matrix_rank': int(np.linalg.matrix_rank(data)) if data.shape[0] == data.shape[1] else 'non_square',
            'frobenius_norm': float(np.linalg.norm(data, 'fro')),
            'condition_number': float(np.linalg.cond(data)) if data.shape[0] == data.shape[1] else None
        })
    
    # Upload results to Cloud Storage
    output_bytes = pickle.dumps(processed_data)
    output_blob_obj = bucket.blob(output_blob)
    output_blob_obj.upload_from_string(output_bytes)
    
    return f"Processed data saved to gs://{bucket_name}/{output_blob}"

# Utility functions for Google Cloud Storage
def upload_to_gcs(data, bucket_name, blob_name, project_id=None):
    """Upload data to Google Cloud Storage."""
    storage_client = storage.Client(project=project_id)
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)
    
    data_bytes = pickle.dumps(data)
    blob.upload_from_string(data_bytes)
    return f"gs://{bucket_name}/{blob_name}"

def download_from_gcs(bucket_name, blob_name, project_id=None):
    """Download data from Google Cloud Storage."""
    storage_client = storage.Client(project=project_id)
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)
    
    data_bytes = blob.download_as_bytes()
    return pickle.loads(data_bytes)

def create_gcs_bucket_for_clustrix(project_id, bucket_name, location='us-central1'):
    """Create a Cloud Storage bucket for Clustrix data."""
    gcs_commands = f"""
# Create bucket with appropriate settings
gsutil mb -p {project_id} -l {location} gs://{bucket_name}

# Set lifecycle policy to delete temporary files after 7 days
echo '{{
  "lifecycle": {{
    "rule": [
      {{
        "action": {{"type": "Delete"}},
        "condition": {{
          "age": 7,
          "matchesPrefix": ["temp/"]
        }}
      }}
    ]
  }}
}}' > lifecycle.json

gsutil lifecycle set lifecycle.json gs://{bucket_name}

# Set up proper permissions (if using service account)
gsutil iam ch serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com:objectAdmin gs://{bucket_name}
"""
    
    return gcs_commands

# Create bucket configuration
BUCKET_NAME = f"{PROJECT_ID}-clustrix-data"
bucket_commands = create_gcs_bucket_for_clustrix(PROJECT_ID, BUCKET_NAME)

print("=== Commands to create Cloud Storage bucket ===")
print(bucket_commands)

# Example usage (commented out - uncomment after creating bucket):
# sample_data = np.random.rand(1000, 100)
# upload_location = upload_to_gcs(sample_data, BUCKET_NAME, 'input/sample_data.pkl', PROJECT_ID)
# print(f"✓ Data uploaded to {upload_location}")
# 
# result = process_gcs_data(BUCKET_NAME, 'input/sample_data.pkl', 'output/results.pkl', PROJECT_ID)
# print(f"✓ Processing completed: {result}")

print("\n✓ Google Cloud Storage integration functions defined.")
print("Execute the bucket creation commands above, then uncomment the example usage.")

## Vertex AI Integration

In [None]:
def setup_vertex_ai_for_clustrix(project_id, region='us-central1'):
    """
    Setup Vertex AI for ML workloads with Clustrix.
    """
    
    vertex_commands = f"""
# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com \
  --project {project_id}

# Create Vertex AI custom training job
gcloud ai custom-jobs create \
  --region={region} \
  --display-name=clustrix-training-job \
  --config=training_job_config.yaml

# Create Vertex AI endpoints for model serving
gcloud ai endpoints create \
  --region={region} \
  --display-name=clustrix-model-endpoint
"""
    
    # Vertex AI training job configuration
    training_config = f"""
# training_job_config.yaml
workerPoolSpecs:
- machineSpec:
    machineType: e2-standard-4
  replicaCount: 1
  containerSpec:
    imageUri: gcr.io/cloud-aiplatform/training/tf-cpu.2-8:latest
    command:
    - python3
    - -c
    args:
    - |
      import subprocess
      import sys
      
      # Install clustrix
      subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'clustrix', 'numpy', 'pandas', 'scikit-learn'])
      
      # Your training code here
      print("Clustrix training job completed on Vertex AI")
    env:
    - name: GOOGLE_CLOUD_PROJECT
      value: {project_id}
    - name: AIP_MODEL_DIR
      value: gs://{project_id}-vertex-models
"""
    
    return {
        'project_id': project_id,
        'region': region,
        'setup_commands': vertex_commands,
        'training_config': training_config
    }

@cluster(cores=4, memory="8GB")
def vertex_ai_ml_pipeline(dataset_config, model_config, project_id, bucket_name):
    """ML pipeline that could run on Vertex AI with Clustrix."""
    import numpy as np
    from sklearn.ensemble import GradientBoostingClassifier
    from sklearn.model_selection import cross_val_score, GridSearchCV
    from sklearn.datasets import make_classification
    from sklearn.metrics import classification_report
    from google.cloud import storage
    import pickle
    import time
    
    start_time = time.time()
    
    # Generate or load dataset
    X, y = make_classification(
        n_samples=dataset_config['n_samples'],
        n_features=dataset_config['n_features'],
        n_classes=dataset_config['n_classes'],
        n_informative=dataset_config.get('n_informative', dataset_config['n_features'] // 2),
        random_state=42
    )
    
    # Hyperparameter tuning
    param_grid = {
        'n_estimators': [50, 100, 200],
        'max_depth': [3, 5, 7],
        'learning_rate': [0.01, 0.1, 0.2]
    }
    
    # Grid search with cross-validation
    model = GradientBoostingClassifier(random_state=42)
    grid_search = GridSearchCV(
        model, param_grid, cv=5, scoring='accuracy', n_jobs=-1
    )
    
    grid_search.fit(X, y)
    
    # Get best model
    best_model = grid_search.best_estimator_
    
    # Evaluate with cross-validation
    cv_scores = cross_val_score(best_model, X, y, cv=5, scoring='accuracy')
    
    # Save model to Cloud Storage
    storage_client = storage.Client(project=project_id)
    bucket = storage_client.bucket(bucket_name)
    
    model_blob = bucket.blob('models/clustrix_model.pkl')
    model_bytes = pickle.dumps(best_model)
    model_blob.upload_from_string(model_bytes)
    
    total_time = time.time() - start_time
    
    return {
        'best_params': grid_search.best_params_,
        'best_score': grid_search.best_score_,
        'cv_mean_score': cv_scores.mean(),
        'cv_std_score': cv_scores.std(),
        'training_time': total_time,
        'model_location': f'gs://{bucket_name}/models/clustrix_model.pkl',
        'feature_importance': best_model.feature_importances_[:10].tolist(),  # Top 10
        'dataset_size': len(X)
    }

# Setup Vertex AI configuration
vertex_config = setup_vertex_ai_for_clustrix(PROJECT_ID)

print("=== Vertex AI Setup Commands ===")
print(vertex_config['setup_commands'])
print("\n=== Training Job Configuration ===")
print(vertex_config['training_config'])

# Example usage (commented out):
# dataset_params = {'n_samples': 10000, 'n_features': 20, 'n_classes': 3}
# model_params = {}
# result = vertex_ai_ml_pipeline(dataset_params, model_params, PROJECT_ID, BUCKET_NAME)
# print(f"✓ Best model score: {result['best_score']:.4f}")
# print(f"✓ Model saved to: {result['model_location']}")

print("\n✓ Vertex AI integration examples defined.")

## Security Best Practices

In [None]:
def setup_gcp_security_for_clustrix(project_id):
    """
    Security configuration for GCP + Clustrix deployment.
    """
    
    security_commands = f"""
# Create VPC with private subnets
gcloud compute networks create clustrix-vpc \
  --project {project_id} \
  --subnet-mode custom

gcloud compute networks subnets create clustrix-subnet \
  --project {project_id} \
  --network clustrix-vpc \
  --range 10.1.0.0/24 \
  --region us-central1 \
  --enable-private-ip-google-access

# Create firewall rules (restrictive)
gcloud compute firewall-rules create clustrix-allow-ssh \
  --project {project_id} \
  --network clustrix-vpc \
  --allow tcp:22 \
  --source-ranges YOUR_IP/32 \
  --target-tags clustrix

gcloud compute firewall-rules create clustrix-internal \
  --project {project_id} \
  --network clustrix-vpc \
  --allow tcp,udp,icmp \
  --source-ranges 10.1.0.0/24 \
  --target-tags clustrix

# Create service account with minimal permissions
gcloud iam service-accounts create clustrix-compute \
  --project {project_id} \
  --description="Service account for Clustrix compute instances" \
  --display-name="Clustrix Compute Service Account"

# Grant only necessary permissions
gcloud projects add-iam-policy-binding {project_id} \
  --member="serviceAccount:clustrix-compute@{project_id}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

gcloud projects add-iam-policy-binding {project_id} \
  --member="serviceAccount:clustrix-compute@{project_id}.iam.gserviceaccount.com" \
  --role="roles/logging.logWriter"

# Enable OS Login for better SSH key management
gcloud compute project-info add-metadata \
  --project {project_id} \
  --metadata enable-oslogin=TRUE

# Create Cloud KMS key for encryption
gcloud kms keyrings create clustrix-keyring \
  --project {project_id} \
  --location global

gcloud kms keys create clustrix-key \
  --project {project_id} \
  --keyring clustrix-keyring \
  --location global \
  --purpose encryption
"""
    
    return {
        'project_id': project_id,
        'vpc_name': 'clustrix-vpc',
        'subnet_name': 'clustrix-subnet',
        'service_account': f'clustrix-compute@{project_id}.iam.gserviceaccount.com',
        'security_commands': security_commands
    }

# Generate security configuration
security_config = setup_gcp_security_for_clustrix(PROJECT_ID)

print("=== GCP Security Setup Commands ===")
print(security_config['security_commands'])
print(f"\n✓ Security configuration templates generated for project: {PROJECT_ID}")
print(f"✓ VPC: {security_config['vpc_name']}")
print(f"✓ Service Account: {security_config['service_account']}")
print("\n⚠️  Remember to replace 'YOUR_IP' with your actual IP address in the firewall rules!")

### GCP Security Checklist for Clustrix

✓ **Authentication and Access**
- Use IAM service accounts with minimal permissions
- Enable OS Login for centralized SSH key management
- Create custom VPC with private subnets
- Restrict firewall rules to specific IP ranges

✓ **Infrastructure Security**
- Enable private Google access for instances without external IPs
- Use Cloud KMS for encryption at rest
- Enable audit logging and Cloud Security Command Center
- Use Binary Authorization for container security

✓ **Network Security**
- Implement VPC Service Controls for data perimeter
- Enable DDoS protection and Cloud Armor
- Use Secret Manager for sensitive configuration
- Enable vulnerability scanning for container images

✓ **Governance and Compliance**
- Set up budget alerts and billing account security
- Use organization policies for governance
- Regular security reviews and access audits

## Resource Cleanup

In [None]:
def cleanup_gcp_resources(project_id, zone='us-central1-a', region='us-central1'):
    """
    Clean up GCP resources to avoid ongoing charges.
    
    Args:
        project_id: GCP project ID
        zone: Zone where resources were created
        region: Region where resources were created
    """
    
    cleanup_commands = f"""
# List all compute instances
gcloud compute instances list --project {project_id}

# Delete specific instances
gcloud compute instances delete clustrix-instance \
  --project {project_id} \
  --zone {zone} \
  --quiet

# Delete managed instance groups
gcloud compute instance-groups managed delete clustrix-preemptible-group \
  --project {project_id} \
  --zone {zone} \
  --quiet

# Delete instance templates
gcloud compute instance-templates delete clustrix-preemptible-template \
  --project {project_id} \
  --quiet

# Delete GKE clusters
gcloud container clusters delete clustrix-cluster \
  --project {project_id} \
  --zone {zone} \
  --quiet

# Delete Cloud Storage buckets (BE CAREFUL - THIS DELETES ALL DATA)
gsutil -m rm -r gs://{project_id}-clustrix-batch
gsutil -m rm -r gs://{project_id}-vertex-models
gsutil -m rm -r gs://{project_id}-clustrix-data

# Delete firewall rules
gcloud compute firewall-rules delete clustrix-allow-ssh clustrix-internal \
  --project {project_id} \
  --quiet

# Delete VPC network
gcloud compute networks subnets delete clustrix-subnet \
  --project {project_id} \
  --region {region} \
  --quiet

gcloud compute networks delete clustrix-vpc \
  --project {project_id} \
  --quiet

# Delete service accounts
gcloud iam service-accounts delete clustrix-compute@{project_id}.iam.gserviceaccount.com \
  --project {project_id} \
  --quiet

gcloud iam service-accounts delete clustrix-batch-sa@{project_id}.iam.gserviceaccount.com \
  --project {project_id} \
  --quiet

# List remaining billable resources
echo "=== Remaining billable resources ==="
gcloud compute instances list --project {project_id}
gcloud compute disks list --project {project_id}
gcloud compute addresses list --project {project_id}
gcloud container clusters list --project {project_id}
"""
    
    return {
        'project_id': project_id,
        'zone': zone,
        'region': region,
        'cleanup_commands': cleanup_commands
    }

# Generate cleanup commands
cleanup_info = cleanup_gcp_resources(PROJECT_ID)

print(f"=== GCP Resource Cleanup Commands for Project: {PROJECT_ID} ===")
print(cleanup_info['cleanup_commands'])
print("\n⚠️  WARNING: Some commands will permanently delete resources and data!")
print("Review each resource before deleting and ensure you have backups if needed.")
print("\n💡 TIP: Use 'gcloud compute instances stop' instead of 'delete' to preserve instances while stopping charges.")
print("\n✓ Cleanup commands generated. Always verify resources before deletion!")

## Advanced Example: Distributed Scientific Computing

In [None]:
# Advanced Scientific Computing
@cluster(cores=4, memory="8GB", time="01:00:00")
def gcp_scientific_simulation(simulation_params, storage_config=None):
    """
    Distributed scientific simulation using GCP infrastructure.
    """
    import numpy as np
    from scipy.integrate import odeint
    from scipy.optimize import minimize
    import pickle
    import time
    import matplotlib
    matplotlib.use('Agg')  # Use non-interactive backend
    import matplotlib.pyplot as plt
    import io
    
    # Only import GCP storage if config provided
    if storage_config:
        from google.cloud import storage
    
    def lorenz_system(state, t, sigma, rho, beta):
        """Lorenz attractor differential equations."""
        x, y, z = state
        return [
            sigma * (y - x),
            x * (rho - z) - y,
            x * y - beta * z
        ]
    
    def simulate_lorenz(params, time_points):
        """Simulate Lorenz system with given parameters."""
        initial_state = [1.0, 1.0, 1.0]
        solution = odeint(
            lorenz_system, initial_state, time_points,
            args=(params['sigma'], params['rho'], params['beta'])
        )
        return solution
    
    start_time = time.time()
    
    # Parameter sweep
    parameter_sets = simulation_params['parameter_sets']
    time_points = np.linspace(0, simulation_params['max_time'], simulation_params['num_points'])
    
    results = []
    
    for i, params in enumerate(parameter_sets):
        # Run simulation
        solution = simulate_lorenz(params, time_points)
        
        # Analyze results
        x, y, z = solution[:, 0], solution[:, 1], solution[:, 2]
        
        analysis = {
            'params': params,
            'max_x': float(np.max(x)),
            'min_x': float(np.min(x)),
            'max_y': float(np.max(y)),
            'min_y': float(np.min(y)),
            'max_z': float(np.max(z)),
            'min_z': float(np.min(z)),
            'mean_energy': float(np.mean(x**2 + y**2 + z**2)),
            'final_state': [float(x[-1]), float(y[-1]), float(z[-1])],
            'std_x': float(np.std(x)),
            'std_y': float(np.std(y)),
            'std_z': float(np.std(z))
        }
        
        results.append(analysis)
        
        # Create visualization for first few parameter sets
        if i < 3:
            fig = plt.figure(figsize=(12, 4))
            
            # Time series
            plt.subplot(1, 3, 1)
            plt.plot(time_points, x, label='X', alpha=0.8)
            plt.plot(time_points, y, label='Y', alpha=0.8)
            plt.plot(time_points, z, label='Z', alpha=0.8)
            plt.xlabel('Time')
            plt.ylabel('State')
            plt.title(f'Lorenz System (σ={params["sigma"]}, ρ={params["rho"]}, β={params["beta"]})')
            plt.legend()
            plt.grid(True, alpha=0.3)
            
            # Phase space (X-Y)
            plt.subplot(1, 3, 2)
            plt.plot(x, y, alpha=0.7, linewidth=0.8)
            plt.xlabel('X')
            plt.ylabel('Y')
            plt.title('X-Y Phase Space')
            plt.grid(True, alpha=0.3)
            
            # Phase space (X-Z)
            plt.subplot(1, 3, 3)
            plt.plot(x, z, alpha=0.7, linewidth=0.8)
            plt.xlabel('X')
            plt.ylabel('Z')
            plt.title('X-Z Phase Space')
            plt.grid(True, alpha=0.3)
            
            plt.tight_layout()
            
            # Save plot to Cloud Storage if configured
            if storage_config:
                try:
                    img_buffer = io.BytesIO()
                    plt.savefig(img_buffer, format='png', dpi=150, bbox_inches='tight')
                    img_buffer.seek(0)
                    
                    storage_client = storage.Client(project=storage_config['project_id'])
                    bucket = storage_client.bucket(storage_config['bucket_name'])
                    
                    plot_blob = bucket.blob(f"plots/lorenz_simulation_{i}.png")
                    plot_blob.upload_from_string(img_buffer.getvalue(), content_type='image/png')
                except Exception as e:
                    print(f"Warning: Could not save plot to GCS: {e}")
            
            plt.close()
    
    computation_time = time.time() - start_time
    
    # Calculate summary statistics
    energies = [r['mean_energy'] for r in results]
    summary_stats = {
        'total_simulations': len(parameter_sets),
        'computation_time': computation_time,
        'average_energy': np.mean(energies),
        'max_energy': max(energies),
        'min_energy': min(energies),
        'energy_std': np.std(energies),
        'time_per_simulation': computation_time / len(parameter_sets)
    }
    
    # Save detailed results to Cloud Storage if configured
    if storage_config:
        try:
            storage_client = storage.Client(project=storage_config['project_id'])
            bucket = storage_client.bucket(storage_config['bucket_name'])
            
            results_blob = bucket.blob("results/simulation_results.pkl")
            results_data = {
                'simulation_params': simulation_params,
                'results': results,
                'summary_stats': summary_stats,
                'timestamp': time.time()
            }
            results_bytes = pickle.dumps(results_data)
            results_blob.upload_from_string(results_bytes)
        except Exception as e:
            print(f"Warning: Could not save results to GCS: {e}")
    
    return {
        'num_simulations': len(parameter_sets),
        'computation_time': computation_time,
        'summary_stats': summary_stats,
        'results_preview': results[:2],  # First 2 for brevity
        'storage_location': f"gs://{storage_config['bucket_name']}/results/" if storage_config else None,
        'plots_saved': min(3, len(parameter_sets))
    }

# Monte Carlo simulation example
@cluster(cores=2, memory="4GB")
def gcp_monte_carlo_simulation(n_samples=1000000):
    """Monte Carlo simulation for option pricing."""
    import numpy as np
    import time
    
    start_time = time.time()
    
    # Black-Scholes parameters
    S0 = 100    # Initial stock price
    K = 105     # Strike price
    T = 1.0     # Time to expiration
    r = 0.05    # Risk-free rate
    sigma = 0.2 # Volatility
    
    # Generate random samples
    np.random.seed(42)
    Z = np.random.standard_normal(n_samples)
    
    # Simulate stock prices at expiration
    ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
    
    # Calculate option payoffs
    call_payoffs = np.maximum(ST - K, 0)
    put_payoffs = np.maximum(K - ST, 0)
    
    # Discount to present value
    call_price = np.exp(-r * T) * np.mean(call_payoffs)
    put_price = np.exp(-r * T) * np.mean(put_payoffs)
    
    # Calculate confidence intervals
    call_std = np.std(call_payoffs) / np.sqrt(n_samples)
    put_std = np.std(put_payoffs) / np.sqrt(n_samples)
    
    computation_time = time.time() - start_time
    
    return {
        'n_samples': n_samples,
        'computation_time': computation_time,
        'call_price': call_price,
        'put_price': put_price,
        'call_confidence_interval': [call_price - 1.96*call_std, call_price + 1.96*call_std],
        'put_confidence_interval': [put_price - 1.96*put_std, put_price + 1.96*put_std],
        'parameters': {'S0': S0, 'K': K, 'T': T, 'r': r, 'sigma': sigma}
    }

print("✓ Advanced scientific computing examples defined")

# Example simulation parameters
example_lorenz_params = {
    'parameter_sets': [
        {'sigma': 10.0, 'rho': 28.0, 'beta': 8.0/3.0},    # Classic chaotic
        {'sigma': 10.0, 'rho': 24.74, 'beta': 8.0/3.0},   # Near onset
        {'sigma': 10.0, 'rho': 99.65, 'beta': 8.0/3.0},   # High rho
        {'sigma': 16.0, 'rho': 45.92, 'beta': 4.0},       # Different params
    ],
    'max_time': 25.0,
    'num_points': 5000
}

print("\n📝 Example usage:")
print("# Lorenz simulation:")
print("# result = gcp_scientific_simulation(example_lorenz_params)")
print("# print(f'Completed {result[\"num_simulations\"]} simulations')")
print("# print(f'Computation time: {result[\"computation_time\"]:.2f} seconds')")
print("#")
print("# Monte Carlo simulation:")
print("# mc_result = gcp_monte_carlo_simulation(n_samples=5000000)")
print("# print(f'Call option price: ${mc_result[\"call_price\"]:.2f}')")

print("\n🧪 These examples demonstrate GCP's computational capabilities:")
print("  • Parallel differential equation solving")
print("  • Statistical simulations with confidence intervals")
print("  • Cloud Storage integration for results")
print("  • Visualization generation and storage")

## Summary

This tutorial covered:

1. **Setup**: GCP authentication and Clustrix installation
2. **Compute Engine**: Direct VM configuration and management
3. **GKE Integration**: Kubernetes clusters for containerized workloads
4. **Cloud Batch**: Managed job scheduling for large-scale processing
5. **Cloud Storage**: Data management and result storage
6. **Vertex AI**: Machine learning platform integration
7. **Security**: Best practices for secure deployment
8. **Resource Management**: Proper cleanup procedures

### Cost Monitoring

For comprehensive cost monitoring, optimization strategies, and multi-cloud cost comparisons, see the dedicated [Cost Monitoring Tutorial](cost_monitoring_tutorial.ipynb).

### Next Steps

- Set up your GCP credentials and test the basic configuration
- Start with a simple Compute Engine instance for initial testing
- Consider GKE for containerized workloads and auto-scaling
- Explore Cloud Batch for large-scale batch processing
- Implement proper monitoring and access controls
- Review the Cost Monitoring Tutorial for expense tracking

### GCP-Specific Advantages

- **Preemptible/Spot VMs**: Exceptional cost savings (up to 80%)
- **Google Kubernetes Engine**: Industry-leading managed Kubernetes
- **Vertex AI**: Comprehensive ML platform with AutoML capabilities
- **Global Network**: Superior network performance and global reach
- **BigQuery Integration**: Seamless data analytics integration
- **Sustained Use Discounts**: Automatic discounts for sustained usage

### Resources

- [Google Cloud Compute Engine Documentation](https://cloud.google.com/compute/docs)
- [Google Kubernetes Engine Documentation](https://cloud.google.com/kubernetes-engine/docs)
- [Google Cloud Batch Documentation](https://cloud.google.com/batch/docs)
- [Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs)
- [Google Cloud Storage Documentation](https://cloud.google.com/storage/docs)
- [GCP Pricing Calculator](https://cloud.google.com/products/calculator)
- [Clustrix Documentation](https://clustrix.readthedocs.io/)
- [Clustrix Cost Monitoring Tutorial](cost_monitoring_tutorial.ipynb)

**Remember**: Always monitor your cloud costs and clean up resources when not in use!