# 🌀 Week 11-12 · Notebook 08 · Advanced Orchestration with Kubernetes (GKE)

This notebook explores deploying the **Manufacturing Copilot** to Google Kubernetes Engine (GKE) for maximum control, advanced deployment strategies, and complex networking scenarios. This represents the most advanced deployment option for our capstone project.


## 🎯 Learning Objectives

- **Package with Helm:** Use Helm, the Kubernetes package manager, to create a reusable and configurable chart for deploying the copilot application.
- **Configure Advanced Autoscaling:** Implement the Horizontal Pod Autoscaler (HPA) to automatically scale the number of application pods based on CPU utilization or custom metrics.
- **Implement Canary Rollouts:** Use a progressive delivery controller like Argo Rollouts to safely release new versions by gradually shifting traffic and running automated analysis.
- **Secure with a Service Mesh:** Introduce Anthos Service Mesh (ASM) or Istio to enforce mutual TLS (mTLS) for secure pod-to-pod communication and to gain deep traffic telemetry.


## 🧩 Scenario: High-Availability, Multi-Service Deployment

As the Manufacturing Copilot becomes business-critical, the requirements for its deployment have become more stringent. Cloud Run is excellent, but the operations team needs more control.

**New Requirements:**
1.  **Complex Service Topologies:** The application is no longer a single container. It now includes a sidecar container for real-time retrieval, which must be deployed alongside the main API container in the same pod.
2.  **Zero-Downtime, Risk-Managed Rollouts:** New versions must be rolled out to a small subset of users first (a "canary" release). The system must automatically monitor key metrics (like hallucination rate) and roll back if they degrade.
3.  **Strict Security:** All network traffic *between* services inside the Kubernetes cluster must be encrypted (mTLS).
4.  **Cost-Effective Scaling:** The application must scale based on real-time demand, but the team also needs to prevent service disruption during voluntary maintenance (e.g., node upgrades), which requires a `PodDisruptionBudget`.


## 🧱 Packaging with Helm

Helm allows us to package all our Kubernetes manifests (`Deployment`, `Service`, `HPA`, etc.) into a single, versioned, and configurable "chart."

**Our Helm Chart Structure:**
```
charts/
└── manufacturing-copilot/
    ├── Chart.yaml          # Metadata about the chart (name, version)
    ├── values.yaml         # Default configuration values
    ├── templates/          # Directory for the Kubernetes manifest templates
    │   ├── _helpers.tpl    # Helper templates and functions
    │   ├── deployment.yaml # Defines the Deployment resource
    │   ├── service.yaml    # Defines the Service resource
    │   ├── hpa.yaml        # Defines the HorizontalPodAutoscaler
    │   └── pdb.yaml        # Defines the PodDisruptionBudget
    └── ...
```


In [None]:
# --- charts/manufacturing-copilot/values.yaml ---
# This file defines the default configuration for our Helm chart.
# Users can override these values during installation.

from pathlib import Path
from textwrap import dedent

values_yaml_content = dedent("""
# Default values for manufacturing-copilot chart.
replicaCount: 2

image:
  repository: us-central1-docker.pkg.dev/my-gcp-project/copilot-repo/copilot-api
  pullPolicy: IfNotPresent
  # tag is overridden at deploy time, e.g., with the Git SHA
  tag: "latest"

service:
  type: ClusterIP
  port: 8080

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 75

# PodDisruptionBudget ensures high availability during voluntary disruptions.
podDisruptionBudget:
  enabled: true
  minAvailable: 1

# Application-specific configuration
config:
  plantId: "PUNE-IN"
  logLevel: "INFO"
  enableTranslation: false

# Resource requests and limits for the container
resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "1"
    memory: "2Gi"
""")

# Create the file for demonstration
helm_dir = Path("helm_example/manufacturing-copilot")
helm_dir.mkdir(parents=True, exist_ok=True)
(helm_dir / "values.yaml").write_text(values_yaml_content)

print("--- values.yaml ---")
print(values_yaml_content)


### 🧩 `deployment.yaml` Template with a Sidecar

This Helm template defines the main `Deployment` resource. It uses values from `values.yaml` (like `{{ .Values.image.repository }}`) to make the manifest configurable.

Note the two containers defined: `api` (our main FastAPI app) and `retriever-sidecar`.

```yaml
# --- charts/manufacturing-copilot/templates/deployment.yaml ---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-copilot
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Release.Name }}-copilot
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}-copilot
    spec:
      containers:
        # Main application container
        - name: api
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          env:
            - name: LOG_LEVEL
              value: {{ .Values.config.logLevel | quote }}
            - name: PLANT_ID
              value: {{ .Values.config.plantId | quote }}
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

        # Sidecar container for retrieval
        - name: retriever-sidecar
          image: "us-docker.pkg.dev/vertex-ai/vector-search-sidecar:latest"
          args:
            - "--db-uri=$(CHROMA_DB_URI)"
            - "--port=8081"
          env:
            - name: CHROMA_DB_URI
              valueFrom:
                secretKeyRef:
                  name: chroma-db-secret
                  key: uri
          ports:
            - name: grpc
              containerPort: 8081
```


## 🔄 Progressive Delivery with Argo Rollouts

Standard Kubernetes deployments are too basic for our needs. We will use **Argo Rollouts**, a Kubernetes controller that provides advanced deployment strategies like canary and blue-green.

**Key Features of our Canary Strategy:**
-   **Phased Rollout:** The new version is first rolled out to only 10% of pods.
-   **Automated Analysis:** After a waiting period, the system automatically queries Prometheus to check our `copilot_hallucination_rate` metric.
-   **Automatic Rollback:** If the hallucination rate for the canary version is higher than the stable version, Argo Rollouts automatically aborts the rollout and scales the new version down to zero.
-   **Manual Promotion:** If the analysis passes, the rollout pauses for a manual approval step before proceeding to 100%.


In [None]:
# --- templates/rollout.yaml (for Argo Rollouts) ---

argo_rollout_yaml = dedent("""
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: copilot-rollout
spec:
  replicas: {{ .Values.replicaCount }}
  strategy:
    canary:
      # Defines the two services that manage traffic to stable and canary pods
      canaryService: copilot-canary
      stableService: copilot-stable
      steps:
      - setWeight: 10  # 1. Send 10% of traffic to the new version
      - pause: { duration: 15m } # 2. Wait 15 minutes for metrics to collect
      
      # 3. Run an automated analysis against Prometheus
      - analysis:
          templates:
            - templateName: hallucination-check
          args:
            - name: service-name
              value: copilot-canary
      
      - setWeight: 50 # 4. If analysis passes, increase traffic to 50%
      - pause: {} # 5. Pause indefinitely for manual promotion
      
  # ... the rest of the pod template (selector, spec, etc.) is the same as the Deployment ...
""")

(helm_dir / "templates").mkdir(exist_ok=True)
(helm_dir / "templates" / "rollout.yaml").write_text(argo_rollout_yaml)

print("--- Argo Rollout Snippet ---")
print(argo_rollout_yaml)


## 🛡️ Zero-Trust Security with a Service Mesh

A service mesh like **Anthos Service Mesh (ASM)** or **Istio** provides a dedicated infrastructure layer for making service-to-service communication safe, fast, and reliable.

**How we will use it:**
1.  **Automatic mTLS:** The service mesh automatically injects a proxy sidecar into each of our pods. This proxy encrypts and decrypts all incoming and outgoing traffic, ensuring mutual TLS (mTLS) without any changes to our application code.
2.  **Fine-Grained Access Control:** We will apply a `PeerAuthentication` policy to enforce `STRICT` mTLS mode, meaning no unencrypted traffic is allowed within the mesh.
3.  **Traffic Telemetry:** The proxies collect detailed metrics, logs, and traces for all traffic, giving us deep visibility into how our services are communicating. This data can be visualized in tools like Kiali or the GCP console.


## 🧮 Final Capstone Readiness Review

This notebook concludes the core technical topics of the course. Before starting the final capstone project, let's review the production-readiness of our proposed system.

| Area                        | Technology Stack                               | Status & Confidence Level |
| --------------------------- | ---------------------------------------------- | ------------------------- |
| **1. MLOps Foundation**     | MLflow                                         | ✅ High                   |
| **2. API Service**          | FastAPI, Pydantic                              | ✅ High                   |
| **3. Containerization**     | Docker (Multi-stage), Trivy                    | ✅ High                   |
| **4. Monitoring**           | Prometheus, Grafana                            | ✅ High                   |
| **5. CI/CD Pipeline**       | GitHub Actions                                 | ✅ High                   |
| **6. IaC (Infrastructure)** | Terraform                                      | ✅ High                   |
| **7. Deployment (Simple)**  | GCP Cloud Run                                  | ✅ High                   |
| **8. Deployment (Advanced)**| GKE, Helm, Argo Rollouts, Service Mesh         | ⚠️ Medium (Complex)        |

**Conclusion:** We have covered all the necessary components to build and deploy a production-grade GenAI application. The choice between Cloud Run (simpler, serverless) and GKE (more control, more complex) for the final deployment will depend on the specific requirements of the capstone project.


## 🧪 Lab Assignment: Deploy to GKE with Helm

1.  **Create a GKE Cluster:**
    -   Use the `gcloud` CLI to create a regional GKE cluster in your GCP project. Enable Workload Identity and, if possible, Anthos Service Mesh.

2.  **Install Helm and Argo Rollouts:**
    -   Install the Helm CLI on your local machine.
    -   Follow the Argo Rollouts documentation to install the controller and its associated CRDs into your GKE cluster.

3.  **Package and Deploy the Chart:**
    -   From your `helm_example` directory, run `helm package ./manufacturing-copilot` to create a versioned chart archive (`.tgz` file).
    -   Use `helm install <release-name> <chart-archive>.tgz --set image.tag=<your-git-sha>` to deploy the application to your cluster.

4.  **Trigger and Observe a Rollout:**
    -   Update the image tag in your `values.yaml` or via the `--set` flag and run `helm upgrade ...`.
    -   Use the Argo Rollouts kubectl plugin (`kubectl argo rollouts get rollout <name>`) to watch the canary progression.
    -   Simulate a failure by manually scaling down the canary, and then promote the rollout manually once you are confident it is stable.


## ✅ Checklist for this Notebook

- [X] Helm chart created to package all Kubernetes manifests for the application.
- [X] Advanced deployment concepts like HPA and PodDisruptionBudgets are defined.
- [X] A canary release strategy using Argo Rollouts is designed for safe, progressive delivery.
- [X] The role of a service mesh for enforcing mTLS and providing telemetry is understood.
- [ ] **TODO:** Complete the Lab Assignment to get hands-on experience deploying a Helm chart to a GKE cluster.


## 📚 References and Further Reading

-   [Helm Documentation](https://helm.sh/docs/)
-   [Google Kubernetes Engine (GKE) Documentation](https://cloud.google.com/kubernetes-engine/docs)
-   [Argo Rollouts Documentation](https://argo-rollouts.readthedocs.io/en/stable/) - The official guide for advanced deployment strategies.
-   [Istio Documentation](https://istio.io/latest/docs/) (The open-source foundation for Anthos Service Mesh)
-   [Kubernetes PodDisruptionBudget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
