<a href="https://colab.research.google.com/github/NeetishPathak/colab-notebooks/blob/main/final_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autonomous DevOps: Agentic Workflow for Cloud Infrastructure Deployment

#### Final Project Submission for TECH 16: LLMs for Business with Python, Summer 2025

#### Author: Neetish Pathak



## 🌐 Introduction

#### This notebook demonstrates a CrewAI-based agentic workflow for deploying a Redis data store on a Google Cloud Platform (GCP) virtual machine running Kubernetes. The system simulates autonomous, collaborative infrastructure automation by assigning CrewAI agents to assess capacity requirements, configure Redis, generate manifests, and execute each stage of deployment—triggered in response to a user prompt specifying the target workload.
<br/>

---

## 🎯 Key Objectives

1. Architect the Redis data service based on workload requirements.

2. Pre-ground the LLM using Redis deployment documentation via a lightweight Retrieval-Augmented Generation (RAG) pipeline to generate architecture-aware manifests.

3. Provision a virtual machine (VM) on Google Cloud Platform (GCP) using Terraform.

4. Validate VM readiness by testing SSH accessibility.

5. Install Docker on the VM (required by Kind).

6. Install Kind and kubectl for Kubernetes cluster setup.

7. Create a local Kubernetes cluster on the VM using Kind.

8. Deploy Redis using a minimal, resource-efficient Kubernetes manifest.  

<br/>

---

## 🔧 Pre-Deployment Phase
Before initiating the agentic workflow to provision infrastructure, we first prepare the Colab notebook environment with the necessary tools:

1. Install Google Cloud SDK (gcloud) to enable authentication and API access.

2. Install Terraform to provision GCP resources via infrastructure-as-code.

3. Download and register the Redis.pdf file, which serves as the foundational source for building a Retrieval-Augmented Generation (RAG) system. This document will be parsed and indexed by the PDFSearchTool, enabling semantic lookups during Redis architecture planning.


<br/>


## 🧠 LLM, RAG and Agentic Workflow

### 1. 📖 LLM Grounding and Lightweight RAG Construction

1. We leverage CrewAI's default gpt-4o-mini model to power agent reasoning and response generation throughout the workflow.
2. The uploaded `Redis.pdf` document serves as the authoritative source for Redis deployment patterns.
3. The document is embedded and indexed using the `PDFSearchTool` from the CrewAI framework.

```python
pdf_file = "Redis.pdf"
pdf_rag_search_tool = PDFSearchTool(pdf_file)
```

This tool enables agents to query Redis-specific deployment practices (developed by domain experts) dynamically during YAML generation.

A lightweight Retrieval-Augmented Generation (RAG) system is initialized at this step, enabling in-context retrieval for grounding the architecture decisions.

<br/>


### 2. 📂 Manifest Generation and Infra Deployment Agents
In addition to standard CrewAI tools, several custom tools are registered to handle different stages of the infrastructure lifecycle:

1.  **Provisioning a GCP VM:** Using Terraform to create a new virtual machine instance.
2.  **Validating VM Readiness:** Ensuring the VM is running and accessible via SSH.
3.  **Installing Docker:** Setting up Docker on the provisioned VM, as it's a prerequisite for Kind.
4.  **Installing Kind and Kubectl:** Installing the necessary tools for creating and managing Kubernetes clusters.
5.  **Creating a Kind Cluster:** Deploying a local Kubernetes cluster on the VM using Kind.
6.  **Deploying Redis:** Applying a Kubernetes manifest to deploy a Redis deployment and service within the Kind cluster.

The system utilizes a crew of specialized agents, each with specific roles and tools, to execute these tasks sequentially, providing a clear example of how autonomous agents can collaborate to achieve complex infrastructure deployment goals.


<br/>


### 3. 🤖 Crew AI
The full CrewAI orchestration follows a sequential process to achieve the key objectives

## Agentic Workflow Diagram

Here's a simplified flow diagram illustrating the sequence of tasks and the agents involved in this deployment workflow, using text-based boxes and arrows:

```bash
+----------------------------+       +----------------------------+
|          🌀 Start          | -->   | 📖 Redis RAG from PDF      |
+----------------------------+       | (Agent: redis_doc_loader)  |
                                     +----------------------------+
                                                 |
                  -------------------------------                         
                  |
                  V
+--------------------------------+               
| 🏢 Redis Architecture Plan     |     +--------------------------------+
| (Agent: infra_architect_agent) | --> | 🔢 Generate redis.yaml         |
+--------------------------------+     | (Agent: manifest_writer_agent) |
                                        +--------------------------------+
                                                      |
                  ------------------------------------                       
                  |
                  V
+----------------------------+       +-----------------------------------+
| 📀 Provision GCP VM        |       | 🚪 Validate VM Readiness          |
| (Agent: infra_vm_agent)    |---->  | (Agent: infra_vm_readiness_agent) |
+----------------------------+       +-----------------------------------+
                                                      |
              ----------------------------------------   
              |
              v                                   
+-------------------------------+     +-----------------------------------+
| 🛠️ Install Docker             | --> | 🧰 Install Kind & Kubectl         |
| (Agent: infra_docker_agent)   |     | (Agent: infra_kind_k8s_installer) |
+-------------------------------+     +-----------------------------------+
                                                      |
              ----------------------------------------                         
              |
              v                                      
+----------------------------------+      +--------------------------------+
| 🏢 Create Kind Kubernetes Cluster| -->  | 🚀 Deploy Redis.               |
| (Agent: infra_kind_k8s_creator)  |      | infra_redis_deployer)          |
+----------------------------------+.     +--------------------------------+   
                                                      |
                                                      v
                                          +--------------------------------+
                                          |           🚤 End               |
                                          +--------------------------------+
```


## Pre-Steps

In [75]:
%pip install crewai crewai[tools] openai termcolor



#### Required Secrets



In [76]:
# 🔐 Required Secrets Setup (via google.colab.userdata)
# These secrets are required for this notebook to function:
# - OPENAI_API_KEY: Used to authenticate with OpenAI's API for LLM-powered DevOps workflows.
# - MY_GCP_PROJECT: Your Google Cloud Project ID, required for deploying cloud resources.

from google.colab import userdata
import os

# Retrieve secrets securely from Colab's in-memory storage
openai_api_key = userdata.get('OPENAI_API_KEY')
gcp_project = userdata.get('MY_GCP_PROJECT')

# Ensure the secrets are set before continuing
if not openai_api_key or not gcp_project:
    raise ValueError("❌ Please set OPENAI_API_KEY and MY_GCP_PROJECT using `userdata.set` before running this cell.")

# Store them as environment variables for downstream libraries
os.environ['OPENAI_API_KEY'] = openai_api_key
os.environ['MY_GCP_PROJECT'] = gcp_project

print("✅ Secrets loaded successfully.")

✅ Secrets loaded successfully.


In [77]:
!wget https://raw.githubusercontent.com/NeetishPathak/colab-notebooks/main/Redis_Deployment_Guide.pdf -O Redis.pdf

--2025-07-31 16:48:11--  https://raw.githubusercontent.com/NeetishPathak/colab-notebooks/main/Redis_Deployment_Guide.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36457 (36K) [application/octet-stream]
Saving to: ‘Redis.pdf’


2025-07-31 16:48:11 (4.99 MB/s) - ‘Redis.pdf’ saved [36457/36457]



In [78]:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

### GCP Setup for Agents to use later

In [79]:
## We use GCP to setup a vm . Some presteps to setup a GCP project resources

# Install the CLI
!apt-get install -y lsb-release
!curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!echo "deb http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
!apt-get update -q
!apt-get install -y google-cloud-sdk

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
lsb-release is already the newest version (11.1.0ubuntu4).
0 upgraded, 0 newly installed, 0 to remove and 36 not upgraded.
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list.d/google-cloud-sdk.list:1 and /etc/apt/sources.list.d/google-cloud-sdk.list:2
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list.d/google-cloud-sdk.list:1 and /etc/apt/sources.list.d/google-cloud-sdk.list:2
OK
deb http://packages.cloud.google.com/apt cloud-sdk main
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Get:2 https://apt.releases.hashicorp.com jammy InRelease [12.9 kB]
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hi

In [80]:
## Setup terraform
!wget -O - https://apt.releases.hashicorp.com/gpg | gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
!echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(grep -oP '(?<=UBUNTU_CODENAME=).*' /etc/os-release || lsb_release -cs) main" | tee /etc/apt/sources.list.d/hashicorp.list
!apt update && apt install terraform
!terraform version

gpg: cannot open '/dev/tty': No such device or address
--2025-07-31 16:48:39--  https://apt.releases.hashicorp.com/gpg
Resolving apt.releases.hashicorp.com (apt.releases.hashicorp.com)... 3.165.160.125, 3.165.160.82, 3.165.160.119, ...
Connecting to apt.releases.hashicorp.com (apt.releases.hashicorp.com)|3.165.160.125|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3980 (3.9K) [binary/octet-stream]
Saving to: ‘STDOUT’

-                     0%[                    ]       0  --.-KB/s               -                     0%[                    ]       0  --.-KB/s    in 0s      


Cannot write to ‘-’ (Success).
deb [arch=amd64 signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com jammy main
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://apt.releases.hashicorp.com jammy InRelease
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:4 http

In [81]:
# Authenticate the user to use GCP
from google.colab import auth
auth.authenticate_user()

In [82]:
my_gcp_project_name = userdata.get('MY_GCP_PROJECT')
!gcloud config set project {my_gcp_project_name}

!gcloud services enable compute.googleapis.com

Updated property [core/project].


In [None]:
!gcloud iam service-accounts create terraform-agent --display-name="Terraform Agent"

for role in [
    "roles/compute.admin",
    "roles/iam.serviceAccountUser",
    "roles/serviceusage.serviceUsageAdmin"
]:
    !gcloud projects add-iam-policy-binding {my_gcp_project_name} \
        --member="serviceAccount:terraform-agent@{my_gcp_project_name}.iam.gserviceaccount.com" \
        --role="{role}" --quiet

!gcloud iam service-accounts keys create terraform-agent-key.json \
    --iam-account=terraform-agent@{my_gcp_project_name}.iam.gserviceaccount.com

In [84]:
import os

# Replace this with your actual GCP project name
my_gcp_project_name = userdata.get('MY_GCP_PROJECT')

# Define the service account email
sa_email = f"terraform-agent@{my_gcp_project_name}.iam.gserviceaccount.com"

# Step 1: Check if service account exists
!gcloud iam service-accounts list --filter="email:{sa_email}" --format="value(email)" | grep -q {sa_email} || \
    gcloud iam service-accounts create terraform-agent --display-name="Terraform Agent"

# Step 2: Bind roles if not already bound (no built-in deduplication, so it's safe to re-run)
for role in [
    "roles/compute.admin",
    "roles/iam.serviceAccountUser",
    "roles/serviceusage.serviceUsageAdmin"
]:
    os.system(f'''
    gcloud projects add-iam-policy-binding {my_gcp_project_name} \
        --member="serviceAccount:{sa_email}" \
        --role="{role}" --quiet
    ''')

# Step 3: Create service account key if it doesn't already exist
key_file = "terraform-agent-key.json"
if not os.path.exists(key_file):
    !gcloud iam service-accounts keys create {key_file} --iam-account={sa_email}
else:
    print(f"✅ Key file '{key_file}' already exists. Skipping creation.")


created key [eeae5a908e489f26395642ea783c2a6e8fec771c] of type [json] as [terraform-agent-key.json] for [terraform-agent@my-user-admin-project.iam.gserviceaccount.com]


## Sample Redis Yaml

## CrewAi Agentic Workflow

### Creating Custom Tools

In [85]:
# Terraform Config + CrewAI Tooling: Provision GCP VM

# --- Terraform Config (main.tf) ---
main_tf = """
provider "google" {
  project     = var.project
  region      = var.region
  zone        = var.zone
  credentials = file(var.credentials_file)
}

resource "google_compute_instance" "docker_vm" {
  name         = var.instance_name
  machine_type = var.machine_type
  zone         = var.zone

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
      size  = 30
    }
  }

  network_interface {
    network = "default"
    access_config {}
  }
}
"""

# --- Terraform Variables (variables.tf) ---
variables_tf = """
variable "project" {}
variable "region" { default = "us-central1" }
variable "zone" { default = "us-central1-a" }
variable "credentials_file" { default = "account.json" }
variable "instance_name" { default = "docker-agent-vm" }
variable "machine_type" { default = "e2-micro" }
"""

# --- CrewAI Tool: Terraform VM Creator ---
import subprocess
import os, time, stat
from crewai.tools import BaseTool

class GCPVMTool(BaseTool):
    def __init__(self, name="GCPVMTool", description="Creates a GCP VM using Terraform if it doesn't already exist", **kwargs):
        super().__init__(name=name, description=description, **kwargs)

    def _run(self, input_type: dict) -> str:
        try:
            from google.colab import userdata  # Only available in Colab
            tf_path="terraform"

            def remove_readonly(func, path, _):
              # Change the file to be writable, then delete
              os.chmod(path, stat.S_IWRITE)
              func(path)

            if os.path.exists(tf_path):
              try:
                  shutil.rmtree(tf_path, onerror=remove_readonly)
              except Exception as e:
                  return f"❌ Failed to remove existing terraform directory: {e}"


            os.makedirs("terraform", exist_ok=True)

            # Write Terraform files
            with open(os.path.join(tf_path, "main.tf"), "w") as f:
                f.write(main_tf)
            with open(os.path.join(tf_path, "variables.tf"), "w") as f:
                f.write(variables_tf)

            project_id = userdata.get("MY_GCP_PROJECT")

            if not project_id:
                return "❌ Missing required secrets: MY_GCP_PROJECT"

            # Load service account secrets from
            key_file_path = "terraform-agent-key.json"

            if not os.path.exists(key_file_path):
                return f"❌ Key file '{key_file_path}' does not exist."

            with open(key_file_path, "r") as f:
                key_data = f.read().strip()

            if not key_data or not key_data.startswith('{'):
                return "❌ Key file is empty or not a valid JSON key."

            credentials_path = os.path.join(tf_path, "account.json")
            with open(credentials_path, "w") as f:
                f.write(key_data)

            # Create terraform.tfvars
            with open(os.path.join(tf_path, "terraform.tfvars"), "w") as f:
                f.write(f"""
project = "{project_id}"
region = "us-central1"
zone = "us-central1-a"
credentials_file = "account.json"
instance_name = "docker-agent-vm"
machine_type = "e2-medium"
""")
            apply_init = subprocess.run(["terraform", "init"], cwd=tf_path, check=True, capture_output=True, text=True)
            if apply_init.returncode != 0:
                return f"❌ Terraform init failed:\nSTDOUT:\n{apply_init.stdout}\nSTDERR:\n{apply_init.stderr}"

            # Plan with exit code check
            plan_proc = subprocess.run(["terraform", "plan", "-detailed-exitcode"], cwd=tf_path, capture_output=True, text=True)

            if plan_proc.returncode == 0:
                return "✅ VM already exists — Terraform reports no changes needed."
            elif plan_proc.returncode == 1:
                return f"❌ Terraform plan failed:\n{plan_proc.stderr}"
            elif plan_proc.returncode == 2:
                # Proceed to apply
                apply_proc = subprocess.run(["terraform", "apply", "-auto-approve"], cwd=tf_path, capture_output=True, text=True)
                if "already exists" in apply_proc.stderr.lower():
                    return "✅ VM already exists — Terraform reports no changes needed."
                if apply_proc.returncode != 0:
                    return f"❌ Terraform apply failed:\n{apply_proc.stderr}"
                return "✅ VM created successfully via Terraform."

            return "❌ Unknown terraform plan exit code — something went wrong."

        except subprocess.CalledProcessError as e:
            return f"❌ Terraform failed: {e}"
        except Exception as e:
            return f"❌ Unexpected error: {str(e)}"

In [86]:
import subprocess
import os, time
from crewai.tools import BaseTool

class DockerInstallTool(BaseTool):
    def __init__(self, name="DockerInstaller", description="Installs and validates Docker on an existing GCP VM.", **kwargs):
        super().__init__(name=name, description=description, **kwargs)

    def _run(self, input_type: dict) -> str:
        try:
            # Check if Docker is already installed
            check_cmd = [
                "gcloud", "compute", "ssh", "docker-agent-vm", "--zone", "us-central1-a",
                "--command", "docker --version"
            ]
            result = subprocess.run(check_cmd, capture_output=True, text=True)
            if result.returncode == 0:
                return f"✅ Docker already installed: {result.stdout.strip()}"

            # Install Docker
            install_cmd = [
                "gcloud", "compute", "ssh", "docker-agent-vm", "--zone", "us-central1-a",
                "--command", "curl -fsSL https://get.docker.com | sh", "--quiet"
            ]
            install_result = subprocess.run(install_cmd, capture_output=True, text=True)
            if install_result.returncode != 0:
                return f"❌ Docker installation failed:\n{install_result.stderr}"

            return "✅ Docker installed successfully."
        except Exception as e:
            return f"❌ Docker installation error: {str(e)}"

In [87]:
import subprocess
from crewai.tools import BaseTool

class VMReadinessValidator(BaseTool):
    def __init__(self, name="VMReadinessValidator", description="Checks if the GCP VM is running and SSH-ready.", **kwargs):
        super().__init__(name=name, description=description, **kwargs)

    def _run(self, input_type: dict) -> str:
        try:
            vm_name = "docker-agent-vm"
            zone = "us-central1-a"


            # Check VM status
            status_cmd = [
                "gcloud", "compute", "instances", "describe", vm_name,
                "--zone", zone,
                "--format=value(status)"
            ]
            status = subprocess.check_output(status_cmd).decode().strip()
            if status != "RUNNING":
                return f"❌ VM exists but is not ready (status: {status})"


            # Wait buffer before SSH attempt
            wait_time = 10  # seconds
            print(f"⏳ Waiting {wait_time} seconds before checking VM SSH readiness...")
            time.sleep(wait_time)


            # Try a lightweight SSH check
            ssh_check_cmd = [
                "gcloud", "compute", "ssh", vm_name,
                "--zone", zone,
                "--command", "echo READY",
                "--quiet"
            ]
            result = subprocess.run(ssh_check_cmd, capture_output=True, text=True)
            if result.returncode != 0 or "READY" not in result.stdout:
                return f"❌ VM is running but SSH failed:\n{result.stderr}"

            return "✅ VM is running and SSH-ready."

        except subprocess.CalledProcessError as e:
            return f"❌ Error checking VM readiness: {e.stderr}"
        except Exception as e:
            return f"❌ Unexpected error during readiness check: {str(e)}"

In [88]:
# Custom CrewAI Tools
import subprocess
from crewai.tools import BaseTool

class KindK8sInstallerTool(BaseTool):
    def __init__(self, name="KindK8sInstaller", description="Installs Kind akd Kubectl on a GCP VM via SSH", **kwargs):
        super().__init__(name=name, description=description, **kwargs)

    def _run(self, input_type: dict) -> str:
        try:
            vm_name = "docker-agent-vm"
            zone = "us-central1-a"

            # Step 1: Check if kind is already installed
            check_cmd = [
                "gcloud", "compute", "ssh", vm_name, "--zone", zone,
                "--command", "which kind && kind --version && which kubectl && kubectl version --client",
                "--quiet"
            ]
            check = subprocess.run(check_cmd, capture_output=True, text=True)
            if check.returncode == 0:
                return f"✅ kind and kubectl are already installed:\n{check.stdout.strip()}"

            install_and_create_script = f"""
echo '☸️ Installing kind...'
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.29.0/kind-linux-amd64 && chmod +x ./kind && sudo mv ./kind /usr/local/bin/kind
echo '☸️ Installing kubectl...'
curl -LO https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl && chmod +x ./kubectl && sudo mv ./kubectl /usr/local/bin/kubectl
"""
            # Step 2: Download and install kind
            install_cmd = [
                "gcloud", "compute", "ssh", vm_name, "--zone", zone,
                "--command", install_and_create_script,
                "--quiet"
            ]
            install = subprocess.run(install_cmd, capture_output=True, text=True)
            if install.returncode != 0:
                return f"❌ Failed to install kind and kubectl:\n{install.stderr}"

            # Step 3: Verify installation
            verify_cmd = [
                "gcloud", "compute", "ssh", vm_name, "--zone", zone,
                "--command", "kind --version && kubectl version --client",
                "--quiet"
            ]
            verify = subprocess.run(verify_cmd, capture_output=True, text=True)
            if verify.returncode != 0:
                return f"⚠️ kind and kubectl install ran, but verification failed:\n{verify.stderr}"

            return f"✅ Kind and kubectl installed successfully:\n{verify.stdout.strip()}"

        except Exception as e:
            return f"❌ Kind and kubectl installation error: {str(e)}"

In [89]:
from crewai.tools import BaseTool
import subprocess

class KindClusterDeployerTool(BaseTool):
    def __init__(self, name="KindClusterDeployer", description="Creates a Kubernetes cluster using Kind on a GCP VM.", **kwargs):
        super().__init__(name=name, description=description, **kwargs)

    def _run(self, input_type: dict) -> str:
        try:
            with open("kind-config.yaml", "w") as f:
              f.write("""
          kind: Cluster
          apiVersion: kind.x-k8s.io/v1alpha4
          nodes:
            - role: control-plane
            - role: worker
            - role: worker
          """)

             # Step 1: Copy kind config to VM
            scp_config = subprocess.run([
                "gcloud", "compute", "scp", "kind-config.yaml",
                "docker-agent-vm:~/kind-config.yaml",
                "--zone", "us-central1-a",
                "--quiet"
            ], capture_output=True, text=True)

            cluster_name = "redis-cluster"
            create_cmd = [
                "gcloud", "compute", "ssh", "docker-agent-vm",
                "--zone", "us-central1-a",
                "--command", f"echo '☸️ Creating Kind cluster'; kind create cluster --name {cluster_name} --config kind-config.yaml",
                "--quiet"
            ]
            result = subprocess.run(create_cmd, capture_output=True, text=True)
            if "already exist" in result.stderr.lower():
                return f"✅ Kind cluster '{cluster_name}' already exists. Ok to Proceed"
            if result.returncode != 0:
                return f"❌ Failed to create Kind cluster:\n{result.stderr}"

            return f"✅ Kind cluster '{cluster_name}' created successfully:\n{result.stdout}"
        except Exception as e:
            return f"❌ Error during Kind cluster creation: {str(e)}"

In [90]:
from crewai.tools import BaseTool
import subprocess

class RedisDeployerTool(BaseTool):
    def __init__(self, name="RedisDeployer", description="Deploys a Redis pod/service using redis.yaml into Kind cluster.", **kwargs):
        super().__init__(name=name, description=description, **kwargs)

    def _run(self, input_type: dict) -> str:
        try:
            # Step 1: Copy redis.yaml to remote VM
            scp_cmd = [
                "gcloud", "compute", "scp", "redis.yaml",
                "docker-agent-vm:~/redis.yaml",
                "--zone", "us-central1-a",
                "--quiet"
            ]
            scp_result = subprocess.run(scp_cmd, capture_output=True, text=True)
            if scp_result.returncode != 0:
                return f"❌ Failed to copy redis.yaml to VM:\n{scp_result.stderr}"

            # Step 2: Apply it to Kind
            apply_cmd = [
                "gcloud", "compute", "ssh", "docker-agent-vm",
                "--zone", "us-central1-a",
                "--command", "echo '🔴 Deploying Redis to Kind'; kubectl apply -f redis.yaml",
                "--quiet"
            ]
            apply_result = subprocess.run(apply_cmd, capture_output=True, text=True)
            if apply_result.returncode != 0:
                return f"❌ Failed to apply redis.yaml:\n{apply_result.stderr}"

            return f"✅ Redis deployed successfully:\n{apply_result.stdout}"

        except Exception as e:
            return f"❌ Error during Redis deploy: {str(e)}"


### Tools List

In [91]:
# Tools List

# Crew AI Tools

from crewai_tools import FileWriterTool, PDFSearchTool
from pathlib import Path
import os
import platform
import subprocess
import shutil
import urllib.request


# PDF Search Tool
pdf_file = "Redis.pdf"
pdf_rag_search_tool = PDFSearchTool(pdf_file)

# File Write Tools
file_writer_tool = FileWriterTool()

# Custom Tools
gcp_vm_tool = GCPVMTool(description="Deploys a VM in GCP.")
gcp_vm_readiness_tool = VMReadinessValidator(description="Validates that VM is ready for deployment")
gcp_docker_tool = DockerInstallTool(description="Installs Docker on an existing GCP VM.")
kind_installer_tool = KindK8sInstallerTool(description="Installs Kind and kubectl Tool for Kubernetes deployments.")
kind_k8s_creator_tool = KindClusterDeployerTool(description="Creates a Kubernetes cluster using Kind on a GCP VM.")
redis_deployer_tool = RedisDeployerTool(description="Deploys a Redis deployment/service using redis.yaml into Kind cluster.")

  util.warn_deprecated(


### Agents

In [100]:
# CrewAI Agents

from crewai import Agent

## Agents
architect = Agent(
  role="{data_store} System Architect",
  goal="Design optimal {data_store} cluster configurations tailored to user workloads and environments.",
  backstory=("You are a principal system architect specializing in distributed data stores, particularly {data_store}. "
        "You are an expert in {data_store} internals and Kubernetes-ready deployments."
        "Use the tool to get the guidance on redis configuration. Do not use external knowledge beyond what is contained in the pdf file {pdf_file} "
        "Pass your question as a plain string to the 'query' parameter — for example: "
        "{ \"query\": \" Provide a kubernetes compatible Redis deployment and service in yaml format for 100 ops/sec workload to deploy in a kubernetes cluster. "
        "Access pattern is 80% reads , 20% writes . Be mindful of the cpu and memory budget on the vm. Deployment's cpu and memory requests/limits should be resonablly lower than vm's limits. It must be always 50% less than available cpu and memory budget on the vm\" }. "
        "Do not include 'description' or 'type' keys. Do not wrap the string inside another dictionary."
        "Just say, your capacity is limited to certain use cases only at the moment if the query is outside the scope of the pdf."),
  verbose=False,
  tools=[pdf_rag_search_tool],
  allow_delegation=False
)


manifest_writer_agent = Agent(
  role="{data_store} manifest Writer",
  goal="Vets and writes yaml configurations to the target directory.",
  backstory=("You are a dedicated writer agent specializing in {data_store} configurations. "
        "You leverage the File Writer Tool to create and update configuration files as needed."),
  verbose=False,
  tools=[file_writer_tool],
  allow_delegation=False,
)

infra_vm_agent = Agent(
    role="GCP VM Creator ",
    goal="Provision a GCP VM using Terraform.",
    backstory=("You are an experienced infra admin responsible for setting up compute vm on Google Cloud Platform using terraforms"
              "You ensure the vm is properly setup"
              ),
    verbose=False,
    tools=[gcp_vm_tool],
    allow_delegation=False,
    max_iterations=5
)

infra_vm_readiness_agent = Agent(
    role="GCP VM Readiness Agent ",
    goal="Checks the readiness of a GCP VM for ssh and future deployments",
    backstory=("You are an experienced infra admin responsible for validating the ssh readiness of a compute vm on Google Cloud Platform "
              "You ensure the vm is properly running and ready to ssh to install other tools"
              ),
    verbose=False,
    tools=[gcp_vm_readiness_tool],
    allow_delegation=False,
    max_iterations=5
)

infra_docker_agent = Agent(
    role="Docker Installer",
    goal="Provision Docker on a GCP VM.",
    backstory=("You are an experienced infra admin responsible for installing docker on a GCP VM"
              "You ensure docker is installed which is important for kind k8s cluster to be installed later"
              ),
    verbose=False,
    tools=[gcp_docker_tool],
    allow_delegation=False
)

infra_kind_k8s_installer = Agent(
  role="kind and kubectl cluster installer",
  goal = "Installs the kind and kubectl tool and ensure the environment is ready for K8s cluster creation deployment.",
  backstory=("You are an experienced infra admin responsible for setting up and managing the kind cluster. "
        "You ensure that the environment is properly configured and ready for deployment of {data_store} configurations."),
  verbose=False,
  tools=[kind_installer_tool],
  allow_delegation=False
)

infra_kind_k8s_creator = Agent(
  role="kind kubernetes cluster creator",
  goal = "Creates the kind kubernetes cluster setup and ensure the environment is ready for redis deployment.",
  backstory=("You are an experienced infra admin responsible for setting up kubernetes cluster using kind."
        "You ensure that the environment is properly configured and ready for deployment of {data_store} configurations."),
  verbose=False,
  tools=[kind_k8s_creator_tool],
  allow_delegation=False
)

infra_redis_deployer = Agent(
  role="Redis deployer",
  goal = "Creates the redis deployment and service in the kuberntes kind on the gcp vm",
  backstory=("You are an experienced infra admin responsible for deploying redis deployment and service on kind kubernetes cluster."
        "You ensure that the redis is properly deployed "),
  verbose=False,
  tools=[redis_deployer_tool],
  allow_delegation=False
)


### Tasks

In [104]:
from crewai import Task

## Tasks

# ----------------------------------------------------
# Define the architect's task to suggest a cluster design
# ----------------------------------------------------
design_cluster = Task(
    description=(
        "Design an optimal {data_store} architecture for this workload:\n"
        "{workload_description}\n\n"
        "Respond in this exact format:\n"
        "---\n"
        "Architecture Summary:\n<brief text>\n\n"
        "---\n"
        "Key Decisions:\n- <bullet1>\n- <bullet2>\n\n"
        "```yaml\n<valid Kubernetes YAML>\n```\n"
        "Do not add any extra commentary."
    ),
    expected_output="A summary, key decisions, and a valid Kubernetes YAML block.",
    agent=architect,
)

# ----------------------------------------------------
# Define the writer's task to create a manifest writer
# ----------------------------------------------------
write_manifest = Task(
    description=(
        "Validate the Kubernetes YAML and write it to `{data_store}.yaml`. Keep name of the file `{data_store}.yaml` in lower case. \n"
        "Use the File Writer Tool.\n"
        "If the file exists, overwrite it.\n"
        "**Important**: The YAML must be valid and ready for `kubectl apply`."
    ),
    expected_output="Confirmation that the YAML was successfully written.",
    agent=manifest_writer_agent,
    context=[design_cluster]
)

# ----------------------------------------------------
# Define the Infra's setup task to create a vm
# ----------------------------------------------------
infra_vm_setup = Task(
    description=(
        "Set up a vm on gcp \n"
        "Use the infra_vm_agent Tool. Pass a directory as a string to store terraform configurations\n"
        "If already set up, confirm readiness.\n"
    ),
    expected_output="Confirmation that the vm is created and ready",
    agent=infra_vm_agent,
    context=[write_manifest]
)

# ----------------------------------------------------
# Define the Infra's setup task to validate readiness of a vm
# ----------------------------------------------------
infra_vm_readiness_setup = Task(
    description=(
        "Check status of a vm on gcp as running and it should be able to ssh\n"
        "Use the VM readiness Tool. pass dictionary input with tool name\n"
        "If set up correctly, confirm readiness.\n"
    ),
    expected_output="Confirmation that the vm is running , can ssh and ready for docker installation",
    agent=infra_vm_readiness_agent,
    context=[infra_vm_setup]
)

# ----------------------------------------------------
# Define the Infra's setup task to install docker on vm
# ----------------------------------------------------
infra_docker_setup = Task(
    description=(
        "Install and validate docker on a gcp vm \n"
        "Use the docker tool. pass dictionary input with tool name\n"
        "If already set up, confirm readiness. Check docker is ready after installation is complete\n"
    ),
    expected_output="Confirmation that the vm with docker is ready for kind deployment.",
    agent=infra_docker_agent,
    context=[infra_vm_readiness_setup]
)


# ----------------------------------------------------
# Define the Infra's setup task to install kind tool
# ----------------------------------------------------
infra_kind_k8s_install = Task(
    description=(
        "Installs kind and k8s tool for deploying {data_store}.\n"
        "Use the Kind Installer Tool. pass dictionary input with tool name\n"
        "If already set up, confirm readiness.\n"
        "Provide access instructions if setup is successful."
    ),
    expected_output="Confirmation that the kind and kubectl is installed and ready for k8s deployment.",
    agent=infra_kind_k8s_installer,
    context=[infra_docker_setup]
)

# ----------------------------------------------------
# Define the Infra's setup task to create a kind k8s cluster
# ----------------------------------------------------
infra_kind_k8s_setup = Task(
    description=(
        "Create kind kubernetes tool for deploying {data_store}.\n"
        "Use the Kind Creator Tool. pass dictionary input with tool name\n"
        "If already set up, confirm readiness.\n"
        "Provide access instructions if setup is successful."
    ),
    expected_output="Confirmation that the kind kubernetes cluster is created and ready for {data_store} deployment.",
    agent=infra_kind_k8s_creator,
    context=[infra_kind_k8s_install]
)

# ----------------------------------------------------
# Define the Infra's setup task to deploy redis on kubernetes
# ----------------------------------------------------
infra_redis_deploy = Task(
    description=(
        "Deploy deployments and service for deploying {data_store}.\n"
        "Use the redis deployer tool. pass dictionary input with tool name\n"
        "If already set up, confirm readiness and success.\n"
    ),
    expected_output="Confirmation that the {data_store} yaml is deployed",
    agent=infra_redis_deployer,
    context=[infra_kind_k8s_setup]
)

### Crew

In [105]:
from crewai import Crew, Process

## Crew
crew = Crew(
  agents=[architect, manifest_writer_agent, infra_vm_agent, infra_vm_readiness_agent, infra_docker_agent, infra_kind_k8s_installer, infra_kind_k8s_creator, infra_redis_deployer],
  tasks=[design_cluster, write_manifest, infra_vm_setup, infra_vm_readiness_setup, infra_docker_setup, infra_kind_k8s_install, infra_kind_k8s_setup, infra_redis_deploy],
  process=Process.sequential,
  verbose=True
)

### Run Crew

In [106]:
import os
from pathlib import Path
from termcolor import colored
import warnings

warnings.filterwarnings("ignore", category=UserWarning, module="pydantic")

inputs = {
  "data_store": "redis",
  "workload_description": (
    "A low-volume website. "
    "It handles 100 ops/sec with 80% reads, 20% writes. "
  ),
  "target_file": "redis.yaml",
  "pdf_file": "Redis.pdf",
  "terraform_dir" : "terraform"
}


print("""
Kick off the crew process with the provided inputs.
"""
)
print(colored("🚀 Crew execution started...\n", "yellow"))


crew.kickoff(inputs=inputs)

print(colored("\n🧠 All Tasks Finished  ", "green"))


Kick off the crew process with the provided inputs.

🚀 Crew execution started...



Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()


🧠 All Tasks Finished  


## Validations

In [None]:
import subprocess

def run_test_step(step_name: str, ssh_command: str, icon: str = "🔹"):
    full_command = [
        "gcloud", "compute", "ssh", "docker-agent-vm",
        "--zone", "us-central1-a",
        "--command", f"echo '{icon} {step_name}'; {ssh_command}",
        "--quiet"
    ]
    print(f"\n{icon} Running: {step_name}")
    print("-" * 60)

    result = subprocess.run(full_command, capture_output=True, text=True)
    print(result.stdout)
    if result.returncode != 0:
        print(result.stderr)
        raise RuntimeError(f"❌ Step failed: {icon} {step_name}")
    else:
        print(f"✅ Step passed: {icon} {step_name}")
    print("-" * 60)
    print("\n")

try:
    # System & Docker validation
    run_test_step("Checking VM OS version", "uname -a", "💻")
    run_test_step("Checking Docker version", "docker --version", "🐳")
    run_test_step("Checking Docker status", "docker ps", "📦")

    # Kind and Kubernetes
    run_test_step("Checking Kind version", "kind --version", "☸️")
    run_test_step("Checking Kubernetes nodes in Kind", "kubectl get nodes -o wide", "🧩")

    # Redis (raw Docker test)
    run_test_step("Running Redis container", "docker run -d -p 6379:6379 --name redis redis", "🔴")
    run_test_step("Checking running containers (should include Redis)", "docker ps", "🔍")
    run_test_step("Cleaning up Redis container", "docker stop redis && docker rm redis", "🧹")

    # Redis in Kubernetes (Kind)
    run_test_step("Checking Redis deployment", "kubectl get deployment redis-deployment", "📦")
    run_test_step("Checking Redis service", "kubectl get service redis-service", "📡")
    run_test_step("Checking Redis pods status", "kubectl get pods -l app=redis", "🧬")

    print("\n🎉 All validation steps passed successfully! 🚀")
    print("You're ready to build with 🐳 Docker, ☸️ Kind, and 🔴 Redis on GCP 💻")

except RuntimeError as e:
    print(f"\n🚨 Validation failed:\n{e}")
    print("Please check the above output and rerun once resolved. ❗")


### Optional Clean up

In [99]:
# 🔄 Optional: Remove the vm
!gcloud compute instances delete docker-agent-vm --zone us-central1-a --quiet

In [98]:
# 🔄 Optional: Remove the generated redis.yaml
!rm redis.yaml

In [None]:
# Just a Sample redis file
# This will Not  be used in agent . Agent will genarate a valid yaml based on asked workload

with open("redis.yaml", "w") as f:
  f.write("""
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-deployment
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7.2
          ports:
            - containerPort: 6379
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: redis-service
spec:
  type: NodePort
  selector:
    app: redis
  ports:
    - protocol: TCP
      port: 6379
      targetPort: 6379
      nodePort: 30079  # You can pick any port in the 30000–32767 range

  """)