A production-grade "Self-Healing CI/CD Pipeline" built in Go. This system acts as an autonomous AI-driven DevOps engineer. It monitors CI/CD pipelines, diagnoses failures using contextual signals (commit diffs, logs, test reports) via LLMs, automatically generates code configurations/fixes, and enforces a human-in-the-loop governance layer before raising a final Pull Request to remediate the issue.
- Architecture & Workflow
- Multi-Agent System
- Technology Stack
- Prerequisites
- Installation & Setup
- Configuration
- Usage & Testing
- Project Structure
- Future Enhancements
The orchestration strictly follows an autonomous workflow driven by a centralized coordinator orchestrating discrete AI agents.
- Trigger: A GitHub webhook fires on a
workflow_runfailure. - Detection: The Gin server catches the payload and alerts the Monitor Agent.
- Context Gathering: The system fetches pipeline logs, test reports, and the breaking commit diff.
- Analysis: The Root Cause Agent uses an LLM to interpret the failure in human terms.
- Remediation: The Repair Agent writes a unified git patch intended to fix the failure.
- Governance: The Governance Agent reviews the fix. If it's high-risk (e.g., core infrastructure), it flags for human approval. If low-risk (e.g., typo, missing dependency), it auto-approves.
- Resolution: The PR Agent creates an automatic branch and raises a Pull Request with the fix and detailed explanations.
This system implements 5 specialized Agents inside a monolithic architecture pattern:
- Pipeline Monitoring Agent: Acts as the entry point, listening to GitHub webhooks, validating payloads, and extracting core pipeline metadata.
- Root Cause Analysis Agent: Consumes
Prompts + Contextto identify exact failure footprints (build error, dependency conflict, test failure). - Auto Repair Agent: Engineered to generate minimal structural patches to existing repositories based on identified root causes.
- Governance Agent: Enforces security and risk policy rules (LOW/MEDIUM/HIGH) evaluating the fix stringency.
- Pull Request Agent: Interfaces directly with the GitHub API to manage branches, commits, and PR descriptions for automated merge readiness.
- Orchestrator: The main brain that asynchronously chains the output of one agent into the input of the next.
- Go (Golang) 1.24: Core application language.
- Gin Framework: Lightning-fast webhook HTTP server.
- go-openai: Wrapper to interact natively with OpenAI GPT-4.
- go-github (v69): To fetch diffs, logs, and interact with repository trees.
- Uber Zap: High-performance structured logging.
- PostgreSQL: (Prepared as a Docker service) for future data store of pipeline metrics.
- Docker & Docker Compose: Containerization & Rootless security profiles.
- Go 1.24+ (If running locally instead of Docker)
- Docker & Docker Compose
- An OpenAI API Key with access to GPT-4 (or a compatible LLM).
- A GitHub Personal Access Token (PAT) with
repoandworkflowscopes. - ngrok (Recommended for local webhook testing).
-
Clone the repository:
git clone https://github.com/adithya11sci/agentic_cicd.git cd agentic_cicd -
Environment Configuration: Copy the example environment variables file and insert your keys.
cp .env.example .env
-
Running via Docker (Recommended): We provide a fully dockerized setup using a secure, rootless profile.
docker-compose up --build -d
The application will safely spin up on
localhost:8080. -
Running Locally (Go Native):
go mod tidy go run cmd/server.go
The .env file exposes the following core configurations:
| Variable | Description | Example |
|---|---|---|
PORT |
Listening port for the Gin Webhook server | 8080 |
LLM_API_KEY |
Your OpenAI API key for LLM agents | sk-... |
GITHUB_TOKEN |
Token for reading diffs/logs & opening PRs | ghp_... |
DATABASE_URL |
PostgreSQL connection string | postgres://user... |
To securely expose your local instance to GitHub to receive live Webhooks:
ngrok http 8080In your GitHub Repository:
- Go to
Settings->Webhooks->Add webhook. - Payload URL:
https://<your-ngrok-url>.ngrok.io/webhook/github - Content type:
application/json - Events: Select Workflow runs.
- Save. GitHub will send a
pingevent. The Monitor agent gracefully acknowledges it.
Trigger a broken build manually in your repository, and observe the Go logs via docker-compose logs -f to see the agents in action!
agentic-cicd/
|
|-- cmd/
| \-- server.go # Application entrypoint & HTTP server
|
|-- internal/
| |-- agents/ # Business logic for all 5 Agents
| | |-- monitor.go
| | |-- rootcause.go
| | |-- repair.go
| | |-- governance.go
| | \-- pragent.go
| |
| |-- services/ # Core integrations (LLM API & GitHub SDK)
| | |-- github.go
| | \-- llm.go
| |
| |-- models/ # Structs to maintain uniform payload structures
| | \-- event.go
| |
| |-- config/ # Environment Configuration loader
| | \-- config.go
| |
| \-- orchestrator/ # The coordination engine
| \-- orchestrator.go
|
|-- prompts/ # System prompts feeding instructions to the LLM
| |-- root_cause_prompt.txt
| |-- repair_prompt.txt
| \-- governance_prompt.txt
|
|-- docker-compose.yml # Docker compose containing App + Postgres
|-- Dockerfile # Rootless secure multi-stage builder
|-- go.mod / go.sum # Dependecy manager
\-- README.md # Project documentation
- Dashboard: Create a React/Next.js frontend to visualize AI confidence scores and approve manual governed tasks.
- Persistent State: Log LLM outputs to PostgreSQL to retrain/fine-tune custom internal models.
- Slack Integration: Send direct actionable buttons via Slack for
HIGHrisk governance tasks. - Predictive CI/CD: Predict flaky tests and pipeline failures before a workflow is even executed by running the Root Cause Analysis early in the git pre-commit hook.
Built for Agentic DevOps engineering.