KubernetesCrew

TFM - AI-Powered DevOps Automation System

The repository contains a DevOps automation assistant that leverages AI agents, vector databases, and comprehensive tooling to streamline infrastructure management and operations, designed for Kubernetes as part of the final project of my Master's Thesis in Applied Artificial Intelligence. This is repo is indexed with DeepWiki, you can ask questions about it here.

Features

AI Agent Crews: Powered by CrewAI for task automation
Knowledge Management: Vector-based RAG system using Weaviate
File Operations: File editing with version control integration
Web Research: Web browsing and information extraction
Container Analysis: Docker image analysis and registry search
Kubernetes Integration: Safe kubectl operations
Security Scanning: Built-in security validation tools

Prerequisites

Python 3.10.x - 3.11.x (Python 3.12+ not supported)
Docker & Docker Compose
Kind
Poetry
Popeye
8GB+ RAM (16GB+ recommended)
Internet access
Download the knowlegde (available in here) and copy it to knowledge folder or create and configure your own

Quick Start

1. Environment Setup

Clone the repository and install dependencies:

git clone https://github.com/TheSOV/TFM
cd TFM
poetry install

2. Configuration

Copy the example environment file and configure your settings:

cp .env.example .env

Edit .env with your configuration:

AI/LLM Configuration

# OpenAI API (required for AI agents)
OPENAI_API_KEY=your_openai_api_key_here

# Alternative: OpenRouter API/ In roadmap
OPENROUTER_API_KEY=your_openrouter_key
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"

# Model Configuration
AGENT_MAIN_MODEL="openai/gpt-4.1-mini" # model that will be used by the agents during reasoning and to answer the user
AGENT_TOOL_CALL_MODEL="openai/gpt-4.1-mini" # model that will be used by the agents during tool calls
TOOL_MODEL="openai/gpt-4.1-mini" # model that will be used by the tools (in most case, to summarize raw information gathered by the RAG and Web Research tools)
GUARDRAIL_MODEL="openai/gpt-4.1-nano" # model that will be used by the guardrails (to validate the agents' responses)

Vector Database Configuration

# Weaviate Vector Database
WEAVIATE_API_KEY=your_secure_weaviate_key
WEAVIATE_HOST="127.0.0.1"
WEAVIATE_PORT="8080"
WEAVIATE_GRPC_PORT="50051"

Knowledge Management

# Embedding Model Configuration
LATE_CHUNKING_MODEL_NAME="jinaai/jina-embeddings-v3" # model that will be used to generate embeddings for the knowledge
LATE_CHUNKING_HEADERS_TO_SPLIT_ON=[("#", "h1"), ("##", "h2")] # headers to split on for markdown files
LATE_CHUNKING_MAX_CHUNK_CHARS=2048 # maximum chunk size in characters
LATE_CHUNKING_DEVICE="cuda"  # or "cpu", "cuda" is recommended for GPU acceleration

# Knowledge Ingestion
INGEST_KNOWLEDGE_SUMMARY_MODEL="gpt-4.1-nano" # model that will be used to generate summaries for the knowledge
INGEST_KNOWLEDGE_CONFIG_PATH="config/knowledge/knowledge.yaml" # path to the knowledge ingestion configuration file
INGEST_KNOWLEDGE_OVERRIDE_COLLECTION=True # override the collection if it already exists. When this option is enabled, the ingestion process will check at the begining if the collection already exists and if it does, it will be deleted and recreated. This verification occurs before the ingestion process starts, if multiple knowledge sources are configured with the same collection name, they will still be merged.

File System Configuration

# Working Directories
TEMP_FILES_DIR="temp" # directory where the kubernetes YAML files will be stored during the assistant's execution
CONFIG_FILES_DIR="config" # directory where configuration files are stored
CREWAI_STORAGE_DIR="./memory" # directory where CrewAI will store its memory (on roadmap)

Kubernetes Configuration

# kubectl Setup
KUBECTL_PATH="kubectl"  # or full path on Windows
KUBECTL_ALLOWED_VERBS="get,describe,logs,apply,diff,delete,create,patch,exec,cp,rollout,scale" # verbs allowed to be used by the assistant
KUBECTL_SAFE_NAMESPACES=""  # comma-separated list of all safe namespaces, leave empty to allow all namespaces
KUBECTL_DENIED_NAMESPACES="kube-system,kube-public" # comma-separated list of all denied namespaces
KUBECTL_DENY_FLAGS="--raw,--kubeconfig,--context,-ojsonpath,--output" # comma-separated list of all denied flags
K8S_VERSION="v1.29.0" # kubernetes version targeted

External API Keys

# Web Research APIs
STACK_EXCHANGE_API_KEY=your_stack_exchange_key # stack exchange API key
BRAVE_API_KEY=your_brave_search_key # brave search API key

Security Tools

POPEYE_PATH="/path/to/popeye"  # Kubernetes cluster scanner

Configure knowledge sources and ingestion

The config/knowledge/knowledge.yaml file, defines how knowledge ingestion system processes documents for the RAG (Retrieval Augmented Generation) system. The knowledge.yaml file defines collections of documents that will be ingested into the Weaviate vector database.

Collection Configuration

Each collection in the YAML file must have the following structure:

name: The Weaviate collection identifier. If you use the same name for multiple collections, they will be merged into a single collection. If you use different names, they will be created as separate collections. While using RAG system, the collections will isolate the information, forcing to make a query over one collection at a time.
description: Metadata describing the collection's content. It is useful when multiple collections are defined, allowing the assistant to know what information a collection contains.
dirs: List of directories to scan for documents. Directories are scaned recursively.
rules: Processing rules for file filtering and handling

Processing Rules

The rules section controls how files are processed during ingestion:

File Filtering Rules

include: Array of file extensions to process (e.g., ["md"], ["yaml", "yml"], ["adoc"])
exclude: Array of file extensions to skip (typically empty [])
min_length: Minimum file size in characters (-1 for unlimited)
max_length: Maximum file size in characters (-1 for unlimited)

Summary Generation

generate_summary: Boolean flag controlling whether to generate LLM summaries, that will be added as a comment at the beginning of the file.
- Set to true for code only files to add context
- Set to false for files that are self-documenting

File Type Processing

The ingestion system handles different file types with specialized chunking strategies:

Markdown Files

Uses header-aware chunking with MarkdownHeaderTextSplitter
Preserves document structure through header hierarchy
No summary generation needed (self-documenting)

YAML Files

Prepends generated summary as a comment
Treats entire file as single chunk
Requires generate_summary: true for context

Other Files

Uses generic recursive character splitting
Falls back to standard chunking strategy

Example Configuration

Here's how to configure a new knowledge source:

collections:
  - name: "knowledge"
    description: "Custom documentation collection about Kubernetes"
    dirs:
      - "knowledge\\custom\\docs"
      - "knowledge\\custom\\docs2"
    rules:
      include: ["md", "rst"]
      exclude: []
      min_length: 100
      max_length: -1
      generate_summary: false

The ingestion process will scan the specified directories, apply the filtering rules, and process matching files according to their type-specific chunking strategy before storing them in the Weaviate vector database. To begin the ingestion process, run the ingest_knowledge.py script.

CUDA Support (recommended)

For CUDA support (GPU acceleration):

poetry install --with cu118

3. Usage

To run the application, execute:

python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
config/knowledge		config/knowledge
knowledge		knowledge
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chromadb-047c483dece7a0874bb99bad62e82a18.lock		chromadb-047c483dece7a0874bb99bad62e82a18.lock
chromadb-2a1e3aecd188ba121a91af95d81a66a1.lock		chromadb-2a1e3aecd188ba121a91af95d81a66a1.lock
chromadb-2f755f570a81b74a93dca2401e323cd0.lock		chromadb-2f755f570a81b74a93dca2401e323cd0.lock
chromadb-7da30a73d3f7e9dd11745e6c89454d29.lock		chromadb-7da30a73d3f7e9dd11745e6c89454d29.lock
chromadb-def95b48a62f3269372e21eb8803f6a3.lock		chromadb-def95b48a62f3269372e21eb8803f6a3.lock
chromadb-e613f7704e2cc3483a4517805868dbaa.lock		chromadb-e613f7704e2cc3483a4517805868dbaa.lock
chromadb-f7448e4665e2a7f539bb49312dee244d.lock		chromadb-f7448e4665e2a7f539bb49312dee244d.lock
docker-compose.yml		docker-compose.yml
ingest_knowledge.py		ingest_knowledge.py
main.py		main.py
manager_weviate.py		manager_weviate.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KubernetesCrew

TFM - AI-Powered DevOps Automation System

Features

Prerequisites

Quick Start

1. Environment Setup

2. Configuration

AI/LLM Configuration

Vector Database Configuration

Knowledge Management

File System Configuration

Kubernetes Configuration

External API Keys

Security Tools

Configure knowledge sources and ingestion

Collection Configuration

Processing Rules

File Filtering Rules

Summary Generation

File Type Processing

Markdown Files

YAML Files

Other Files

Example Configuration

CUDA Support (recommended)

3. Usage

About

Uh oh!

Releases

Packages

Languages

License

TheSOV/KubernetesCrew

Folders and files

Latest commit

History

Repository files navigation

KubernetesCrew

TFM - AI-Powered DevOps Automation System

Features

Prerequisites

Quick Start

1. Environment Setup

2. Configuration

AI/LLM Configuration

Vector Database Configuration

Knowledge Management

File System Configuration

Kubernetes Configuration

External API Keys

Security Tools

Configure knowledge sources and ingestion

Collection Configuration

Processing Rules

File Filtering Rules

Summary Generation

File Type Processing

Markdown Files

YAML Files

Other Files

Example Configuration

CUDA Support (recommended)

3. Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages