# Install Python Packages

In [2]:
%%capture --no-stderr
%pip install --quiet -U langgraph langchain-core langchain_openai python-dotenv langsmith langchain

In [3]:
%pip install --quiet -U pydantic

Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install --quiet -U jupyterlab-lsp
%pip install --quiet -U "python-lsp-server[all]"

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


# Check Environment Dependencies

In [34]:
import sys
print(sys.executable)
#!pip list

c:\Program Files\Python312\python.exe


## Setup logging

In [5]:
import logging

logger = logging.getLogger(__name__)

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',  # Define the format
    handlers=[logging.StreamHandler()]  # Output to the console
)


# Setup OpenAI

In [6]:
from openai_config import OpenAIConfig

In [7]:
# We read the .env file inside the class
openai_config = OpenAIConfig()
openai_config.chat_model.invoke("Tell me a Joke about corporate life")

2024-09-11 07:31:14,520 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


AIMessage(content='Why did the employee bring a ladder to work?\n\nBecause they heard the job was a "climb" up the corporate ladder!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 14, 'total_tokens': 40}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'stop', 'logprobs': None}, id='run-653b6f7a-05b0-48a0-8f92-0d1881e916c3-0', usage_metadata={'input_tokens': 14, 'output_tokens': 26, 'total_tokens': 40})

In [8]:
import os
logger.debug(os.getenv('OPENAI_API_KEY'))
logger.info(os.getenv('LANGCHAIN_PROJECT'))

2024-09-11 07:31:16,029 - INFO - builder


# High Level Plan

In [39]:
high_level_query = """You are a DevSecOs expert. You are thorough and leave nothing to chance. 

Create an end to end plan to debug Kubernetes DevOps Incidents on a  
environment with multiple clusters and namespaces. The plan must follows established 
DevSecOps best practices.  

The plan should be generic and not point to any specific issue.

The incident starts with a notification in a Slack room with the PagerDuty incident ID

Use the following methodology to create the plan. These steps can be repeated any number of times. 

strings between brackets are placeholders that you must fill in as you create the tasks and plan: 


Thought: To solve this problem, I need to break it down into smaller, more manageable steps. I will think through the key components of the task.
Action: Identify the main goals and subgoals required to complete the task.
Observation: The key goals are [list high-level goals]. To achieve these, I will need to complete the following tasks: [list tasks].
Thought: Now that I have a plan and a list of the task, I will start working through the steps systematically.
Action: Begin executing the first task by [describe first action].
Observation: Completing the first task revealed that [describe observation]. I will need to adjust my approach slightly.
Thought: Based on the results so far, I should modify my plan to be more effective. I will re-evaluate the next steps.
Action: Revise the plan by [describe revised action].
Observation: The revised approach is working better because [describe positive outcome]. I'm making good progress towards the overall goal.
Thought: I believe I now have enough information to provide a final solution to the original problem.
Action: Summarize the key steps taken and provide the final solution.
Observation: The solution is: [provide final solution with all necessary details for each step].

"""

In [40]:
high_level_response = openai_config.chat_model.invoke(high_level_query)

2024-09-10 18:52:53,538 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [41]:
json_string = high_level_response.content.strip('```json\n').strip('```')

In [42]:
print(json_string)

### End-to-End Plan for Debugging Kubernetes DevOps Incidents

#### Initial Notification
- **Incident Trigger**: Notification in Slack with PagerDuty incident ID.

---

### Step 1: Thought
To solve this problem, I need to break it down into smaller, more manageable steps. I will think through the key components of the task.

### Step 2: Action
Identify the main goals and subgoals required to complete the task.

#### Main Goals:
1. **Acknowledge the Incident**: Ensure the incident is recognized and assigned.
2. **Gather Context**: Collect relevant information about the incident.
3. **Diagnose the Issue**: Identify the root cause of the incident.
4. **Implement a Fix**: Apply a solution to resolve the incident.
5. **Verify Resolution**: Confirm that the issue is resolved and services are functioning as expected.
6. **Document the Incident**: Record the incident details and resolution steps for future reference.
7. **Conduct a Post-Mortem**: Analyze the incident to improve future response

# Define Schemas

### Tool Schema

In [15]:
from pydantic import BaseModel, Field
from typing import List, Union

class Tool(BaseModel):
    """
    The `Tool` class represents a tool used in DevOps tasks. Each tool should include relevant details that define how it is used within a task.

    Key Points:
    - **Name**: The name of the tool as known in the DevOps industry.
    - **Description**: A clear, human-readable description of the tool, outlining its purpose and functionality.
    - **Command**: A function call or a complete Unix command with relevant options should always be provided. This is crucial for tools that access on-prem resources or interact with external services via REST/cURL. For example, 'kubectl get pods --namespace default' should be provided for `kubectl`, even if it interacts with external resources.
    - **URL**: An optional URL where more information about the tool can be found. This is useful for users who need additional details or documentation.

    **Examples of Proper Usage:**
    - **`kubectl`**: Command might be 'kubectl get pods --namespace default'.
    - **`curl`**: Command might be 'curl -X GET https://api.example.com/resource'.

    **Examples of What Should Not Be Included:**
    - Omitting the command for a tool (e.g., only providing 'kubectl' without any command).
    - Providing incomplete or vague commands that do not specify how to use the tool (e.g., 'curl' without specifying the URL or method).

    Attributes:
    - `name` (str): The name known in the DevOps industry, e.g., 'kubectl'.
    - `description` (str): A human-like description of the tool, e.g., 'A command-line tool for controlling Kubernetes clusters.'
    - `command` (Union[str, None]): A function call or a complete Unix command with relevant options should always be provided. This is crucial for tools that access on-prem resources or interact with external services via REST/cURL. For example, 'kubectl get pods --namespace default' should be provided for `kubectl`, even if it interacts with external resources.
    - `url` (str, Optional): URL for more information about the tool, e.g., 'https://kubernetes.io/docs/reference/kubectl/'. Optional but recommended for additional context.

    """

    name: str = Field(..., description="The name known in the DevOps industry, e.g., 'kubectl'")
    description: str = Field(..., description="A human-like description of the tool, e.g., 'A command-line tool for controlling Kubernetes clusters.'")
    command: Union[str, None] = Field(None, description="A function call or a complete Unix command with relevant options should always be provided. This is crucial for tools that access on-prem resources or interact with external services via REST/cURL. For example, 'kubectl get pods --namespace default' should be provided for `kubectl`, even if it interacts with external resources.")
    url: str = Field(None, description="URL for more information about the tool, e.g., 'https://kubernetes.io/docs/reference/kubectl/'")

    class Config:
        """Configuration for the Tool model."""
        json_schema_extra = {
            "description": "The `Tool` class represents a tool used in DevOps tasks."
            # Removed 'examples' as it is now handled at the field level
        }

### Task Schema

In [16]:
class Task(BaseModel):
    """
    The `Task` class represents an atomic unit of work within a plan. Each task must be focused on a single, clear objective and should not encompass multiple, unrelated activities.

    Key Points:
    - A task should perform one specific function and should not combine different activities. 
    - Each task should have a clear, verifiable success criterion that is achievable through the task's activities alone.
    - Tasks must be URL-friendly, without spaces or special characters in their names.
    
    **Examples of What a Task Should NOT Do:**
    - "Acknowledge the incident in PagerDuty and notify the team in the Slack room." (This task mixes incident management with team communication.)
    - "Update the application code and deploy it to production." (This task combines code changes with deployment, which should be separated.)
    - "Collect all relevant information about the incident from Slack and PagerDuty" (This task has two action items)

    Attributes:
    - `name` (str): URL-friendly name for the task, e.g., 'deploy-application'. Must not contain spaces or special characters.
    - `description` (str): A detailed description of the task, e.g., 'Deploy the application to the Kubernetes cluster using Helm.'
    - `success_criteria` (str): Clear and verifiable criteria for task completion, e.g., 'Application is successfully deployed and running in the Kubernetes cluster, as verified by checking the deployment status with 'kubectl get deployments'.' Each task must be atomic and focus on one thing only.
    - `tools` (List[Tool]): List of `Tool` instances required for the task, specifying which tools are needed.

    """    
    name: str = Field(..., description="URL-friendly name for the task, e.g., 'deploy-application'. Must not contain spaces or special characters.")
    description: str = Field(..., description="A detailed description of the task, e.g., 'Deploy the application to the Kubernetes cluster using Helm.'")
    success_criteria: str = Field(..., description="Clear and verifiable criteria for task completion, e.g., 'Application is successfully deployed and running in the Kubernetes cluster, as verified by checking the deployment status with 'kubectl get deployments'.' Each task must be atomic and focus on one thing only.")
    tools: List[Tool] = Field(..., description="List of `Tool` instances required for the task, specifying which tools are needed.")

    class Config:
        """Configuration for the Task model."""
        json_schema_extra = {
            "description": "The `Task` class represents an atomic unit of work that involves using specific tools to achieve a clear goal."
            # Removed 'examples' as it is now handled at the field level
        }

### Plan Schema

In [17]:
from langchain_core.messages import BaseMessage


class Plan(BaseModel):
    tasks: List['Task'] = Field(
        ..., 
        description=(
            "A list of `Task` instances that constitute the plan. Each task must be atomic, "
            "focused on a specific goal, and the entire plan must be complete, verifiable, "
            "and reproducible. It should follow DevOps best practices to resolve incidents " 
            "to ensure that the plan is executed consistently and reliably across different "
            "environments."
        )
    )
    Reasoning: str = Field(
        ..., 
        description=(
            "OpenAI should capture here the entire reasoning sequence used to create "
            " the plan"
        )
    )


# From Query to Structured Plan
This forces the LLM to use JSON response based on the given schema


## References

https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.with_structured_output

In [68]:
structured_system_query = """You are a DevSecOs expert. You are thorough and leave nothing to chance.  
"""

In [19]:
structured_user_query = """
Create an end to end plan to debug Kubernetes DevOps Incidents on a  
environment with multiple clusters and namespaces. The plan must follows established 
DevSecOps best practices.  

The incident starts with a notification in a Slack room with the PagerDuty incident ID 
"""

In [20]:
from langchain_core.prompts import ChatPromptTemplate

structured_template = ChatPromptTemplate.from_messages([
    ("system", "{system_prompt}"),
    ("user", "{input}")
])
structured_prompt = structured_template.invoke({"system_prompt": structured_system_query, "input": structured_user_query})

In [21]:
structured_llm = openai_config.chat_model.with_structured_output(Plan, method="json_schema")

In [22]:
structured_plan_response = structured_llm.invoke(structured_prompt)

2024-09-11 07:34:36,146 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [23]:
import json
plan = json.dumps(structured_plan_response.dict(), indent=4)
print(plan)

{
    "tasks": [
        {
            "name": "acknowledge-incident",
            "description": "Acknowledge the incident in PagerDuty to prevent further notifications and inform the team that the issue is being addressed.",
            "success_criteria": "Incident is acknowledged in PagerDuty, and notifications are paused for the team.",
            "tools": [
                {
                    "name": "PagerDuty",
                    "description": "An incident management tool that helps teams respond to incidents quickly.",
                    "command": "curl -X POST 'https://api.pagerduty.com/incidents/{incident_id}/acknowledge' -H 'Authorization: Token token=YOUR_API_TOKEN' -H 'Content-Type: application/json' -d '{\"incident\": {\"type\": \"incident_reference\"}}'",
                    "url": "https://developer.pagerduty.com/docs/ZG9jOjExMjA5MjA-pagerduty-api-v2"
                }
            ]
        },
        {
            "name": "check-incident-details",
            "

# From Structured Plan to Functions

In [24]:
function_system_prompt = """You are an expert on OpenAI function calling and Python programming. 

You will create hypothetical functions that follow OpenAI's function calling schema to solve tasks. Each task must have a function. 

Each function specification should match OpenAI's function calling schema. Example: 


{
    "functions": [
        {
            "type": "function",
            "function": {
                "name": "acknowledge_incident",
                "description": "Acknowledge the incident in PagerDuty to prevent further notifications and inform the team that the issue is being addressed.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "incident_id": {
                            "type": "string",
                            "description": "The ID of the incident to acknowledge."
                        }
                    },
                    "required": ["incident_id"],
                    "additionalProperties": false
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "check_incident_details",
                "description": "Retrieve detailed information about the incident from PagerDuty, including the affected services and any relevant logs.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "incident_id": {
                            "type": "string",
                            "description": "The ID of the incident to retrieve details for."
                        }
                    },
                    "required": ["incident_id"],
                    "additionalProperties": false
                }
            }
        }
    ]
}

"""

In [25]:
from langchain import hub
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/react")
print(prompt)

input_variables=['agent_scratchpad', 'input', 'tool_names', 'tools'] metadata={'lc_hub_owner': 'hwchase17', 'lc_hub_repo': 'react', 'lc_hub_commit_hash': 'd15fe3c426f1c4b3f37c9198853e4a86e20c425ca7f4752ec0c9b0e97ca7ea4d'} template='Answer the following questions as best you can. You have access to the following tools:\n\n{tools}\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: {input}\nThought:{agent_scratchpad}'


In [26]:
function_user_query = """
Based on the following Plan, create hypothetical functions that should be called to solve each task. 

ALWAYS use the following reasoning for each task:

Question: the input question you must answer
Thought: you should always think about what to do
Action: Gather all information from the task
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin! Reminder to always use the exact characters `Final Answer` when responding.

Do not add extra commentary before or after. 
"""

## Invoke LLM

In [45]:
from langchain_core.prompts import ChatPromptTemplate

function_template = ChatPromptTemplate.from_messages([
    ("system", "{function_system_prompt}"),
    ("user", "{function_user_query}"),
    ("user", "{plan}")
])


In [47]:
json_mode_llm = openai_config.chat_model.with_structured_output(None, method="json_mode")

# chain = function_template | openai_config.chat_model

chain = function_template | json_mode_llm

functions = chain.invoke({"function_system_prompt": function_system_prompt, "function_user_query": function_user_query, "plan": plan})

2024-09-11 16:28:19,007 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [54]:
print(json.dumps(functions, indent=4))

{
    "functions": [
        {
            "type": "function",
            "function": {
                "name": "acknowledge_incident",
                "description": "Acknowledge the incident in PagerDuty to prevent further notifications and inform the team that the issue is being addressed.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "incident_id": {
                            "type": "string",
                            "description": "The ID of the incident to acknowledge."
                        }
                    },
                    "required": [
                        "incident_id"
                    ],
                    "additionalProperties": false
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "check_incident_details",
                "description": "Retrieve detailed information about the

# From Functions to Agents

In [66]:
import requests


funcs = functions.get("functions", [])
if funcs:
    pager_func_description = funcs[0].get("function").get("description")
    response = requests.get(f"http://localhost:8000/agents", params={"query":pager_func_description}, headers={'Accept': 'application/json'})
    print(response.json())



[{'metadata': {'name': 'Acknowledge PagerDuty', 'namespace': 'production', 'description': 'Acknowledge the incident in PagerDuty to prevent further notifications and inform the team that the issue is being addressed.', 'id': '5cdbc649-a707-4d3e-ba9b-88c72a5b02ea', 'ratings_id': '73e76c3a-084a-4135-bae1-10692ca82160'}, 'spec': {'type': 'agent', 'lifecycle': 'stable', 'owner': 'owner50@business.com', 'access_level': 'PUBLIC', 'category': 'DevOps', 'url': 'https://api.business.com/agent-50', 'parameters': {'type': 'object', 'properties': {'incident_id': {'type': 'string', 'description': 'The ID of the incident to acknowledge.'}}, 'required': ['incident_id'], 'additionalProperties': False}, 'output': {'type': 'object', 'description': 'Boolean flag indicating success or failure'}}, 'ratings': {'id': '73e76c3a-084a-4135-bae1-10692ca82160', 'agent_id': '5cdbc649-a707-4d3e-ba9b-88c72a5b02ea', 'data': {'score': 0, 'samples': 0}}}, {'metadata': {'name': 'agent-37', 'namespace': 'sandbox', 'descr

In [67]:
print(json.dumps(response.json(), indent=4))


[
    {
        "metadata": {
            "name": "Acknowledge PagerDuty",
            "namespace": "production",
            "description": "Acknowledge the incident in PagerDuty to prevent further notifications and inform the team that the issue is being addressed.",
            "id": "5cdbc649-a707-4d3e-ba9b-88c72a5b02ea",
            "ratings_id": "73e76c3a-084a-4135-bae1-10692ca82160"
        },
        "spec": {
            "type": "agent",
            "lifecycle": "stable",
            "owner": "owner50@business.com",
            "access_level": "PUBLIC",
            "category": "DevOps",
            "url": "https://api.business.com/agent-50",
            "parameters": {
                "type": "object",
                "properties": {
                    "incident_id": {
                        "type": "string",
                        "description": "The ID of the incident to acknowledge."
                    }
                },
                "required": [
                 