Skip to content

Prompt Engineering with Octopus

Matthew Casperson edited this page Apr 15, 2024 · 22 revisions

The Octopus AI interface uses a Large Language Model (LLM) to answer plain English questions about an Octopus space. LLMs are rapidly evolving and improving their ability to understand the relationships between complex object, like those that make up an Octopus space.

However, today, the ability to extract useful information when querying an LLM is heavily influenced by the structure of the question. Building questions that generate the desired result is called Prompt Engineering.

Limitations of Octopus AI

Octopus AI is a read only query interface. You can not initiate deployments, delete variables, create project, or modify the space in anyway.

It is most useful for understanding the state of your space with plain English queries that would otherwise require a complex script to query the Octopus API. However, you must verify the answers provided by Octopus AI - see the section "Verifying the answers" for more details.

The query interface does not currently know about things like:

  • Licenses
  • Instance configuration
  • Users and teams
  • Logins
  • Audits
  • Database configuration

Asking specific questions

LLMs have a context window that holds the relevant information from an Octopus space, including details about resources like projects, accounts, feeds, variables, targets, deployments etc.

The context window is too small to include all the information about an Octopus space, so only a subset of resources are placed into the context to allow the LLM to answer a question.

Naming specific resources allows the AI agent to narrowly select resources to be placed in the context. For example, the following query names a specific project, meaning only that project is placed in the context:

What is the description of the project "Deploy Web App"?

Broad questions that do not name a specific resource mean that the AI agent will place all resources of that type (up to a predefined limit) in the context. For example, this question relates to every project:

How many projects are there in this space?

Broad questions are very likely to produce inaccurate answers because:

  • The LLM context is not large enough to hold every resource in a large space, and so only a subset of resources (around 20, although that number may vary) will be placed into the context. This means some resources are ignored.
  • The answers provided by LLMs degrades if they have to answer questions about large data sets.

While the AI agent will attempt to accommodate broad questions, it is always better to limit questions to specific named resources.

Including resources in the context

The most important factor when querying Octopus AI is to include the correct keywords to identify the kinds of resources the question relates to. LLMs have a context limit, which means they can only be supplied a small subset of the configuration that makes up an Octopus space. The subset of resources included in the context is defined by a number of keywords you must include in a query.

The table below lists the keywords recognized by Octopus AI, the resources the keywords relate to, and an example query demonstrating how the keyword is used.

Resource Keyword Good Example Bad Example Notes
Deployment Projects project What does the "Deploy Web App" project do? What does "Deploy Web App" do?
Deployment targets or agents target, machine, or agent What URL is the "K8s" machine configured with? What URL does "K8s" have? Prefer to use the word machine over target as target can sometimes trigger the LLM content filters.
Runbooks runbook Does the runbook "restart web server" in the project "web app" include a bash script? Does "restart web server" include a bash script? Include the project that the runbook belongs to.
Tenants tenant What is the description of the "US-EAST-1" tenant? What is the description of "US-EAST-1"?
Library Variable Sets library variable set What variables are included in the library variable set "AWS"? What variables are included in "AWS"?
Environments environment Does the "Production" environment allow dynamic infrastructure? Does "Production" allow dynamic infrastructure?
Feeds feed What URL does the "Docker" feed use? What URL does "Docker" use?
Accounts account What is the access key for the "AWS Production" account? What is the access key for "AWS Production"?
Certificates certificate What is the description of the "Test Web Server" certificate? What is the description of "Test Web Server"?
Lifecycles lifecycle What environments are part of the "Application" lifecycle? What environments are part of "Application"
Worker Pools worker pool What is the description of the "Azure" worker pool? What is the description of "Azure"?
Machine Policies policy What does the health check script for the "K8s Worker" machine policy do? What does the health check script for "K8s Worker" do?
Tenant Tag Sets tag What tags are assigned to the tenant "Store 1"? What groups does the tenant "Store 1" belong to?
Project Groups group What projects belong to the "Default" project group? What belongs to "Default"?
Deployment Process or Runbook steps step What does the "Deploy to Azure" step in the "Web App" project do? What does "Deploy to Azure" do?
Variables variable Which steps in the "Web App" project is the "DatabasePassord" variable used in? Where is "DatabasePassord" used?
Releases release What is the latest release for the "Web App" project? What is the latest version of the "Web App" project?

Verifying the answers

LLMs are non-deterministic. This means that you will often get a different result when asking the same question multiple times. More importantly, it also means LLMs do not always provide the correct answer, even when asked the same question that was previously answered correctly.

The currently available LLMs are also often unable to answer questions that require processing complex relationships. For example, the following question will often confuse today's LLMs:

Bob is in the living room.
He walks to the kitchen, carrying a cup.
He puts a ball in the cup and carries the cup to the bedroom.
He turns the cup upside down, then walks to the garden.
He puts the cup down in the garden, then walks to the garage.
Where is the ball?

In addition, the limited context that can be passed to an LLM means that it may not have all the information required to answer the question correctly.

This means you must always confirm the answers provided by Octopus AI before making any decisions.

In practice, this means Octopus AI can be useful for indicating the presence of something, such as an error in a deployment log, that can then be independently verified. Octopus AI is less useful for indicating the absence of something, such as a report of a dependency vulnerability, as verifying the absence would require inspecting all the relevant data manually, negating the point of using Octopus AI in the first place.

Small changes can have a big impact

Making small changes to your query can have a big impact on the answer. For example, this query has failed in the past to find the list of URLs in a deployment log:

Find any URLs contained in the deployment logs for the "Octopus Copilot Function" project in the "Production" environment.

The following query replaces the word find with extract, and generates the desired answer:

Extract any URLs contained in the deployment logs for the "Octopus Copilot Function" project in the "Production" environment.

It is not obvious why such a small change can be the difference between getting the answer you need or not. The opaque nature of LLMs is a well known challenge.

So if you do not get the results you expect, try altering your query slightly to see if that affects the results.

Ask "what" questions to query the state of your Octopus instance

Prefer asking questions like:

What is the latest deployment of the project "Project Name" in the "Production" environment?

Asking what the answer is will more likely get the actual result.

Avoid statements like these:

Find the latest deployment of the project "Project Name" in the "Production" environment.
List the latest deployment of the project "Project Name" in the "Production" environment.
Display the latest deployment of the project "Project Name" in the "Production" environment.
Print the latest deployment of the project "Project Name" in the "Production" environment.

They often result in the LLM providing instructions on how to find the information rather than providing the answer.

Ask "how" questions to learn how to do something

Prefer questions that ask "how" to do something when seeking instructions on the general use of Octopus, for example:

How do I view the deployment frequency?
How do I create an Azure target?
How do I configure a project with config-as-code?

Avoid statements like these:

Where do I view the deployment frequency?
Where do I configure config-as-code?

Give the LLM permission to not provide an answer

LLMs really want to give you an answer, even if the correct answer is that there is no answer or there are no matches to your question.

For example, this question attempts to find any manual intervention steps in a project that does not have any:

What are the steps in the "EKS Octopub Audits" project with a step type of "Octopus.Manual"? Show the step name and type in a markdown table.

The LLM with helpfully (and incorrectly) report that every step is of type Octopus.Manual and then clearly list every step with a step type that is not of type Octopus.Manual:

Here is the list of steps in the "EKS Octopub Audits" project with a step type of "Octopus.Manual":

Step Name	        Step Type
Generate Variables	Octopus.Script
Smoke Test	        Octopus.KubernetesRunScript
Security Scan	        Octopus.Script
Feedback	        Octopus.Script

By giving the LLM permission to report if there are no matching steps, we get a more accurate answer:

What are the steps in the "EKS Octopub Audits" project with a step type of "Octopus.Manual"? If you can not find any matching steps, say so in the answer. Show the step name and type in a markdown table.

This is the result:

The project "EKS Octopub Audits" has the following steps with a step type of "Octopus.Manual":

Step Name	        Step Type
Generate Variables	Octopus.Script
Smoke Test	        Octopus.KubernetesRunScript
Security Scan	        Octopus.Script
Feedback	        Octopus.Script

Note: There are no steps with a step type of "Octopus.Manual" in the "EKS Octopub Audits" project.

Use timezone aware date formats

The LLM doesn't know what your local timezone is, so when entering dates, use a timezone aware format like ISO 8601.

The LLM does not know to convert times into a local timezone, so you can ask for and output values to be converted for you.

This is an example query that supplies timezone aware dates and asks for the output to include times formatted for a specific timezone:

Your answer will be displayed as a markdown table. 
The status "success" is represented with the 🟢 character. 
The status "In Progress" is represented by the 🔵 character.
Other statuses are represented with the 🔴 character. 
Find the deployments created after 2024-04-07T00:00:00+10:00 and before 2024-04-08T00:00:00+10:00.
List the release version, created time (in the Brisbane timezone), release notes, deployer, and status.

Avoid HTML in context

Avoid placing HTML, or anything that looks like HTML, into the query. Typically you should just not use the greater than (>) or less than (<) symbols in a question.

Responses that mention HCL or Terraform

The context passed to the LLM is a Terraform representation of the resources in a space. HCL is the language used to define Terraform resources. This is why the LLM may reference Terraform or HCL in the answer even though the question did not mention them, because it was passed your original question and a large blob of HCL defining the state of the space as the context from which to answer the question.