axiomhq · manototh · Jul 28, 2025 · Jul 21, 2025 · Jul 21, 2025 · Jul 21, 2025
diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md
@@ -1,3 +1,5 @@
+{/* vale off */}
+
 # Contributor covenant code of conduct
 
 ## Our pledge
@@ -119,7 +121,7 @@ version 2.0, available at
 [https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
 
 Community Impact Guidelines were inspired by 
-[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+[Mozilla’s code of conduct enforcement ladder][Mozilla CoC].
 
 For answers to common questions about this code of conduct, see the FAQ at
 [https://www.contributor-covenant.org/faq][FAQ]. Translations are available 

diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -1,6 +1,6 @@
 # Contribution guidelines
 
-We encourage you to participate in this documentation project. We appreciate your help in making Axiom as easy to understand and work with as possible.
+Axiom encourages you to participate in this documentation project. The community appreciates your help in making Axiom as easy to understand and work with as possible.
 
 To contribute, fork this repo, and then clone it. For more information, see the [GitHub documentation](https://docs.github.com/en/get-started/exploring-projects-on-github/contributing-to-a-project).
 
@@ -32,7 +32,7 @@ If you want to contribute but don’t know where to start, browse the open issue
 - When you review a PR, use GitHub suggestions for changes where discussion is necessary. For major changes or uncontroversial smaller fixes, commit directly to the branch.
 - Let the original creator merge the PR. The reviewer only approves or asks for changes.
 - In your comments, be kind, considerate, and constructive.
-- If a comment does not apply to the review of the PR, post it on the related issue.
+- If a comment doesn’t apply to the review of the PR, post it on the related issue.
 
 ## Commits
 

diff --git a/.prettierignore b/.prettierignore
@@ -0,0 +1 @@
+/docs.json
diff --git a/.vale.ini b/.vale.ini
@@ -26,6 +26,8 @@ Google.Headings = NO
 Google.Parens = NO
 Google.Colons = NO
 Google.Ordinal = NO
+Google.Will = NO
+Google.EmDash = NO
 
 # Ignore code surrounded by backticks or plus sign, parameters defaults, URLs, and angle brackets.
 TokenIgnores = (<\/?[A-Z].+>), (\x60[^\n\x60]+\x60), ([^\n]+=[^\n]*), (\+[^\n]+\+), (http[^\n]+\[)

diff --git a/ai-engineering/concepts.mdx b/ai-engineering/concepts.mdx
@@ -4,7 +4,7 @@ description: "Learn about the core concepts in Rudder: Capabilities, Collections
 keywords: ["ai engineering", "rudder", "concepts", "capability", "grader", "eval"]
 ---
 
-import { definitions } from '/snippets/definitions.mdx';
+import { definitions } from '/snippets/definitions.mdx'
 
 This page defines the core terms used in the Rudder workflow. Understanding these concepts is the first step toward building robust and reliable generative AI capabilities.
 
@@ -20,7 +20,7 @@ The concepts in Rudder are best understood within the context of the development
     The prototype is then tested against a <Tooltip tip={definitions.Collection}>collection</Tooltip> of reference examples (so called “<Tooltip tip={definitions.GroundTruth}>ground truth</Tooltip>”) to measure its quality and effectiveness using <Tooltip tip={definitions.Grader}>graders</Tooltip>. This process is known as an <Tooltip tip={definitions.Eval}>eval</Tooltip>.
   </Step>
   <Step title="Observe in production">
-    Once a capability meets quality benchmarks, it's deployed. In production, graders can be applied to live traffic (<Tooltip tip={definitions.OnlineEval}>online evals</Tooltip>) to monitor performance and cost in real-time.
+    Once a capability meets quality benchmarks, it’s deployed. In production, graders can be applied to live traffic (<Tooltip tip={definitions.OnlineEval}>online evals</Tooltip>) to monitor performance and cost in real-time.
   </Step>
   <Step title="Iterate with new insights">
     Insights from production monitoring reveal edge cases and opportunities for improvement. These new examples are used to refine the capability, expand the ground truth collection, and begin the cycle anew.
@@ -33,7 +33,7 @@ The concepts in Rudder are best understood within the context of the development
 
 A generative AI capability is a system that uses large language models to perform a specific task by transforming inputs into desired outputs.
 
-Capabilities exist on a spectrum of complexity. They can be a simple, single-step function (for example, classifying a support ticket's intent) or evolve into a sophisticated, multi-step agent that uses reasoning and tools to achieve a goal (for example, orchestrating a complete customer support resolution).
+Capabilities exist on a spectrum of complexity. They can be a simple, single-step function (for example, classifying a support ticket’s intent) or evolve into a sophisticated, multi-step agent that uses reasoning and tools to achieve a goal (for example, orchestrating a complete customer support resolution).
 
 ### Collection
 
@@ -57,16 +57,16 @@ Annotations are expert-provided labels, corrections, or outputs added to records
 
 ### Grader
 
-A grader is a function that scores a capability's output. It programmatically assesses quality by comparing the generated output against ground truth or other criteria, returning a score or judgment. Graders are the reusable, atomic scoring logic used in all forms of evaluation.
+A grader is a function that scores a capability’s output. It programmatically assesses quality by comparing the generated output against ground truth or other criteria, returning a score or judgment. Graders are the reusable, atomic scoring logic used in all forms of evaluation.
 
 ### Evaluator (Eval)
 
 An evaluator, or eval, is the process of testing a capability against a collection of ground truth data using one or more graders. An eval runs the capability on every record in the collection and reports metrics like accuracy, pass-rate, and cost. Evals are typically run before deployment to benchmark performance.
 
 ### Online Eval
 
-An online eval is the process of applying a grader to a capability's live production traffic. This provides real-time feedback on performance degradation, cost, and quality drift, enabling continuous monitoring and improvement.
+An online eval is the process of applying a grader to a capability’s live production traffic. This provides real-time feedback on performance degradation, cost, and quality drift, enabling continuous monitoring and improvement.
 
-### What's next?
+### What’s next?
 
 Now that you understand the core concepts, see them in action in the Rudder [workflow](/ai-engineering/quickstart).
diff --git a/ai-engineering/create.mdx b/ai-engineering/create.mdx
@@ -4,14 +4,14 @@ description: "Learn how to create and define AI capabilities using structured pr
 keywords: ["ai engineering", "rudder", "create", "prompt", "template", "schema"]
 ---
 
-import { Badge } from "/snippets/badge.jsx";
-import { definitions } from '/snippets/definitions.mdx';
+import { Badge } from "/snippets/badge.jsx"
+import { definitions } from '/snippets/definitions.mdx'
 
 The **Create** stage is about defining a new AI <Tooltip tip={definitions.Capability}>capability</Tooltip> as a structured, version-able asset in your codebase. The goal is to move away from scattered, hard-coded string prompts and toward a more disciplined and organized approach to prompt engineering.
 
 ### Defining a capability as a prompt object
 
-In Rudder, every capability is represented by a `Prompt` object. This object serves as the single source of truth for the capability's logic, including its messages, metadata, and the schema for its arguments.
+In Rudder, every capability is represented by a `Prompt` object. This object serves as the single source of truth for the capability’s logic, including its messages, metadata, and the schema for its arguments.
 
 For now, these `Prompt` objects can be defined and managed as TypeScript files within your own project repository.
 
@@ -47,7 +47,7 @@ export const emailSummarizerPrompt = {
 
 ### Strongly-typed arguments with `Template`
 
-To ensure that prompts are used correctly, the `@axiomhq/ai` package includes a `Template` type system (exported as `Type`) for defining the schema of a prompt's `arguments`. This provides type safety, autocompletion, and a clear, self-documenting definition of what data the prompt expects.
+To ensure that prompts are used correctly, the `@axiomhq/ai` package includes a `Template` type system (exported as `Type`) for defining the schema of a prompt’s `arguments`. This provides type safety, autocompletion, and a clear, self-documenting definition of what data the prompt expects.
 
 The `arguments` object uses `Template` helpers to define the shape of the context:
 
@@ -78,7 +78,7 @@ export const reportGeneratorPrompt = {
 } satisfies Prompt;
 ```
 
-You can even infer the exact TypeScript type for a prompt's context using the `InferContext` utility.
+You can even infer the exact TypeScript type for a prompt’s context using the `InferContext` utility.
 
 ### Prototyping and local testing
 
@@ -119,8 +119,8 @@ To enable more advanced workflows and collaboration, Axiom is building tools to
 * <Badge>Coming soon</Badge> The `axiom` CLI will allow you to `push`, `pull`, and `list` prompt versions directly from your terminal, synchronizing your local files with the Axiom platform.
 * <Badge>Coming soon</Badge> The SDK will include methods like `axiom.prompts.create()` and `axiom.prompts.load()` for programmatic access to your managed prompts. This will be the foundation for A/B testing, version comparison, and deploying new prompts without changing your application code.
 
-### What's next?
+### What’s next?
 
-Now that you've created and structured your capability, the next step is to measure its quality against a set of known good examples.
+Now that you’ve created and structured your capability, the next step is to measure its quality against a set of known good examples.
 
 Learn more about this step of the Rudder workflow in the [Measure](/ai-engineering/measure) docs.
diff --git a/ai-engineering/iterate.mdx b/ai-engineering/iterate.mdx
@@ -4,14 +4,14 @@ description: "Learn how to iterate on your AI capabilities by using production d
 keywords: ["ai engineering", "rudder", "iterate", "improvement", "a/b testing", "champion challenger"]
 ---
 
-import { Badge } from "/snippets/badge.jsx";
-import { definitions } from '/snippets/definitions.mdx';
+import { Badge } from "/snippets/badge.jsx"
+import { definitions } from '/snippets/definitions.mdx'
 
 <Warning>
 The iteration workflow described here is in active development. Axiom is working with design partners to shape what’s built. [Contact Axiom](https://www.axiom.co/contact) to get early access and join a small group of teams shaping these tools.
 </Warning>
 
-The **Iterate** stage is where the Rudder workflow comes full circle. It's the process of taking the real-world performance data from the [Observe](/ai-engineering/observe) stage and the quality benchmarks from the [Measure](/ai-engineering/measure) stage, and using them to make concrete improvements to your AI <Tooltip tip={definitions.Capability}>capability</Tooltip>. This creates a cycle of continuous, data-driven enhancement.
+The **Iterate** stage is where the Rudder workflow comes full circle. It’s the process of taking the real-world performance data from the [Observe](/ai-engineering/observe) stage and the quality benchmarks from the [Measure](/ai-engineering/measure) stage, and using them to make concrete improvements to your AI <Tooltip tip={definitions.Capability}>capability</Tooltip>. This creates a cycle of continuous, data-driven enhancement.
 
 ## Identifying opportunities for improvement
 
@@ -25,7 +25,7 @@ These examples can be used to create a new, more robust <Tooltip tip={definition
 
 ## Testing changes against ground truth
 
-<Badge>Coming soon</Badge> Once you've created a new version of your `Prompt` object, you need to verify that it's actually an improvement. The best way to do this is to run an "offline evaluation"—testing your new version against the same ground truth collection you used in the **Measure** stage.
+<Badge>Coming soon</Badge> Once you’ve created a new version of your `Prompt` object, you need to verify that it’s actually an improvement. The best way to do this is to run an "offline evaluation"—testing your new version against the same ground truth collection you used in the **Measure** stage.
 
 The Axiom Console will provide views to compare these evaluation runs side-by-side:
 
@@ -38,7 +38,7 @@ This ensures you can validate changes with data before they ever reach your user
 
 <Badge>Coming soon</Badge> After a new version of your capability has proven its superiority in offline tests, you can deploy it with confidence. The Rudder workflow will support a champion/challenger pattern, where you can deploy a new "challenger" version to run in shadow mode against a portion of production traffic. This allows for a final validation on real-world data without impacting the user experience.
 
-Once you're satisfied with the challenger's performance, you can promote it to become the new "champion" using the SDK's `deploy` function.
+Once you’re satisfied with the challenger’s performance, you can promote it to become the new "champion" using the SDK’s `deploy` function.
 
 ```typescript
 import { axiom } from './axiom-client';
@@ -50,7 +50,7 @@ await axiom.prompts.deploy('prompt_123', {
 });
 ```
 
-## What's next?
+## What’s next?
 
 By completing the Iterate stage, you have closed the loop. Your improved capability is now in production, and you can return to the **Observe** stage to monitor its performance and identify the next opportunity for improvement.
 

diff --git a/ai-engineering/measure.mdx b/ai-engineering/measure.mdx
@@ -4,14 +4,14 @@ description: "Learn how to measure the quality of your AI capabilities by runnin
 keywords: ["ai engineering", "rudder", "measure", "evals", "evaluation", "scoring", "graders"]
 ---
 
-import { Badge } from "/snippets/badge.jsx";
-import { definitions } from '/snippets/definitions.mdx';
+import { Badge } from "/snippets/badge.jsx"
+import { definitions } from '/snippets/definitions.mdx'
 
 <Warning>
 The evaluation framework described here is in active development. Axiom is working with design partners to shape what’s built. [Contact Axiom](https://www.axiom.co/contact) to get early access and join a small group of teams shaping these tools.
 </Warning>
 
-The **Measure** stage is where you quantify the quality and effectiveness of your AI <Tooltip tip={definitions.Capability}>capability</Tooltip>. Instead of relying on anecdotal checks, this stage uses a systematic process called an <Tooltip tip={definitions.Eval}>eval</Tooltip> to score your capability's performance against a known set of correct examples (<Tooltip tip={definitions.GroundTruth}>ground truth</Tooltip>). This provides a data-driven benchmark to ensure a capability is ready for production and to track its quality over time.
+The **Measure** stage is where you quantify the quality and effectiveness of your AI <Tooltip tip={definitions.Capability}>capability</Tooltip>. Instead of relying on anecdotal checks, this stage uses a systematic process called an <Tooltip tip={definitions.Eval}>eval</Tooltip> to score your capability’s performance against a known set of correct examples (<Tooltip tip={definitions.GroundTruth}>ground truth</Tooltip>). This provides a data-driven benchmark to ensure a capability is ready for production and to track its quality over time.
 
 ## The `Eval` function
 
@@ -62,7 +62,7 @@ Eval('text-match-eval', {
 
 ## Grading with scorers
 
-<Badge>Coming soon</Badge> A <Tooltip tip={definitions.Grader}>grader</Tooltip> is a function that scores a capability's output. Axiom will provide a library of built-in scorers for common tasks (e.g., checking for semantic similarity, factual correctness, or JSON validity). You can also provide your own custom functions to measure domain-specific logic. Each scorer receives the `input`, the generated `output`, and the `expected` value, and must return a score.
+<Badge>Coming soon</Badge> A <Tooltip tip={definitions.Grader}>grader</Tooltip> is a function that scores a capability’s output. Axiom will provide a library of built-in scorers for common tasks (e.g., checking for semantic similarity, factual correctness, or JSON validity). You can also provide your own custom functions to measure domain-specific logic. Each scorer receives the `input`, the generated `output`, and the `expected` value, and must return a score.
 
 ## Running evaluations
 
@@ -80,8 +80,8 @@ This command will execute the specified test file using `vitest` in the backgrou
 
 The Console will feature leaderboards and comparison views to track score progression across different versions of a capability, helping you verify that your changes are leading to measurable improvements.
 
-## What's next?
+## What’s next?
 
-Once your capability meets your quality benchmarks in the Measure stage, it's ready to be deployed. The next step is to monitor its performance with real-world traffic.
+Once your capability meets your quality benchmarks in the Measure stage, it’s ready to be deployed. The next step is to monitor its performance with real-world traffic.
 
 Learn more about this step of the Rudder workflow in the [Observe](/ai-engineering/observe) docs.