Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,16 @@ jobs:
with:
ruby-version: ${{ matrix.ruby }}
bundler-cache: true
- name: Setup database
env:
RAILS_ENV: test
RAILS_MASTER_KEY: ${{ secrets.RAILS_MASTER_KEY }}
BUNDLE_GEMFILE: ${{ github.workspace }}/${{ matrix.gemfile }}
run: |
cd test/dummy
bundle exec rails db:create
bundle exec rails db:migrate
cd ../..
- name: Run tests
env:
RAILS_ENV: test
Expand Down
1 change: 1 addition & 0 deletions .tool-versions
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nodejs 24.7.0
36 changes: 36 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,16 @@ This repository follows a strict documentation process to ensure all code exampl
### Key Principles

1. **No hardcoded code blocks** - All code must come from tested files
- NEVER use ``` code blocks in docs/docs/ directory
- ALL code examples must use `<<<` imports from tested files
- Code blocks (```) should ONLY appear in deterministically generated docs/parts/ files from test helper
2. **Use `<<<` imports only** - Import code from actual tested implementation and test files
3. **Test everything** - If it's in docs, it must have a test
4. **Include outputs** - Use `doc_example_output` for response examples
5. **Configuration examples** - Must come from actual config files with proper regions
- ALWAYS include the `service:` key in provider configurations
- Use regions in config files (e.g., test/dummy/config/active_agent.yml)
- Import config examples using VitePress snippets with regions

### Import Patterns

Expand Down Expand Up @@ -1300,3 +1307,32 @@ When updating documentation:
- VCR cassettes need to be removed and tests run again to record new cassettes when the request params change

- Do not hardcode examples and make sure to use vscode regions and vite-press code snippets imports

- use `bin/rubocop -a` to autofix linting issues
- Follow the testing procedures to have deterministic tested code examples; never hardcode code examples in docs; always use the vite-press snippets along with the test helper for example outputs

## Critical Documentation Rules (MUST FOLLOW)

### NEVER Hardcode Examples
- ❌ NEVER write ```ruby, ```yaml, ```bash or any ``` code blocks in docs/docs/
- ✅ ALWAYS use <<< imports from tested files
- ✅ Use regions in test files for specific snippets
- ✅ Generated examples go in docs/parts/examples/ via doc_example_output

### Configuration Documentation
- ❌ NEVER hardcode config examples like:
```yaml
openai:
access_token: ...
```
- ✅ ALWAYS use actual config files with regions:
- Add regions to test/dummy/config/active_agent.yml
- Import with: `<<< @/../test/dummy/config/active_agent.yml#region_name{yaml}`
- ⚠️ REMEMBER: All provider configs MUST have `service:` key or they won't load

### Testing Before Documenting
1. Write the test first
2. Add regions for important snippets
3. Call doc_example_output for response examples
4. Import in docs using VitePress snippets
5. Verify with `npm run docs:build` - no hardcoded blocks should exist
4 changes: 2 additions & 2 deletions docs/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -102,15 +102,15 @@ export default defineConfig({
},
{ text: 'Agents',
items: [
{ text: 'Browser User', link: '/docs/agents/browser-use-agent' },
{ text: 'Browser Use', link: '/docs/agents/browser-use-agent' },
{ text: 'Data Extraction', link: '/docs/agents/data-extraction-agent' },
{ text: 'Translation', link: '/docs/agents/translation-agent' },
]
},
{ text: 'Active Agent',
items: [
// { text: 'Generative UI', link: '/docs/active-agent/generative-ui' },
{ text: 'Structured Output', link: '/docs/agents/data-extraction-agent#structured-output' },
{ text: 'Structured Output', link: '/docs/active-agent/structured-output' },
{ text: 'Callbacks', link: '/docs/active-agent/callbacks' },
{ text: 'Generation', link: '/docs/active-agent/generation' },
{ text: 'Queued Generation', link: '/docs/active-agent/queued-generation' },
Expand Down
239 changes: 239 additions & 0 deletions docs/docs/active-agent/structured-output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Structured Output

Structured output allows agents to return responses in a predefined JSON format, ensuring consistent and reliable data extraction. ActiveAgent provides comprehensive support for structured output through JSON schemas and automatic model schema generation.

## Overview

Structured output ensures AI responses conform to a specific JSON schema, making it ideal for:
- Data extraction from unstructured text, images, and documents
- API integrations requiring consistent response formats
- Form processing and validation
- Database record creation from natural language

## Key Features

### Automatic JSON Parsing
When using structured output, responses are automatically:
- Tagged with `content_type: "application/json"`
- Parsed from JSON strings to Ruby hashes
- Validated against the provided schema

### Schema Generator
ActiveAgent includes a `SchemaGenerator` module that creates JSON schemas from:
- ActiveRecord models with database columns and validations
- ActiveModel classes with attributes and validations
- Custom Ruby classes with the module included

## Quick Start

### Using Model Schema Generation

ActiveAgent can automatically generate schemas from your Rails models:

<<< @/../test/schema_generator_test.rb#agent_using_schema {ruby:line-numbers}

### Basic Structured Output Example

Define a schema and use it with the `output_schema` parameter:

<<< @/../test/integration/structured_output_json_parsing_test.rb#34-70{ruby:line-numbers}

The response will automatically have:
- `content_type` set to `"application/json"`
- `content` parsed as a Ruby Hash
- `raw_content` available as the original JSON string

## Schema Generation

### From ActiveModel

Create schemas from ActiveModel classes with validations:

<<< @/../test/schema_generator_test.rb#basic_user_model {ruby:line-numbers}

Generate the schema:

<<< @/../test/schema_generator_test.rb#basic_schema_generation {ruby:line-numbers}

### From ActiveRecord

Generate schemas from database-backed models:

<<< @/../test/schema_generator_test.rb#activerecord_schema_generation {ruby:line-numbers}

### Strict Schemas

For providers requiring strict schemas (like OpenAI):

<<< @/../test/schema_generator_test.rb#strict_schema_generation {ruby:line-numbers}

In strict mode:
- All properties are marked as required
- `additionalProperties` is set to false
- The schema is wrapped with name and strict flags

### Excluding Fields

Exclude sensitive or unnecessary fields from schemas:

<<< @/../test/schema_generator_test.rb#schema_with_exclusions {ruby:line-numbers}

## JSON Response Handling

### Automatic Parsing

With structured output, responses are automatically parsed:

```ruby
# Without structured output
response = agent.prompt(message: "Hello").generate_now
response.message.content # => "Hello! How can I help?"
response.message.content_type # => "text/plain"

# With structured output
response = agent.prompt(
message: "Extract user data",
output_schema: schema
).generate_now
response.message.content # => { "name" => "John", "age" => 30 }
response.message.content_type # => "application/json"
response.message.raw_content # => '{"name":"John","age":30}'
```

### Error Handling

Handle JSON parsing errors gracefully:

<<< @/../test/integration/structured_output_json_parsing_test.rb#155-169{ruby:line-numbers}

## Provider Support

Different AI providers have varying levels of structured output support:

- **[OpenAI](/docs/generation-providers/openai-provider#structured-output)** - Native JSON mode with strict schema validation
- **[OpenRouter](/docs/generation-providers/open-router-provider#structured-output-support)** - Support through compatible models, ideal for multimodal tasks
- **[Anthropic](/docs/generation-providers/anthropic-provider#structured-output)** - Instruction-based JSON generation
- **[Ollama](/docs/generation-providers/ollama-provider#structured-output)** - Local model support with JSON mode

## Real-World Examples

### Data Extraction Agent

The [Data Extraction Agent](/docs/agents/data-extraction-agent#structured-output) demonstrates comprehensive structured output usage:

<<< @/../test/agents/data_extraction_agent_test.rb#data_extraction_agent_parse_chart_with_structured_output {ruby:line-numbers}

### Integration with Rails Models

Use your existing Rails models for schema generation:

<<< @/../test/integration/structured_output_json_parsing_test.rb#110-137{ruby:line-numbers}

## Best Practices

### 1. Use Model Schemas
Leverage ActiveRecord/ActiveModel for single source of truth:

```ruby
class User < ApplicationRecord
include ActiveAgent::SchemaGenerator

validates :email, presence: true, format: { with: URI::MailTo::EMAIL_REGEXP }
validates :age, numericality: { greater_than: 18 }
end

# In your agent
schema = User.to_json_schema(strict: true, name: "user_data")
prompt(output_schema: schema)
```

### 2. Schema Design
- Keep schemas focused and minimal
- Use strict mode for critical data
- Include validation constraints
- Provide clear descriptions for complex fields

### 3. Testing
Always test structured output with real providers:

```ruby
test "extracts data with correct schema" do
VCR.use_cassette("structured_extraction") do
response = agent.extract_data.generate_now

assert_equal "application/json", response.message.content_type
assert response.message.content.is_a?(Hash)
assert_valid_schema response.message.content, expected_schema
end
end
```

## Migration Guide

### From Manual JSON Parsing

Before:
```ruby
response = agent.prompt(message: "Extract data as JSON").generate_now
data = JSON.parse(response.message.content) rescue {}
```

After:
```ruby
response = agent.prompt(
message: "Extract data",
output_schema: MyModel.to_json_schema(strict: true)
).generate_now
data = response.message.content # Already parsed!
```

### From Custom Schemas

Before:
```ruby
schema = {
type: "object",
properties: {
name: { type: "string" },
age: { type: "integer" }
}
}
```

After:
```ruby
class ExtractedUser
include ActiveModel::Model
include ActiveAgent::SchemaGenerator

attribute :name, :string
attribute :age, :integer
end

schema = ExtractedUser.to_json_schema(strict: true)
```

## Troubleshooting

### Common Issues

**Invalid JSON Response**
- Ensure provider supports structured output
- Check model compatibility
- Verify schema is valid JSON Schema

**Missing Fields**
- Use strict mode to require all fields
- Add validation constraints to model
- Check provider documentation for limitations

**Type Mismatches**
- Ensure schema types match provider capabilities
- Use appropriate type coercion in models
- Test with actual provider responses

## See Also

- [Data Extraction Agent](/docs/agents/data-extraction-agent) - Complete extraction examples
- [OpenAI Structured Output](/docs/generation-providers/openai-provider#structured-output) - OpenAI implementation details
- [OpenRouter Structured Output](/docs/generation-providers/open-router-provider#structured-output-support) - Multimodal extraction
36 changes: 36 additions & 0 deletions docs/docs/agents/data-extraction-agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,42 @@ When using structured output:
- The response content will be valid JSON matching your schema
- Parse the response with `JSON.parse(response.message.content)`

#### Generating Schemas from Models

ActiveAgent provides a `SchemaGenerator` module that can automatically create JSON schemas from your ActiveRecord and ActiveModel classes. This makes it easy to ensure extracted data matches your application's data models.

##### Basic Usage

::: code-group
<<< @/../test/schema_generator_test.rb#basic_user_model {ruby:line-numbers}
<<< @/../test/schema_generator_test.rb#basic_schema_generation {ruby:line-numbers}
:::

The `to_json_schema` method generates a JSON schema from your model's attributes and validations.

##### Schema with Validations

Model validations are automatically included in the generated schema:

<<< @/../test/schema_generator_test.rb#schema_with_validations {ruby:line-numbers}

##### Strict Schema for Structured Output

For use with AI providers that support structured output, generate a strict schema:

::: code-group
<<< @/../test/schema_generator_test.rb#blog_post_model {ruby:line-numbers}
<<< @/../test/schema_generator_test.rb#strict_schema_generation {ruby:line-numbers}
:::

##### Using Generated Schemas in Agents

Agents can use the schema generator to create structured output schemas dynamically:

<<< @/../test/schema_generator_test.rb#agent_using_schema {ruby:line-numbers}

This allows you to maintain a single source of truth for your data models and automatically generate schemas for AI extraction.

::: info Provider Support
Structured output requires a generation provider that supports JSON schemas. Currently supported providers include:
- **OpenAI** - GPT-4o, GPT-4o-mini, GPT-3.5-turbo variants
Expand Down
Loading