diff --git a/Gemfile.lock b/Gemfile.lock index db81b181..f36fa332 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -1,7 +1,7 @@ PATH remote: . specs: - activeagent (0.5.0) + activeagent (0.6.0rc1) actionpack (>= 7.2, < 9.0) actionview (>= 7.2, < 9.0) activejob (>= 7.2, < 9.0) diff --git a/docs/.vitepress/config.mts b/docs/.vitepress/config.mts index cbeed4ab..5b7673c1 100644 --- a/docs/.vitepress/config.mts +++ b/docs/.vitepress/config.mts @@ -82,6 +82,9 @@ export default defineConfig({ { text: 'Generation Providers', items: [ + { text: 'OpenAI', link: '/docs/generation-providers/openai-provider' }, + { text: 'Anthropic', link: '/docs/generation-providers/anthropic-provider' }, + { text: 'Ollama', link: '/docs/generation-providers/ollama-provider' }, { text: 'OpenRouter', link: '/docs/generation-providers/open-router-provider' }, ] }, diff --git a/docs/docs/action-prompt/actions.md b/docs/docs/action-prompt/actions.md index 3323376f..7d846c3b 100644 --- a/docs/docs/action-prompt/actions.md +++ b/docs/docs/action-prompt/actions.md @@ -5,15 +5,23 @@ Active Agent uses Action View to render Message content for [Prompt](./prompts.m The `prompt` method is used to render the action's content as a message in a prompt. The `prompt` method is similar to `mail` in Action Mailer or `render` in Action Controller, it allows you to specify the content type and view template for the action's response. ```ruby -ApplicationAgent.new.prompt( - content_type: :text, # or :json, :html, etc. - message: "Hello, world!", # The message content to be rendered - messages: [], # Additional messages to include in the prompt context - template_name: "action_template", # The name of the view template to be used - instructions: { template: "instructions" }, # Optional instructions for the prompt generation - actions: [], # Available actions for the agent to use - output_schema: :schema_name # Optional schema for structured output -) +# The prompt method is typically called within an action +class MyAgent < ApplicationAgent + def my_action + prompt( + content_type: :text, # or :json, :html, etc. + message: "Hello, world!", # The message content to be rendered + messages: [], # Additional messages to include in the prompt context + template_name: "action_template", # The name of the view template to be used + instructions: { template: "instructions" }, # Optional instructions for the prompt generation + actions: [], # Available actions for the agent to use + output_schema: :schema_name # Optional schema for structured output + ) + end +end + +# To use the agent with parameters: +MyAgent.with(param: value).my_action.generate_now ``` These Prompt objects contain the context Messages and available Actions. These actions are the interface that agents can use to interact with tools through text and JSON views or interact with users through text and HTML views. diff --git a/docs/docs/framework/generation-provider.md b/docs/docs/framework/generation-provider.md index ff18855f..743165c6 100644 --- a/docs/docs/framework/generation-provider.md +++ b/docs/docs/framework/generation-provider.md @@ -125,5 +125,8 @@ For a complete example showing all three levels working together, see: For detailed documentation on specific providers and their features: +- [OpenAI Provider](/docs/generation-providers/openai-provider) - GPT-4, GPT-3.5, function calling, vision, and Azure OpenAI support +- [Anthropic Provider](/docs/generation-providers/anthropic-provider) - Claude 3.5 and Claude 3 models with extended context windows +- [Ollama Provider](/docs/generation-providers/ollama-provider) - Local LLM inference for privacy-sensitive applications - [OpenRouter Provider](/docs/generation-providers/open-router-provider) - Multi-model routing with fallbacks, PDF processing, and vision support diff --git a/docs/docs/framework/rails-integration.md b/docs/docs/framework/rails-integration.md index 80d5b1e2..bfefb5b3 100644 --- a/docs/docs/framework/rails-integration.md +++ b/docs/docs/framework/rails-integration.md @@ -15,8 +15,22 @@ You can pass messages to the agent from Action Controller, and the agent render ```ruby class MessagesController < ApplicationController def create - @agent = TravelAgent.with(messages: params[:messages]).generate_later - render json: @agent.response + # Use the class method with() to pass parameters, then call the action + generation = TravelAgent.with(message: params[:message]).prompt_context.generate_later + + # The generation object tracks the async job + render json: { job_id: generation.job_id } + end + + def show + # Check status of a generation + generation = ActiveAgent::Generation.find(params[:id]) + + if generation.finished? + render json: { response: generation.response.message.content } + else + render json: { status: "processing" } + end end end ``` diff --git a/docs/docs/generation-providers/anthropic-provider.md b/docs/docs/generation-providers/anthropic-provider.md new file mode 100644 index 00000000..3301145e --- /dev/null +++ b/docs/docs/generation-providers/anthropic-provider.md @@ -0,0 +1,324 @@ +# Anthropic Provider + +The Anthropic provider enables integration with Claude models including Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku. It offers advanced reasoning capabilities, extended context windows, and strong performance on complex tasks. + +## Configuration + +### Basic Setup + +Configure Anthropic in your agent: + +<<< @/../test/dummy/app/agents/anthropic_agent.rb{ruby:line-numbers} + +### Configuration File + +Set up Anthropic credentials in `config/active_agent.yml`: + +```yaml +development: + anthropic: + access_token: <%= Rails.application.credentials.dig(:anthropic, :api_key) %> + model: claude-3-5-sonnet-latest + max_tokens: 4096 + temperature: 0.7 + +production: + anthropic: + access_token: <%= Rails.application.credentials.dig(:anthropic, :api_key) %> + model: claude-3-5-sonnet-latest + max_tokens: 2048 + temperature: 0.3 +``` + +### Environment Variables + +Alternatively, use environment variables: + +```bash +ANTHROPIC_API_KEY=your-api-key +ANTHROPIC_VERSION=2023-06-01 # Optional API version +``` + +## Supported Models + +### Claude 3.5 Family +- **claude-3-5-sonnet-latest** - Most intelligent model with best performance +- **claude-3-5-sonnet-20241022** - Specific version for reproducibility + +### Claude 3 Family +- **claude-3-opus-latest** - Most capable Claude 3 model +- **claude-3-sonnet-20240229** - Balanced performance and cost +- **claude-3-haiku-20240307** - Fastest and most cost-effective + +## Features + +### Extended Context Window + +Claude models support up to 200K tokens of context: + +```ruby +class DocumentAnalyzer < ApplicationAgent + generate_with :anthropic, + model: "claude-3-5-sonnet-latest", + max_tokens: 4096 + + def analyze_document + @document = params[:document] # Can be very long + prompt instructions: "Analyze this document thoroughly" + end +end +``` + +### System Messages + +Anthropic models excel at following system instructions: + +```ruby +class SpecializedAgent < ApplicationAgent + generate_with :anthropic, + model: "claude-3-5-sonnet-latest", + system: "You are an expert Ruby developer specializing in Rails applications." + + def review_code + @code = params[:code] + prompt + end +end +``` + +### Tool Use + +Claude supports function calling through tool use: + +```ruby +class ToolAgent < ApplicationAgent + generate_with :anthropic, model: "claude-3-5-sonnet-latest" + + def process_request + @request = params[:request] + prompt # Includes all public methods as tools + end + + def search_database(query:, table:) + # Tool that Claude can call + ActiveRecord::Base.connection.execute( + "SELECT * FROM #{table} WHERE #{query}" + ) + end + + def calculate(expression:) + # Another available tool + eval(expression) # In production, use a safe math parser + end +end +``` + +### Streaming Responses + +Enable streaming for real-time output: + +```ruby +class StreamingClaudeAgent < ApplicationAgent + generate_with :anthropic, + model: "claude-3-5-sonnet-latest", + stream: true + + on_message_chunk do |chunk| + # Handle streaming chunks + ActionCable.server.broadcast("chat_#{params[:session_id]}", chunk) + end + + def chat + prompt(message: params[:message]) + end +end +``` + +### Vision Capabilities + +Claude models support image analysis: + +```ruby +class VisionAgent < ApplicationAgent + generate_with :anthropic, model: "claude-3-5-sonnet-latest" + + def analyze_image + @image_path = params[:image_path] + @image_base64 = Base64.encode64(File.read(@image_path)) + + prompt content_type: :text + end +end + +# In your view (analyze_image.text.erb): +# Analyze this image: [base64 image data would be included] +``` + +## Provider-Specific Parameters + +### Model Parameters + +- **`model`** - Model identifier (e.g., "claude-3-5-sonnet-latest") +- **`max_tokens`** - Maximum tokens to generate (required) +- **`temperature`** - Controls randomness (0.0 to 1.0) +- **`top_p`** - Nucleus sampling parameter +- **`top_k`** - Top-k sampling parameter +- **`stop_sequences`** - Array of sequences to stop generation + +### Metadata + +- **`metadata`** - Custom metadata for request tracking + ```ruby + generate_with :anthropic, + metadata: { + user_id: -> { Current.user&.id }, + request_id: -> { SecureRandom.uuid } + } + ``` + +### Safety Settings + +- **`anthropic_version`** - API version for consistent behavior +- **`anthropic_beta`** - Enable beta features + +## Error Handling + +Handle Anthropic-specific errors: + +```ruby +class ResilientAgent < ApplicationAgent + generate_with :anthropic, + model: "claude-3-5-sonnet-latest", + max_retries: 3 + + rescue_from Anthropic::RateLimitError do |error| + Rails.logger.warn "Rate limited: #{error.message}" + sleep(error.retry_after || 60) + retry + end + + rescue_from Anthropic::APIError do |error| + Rails.logger.error "Anthropic error: #{error.message}" + fallback_to_cached_response + end +end +``` + +## Testing + +Example test setup with Anthropic: + +```ruby +class AnthropicAgentTest < ActiveSupport::TestCase + test "generates response with Claude" do + VCR.use_cassette("anthropic_claude_response") do + response = AnthropicAgent.with( + message: "Explain Ruby blocks" + ).prompt_context.generate_now + + assert_not_nil response.message.content + assert response.message.content.include?("block") + + doc_example_output(response) + end + end +end +``` + +## Cost Optimization + +### Model Selection + +- Use Claude 3 Haiku for simple tasks +- Use Claude 3.5 Sonnet for complex reasoning +- Reserve Claude 3 Opus for the most demanding tasks + +### Token Management + +```ruby +class EfficientClaudeAgent < ApplicationAgent + generate_with :anthropic, + model: "claude-3-haiku-20240307", + max_tokens: 500 # Limit output length + + def quick_summary + @content = params[:content] + + # Truncate input if needed + if @content.length > 10_000 + @content = @content.truncate(10_000, omission: "... [truncated]") + end + + prompt instructions: "Provide a brief summary" + end +end +``` + +### Response Caching + +```ruby +class CachedClaudeAgent < ApplicationAgent + generate_with :anthropic, model: "claude-3-5-sonnet-latest" + + def answer_question + question = params[:question] + + cache_key = "claude_answer/#{Digest::SHA256.hexdigest(question)}" + + Rails.cache.fetch(cache_key, expires_in: 1.hour) do + prompt(message: question).generate_now + end + end +end +``` + +## Best Practices + +1. **Always specify max_tokens** - Required parameter for Anthropic +2. **Use appropriate models** - Balance cost and capability +3. **Leverage system messages** - Claude follows them very well +4. **Handle rate limits gracefully** - Implement exponential backoff +5. **Monitor token usage** - Track costs and optimize +6. **Use caching strategically** - Reduce API calls for repeated queries +7. **Validate outputs** - Especially for critical applications + +## Anthropic-Specific Considerations + +### Constitutional AI + +Claude is trained with Constitutional AI, making it particularly good at: +- Following ethical guidelines +- Refusing harmful requests +- Providing balanced perspectives +- Being helpful, harmless, and honest + +### Context Window Management + +```ruby +class LongContextAgent < ApplicationAgent + generate_with :anthropic, + model: "claude-3-5-sonnet-latest", + max_tokens: 4096 + + def analyze_codebase + # Claude can handle very large contexts effectively + @files = load_all_project_files # Up to 200K tokens + + prompt instructions: "Analyze this entire codebase" + end + + private + + def load_all_project_files + Dir.glob("app/**/*.rb").map do |file| + "// File: #{file}\n#{File.read(file)}" + end.join("\n\n") + end +end +``` + +## Related Documentation + +- [Generation Provider Overview](/docs/framework/generation-provider) +- [Configuration Guide](/docs/getting-started#configuration) +- [Anthropic API Documentation](https://docs.anthropic.com/claude/reference) \ No newline at end of file diff --git a/docs/docs/generation-providers/ollama-provider.md b/docs/docs/generation-providers/ollama-provider.md new file mode 100644 index 00000000..9c0d0119 --- /dev/null +++ b/docs/docs/generation-providers/ollama-provider.md @@ -0,0 +1,462 @@ +# Ollama Provider + +The Ollama provider enables local LLM inference using the Ollama platform. Run models like Llama 3, Mistral, and Gemma locally without sending data to external APIs, perfect for privacy-sensitive applications and development. + +## Configuration + +### Basic Setup + +Configure Ollama in your agent: + +<<< @/../test/dummy/app/agents/ollama_agent.rb#snippet{ruby:line-numbers} + +### Configuration File + +Set up Ollama in `config/active_agent.yml`: + +```yaml +development: + ollama: + host: http://localhost:11434 # Default Ollama host + model: llama3 + temperature: 0.7 + +production: + ollama: + host: <%= ENV['OLLAMA_HOST'] || 'http://localhost:11434' %> + model: llama3 + temperature: 0.3 +``` + +### Environment Variables + +Configure via environment: + +```bash +OLLAMA_HOST=http://localhost:11434 +OLLAMA_MODEL=llama3 +``` + +## Installing Ollama + +### macOS/Linux + +```bash +# Install Ollama +curl -fsSL https://ollama.ai/install.sh | sh + +# Start Ollama service +ollama serve + +# Pull a model +ollama pull llama3 +``` + +### Docker + +```bash +docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama +docker exec -it ollama ollama pull llama3 +``` + +## Supported Models + +### Popular Models + +- **llama3** - Meta's Llama 3 (8B, 70B) +- **mistral** - Mistral 7B +- **gemma** - Google's Gemma (2B, 7B) +- **codellama** - Code-specialized Llama +- **mixtral** - Mixture of experts model +- **phi** - Microsoft's Phi-2 +- **neural-chat** - Intel's fine-tuned model +- **qwen** - Alibaba's Qwen models + +### List Available Models + +```ruby +class OllamaAdmin < ApplicationAgent + generate_with :ollama + + def list_models + # Get list of installed models + response = HTTParty.get("#{ollama_host}/api/tags") + response["models"] + end + + private + + def ollama_host + Rails.configuration.active_agent.dig(:ollama, :host) || "http://localhost:11434" + end +end +``` + +## Features + +### Local Inference + +Run models completely offline: + +```ruby +class PrivateDataAgent < ApplicationAgent + generate_with :ollama, model: "llama3" + + def process_sensitive_data + @data = params[:sensitive_data] + # Data never leaves your infrastructure + prompt instructions: "Process this confidential information" + end +end +``` + +### Model Switching + +Easily switch between models: + +```ruby +class MultiModelAgent < ApplicationAgent + def code_review + # Use specialized code model + self.class.generate_with :ollama, model: "codellama" + @code = params[:code] + prompt + end + + def general_chat + # Use general purpose model + self.class.generate_with :ollama, model: "llama3" + @message = params[:message] + prompt + end +end +``` + +### Custom Models + +Use fine-tuned or custom models: + +```ruby +class CustomModelAgent < ApplicationAgent + generate_with :ollama, model: "my-custom-model:latest" + + before_action :ensure_model_exists + + private + + def ensure_model_exists + # Check if model is available + models = fetch_available_models + unless models.include?(generation_provider.model) + raise "Model #{generation_provider.model} not found. Run: ollama pull #{generation_provider.model}" + end + end +end +``` + +### Streaming Responses + +Stream responses for better UX: + +```ruby +class StreamingOllamaAgent < ApplicationAgent + generate_with :ollama, + model: "llama3", + stream: true + + on_message_chunk do |chunk| + # Handle streaming chunks + Rails.logger.info "Chunk: #{chunk}" + broadcast_to_client(chunk) + end + + def chat + prompt(message: params[:message]) + end +end +``` + +## Provider-Specific Parameters + +### Model Parameters + +- **`model`** - Model name (e.g., "llama3", "mistral") +- **`temperature`** - Controls randomness (0.0 to 1.0) +- **`top_p`** - Nucleus sampling +- **`top_k`** - Top-k sampling +- **`num_predict`** - Maximum tokens to generate +- **`stop`** - Stop sequences +- **`seed`** - For reproducible outputs + +### System Configuration + +- **`host`** - Ollama server URL (default: "http://localhost:11434") +- **`timeout`** - Request timeout in seconds +- **`keep_alive`** - Keep model loaded in memory + +### Advanced Options + +```ruby +class AdvancedOllamaAgent < ApplicationAgent + generate_with :ollama, + model: "llama3", + options: { + num_ctx: 4096, # Context window size + num_gpu: 1, # Number of GPUs to use + num_thread: 8, # Number of threads + repeat_penalty: 1.1, # Penalize repetition + mirostat: 2, # Mirostat sampling + mirostat_tau: 5.0, # Mirostat tau parameter + mirostat_eta: 0.1 # Mirostat learning rate + } +end +``` + +## Performance Optimization + +### Model Loading + +Keep models in memory for faster responses: + +```ruby +class FastOllamaAgent < ApplicationAgent + generate_with :ollama, + model: "llama3", + keep_alive: "5m" # Keep model loaded for 5 minutes + + def quick_response + @query = params[:query] + prompt + end +end +``` + +### Hardware Acceleration + +Configure GPU usage: + +```ruby +class GPUAgent < ApplicationAgent + generate_with :ollama, + model: "llama3", + options: { + num_gpu: -1, # Use all available GPUs + main_gpu: 0 # Primary GPU index + } +end +``` + +### Quantization + +Use quantized models for better performance: + +```bash +# Pull quantized versions +ollama pull llama3:8b-q4_0 # 4-bit quantization +ollama pull llama3:8b-q5_1 # 5-bit quantization +``` + +```ruby +class EfficientAgent < ApplicationAgent + # Use quantized model for faster inference + generate_with :ollama, model: "llama3:8b-q4_0" +end +``` + +## Error Handling + +Handle Ollama-specific errors: + +```ruby +class RobustOllamaAgent < ApplicationAgent + generate_with :ollama, model: "llama3" + + rescue_from Faraday::ConnectionFailed do |error| + Rails.logger.error "Ollama connection failed: #{error.message}" + render_ollama_setup_instructions + end + + rescue_from ActiveAgent::GenerationError do |error| + if error.message.include?("model not found") + pull_model_and_retry + else + raise + end + end + + private + + def pull_model_and_retry + system("ollama pull #{generation_provider.model}") + retry + end + + def render_ollama_setup_instructions + "Ollama is not running. Start it with: ollama serve" + end +end +``` + +## Testing + +Test with Ollama locally: + +```ruby +class OllamaAgentTest < ActiveSupport::TestCase + setup do + skip "Ollama not available" unless ollama_available? + end + + test "generates response with local model" do + response = OllamaAgent.with( + message: "Hello" + ).prompt_context.generate_now + + assert_not_nil response.message.content + doc_example_output(response) + end + + private + + def ollama_available? + response = Net::HTTP.get_response(URI("http://localhost:11434/api/tags")) + response.code == "200" + rescue + false + end +end +``` + +## Development Workflow + +### Local Development Setup + +```ruby +# config/environments/development.rb +Rails.application.configure do + config.active_agent = { + ollama: { + host: ENV['OLLAMA_HOST'] || 'http://localhost:11434', + model: ENV['OLLAMA_MODEL'] || 'llama3', + options: { + num_ctx: 4096, + temperature: 0.7 + } + } + } +end +``` + +### Docker Compose Setup + +```yaml +# docker-compose.yml +version: '3.8' +services: + ollama: + image: ollama/ollama + ports: + - "11434:11434" + volumes: + - ollama_data:/root/.ollama + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] + +volumes: + ollama_data: +``` + +## Best Practices + +1. **Pre-pull models** - Download models before first use +2. **Monitor memory usage** - Large models require significant RAM +3. **Use appropriate models** - Balance size and capability +4. **Keep models loaded** - Use keep_alive for frequently used models +5. **Implement fallbacks** - Handle connection failures gracefully +6. **Use quantization** - Reduce memory usage and increase speed +7. **Test locally** - Ensure models work before deployment + +## Ollama-Specific Considerations + +### Privacy First + +```ruby +class PrivacyFirstAgent < ApplicationAgent + generate_with :ollama, model: "llama3" + + def process_pii + @personal_data = params[:personal_data] + + # Data stays local - no external API calls + Rails.logger.info "Processing PII locally with Ollama" + + prompt instructions: "Process this data privately" + end +end +``` + +### Model Management + +```ruby +class ModelManager + def self.ensure_model(model_name) + models = list_models + unless models.include?(model_name) + pull_model(model_name) + end + end + + def self.list_models + response = HTTParty.get("http://localhost:11434/api/tags") + response["models"].map { |m| m["name"] } + end + + def self.pull_model(model_name) + system("ollama pull #{model_name}") + end + + def self.delete_model(model_name) + HTTParty.delete("http://localhost:11434/api/delete", + body: { name: model_name }.to_json, + headers: { 'Content-Type' => 'application/json' } + ) + end +end +``` + +### Deployment Considerations + +```ruby +# Ensure Ollama is available in production +class ApplicationAgent < ActiveAgent::Base + before_action :ensure_ollama_available, if: :using_ollama? + + private + + def using_ollama? + generation_provider.is_a?(ActiveAgent::GenerationProvider::OllamaProvider) + end + + def ensure_ollama_available + HTTParty.get("#{ollama_host}/api/tags") + rescue => e + raise "Ollama is not available: #{e.message}" + end + + def ollama_host + Rails.configuration.active_agent.dig(:ollama, :host) + end +end +``` + +## Related Documentation + +- [Generation Provider Overview](/docs/framework/generation-provider) +- [Configuration Guide](/docs/getting-started#configuration) +- [Ollama Documentation](https://ollama.ai/docs) +- [OpenRouter Provider](/docs/generation-providers/open-router-provider) - For cloud alternative \ No newline at end of file diff --git a/docs/docs/generation-providers/openai-provider.md b/docs/docs/generation-providers/openai-provider.md new file mode 100644 index 00000000..3d791da4 --- /dev/null +++ b/docs/docs/generation-providers/openai-provider.md @@ -0,0 +1,255 @@ +# OpenAI Provider + +The OpenAI provider enables integration with OpenAI's GPT models including GPT-4, GPT-4 Turbo, and GPT-3.5 Turbo. It supports advanced features like function calling, streaming responses, and structured outputs. + +## Configuration + +### Basic Setup + +Configure OpenAI in your agent: + +<<< @/../test/dummy/app/agents/open_ai_agent.rb#snippet{ruby:line-numbers} + +### Configuration File + +Set up OpenAI credentials in `config/active_agent.yml`: + +```yaml +development: + openai: + access_token: <%= Rails.application.credentials.dig(:openai, :api_key) %> + model: gpt-4o + temperature: 0.7 + max_tokens: 4096 + +production: + openai: + access_token: <%= Rails.application.credentials.dig(:openai, :api_key) %> + model: gpt-4o + temperature: 0.3 + max_tokens: 2048 +``` + +### Environment Variables + +Alternatively, use environment variables: + +```bash +OPENAI_ACCESS_TOKEN=your-api-key +OPENAI_ORGANIZATION_ID=your-org-id # Optional +``` + +## Supported Models + +- **GPT-4o** - Most capable model with vision capabilities +- **GPT-4o-mini** - Smaller, faster version of GPT-4o +- **GPT-4 Turbo** - Latest GPT-4 with 128k context +- **GPT-4** - Original GPT-4 model +- **GPT-3.5 Turbo** - Fast and cost-effective + +## Features + +### Function Calling + +OpenAI supports native function calling with automatic tool execution: + +```ruby +class DataAnalysisAgent < ApplicationAgent + generate_with :openai, model: "gpt-4o" + + def analyze_data + @data = params[:data] + prompt # Will include all public methods as available tools + end + + def calculate_average(numbers:) + numbers.sum.to_f / numbers.size + end + + def fetch_external_data(endpoint:) + # Tool that OpenAI can call + HTTParty.get(endpoint) + end +end +``` + +### Streaming Responses + +Enable real-time streaming for better user experience: + +```ruby +class StreamingAgent < ApplicationAgent + generate_with :openai, stream: true + + on_message_chunk do |chunk| + # Handle streaming chunks + broadcast_to_user(chunk) + end + + def chat + prompt(message: params[:message]) + end +end +``` + +### Vision Capabilities + +GPT-4o models support image analysis: + +```ruby +class VisionAgent < ApplicationAgent + generate_with :openai, model: "gpt-4o" + + def analyze_image + @image_url = params[:image_url] + prompt content_type: :text + end +end + +# In your view (analyze_image.text.erb): +# Analyze this image: <%= @image_url %> +``` + +### Structured Output + +Use JSON mode for structured responses: + +```ruby +class StructuredAgent < ApplicationAgent + generate_with :openai, + model: "gpt-4o", + response_format: { type: "json_object" } + + def extract_entities + @text = params[:text] + prompt( + output_schema: :entity_extraction, + instructions: "Extract entities and return as JSON" + ) + end +end +``` + +## Provider-Specific Parameters + +### Model Parameters + +- **`model`** - Model identifier (e.g., "gpt-4o", "gpt-3.5-turbo") +- **`temperature`** - Controls randomness (0.0 to 2.0) +- **`max_tokens`** - Maximum tokens in response +- **`top_p`** - Nucleus sampling parameter +- **`frequency_penalty`** - Penalize frequent tokens (-2.0 to 2.0) +- **`presence_penalty`** - Penalize new topics (-2.0 to 2.0) +- **`seed`** - For deterministic outputs +- **`response_format`** - Output format ({ type: "json_object" } or { type: "text" }) + +### Organization Settings + +- **`organization_id`** - OpenAI organization ID +- **`project_id`** - OpenAI project ID for usage tracking + +### Advanced Options + +- **`stream`** - Enable streaming responses (true/false) +- **`tools`** - Explicitly define available tools +- **`tool_choice`** - Control tool usage ("auto", "required", "none", or specific tool) +- **`parallel_tool_calls`** - Allow parallel tool execution (true/false) + +## Azure OpenAI + +For Azure OpenAI Service, configure a custom host: + +```ruby +class AzureAgent < ApplicationAgent + generate_with :openai, + access_token: Rails.application.credentials.dig(:azure, :api_key), + host: "https://your-resource.openai.azure.com", + api_version: "2024-02-01", + model: "your-deployment-name" +end +``` + +## Error Handling + +Handle OpenAI-specific errors: + +```ruby +class RobustAgent < ApplicationAgent + generate_with :openai, + max_retries: 3, + request_timeout: 30 + + rescue_from OpenAI::RateLimitError do |error| + Rails.logger.error "Rate limit hit: #{error.message}" + retry_with_backoff + end + + rescue_from OpenAI::APIError do |error| + Rails.logger.error "OpenAI API error: #{error.message}" + fallback_response + end +end +``` + +## Testing + +Use VCR for consistent tests: + +<<< @/../test/agents/open_ai_agent_test.rb#4-15{ruby:line-numbers} + +## Cost Optimization + +### Use Appropriate Models + +- Use GPT-3.5 Turbo for simple tasks +- Reserve GPT-4o for complex reasoning +- Consider GPT-4o-mini for a balance + +### Optimize Token Usage + +```ruby +class EfficientAgent < ApplicationAgent + generate_with :openai, + model: "gpt-3.5-turbo", + max_tokens: 500, # Limit response length + temperature: 0.3 # More focused responses + + def summarize + @content = params[:content] + # Truncate input if needed + @content = @content.truncate(3000) if @content.length > 3000 + prompt + end +end +``` + +### Cache Responses + +```ruby +class CachedAgent < ApplicationAgent + generate_with :openai + + def answer_faq + question = params[:question] + + Rails.cache.fetch("faq/#{question.parameterize}", expires_in: 1.day) do + prompt(message: question).generate_now + end + end +end +``` + +## Best Practices + +1. **Set appropriate temperature** - Lower for factual tasks, higher for creative +2. **Use system messages effectively** - Provide clear instructions +3. **Implement retry logic** - Handle transient failures +4. **Monitor usage** - Track token consumption and costs +5. **Use the latest models** - They're often more capable and cost-effective +6. **Validate outputs** - Especially for critical applications + +## Related Documentation + +- [Generation Provider Overview](/docs/framework/generation-provider) +- [Configuration Guide](/docs/getting-started#configuration) +- [OpenAI API Documentation](https://platform.openai.com/docs) \ No newline at end of file diff --git a/lib/active_agent/version.rb b/lib/active_agent/version.rb index 776899c7..a353cffa 100644 --- a/lib/active_agent/version.rb +++ b/lib/active_agent/version.rb @@ -1,3 +1,3 @@ module ActiveAgent - VERSION = "0.5.0" + VERSION = "0.6.0rc1" end