-
Notifications
You must be signed in to change notification settings - Fork 872
Feat: Cost optimizations via model configs and prompt improvements #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
23fc62f
a890440
e65d615
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,18 +7,60 @@ import { | |
| LiteModels, | ||
| RegularModels, | ||
| } from "./config.types"; | ||
| import { env } from 'cloudflare:workers'; | ||
|
|
||
| export const AGENT_CONFIG: AgentConfig = { | ||
| // Common configs - these are good defaults | ||
| const COMMON_AGENT_CONFIGS = { | ||
| templateSelection: { | ||
| name: AIModels.GEMINI_2_5_FLASH_LITE, | ||
| max_tokens: 2000, | ||
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| temperature: 0.6, | ||
| }, | ||
| screenshotAnalysis: { | ||
| name: AIModels.DISABLED, | ||
| reasoning_effort: 'medium' as const, | ||
| max_tokens: 8000, | ||
| temperature: 1, | ||
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| }, | ||
| realtimeCodeFixer: { | ||
| name: AIModels.DISABLED, | ||
| reasoning_effort: 'low' as const, | ||
| max_tokens: 32000, | ||
| temperature: 1, | ||
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| }, | ||
| fastCodeFixer: { | ||
| name: AIModels.DISABLED, | ||
| reasoning_effort: undefined, | ||
| max_tokens: 64000, | ||
| temperature: 0.0, | ||
| fallbackModel: AIModels.GEMINI_2_5_PRO, | ||
| }, | ||
| } as const; | ||
|
|
||
| const SHARED_IMPLEMENTATION_CONFIG = { | ||
| reasoning_effort: 'low' as const, | ||
| max_tokens: 48000, | ||
| temperature: 0.2, | ||
| fallbackModel: AIModels.GEMINI_2_5_PRO, | ||
| }; | ||
|
|
||
| //====================================================================================== | ||
| // ATTENTION! Platform config requires specific API keys and Cloudflare AI Gateway setup. | ||
| //====================================================================================== | ||
| /* | ||
| These are the configs used at build.cloudflare.dev | ||
| You may need to provide API keys for these models in your environment or use | ||
| Cloudflare AI Gateway unified billing for seamless model access without managing multiple keys. | ||
| */ | ||
| const PLATFORM_AGENT_CONFIG: AgentConfig = { | ||
| ...COMMON_AGENT_CONFIGS, | ||
| blueprint: { | ||
| name: AIModels.GEMINI_3_PRO_PREVIEW, | ||
| name: AIModels.OPENAI_5_MINI, | ||
| reasoning_effort: 'medium', | ||
| max_tokens: 64000, | ||
| max_tokens: 32000, | ||
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| temperature: 1.0, | ||
| }, | ||
|
|
@@ -37,18 +79,12 @@ export const AGENT_CONFIG: AgentConfig = { | |
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| }, | ||
| firstPhaseImplementation: { | ||
| name: AIModels.GEMINI_3_PRO_PREVIEW, | ||
| reasoning_effort: 'low', | ||
| max_tokens: 48000, | ||
| temperature: 1, | ||
| fallbackModel: AIModels.GEMINI_2_5_PRO, | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| ...SHARED_IMPLEMENTATION_CONFIG, | ||
| }, | ||
| phaseImplementation: { | ||
| name: AIModels.GEMINI_3_PRO_PREVIEW, | ||
| reasoning_effort: 'low', | ||
| max_tokens: 48000, | ||
| temperature: 0.2, | ||
| fallbackModel: AIModels.GEMINI_2_5_PRO, | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| ...SHARED_IMPLEMENTATION_CONFIG, | ||
| }, | ||
| conversationalResponse: { | ||
| name: AIModels.GROK_4_FAST, | ||
|
|
@@ -71,31 +107,65 @@ export const AGENT_CONFIG: AgentConfig = { | |
| temperature: 1, | ||
| fallbackModel: AIModels.GROK_CODE_FAST_1, | ||
| }, | ||
| // Not used right now | ||
| screenshotAnalysis: { | ||
| }; | ||
|
|
||
| //====================================================================================== | ||
| // Default Gemini-only config (most likely used in your deployment) | ||
| //====================================================================================== | ||
| /* These are the default out-of-the box gemini-only models used when PLATFORM_MODEL_PROVIDERS is not set */ | ||
| const DEFAULT_AGENT_CONFIG: AgentConfig = { | ||
| ...COMMON_AGENT_CONFIGS, | ||
| blueprint: { | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| reasoning_effort: 'medium', | ||
| max_tokens: 64000, | ||
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| temperature: 0.7, | ||
| }, | ||
| projectSetup: { | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| ...SHARED_IMPLEMENTATION_CONFIG, | ||
| }, | ||
| phaseGeneration: { | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| ...SHARED_IMPLEMENTATION_CONFIG, | ||
| }, | ||
| firstPhaseImplementation: { | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| ...SHARED_IMPLEMENTATION_CONFIG, | ||
| }, | ||
| phaseImplementation: { | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| ...SHARED_IMPLEMENTATION_CONFIG, | ||
| }, | ||
| conversationalResponse: { | ||
| name: AIModels.GEMINI_2_5_FLASH, | ||
| reasoning_effort: 'low', | ||
| max_tokens: 4000, | ||
| temperature: 0, | ||
| fallbackModel: AIModels.GEMINI_2_5_PRO, | ||
| }, | ||
| deepDebugger: { | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| reasoning_effort: 'high', | ||
| max_tokens: 8000, | ||
| temperature: 1, | ||
| temperature: 0.5, | ||
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| }, | ||
| realtimeCodeFixer: { | ||
| name: AIModels.DISABLED, | ||
| fileRegeneration: { | ||
| name: AIModels.GEMINI_2_5_PRO, | ||
| reasoning_effort: 'low', | ||
| max_tokens: 32000, | ||
| temperature: 1, | ||
| temperature: 0, | ||
| fallbackModel: AIModels.GEMINI_2_5_FLASH, | ||
| }, | ||
| // Not used right now | ||
| fastCodeFixer: { | ||
| name: AIModels.DISABLED, | ||
| reasoning_effort: undefined, | ||
| max_tokens: 64000, | ||
| temperature: 0.0, | ||
| fallbackModel: AIModels.GEMINI_2_5_PRO, | ||
| }, | ||
| }; | ||
|
|
||
| export const AGENT_CONFIG: AgentConfig = env.PLATFORM_MODEL_PROVIDERS | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CRITICAL: Module-level env access will not work in Cloudflare Workers The Why this fails:
Example from feedback.ts (correct pattern): const submitFeedbackImplementation = async (args: FeedbackArgs) => {
const sentryDsn = env.SENTRY_DSN; // ✅ Inside function - works
// ...
}Required fix: export function getAgentConfig(env: Env): AgentConfig {
return env.PLATFORM_MODEL_PROVIDERS
? PLATFORM_AGENT_CONFIG
: DEFAULT_AGENT_CONFIG;
}Then update all call sites:
Impact: Without this fix, production cost optimizations will NEVER activate. |
||
| ? PLATFORM_AGENT_CONFIG | ||
| : DEFAULT_AGENT_CONFIG; | ||
|
|
||
|
|
||
| export const AGENT_CONSTRAINTS: Map<AgentActionKey, AgentConstraintConfig> = new Map([ | ||
| ['fastCodeFixer', { | ||
| allowedModels: new Set([AIModels.DISABLED]), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIGH PRIORITY: DRY Principle Violation
The two config branches (PLATFORM_AGENT_CONFIG and DEFAULT_AGENT_CONFIG) duplicate most of the configuration. This violates CLAUDE.md's core requirement: "Never copy-paste code - refactor into shared functions."
Current state:
What actually differs?
Only these fields change between production and default:
blueprint: Different model (OPENAI_5_MINI vs GEMINI_2_5_PRO), token limits, temperatureprojectSetup: GROK_4_FAST vs GEMINI_2_5_PROphaseGeneration: GROK_4_FAST vs GEMINI_2_5_PROconversationalResponse: Different model and temperaturedeepDebugger: OPENAI_5_MINI vs GEMINI_2_5_PRO, different temperaturefileRegeneration: Different token limitsRecommended refactor:
Benefits: