Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,6 @@ public Mono<StreamingResult> streamResponse(StructuredPrompt structuredPrompt, d
* @return completion text from the first successful provider attempt
*/
public Mono<String> complete(String prompt, double temperature) {
String truncatedPrompt = requestFactory.truncatePromptForCompletion(prompt);
return Mono.<String>defer(() -> {
List<OpenAiProviderCandidate> availableProviders =
providerRoutingService.selectAvailableProviderCandidates(clientPrimary, clientSecondary);
Expand All @@ -185,7 +184,7 @@ public Mono<String> complete(String prompt, double temperature) {
RateLimitService.ApiProvider activeProvider = providerCandidate.provider();

ResponseCreateParams requestParameters =
requestFactory.buildCompletionRequest(truncatedPrompt, temperature, activeProvider);
requestFactory.buildCompletionRequest(prompt, temperature, activeProvider);
try {
log.info("[LLM] Complete started (providerId={})", activeProvider.ordinal());
RequestOptions requestOptions = RequestOptions.builder()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,8 @@ public ResponseCreateParams buildCompletionRequest(
String prompt, double temperature, RateLimitService.ApiProvider provider) {
boolean useGitHubModels = provider == RateLimitService.ApiProvider.GITHUB_MODELS;
String modelId = normalizedModelId(useGitHubModels);
return buildResponseParams(prompt, temperature, modelId);
String truncatedPrompt = truncatePromptForCompletion(prompt, modelId);
return buildResponseParams(truncatedPrompt, temperature, modelId);
}

/**
Expand All @@ -124,16 +125,29 @@ public ResponseCreateParams buildCompletionRequest(
* @return original prompt when no truncation is required, otherwise a notice-prefixed prompt
*/
public String truncatePromptForCompletion(String prompt) {
return truncatePromptForCompletion(prompt, RateLimitService.ApiProvider.OPENAI);
}
Comment on lines 127 to +129
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead public API after this refactor

The no-arg truncatePromptForCompletion(String prompt) no longer has any production caller — its only call site in OpenAIStreamingService.complete() was removed by this PR, and buildCompletionRequest now invokes the private truncatePromptForCompletion(String, String) directly. Keeping the method conflicts with [AB1d] ("Delete unused code instead of keeping it 'just in case'") and [RC1b] ("No compatibility shims that hide defects"). The same applies to the new two-arg public overload at line 138, which is also reachable only from tests — buildCompletionRequest bypasses it and calls the private method itself.

Consider removing both public overloads and testing via buildCompletionRequest directly, which exercises the full provider-to-model-id path, or make the provider-arg overload the single public entry point.

Context Used: AGENTS.md (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
Line: 127-129

Comment:
**Dead public API after this refactor**

The no-arg `truncatePromptForCompletion(String prompt)` no longer has any production caller — its only call site in `OpenAIStreamingService.complete()` was removed by this PR, and `buildCompletionRequest` now invokes the private `truncatePromptForCompletion(String, String)` directly. Keeping the method conflicts with [AB1d] ("Delete unused code instead of keeping it 'just in case'") and [RC1b] ("No compatibility shims that hide defects"). The same applies to the new two-arg public overload at line 138, which is also reachable only from tests — `buildCompletionRequest` bypasses it and calls the private method itself.

Consider removing both public overloads and testing via `buildCompletionRequest` directly, which exercises the full provider-to-model-id path, or make the provider-arg overload the single public entry point.

**Context Used:** AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=c73518f6-94f2-4eb4-a597-3be5ff49a896))

How can I resolve this? If you propose a fix, please make it concise.


/**
* Truncates completion prompts to token limits for the selected provider's model.
*
* @param prompt full completion prompt
* @param provider provider chosen for this request attempt
* @return original prompt when no truncation is required, otherwise a notice-prefixed prompt
*/
public String truncatePromptForCompletion(String prompt, RateLimitService.ApiProvider provider) {
boolean useGitHubModels = provider == RateLimitService.ApiProvider.GITHUB_MODELS;
String modelId = normalizedModelId(useGitHubModels);
return truncatePromptForCompletion(prompt, modelId);
}

private String truncatePromptForCompletion(String prompt, String modelId) {
if (prompt == null || prompt.isEmpty()) {
return prompt;
}

String openaiModelId = normalizedModelId(false);
String githubModelId = normalizedModelId(true);
boolean gpt5Family = isGpt5Family(openaiModelId) || isGpt5Family(githubModelId);
boolean reasoningModel = gpt5Family
|| canonicalModelName(openaiModelId).startsWith("o")
|| canonicalModelName(githubModelId).startsWith("o");
boolean gpt5Family = isGpt5Family(modelId);
boolean reasoningModel = gpt5Family || canonicalModelName(modelId).startsWith("o");

int tokenLimit = reasoningModel ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
Comment on lines +149 to 152
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 o-series models receive GPT-5's 7K token limit regardless of context window

canonicalModelName(modelId).startsWith("o") captures o1, o3, o3-mini, etc. and routes them to MAX_TOKENS_GPT5_INPUT (7 000 tokens). Many o-series models expose far larger context windows and this mismatch silently truncates prompts that would fit. This was also true before the PR, but the refactor now makes this path the single authoritative one for both providers, so the blast radius is wider. At minimum the assumption should be documented; at best, o-series models should have their own named constant and explicit limit.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
Line: 149-152

Comment:
**`o`-series models receive GPT-5's 7K token limit regardless of context window**

`canonicalModelName(modelId).startsWith("o")` captures `o1`, `o3`, `o3-mini`, etc. and routes them to `MAX_TOKENS_GPT5_INPUT` (7 000 tokens). Many o-series models expose far larger context windows and this mismatch silently truncates prompts that would fit. This was also true before the PR, but the refactor now makes this path the single authoritative one for both providers, so the blast radius is wider. At minimum the assumption should be documented; at best, o-series models should have their own named constant and explicit limit.

How can I resolve this? If you propose a fix, please make it concise.

String truncatedPrompt = chunker.keepLastTokens(prompt, tokenLimit);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,29 @@ void buildCompletionRequestRetainsQualifiedGitHubModelIdentifier() {
assertTrue(responseCreateParams.maxOutputTokens().isEmpty());
assertEquals(0.25, responseCreateParams.temperature().orElseThrow(), 0.000_001);
}

@Test
void truncatePromptForCompletionUsesSelectedOpenAiModelLimit() {
OpenAiRequestFactory requestFactory =
new OpenAiRequestFactory(new Chunker(), new PromptTruncator(), "gpt-4o", "openai/gpt-5", "");
String prompt = "context ".repeat(8_000);

String truncatedPrompt =
requestFactory.truncatePromptForCompletion(prompt, RateLimitService.ApiProvider.OPENAI);

assertEquals(prompt, truncatedPrompt);
}

@Test
void truncatePromptForCompletionUsesSelectedGitHubModelsLimit() {
OpenAiRequestFactory requestFactory =
new OpenAiRequestFactory(new Chunker(), new PromptTruncator(), "gpt-4o", "gpt-5", "");
String prompt = "context ".repeat(8_000);

String truncatedPrompt =
requestFactory.truncatePromptForCompletion(prompt, RateLimitService.ApiProvider.GITHUB_MODELS);

assertTrue(truncatedPrompt.startsWith("[Context truncated due to GPT-5 8K input limit]"));
assertTrue(truncatedPrompt.length() < prompt.length());
}
}