Multimodal: Providers Extended to handle image/pdf/multimodal flows#628
Multimodal: Providers Extended to handle image/pdf/multimodal flows#628
Conversation
…, and Multimodal parameters; update validation to restrict audio input in multimodal processing.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded multimodal support across the LLM stack: new Image/PDF content and input models, multimodal input model and typing, input resolution and cleanup utilities, provider implementations updated (OpenAI/Google) to handle images/PDFs/multimodal, job/registry plumbing adjusted, and comprehensive multimodal tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Resolver as resolve_input (utils)
participant Jobs as jobs.resolved_input_context
participant Provider as Provider (OpenAI/Google)
participant External as External LLM API
Client->>Resolver: submit QueryParams (text|audio|image|pdf|list)
Resolver-->>Client: (resolved_input, temp_path?)
Client->>Jobs: call resolved_input_context(resolved_input)
Jobs-->>Provider: execute(completion_config, query, resolved_input)
Provider->>External: send API request (text|multimodal)
External-->>Provider: API response (raw, usage)
Provider-->>Jobs: LLMCallResponse, provider_raw_response
Jobs-->>Client: final LLMCallResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/llm/providers/gai.py (1)
674-715:⚠️ Potential issue | 🟠 MajorMissing return statement for unhandled completion types.
If
completion_typedoesn't match any of the handled cases (stt, tts, text, image, pdf, multimodal), the function falls through to the exception handlers without returning. This could lead toNonebeing returned implicitly.Suggested fix
elif completion_type == "multimodal": return self._execute_multimodal( completion_config=completion_config, resolved_input=resolved_input, include_provider_raw_response=include_provider_raw_response, ) + + else: + return None, f"Unsupported completion type: {completion_type}" except TypeError as e:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 674 - 715, The dispatch on completion_type (variable completion_type) handles "stt","tts","text","image","pdf","multimodal" but has no default branch, so when an unknown type is passed the method can fall through and return None; update the dispatch to handle unknown values by adding a final else that either raises a clear exception (e.g., ValueError/UnsupportedCompletionTypeError with the offending completion_type included) or returns a well-formed error response object consistent with this service's API; ensure the change is applied in the method containing the calls to _execute_stt, _execute_tts, _execute_text, _execute_image, _execute_pdf, and _execute_multimodal so all paths return deterministically.
🧹 Nitpick comments (7)
backend/app/services/llm/providers/base.py (1)
24-28: Add return type hint to validator.For better type safety, add
Selfreturn type annotation to the validator method.Suggested fix
+from typing import Self + `@model_validator`(mode="after") - def validate_parts(self): + def validate_parts(self) -> Self: if not self.parts: raise ValueError("MultiModalInput requires at least one content part") return self🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/base.py` around lines 24 - 28, The validator method validate_parts should declare a Self return type; change its signature to return "Self" (e.g., def validate_parts(self) -> Self:) and ensure Self is imported (from typing or typing_extensions depending on Python target). Keep the `@model_validator`(mode="after") decorator and the existing body intact, only adding the return type annotation so the method returns a typed instance of the surrounding model class.backend/app/services/llm/mappers.py (1)
88-92: Update docstring to include reasoning parameter.The docstring's "Supported Mapping" section should include the newly supported
reasoningparameter to keep documentation in sync with the implementation.Suggested fix
Supported Mapping: - model → model - instructions → instructions (for STT prompts, if available) - temperature -> temperature parameter (0-2) + - reasoning → reasoning (if provided)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/mappers.py` around lines 88 - 92, Update the module/function docstring in mappers.py where the "Supported Mapping" list is defined (the module-level docstring or the docstring for the mapping function) to include the newly supported reasoning parameter by adding a line like "reasoning → reasoning" (or "reasoning → reasoning parameter") to the list so the documentation matches the implementation; ensure formatting matches the existing list style and placement among other entries (model, instructions, temperature).backend/app/models/llm/request.py (2)
59-101: Consider extracting common fields to a base class.
ImageLLMParams,PDFLLMParams, andMultimodalLLMParamsshare identical fields. A base class could reduce duplication and ensure consistency.class BaseVisualLLMParams(SQLModel): model: str instructions: str | None = None response_format: Literal["text"] | None = Field(None, description="Currently supports text type") temperature: float | None = Field(default=0.2, ge=0.0, le=2.0) reasoning: Literal["low", "medium", "high"] | None = None class ImageLLMParams(BaseVisualLLMParams): pass class PDFLLMParams(BaseVisualLLMParams): pass class MultimodalLLMParams(BaseVisualLLMParams): passThis allows future differentiation while reducing current duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 59 - 101, Extract the duplicated fields into a shared base model and have ImageLLMParams, PDFLLMParams, and MultimodalLLMParams inherit from it: create a BaseVisualLLMParams SQLModel containing model, instructions, response_format, temperature, and reasoning, then redefine ImageLLMParams, PDFLLMParams, and MultimodalLLMParams to subclass BaseVisualLLMParams (no additional fields) so the shared schema lives in one place and future extensions can add specific fields to the subclasses.
2-2: Use modern type annotation syntax.
Listfromtypingis deprecated in Python 3.9+. Use the built-inlistinstead.Suggested fix
-from typing import Annotated, Any, List, Literal, Union +from typing import Annotated, Any, LiteralNote:
Listdoesn't appear to be used in this file. If it's needed elsewhere, use built-inlistinstead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` at line 2, The import statement currently pulls List from typing which is deprecated; update the typing import in the top-level import (the line importing Annotated, Any, List, Literal, Union) by removing List and, where any list annotations exist in this module, replace typing.List[...] with the built-in list[...] syntax (or if no list is used, simply remove List from the import). Ensure you update references that use List to use built-in list and clean up the unused List import to avoid linter warnings.backend/app/services/llm/providers/gai.py (1)
380-665: Consider extracting shared logic to reduce code duplication.The methods
_execute_image,_execute_pdf,_execute_text, and_execute_multimodalshare significant duplicate code for:
- Model validation
- Generation kwargs setup (instructions, temperature, thinking_config)
- Usage metadata extraction
- LLMCallResponse construction
Consider extracting common logic into helper methods:
def _build_generation_config(self, params: dict) -> tuple[dict, list]: """Extract generation kwargs and contents from params.""" # Shared logic for instructions, temperature, thinking_level ... def _extract_usage(self, response, method_name: str) -> tuple[int, int, int, int]: """Extract usage metadata with fallback to zeros.""" ... def _build_response(self, response, model, provider, include_raw: bool) -> LLMCallResponse: """Build standardized LLMCallResponse.""" ...This would significantly reduce the ~300 lines of near-duplicate code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 380 - 665, The four methods (_execute_image, _execute_pdf, _execute_text, _execute_multimodal) duplicate model validation, building contents/generation_kwargs (instructions, temperature, thinking_config), usage extraction, and LLMCallResponse construction; extract that shared logic into helpers—e.g., add _build_generation_config(params) returning (generation_kwargs, contents), _extract_usage(response, method_name) returning (input_tokens, output_tokens, total_tokens, reasoning_tokens), and _build_response(response, model, provider, include_provider_raw_response) returning an LLMCallResponse—and call these from each _execute_* method after validating model (keep the existing model check in each method but delegate the rest), replacing repeated blocks with calls to those helpers to remove duplication.backend/app/services/llm/jobs.py (1)
107-110: Add return type hint to context manager.As per coding guidelines, all functions should have type hints for return values. Context managers should annotate their return type.
Suggested fix
+from typing import Any, Generator + `@contextmanager` def resolved_input_context( query_input: TextInput | AudioInput | ImageInput | PDFInput | list, -): +) -> Generator[Any, None, None]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 107 - 110, The context manager function resolved_input_context lacks a return type hint; update its signature to include a typed iterator return such as -> Iterator[Union[TextInput, AudioInput, ImageInput, PDFInput, list]] (or -> Generator[Union[...], None, None]) and add the necessary typing imports (Iterator/Generator and Union) so the `@contextmanager-decorated` function is properly annotated; reference the function name resolved_input_context and ensure the return union mirrors the existing parameter union.backend/app/services/llm/providers/registry.py (1)
74-74: Log message should use function name prefix.As per coding guidelines, log messages should be prefixed with the function name in square brackets.
Suggested fix
- logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True) + logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/registry.py` at line 74, The logger.error call that logs "Failed to initialize {provider_type} client: {e}" must be prefixed with the current function name in square brackets; update the logger.error invocation in the function containing that line to compute the function name (e.g., via import inspect and using inspect.currentframe().f_code.co_name) and include it like f"[{inspect.currentframe().f_code.co_name}] Failed to initialize {provider_type} client: {e}", keeping exc_info=True; modify only the logger.error call that references provider_type and e.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 117-118: Remove the debug print statement and replace it with a
proper logger call: delete print(f"Resolved input: {resolved_input}, error:
{error}") and instead call the module logger (e.g., logger.debug or
logging.getLogger(__name__).debug) to emit the same message and variables
(resolved_input and error); if no logger exists in this module, instantiate one
with logging.getLogger(__name__) at top-level before using it.
In `@backend/app/services/llm/providers/base.py`:
- Around line 71-75: The ternary that builds hint uses an unnecessary f-string
for the else branch; update the hint assignment so the else branch uses a plain
string (remove the leading `f`) where it currently says f" Please ensure the
input type matches the completion type." — locate the hint variable construction
that checks isinstance(resolved_input, MultiModalInput) and remove the `f`
prefix from the else string.
In `@backend/app/services/llm/providers/gai.py`:
- Around line 446-448: The success log in GoogleAIProvider._execute_image
incorrectly references "_execute_text"; update the logger.info call in the
GoogleAIProvider._execute_image method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_image] Successfully generated image response:
{response.response_id}") so logs accurately reflect the operation; locate the
logger.info invocation inside _execute_image and change the message string
accordingly while preserving the response.response_id interpolation.
- Around line 565-568: The warning log in GoogleAIProvider currently references
the wrong function name and uses an unnecessary f-string; update the
logger.warning call in _execute_text to reference
"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
(remove the f-prefix since there is no interpolation) so the message accurately
names the function and avoids the needless formatted string.
- Around line 491-494: The warning log in GoogleAIProvider incorrectly names the
function `_execute_stt` and uses an unnecessary f-string; update the
logger.warning call inside `_execute_pdf` to reference `_execute_pdf` instead of
`_execute_stt` and remove the f-prefix so the message is a plain string (e.g., "
[GoogleAIProvider._execute_pdf] Response missing usage_metadata, using zeros").
Ensure you modify the logger.warning invocation in the `_execute_pdf` method.
- Around line 421-423: Update the logger.warning call in
GoogleAIProvider._execute_image to reference the correct function name and
remove the unnecessary f-string; specifically change the message from
"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to "[GoogleAIProvider._execute_image] Response missing usage_metadata, using
zeros" and make it a plain string literal (no f-prefix) in the logger.warning
invocation.
- Around line 662-664: The success log in GoogleAIProvider._execute_multimodal
incorrectly references "_execute_text"; update the logger.info call inside the
_execute_multimodal method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_multimodal] Successfully generated text response:
{response.response_id}") so the message matches the method where
response.response_id is produced.
- Around line 517-519: The logger.info message in GoogleAIProvider._execute_pdf
incorrectly references "_execute_text"; update the log string in the logger.info
call within the GoogleAIProvider._execute_pdf method to reference "_execute_pdf"
(e.g., keep using response.response_id but change the label to indicate
_execute_pdf) so the success log correctly identifies the originating method.
- Around line 636-639: Update the logger call inside
GoogleAIProvider._execute_multimodal to reference the correct function name and
remove the unnecessary f-string: change the message from
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to a plain string like "[GoogleAIProvider._execute_multimodal] Response missing
usage_metadata, using zeros" so the log correctly identifies _execute_multimodal
and doesn't use an f-prefix.
In `@backend/app/services/llm/providers/oai.py`:
- Around line 49-61: The data URL builder in
backend/app/services/llm/providers/oai.py (inside the branches handling
ImageContent and PDFContent where part.format == "base64") uses part.mime_type
which may be None, producing invalid URLs like "data:None;base64,..."; update
the logic that constructs url so it uses a safe fallback MIME type (e.g.,
"application/octet-stream") when part.mime_type is falsy before interpolating,
keeping the rest of the behavior and the same keys added to items
({"type":"input_image","image_url":...} and
{"type":"input_file","file_url":...}) so other code consuming items remains
unchanged.
In `@backend/app/utils.py`:
- Around line 486-488: The function resolve_input is missing a parameter type
for query_input and uses a quoted forward reference in its return annotation;
update the signature to explicitly type query_input (e.g. Union[str,
list[ImageContent], list[PDFContent], MultiModalInput]) and remove the quotes
around MultiModalInput in the return type so the annotation reads tuple[str |
list[ImageContent] | list[PDFContent] | MultiModalInput, str | None]; keep the
existing import names (ImageContent, PDFContent, MultiModalInput) and adjust any
imports if needed.
- Line 47: ContentPart is too broad because it includes AudioContent while
multimodal inputs explicitly exclude audio; update the type alias in ContentPart
to only include the allowed types (TextContent, ImageContent, PDFContent) so it
matches MultiModalInput.parts and the logic in providers/base.py, removing
AudioContent from the union to tighten type safety.
---
Outside diff comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 674-715: The dispatch on completion_type (variable
completion_type) handles "stt","tts","text","image","pdf","multimodal" but has
no default branch, so when an unknown type is passed the method can fall through
and return None; update the dispatch to handle unknown values by adding a final
else that either raises a clear exception (e.g.,
ValueError/UnsupportedCompletionTypeError with the offending completion_type
included) or returns a well-formed error response object consistent with this
service's API; ensure the change is applied in the method containing the calls
to _execute_stt, _execute_tts, _execute_text, _execute_image, _execute_pdf, and
_execute_multimodal so all paths return deterministically.
---
Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Around line 59-101: Extract the duplicated fields into a shared base model and
have ImageLLMParams, PDFLLMParams, and MultimodalLLMParams inherit from it:
create a BaseVisualLLMParams SQLModel containing model, instructions,
response_format, temperature, and reasoning, then redefine ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams to subclass BaseVisualLLMParams (no
additional fields) so the shared schema lives in one place and future extensions
can add specific fields to the subclasses.
- Line 2: The import statement currently pulls List from typing which is
deprecated; update the typing import in the top-level import (the line importing
Annotated, Any, List, Literal, Union) by removing List and, where any list
annotations exist in this module, replace typing.List[...] with the built-in
list[...] syntax (or if no list is used, simply remove List from the import).
Ensure you update references that use List to use built-in list and clean up the
unused List import to avoid linter warnings.
In `@backend/app/services/llm/jobs.py`:
- Around line 107-110: The context manager function resolved_input_context lacks
a return type hint; update its signature to include a typed iterator return such
as -> Iterator[Union[TextInput, AudioInput, ImageInput, PDFInput, list]] (or ->
Generator[Union[...], None, None]) and add the necessary typing imports
(Iterator/Generator and Union) so the `@contextmanager-decorated` function is
properly annotated; reference the function name resolved_input_context and
ensure the return union mirrors the existing parameter union.
In `@backend/app/services/llm/mappers.py`:
- Around line 88-92: Update the module/function docstring in mappers.py where
the "Supported Mapping" list is defined (the module-level docstring or the
docstring for the mapping function) to include the newly supported reasoning
parameter by adding a line like "reasoning → reasoning" (or "reasoning →
reasoning parameter") to the list so the documentation matches the
implementation; ensure formatting matches the existing list style and placement
among other entries (model, instructions, temperature).
In `@backend/app/services/llm/providers/base.py`:
- Around line 24-28: The validator method validate_parts should declare a Self
return type; change its signature to return "Self" (e.g., def
validate_parts(self) -> Self:) and ensure Self is imported (from typing or
typing_extensions depending on Python target). Keep the
`@model_validator`(mode="after") decorator and the existing body intact, only
adding the return type annotation so the method returns a typed instance of the
surrounding model class.
In `@backend/app/services/llm/providers/gai.py`:
- Around line 380-665: The four methods (_execute_image, _execute_pdf,
_execute_text, _execute_multimodal) duplicate model validation, building
contents/generation_kwargs (instructions, temperature, thinking_config), usage
extraction, and LLMCallResponse construction; extract that shared logic into
helpers—e.g., add _build_generation_config(params) returning (generation_kwargs,
contents), _extract_usage(response, method_name) returning (input_tokens,
output_tokens, total_tokens, reasoning_tokens), and _build_response(response,
model, provider, include_provider_raw_response) returning an LLMCallResponse—and
call these from each _execute_* method after validating model (keep the existing
model check in each method but delegate the rest), replacing repeated blocks
with calls to those helpers to remove duplication.
In `@backend/app/services/llm/providers/registry.py`:
- Line 74: The logger.error call that logs "Failed to initialize {provider_type}
client: {e}" must be prefixed with the current function name in square brackets;
update the logger.error invocation in the function containing that line to
compute the function name (e.g., via import inspect and using
inspect.currentframe().f_code.co_name) and include it like
f"[{inspect.currentframe().f_code.co_name}] Failed to initialize {provider_type}
client: {e}", keeping exc_info=True; modify only the logger.error call that
references provider_type and e.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
backend/app/models/llm/__init__.pybackend/app/models/llm/request.pybackend/app/services/llm/jobs.pybackend/app/services/llm/mappers.pybackend/app/services/llm/providers/base.pybackend/app/services/llm/providers/gai.pybackend/app/services/llm/providers/oai.pybackend/app/services/llm/providers/registry.pybackend/app/utils.py
backend/app/services/llm/jobs.py
Outdated
| print(f"Resolved input: {resolved_input}, error: {error}") | ||
|
|
There was a problem hiding this comment.
Remove debug print statement.
Debug print() statements should not be committed to production code. Use the logger if debugging information is needed.
Suggested fix
resolved_input, error = resolve_input(query_input)
- print(f"Resolved input: {resolved_input}, error: {error}")
-📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"Resolved input: {resolved_input}, error: {error}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/jobs.py` around lines 117 - 118, Remove the debug
print statement and replace it with a proper logger call: delete
print(f"Resolved input: {resolved_input}, error: {error}") and instead call the
module logger (e.g., logger.debug or logging.getLogger(__name__).debug) to emit
the same message and variables (resolved_input and error); if no logger exists
in this module, instantiate one with logging.getLogger(__name__) at top-level
before using it.
| logger.warning( | ||
| f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" | ||
| ) |
There was a problem hiding this comment.
Fix incorrect function name in log message and remove unnecessary f-prefix.
The log message references _execute_stt but this is in _execute_image. Also, remove the unnecessary f prefix.
Suggested fix
else:
logger.warning(
- f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+ "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.warning( | |
| f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" | |
| ) | |
| else: | |
| logger.warning( | |
| "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros" | |
| ) |
🧰 Tools
🪛 Ruff (0.15.2)
[error] 422-422: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/providers/gai.py` around lines 421 - 423, Update the
logger.warning call in GoogleAIProvider._execute_image to reference the correct
function name and remove the unnecessary f-string; specifically change the
message from "[GoogleAIProvider._execute_stt] Response missing usage_metadata,
using zeros" to "[GoogleAIProvider._execute_image] Response missing
usage_metadata, using zeros" and make it a plain string literal (no f-prefix) in
the logger.warning invocation.
| logger.info( | ||
| f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}" | ||
| ) |
There was a problem hiding this comment.
Fix incorrect function name in log message.
The success log references _execute_text but this is in _execute_image.
Suggested fix
logger.info(
- f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}"
+ f"[GoogleAIProvider._execute_image] Successfully generated image response: {response.response_id}"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.info( | |
| f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}" | |
| ) | |
| logger.info( | |
| f"[GoogleAIProvider._execute_image] Successfully generated image response: {response.response_id}" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/providers/gai.py` around lines 446 - 448, The
success log in GoogleAIProvider._execute_image incorrectly references
"_execute_text"; update the logger.info call in the
GoogleAIProvider._execute_image method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_image] Successfully generated image response:
{response.response_id}") so logs accurately reflect the operation; locate the
logger.info invocation inside _execute_image and change the message string
accordingly while preserving the response.response_id interpolation.
| else: | ||
| logger.warning( | ||
| f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" | ||
| ) |
There was a problem hiding this comment.
Fix incorrect function name in log message and remove unnecessary f-prefix.
The log message references _execute_stt but this is in _execute_multimodal.
Suggested fix
else:
logger.warning(
- f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+ "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else: | |
| logger.warning( | |
| f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" | |
| ) | |
| else: | |
| logger.warning( | |
| "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros" | |
| ) |
🧰 Tools
🪛 Ruff (0.15.2)
[error] 638-638: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/providers/gai.py` around lines 636 - 639, Update the
logger call inside GoogleAIProvider._execute_multimodal to reference the correct
function name and remove the unnecessary f-string: change the message from
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to a plain string like "[GoogleAIProvider._execute_multimodal] Response missing
usage_metadata, using zeros" so the log correctly identifies _execute_multimodal
and doesn't use an f-prefix.
| elif isinstance(part, ImageContent): | ||
| if part.format == "base64": | ||
| url = f"data:{part.mime_type};base64,{part.value}" | ||
| else: | ||
| url = part.value | ||
| items.append({"type": "input_image", "image_url": url}) | ||
|
|
||
| elif isinstance(part, PDFContent): | ||
| if part.format == "base64": | ||
| url = f"data:{part.mime_type};base64,{part.value}" | ||
| else: | ||
| url = part.value | ||
| items.append({"type": "input_file", "file_url": url}) |
There was a problem hiding this comment.
Handle potential None mime_type when building data URLs.
When part.format == "base64", the code builds a data URL using part.mime_type, but mime_type is optional (can be None). This would result in data:None;base64,... which is invalid.
Suggested fix
elif isinstance(part, ImageContent):
if part.format == "base64":
+ mime = part.mime_type or "image/png"
- url = f"data:{part.mime_type};base64,{part.value}"
+ url = f"data:{mime};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_image", "image_url": url})
elif isinstance(part, PDFContent):
if part.format == "base64":
+ mime = part.mime_type or "application/pdf"
- url = f"data:{part.mime_type};base64,{part.value}"
+ url = f"data:{mime};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_file", "file_url": url})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif isinstance(part, ImageContent): | |
| if part.format == "base64": | |
| url = f"data:{part.mime_type};base64,{part.value}" | |
| else: | |
| url = part.value | |
| items.append({"type": "input_image", "image_url": url}) | |
| elif isinstance(part, PDFContent): | |
| if part.format == "base64": | |
| url = f"data:{part.mime_type};base64,{part.value}" | |
| else: | |
| url = part.value | |
| items.append({"type": "input_file", "file_url": url}) | |
| elif isinstance(part, ImageContent): | |
| if part.format == "base64": | |
| mime = part.mime_type or "image/png" | |
| url = f"data:{mime};base64,{part.value}" | |
| else: | |
| url = part.value | |
| items.append({"type": "input_image", "image_url": url}) | |
| elif isinstance(part, PDFContent): | |
| if part.format == "base64": | |
| mime = part.mime_type or "application/pdf" | |
| url = f"data:{mime};base64,{part.value}" | |
| else: | |
| url = part.value | |
| items.append({"type": "input_file", "file_url": url}) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/providers/oai.py` around lines 49 - 61, The data URL
builder in backend/app/services/llm/providers/oai.py (inside the branches
handling ImageContent and PDFContent where part.format == "base64") uses
part.mime_type which may be None, producing invalid URLs like
"data:None;base64,..."; update the logic that constructs url so it uses a safe
fallback MIME type (e.g., "application/octet-stream") when part.mime_type is
falsy before interpolating, keeping the rest of the behavior and the same keys
added to items ({"type":"input_image","image_url":...} and
{"type":"input_file","file_url":...}) so other code consuming items remains
unchanged.
There was a problem hiding this comment.
resolve_image_content and resolve_pdf_content already sets default mime type so no need to resolve this in provider level. false alarm
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify that resolve_image_content and resolve_pdf_content set default MIME types
cat backend/app/utils.py | grep -A 20 "def resolve_image_content"
echo "---"
cat backend/app/utils.py | grep -A 20 "def resolve_pdf_content"
Length of output: 1255
@vprashrex, you're absolutely right! I can confirm that resolve_image_content and resolve_pdf_content in backend/app/utils.py already ensure default MIME types are set:
# In resolve_image_content:
if not c.mime_type:
c.mime_type = "image/png"
# In resolve_pdf_content:
if not c.mime_type:
c.mime_type = "application/pdf"By the time the content reaches the provider level, mime_type is guaranteed to be non-None. My review comment was indeed a false alarm. Thank you for the clarification! 🙏
✏️ Learnings added
Learnt from: vprashrex
Repo: ProjectTech4DevAI/kaapi-backend PR: 628
File: backend/app/services/llm/providers/oai.py:49-61
Timestamp: 2026-02-26T03:16:20.731Z
Learning: In backend/app/utils.py, the functions resolve_image_content and resolve_pdf_content ensure that mime_type is always set with defaults ("image/png" for images, "application/pdf" for PDFs) before content reaches provider implementations. Provider code in backend/app/services/llm/providers/ can safely assume mime_type is never None.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| def resolve_input( | ||
| query_input, | ||
| ) -> tuple[str | list[ImageContent] | list[PDFContent] | "MultiModalInput", str | None]: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -50 backend/app/utils.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1603
🏁 Script executed:
sed -n '1,50p' backend/app/utils.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1603
🏁 Script executed:
grep -n "from __future__" backend/app/utils.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 111
🏁 Script executed:
sed -n '486,540p' backend/app/utils.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 2637
🏁 Script executed:
sed -n '1,100p' backend/app/utils.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 3417
Add explicit type annotation for query_input and remove the unnecessary quoted forward reference in the return type.
The query_input parameter lacks a type annotation, violating the coding guideline that requires type hints on all function parameters. Additionally, "MultiModalInput" does not need quotes in the return annotation since from __future__ import annotations is enabled.
Proposed fix
def resolve_input(
- query_input,
-) -> tuple[str | list[ImageContent] | list[PDFContent] | "MultiModalInput", str | None]:
+ query_input: TextInput | AudioInput | ImageInput | PDFInput | list[TextInput | AudioInput | ImageInput | PDFInput],
+) -> tuple[str | list[ImageContent] | list[PDFContent] | MultiModalInput, str | None]:🧰 Tools
🪛 Ruff (0.15.2)
[warning] 488-488: Remove quotes from type annotation
Remove quotes
(UP037)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/utils.py` around lines 486 - 488, The function resolve_input is
missing a parameter type for query_input and uses a quoted forward reference in
its return annotation; update the signature to explicitly type query_input (e.g.
Union[str, list[ImageContent], list[PDFContent], MultiModalInput]) and remove
the quotes around MultiModalInput in the return type so the annotation reads
tuple[str | list[ImageContent] | list[PDFContent] | MultiModalInput, str |
None]; keep the existing import names (ImageContent, PDFContent,
MultiModalInput) and adjust any imports if needed.
There was a problem hiding this comment.
its just for type annotation, not going to have any effect on functionality wise
There was a problem hiding this comment.
@vprashrex, you're absolutely right that type hints have no effect on runtime performance — Python remains a dynamically typed language and CPython ignores type annotations during execution.
While the runtime impact is zero, type hints do provide value through:
- Static type checking with tools like mypy or pyright
- Enhanced IDE autocomplete and error detection
- Improved code readability and maintainability
- Better documentation for other developers
That said, I understand your decision here. Feel free to keep the current implementation as-is!
…tter type management and update relevant classes to use it.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
backend/app/services/llm/providers/gai.py (3)
565-572: Remove unnecessary f-string prefix.The warning message has no placeholders but uses an f-string prefix.
Suggested fix
else: logger.warning( - f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 565 - 572, The logger.warning call in GoogleAIProvider._execute_text uses an unnecessary f-string prefix for a static message; remove the f prefix so the call becomes logger.warning("[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros") to avoid misleading formatting usage while leaving the surrounding logic (setting input_tokens, output_tokens, total_tokens, reasoning_tokens to 0) unchanged.
420-427: Remove unnecessary f-string prefix.The warning message has no placeholders but uses an f-string prefix.
Suggested fix
else: logger.warning( - f"[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 420 - 427, The logger.warning call in GoogleAIProvider._execute_image uses an unnecessary f-string for a static message; replace the f-string literal passed to logger.warning with a plain string literal (i.e., remove the leading "f") so the call becomes logger.warning("[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"), leaving the surrounding logic that sets input_tokens, output_tokens, total_tokens, and reasoning_tokens unchanged.
636-643: Remove unnecessary f-string prefix.The warning message has no placeholders but uses an f-string prefix.
Suggested fix
else: logger.warning( - f"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 636 - 643, In GoogleAIProvider._execute_multimodal replace the logger.warning call that uses an unnecessary f-string (currently f"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros") with a plain string literal (remove the leading f) so the message is logged without the redundant f-string prefix; keep the rest of the block (setting input_tokens, output_tokens, total_tokens, reasoning_tokens to 0) unchanged.backend/app/models/llm/request.py (3)
119-126: Consider usingX | Ysyntax for type unions.Per Python 3.10+ best practices (and PEP 604), the
|operator is preferred overUnion[]for type annotations. Sincefrom __future__ import annotationsisn't present here, this is optional but improves readability.Suggested fix
-KaapiLLMParams = Union[ - TextLLMParams, - STTLLMParams, - TTSLLMParams, - ImageLLMParams, - PDFLLMParams, - MultimodalLLMParams, -] +KaapiLLMParams = ( + TextLLMParams + | STTLLMParams + | TTSLLMParams + | ImageLLMParams + | PDFLLMParams + | MultimodalLLMParams +)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 119 - 126, Replace the Union[...] alias KaapiLLMParams with the PEP 604 union operator form using | (e.g., TextLLMParams | STTLLMParams | ...), and optionally add "from __future__ import annotations" at the top if you want to ensure forward-compatible evaluation of annotations; update the KaapiLLMParams definition accordingly so it references TextLLMParams, STTLLMParams, TTSLLMParams, ImageLLMParams, PDFLLMParams, and MultimodalLLMParams using the | operator.
59-116: Consider extracting common fields to reduce duplication.
ImageLLMParams,PDFLLMParams, andMultimodalLLMParamsshare identical fields (model, instructions, knowledge_base_ids, temperature, reasoning, max_num_results). Consider a base class to reduce duplication.Example approach
class BaseVisionLLMParams(SQLModel): model: str instructions: str | None = None knowledge_base_ids: list[str] | None = Field( default=None, description="List of vector store IDs to use for knowledge retrieval", ) temperature: float | None = Field(default=0.2, ge=0.0, le=2.0) reasoning: Literal["low", "medium", "high"] | None = None max_num_results: int | None = Field( default=None, ge=1, description="Maximum number of candidate results to return", ) class ImageLLMParams(BaseVisionLLMParams): pass class PDFLLMParams(BaseVisionLLMParams): pass class MultimodalLLMParams(BaseVisionLLMParams): pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 59 - 116, The three classes ImageLLMParams, PDFLLMParams, and MultimodalLLMParams duplicate the same fields; extract a shared base SQLModel (e.g., BaseVisionLLMParams) containing model, instructions, knowledge_base_ids, temperature, reasoning, and max_num_results and have ImageLLMParams, PDFLLMParams, and MultimodalLLMParams subclass that base (or alias it) so the duplicated field definitions are removed and maintained in one place.
2-2: Use built-inlistinstead of deprecatedtyping.List.Static analysis correctly identifies that
typing.Listis deprecated in Python 3.9+. Use the built-inlisttype instead.Suggested fix
-from typing import Annotated, Any, List, Literal, Union +from typing import Annotated, Any, Literal, Union🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` at line 2, Replace the deprecated typing.List import with the built-in list type: update the import line in request.py to remove List from typing and update any type annotations that use List[...] to use built-in list[...] (references: the import line containing Annotated, Any, List, Literal, Union and any annotations in this module that use List, e.g., List[str] -> list[str]); ensure no other typing.List usages remain and run type checks to confirm correctness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 58-97: When building payloads in the ImageContent and PDFContent
branches, part.mime_type can be None which may cause Gemini API rejects; update
both branches to coalesce part.mime_type into a safe default (e.g., mime =
part.mime_type or "application/octet-stream" or use a more specific default like
"image/png" for images or "application/pdf" for PDFs) and then use mime for the
"mime_type" field in the inline_data and file_data dicts (affecting the items
list entries, the "inline_data"/"file_data" payloads, and the
ImageContent/PDFContent handling in format_parts).
---
Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Around line 119-126: Replace the Union[...] alias KaapiLLMParams with the PEP
604 union operator form using | (e.g., TextLLMParams | STTLLMParams | ...), and
optionally add "from __future__ import annotations" at the top if you want to
ensure forward-compatible evaluation of annotations; update the KaapiLLMParams
definition accordingly so it references TextLLMParams, STTLLMParams,
TTSLLMParams, ImageLLMParams, PDFLLMParams, and MultimodalLLMParams using the |
operator.
- Around line 59-116: The three classes ImageLLMParams, PDFLLMParams, and
MultimodalLLMParams duplicate the same fields; extract a shared base SQLModel
(e.g., BaseVisionLLMParams) containing model, instructions, knowledge_base_ids,
temperature, reasoning, and max_num_results and have ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams subclass that base (or alias it) so the
duplicated field definitions are removed and maintained in one place.
- Line 2: Replace the deprecated typing.List import with the built-in list type:
update the import line in request.py to remove List from typing and update any
type annotations that use List[...] to use built-in list[...] (references: the
import line containing Annotated, Any, List, Literal, Union and any annotations
in this module that use List, e.g., List[str] -> list[str]); ensure no other
typing.List usages remain and run type checks to confirm correctness.
In `@backend/app/services/llm/providers/gai.py`:
- Around line 565-572: The logger.warning call in GoogleAIProvider._execute_text
uses an unnecessary f-string prefix for a static message; remove the f prefix so
the call becomes logger.warning("[GoogleAIProvider._execute_text] Response
missing usage_metadata, using zeros") to avoid misleading formatting usage while
leaving the surrounding logic (setting input_tokens, output_tokens,
total_tokens, reasoning_tokens to 0) unchanged.
- Around line 420-427: The logger.warning call in
GoogleAIProvider._execute_image uses an unnecessary f-string for a static
message; replace the f-string literal passed to logger.warning with a plain
string literal (i.e., remove the leading "f") so the call becomes
logger.warning("[GoogleAIProvider._execute_image] Response missing
usage_metadata, using zeros"), leaving the surrounding logic that sets
input_tokens, output_tokens, total_tokens, and reasoning_tokens unchanged.
- Around line 636-643: In GoogleAIProvider._execute_multimodal replace the
logger.warning call that uses an unnecessary f-string (currently
f"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using
zeros") with a plain string literal (remove the leading f) so the message is
logged without the redundant f-string prefix; keep the rest of the block
(setting input_tokens, output_tokens, total_tokens, reasoning_tokens to 0)
unchanged.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/app/models/llm/request.pybackend/app/services/llm/jobs.pybackend/app/services/llm/providers/base.pybackend/app/services/llm/providers/gai.pybackend/app/services/llm/providers/oai.pybackend/app/utils.py
| elif isinstance(part, ImageContent): | ||
| if part.format == "base64": | ||
| items.append( | ||
| { | ||
| "inline_data": { | ||
| "data": part.value, | ||
| "mime_type": part.mime_type, | ||
| } | ||
| } | ||
| ) | ||
| else: | ||
| items.append( | ||
| { | ||
| "file_data": { | ||
| "file_uri": part.value, | ||
| "mime_type": part.mime_type, | ||
| "display_name": None, | ||
| } | ||
| } | ||
| ) | ||
| elif isinstance(part, PDFContent): | ||
| if part.format == "base64": | ||
| items.append( | ||
| { | ||
| "inline_data": { | ||
| "data": part.value, | ||
| "mime_type": part.mime_type, | ||
| } | ||
| } | ||
| ) | ||
| else: | ||
| items.append( | ||
| { | ||
| "file_data": { | ||
| "file_uri": part.value, | ||
| "mime_type": part.mime_type, | ||
| "display_name": None, | ||
| } | ||
| } | ||
| ) |
There was a problem hiding this comment.
Handle potential None mime_type in format_parts.
Similar to the OpenAI provider, when building inline_data or file_data payloads, part.mime_type can be None. The Gemini API may reject requests with null mime_type values.
Suggested fix
elif isinstance(part, ImageContent):
if part.format == "base64":
+ mime = part.mime_type or "image/png"
items.append(
{
"inline_data": {
"data": part.value,
- "mime_type": part.mime_type,
+ "mime_type": mime,
}
}
)
else:
+ mime = part.mime_type or "image/png"
items.append(
{
"file_data": {
"file_uri": part.value,
- "mime_type": part.mime_type,
+ "mime_type": mime,
"display_name": None,
}
}
)
elif isinstance(part, PDFContent):
if part.format == "base64":
+ mime = part.mime_type or "application/pdf"
items.append(
{
"inline_data": {
"data": part.value,
- "mime_type": part.mime_type,
+ "mime_type": mime,
}
}
)
else:
+ mime = part.mime_type or "application/pdf"
items.append(
{
"file_data": {
"file_uri": part.value,
- "mime_type": part.mime_type,
+ "mime_type": mime,
"display_name": None,
}
}
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/providers/gai.py` around lines 58 - 97, When
building payloads in the ImageContent and PDFContent branches, part.mime_type
can be None which may cause Gemini API rejects; update both branches to coalesce
part.mime_type into a safe default (e.g., mime = part.mime_type or
"application/octet-stream" or use a more specific default like "image/png" for
images or "application/pdf" for PDFs) and then use mime for the "mime_type"
field in the inline_data and file_data dicts (affecting the items list entries,
the "inline_data"/"file_data" payloads, and the ImageContent/PDFContent handling
in format_parts).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/tests/services/llm/test_multimodal.py (1)
28-50: Consider using factory pattern or fixtures for repeated test data.Several tests create similar content objects (e.g.,
ImageContent(format="base64", value="abc", mime_type="image/png")). Using pytest fixtures or a factory function would reduce repetition and make tests easier to maintain.As per coding guidelines: "Use factory pattern for test fixtures in
backend/app/tests/"💡 Example factory approach
# Add at top of file or in conftest.py import pytest `@pytest.fixture` def text_content(): return TextContent(value="hello") `@pytest.fixture` def image_content_base64(): return ImageContent(format="base64", value="abc", mime_type="image/png") `@pytest.fixture` def pdf_content_base64(): return PDFContent(format="base64", value="abc", mime_type="application/pdf") # Or use factory functions def make_image_content(format="base64", value="abc", mime_type="image/png"): return ImageContent(format=format, value=value, mime_type=mime_type)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/services/llm/test_multimodal.py` around lines 28 - 50, Several tests repeatedly construct similar objects (e.g., ImageContent(format="base64", value="abc", mime_type="image/png"), PDFContent(...), TextContent(...)) — refactor by adding pytest fixtures or small factory functions (e.g., image_content_base64, pdf_content_base64, text_content or make_image_content) and replace inline constructions in tests like TestValidateCompletionInput tests and the MultiModalInput setup; ensure tests still call validate_completion_input with the fixture/factory-produced objects (referencing ImageContent, PDFContent, TextContent, MultiModalInput, validate_completion_input).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 115-117: Replace the broad exception check in the test with the
specific Pydantic validation error: change the pytest.raises(Exception) in
test_empty_parts_raises to pytest.raises(pydantic.ValidationError) (or import
ValidationError from pydantic and use that) so the test asserts that
MultiModalInput(parts=[]) triggers the model_validator validation failure;
ensure the test imports pydantic.ValidationError if not already present.
---
Nitpick comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 28-50: Several tests repeatedly construct similar objects (e.g.,
ImageContent(format="base64", value="abc", mime_type="image/png"),
PDFContent(...), TextContent(...)) — refactor by adding pytest fixtures or small
factory functions (e.g., image_content_base64, pdf_content_base64, text_content
or make_image_content) and replace inline constructions in tests like
TestValidateCompletionInput tests and the MultiModalInput setup; ensure tests
still call validate_completion_input with the fixture/factory-produced objects
(referencing ImageContent, PDFContent, TextContent, MultiModalInput,
validate_completion_input).
| def test_empty_parts_raises(self): | ||
| with pytest.raises(Exception): | ||
| MultiModalInput(parts=[]) |
There was a problem hiding this comment.
Use specific exception type instead of broad Exception.
MultiModalInput uses a Pydantic model_validator, so it raises pydantic.ValidationError when validation fails. Using Exception could mask unrelated errors.
🛠️ Proposed fix
+from pydantic import ValidationError
+
# ...
def test_empty_parts_raises(self):
- with pytest.raises(Exception):
+ with pytest.raises(ValidationError):
MultiModalInput(parts=[])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_empty_parts_raises(self): | |
| with pytest.raises(Exception): | |
| MultiModalInput(parts=[]) | |
| def test_empty_parts_raises(self): | |
| with pytest.raises(ValidationError): | |
| MultiModalInput(parts=[]) |
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 116-116: Do not assert blind exception: Exception
(B017)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/tests/services/llm/test_multimodal.py` around lines 115 - 117,
Replace the broad exception check in the test with the specific Pydantic
validation error: change the pytest.raises(Exception) in test_empty_parts_raises
to pytest.raises(pydantic.ValidationError) (or import ValidationError from
pydantic and use that) so the test asserts that MultiModalInput(parts=[])
triggers the model_validator validation failure; ensure the test imports
pydantic.ValidationError if not already present.
| ) | ||
|
|
||
|
|
||
| KaapiLLMParams = Union[ |
There was a problem hiding this comment.
let this be at the top of the file
There was a problem hiding this comment.
It can’t be moved to the top because KaapiLLMParams depends on types that are defined above. Moving it up would cause unresolved reference issues.
| None, | ||
| f"Unsupported completion type '{completion_type}' for Google AI provider", | ||
|
|
||
| elif completion_type == "text": |
There was a problem hiding this comment.
I think completion_type should be text only - in text itself, we should be able to take any combination of
[text, list(image), list(pdf)] as input
-- separate completion types should not be there for all
…and completion type
There was a problem hiding this comment.
♻️ Duplicate comments (1)
backend/app/tests/services/llm/test_multimodal.py (1)
37-39:⚠️ Potential issue | 🟡 MinorUse specific
ValidationErrorinstead of broadException.
MultiModalInputuses a Pydanticmodel_validator, so validation failures raisepydantic.ValidationError. UsingExceptioncould mask unrelated errors and is flagged by static analysis (B017).Suggested fix
+from pydantic import ValidationError + # ... at line 37-39 ... def test_empty_parts_raises(self): - with pytest.raises(Exception): + with pytest.raises(ValidationError): MultiModalInput(parts=[])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/services/llm/test_multimodal.py` around lines 37 - 39, Update the test_empty_parts_raises unit test to assert the specific pydantic.ValidationError instead of a broad Exception: change the pytest.raises target from Exception to pydantic.ValidationError when instantiating MultiModalInput(parts=[]). Locate the test method test_empty_parts_raises and import or reference pydantic.ValidationError so the test fails only on validation errors raised by the MultiModalInput model_validator.
🧹 Nitpick comments (5)
backend/app/models/llm/request.py (4)
2-2: Use built-inlistinstead of deprecatedtyping.List.
typing.Listis deprecated in Python 3.9+. Since the codebase uses Python 3.11+, use the built-inlisttype directly.Suggested fix
-from typing import Annotated, Any, List, Literal, Union +from typing import Annotated, Any, Literal, UnionNote:
Listappears unused in this file since all list annotations already use the built-inlistsyntax.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` at line 2, The import currently brings in typing.List which is deprecated and unused; update the import in backend.app.models.llm.request by removing List from the from typing import ... list so it only imports Annotated, Any, Literal, Union (or use built-in list annotations where present), and run a quick search for any remaining references to typing.List in this module to ensure no other changes are needed.
124-128: UseX | Yunion syntax forQueryInput.For consistency with Python 3.11+ conventions and other union types in this file.
Suggested fix
# Discriminated union for query input types QueryInput = Annotated[ - Union[TextInput, AudioInput, ImageInput, PDFInput], + TextInput | AudioInput | ImageInput | PDFInput, Field(discriminator="type"), ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 124 - 128, Update the QueryInput typing to use Python 3.11+ pipe union syntax instead of Union[...]: replace the Annotated[Union[TextInput, AudioInput, ImageInput, PDFInput], Field(discriminator="type")] expression for QueryInput with Annotated[TextInput | AudioInput | ImageInput | PDFInput, Field(discriminator="type")] so the discriminated union uses the X | Y style while keeping the Annotated and Field(discriminator="type") wrapper intact; ensure the symbol QueryInput is updated accordingly and imports remain valid.
59-63: UseX | Yunion syntax for type annotations.Per Python 3.10+ and the codebase's Python 3.11+ requirement, prefer
X | YoverUnion[X, Y]for consistency with the rest of the file (e.g.,str | Nonepatterns used elsewhere).Suggested fix
-KaapiLLMParams = Union[ - TextLLMParams, - STTLLMParams, - TTSLLMParams, -] +KaapiLLMParams = TextLLMParams | STTLLMParams | TTSLLMParams🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 59 - 63, Replace the Union[...] declaration for KaapiLLMParams with the modern pipe union syntax: change the type alias KaapiLLMParams = Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use KaapiLLMParams = TextLLMParams | STTLLMParams | TTSLLMParams so it matches the file's Python 3.11 style (consistent with other annotations like str | None) and update any imports/comments if necessary.
87-91: Incomplete comments onmime_typefields.Lines 87 and 97 have truncated comments (
# keeping the mime_type) that don't convey meaningful information. Either complete the thought or remove the comments.Suggested fix
- # keeping the mime_type mime_type: str | None = Field( None, description="MIME type of the image (e.g., image/png, image/jpeg)", )- # keeping the mime_type mime_type: str | None = Field( None, description="MIME type of the PDF (e.g., application/pdf)", )Also applies to: 97-101
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 87 - 91, The inline comment "# keeping the mime_type" is incomplete for the mime_type Field(s) in the LLM request model; update or remove those comments so they are meaningful: replace the truncated comment with a short, clear description of why mime_type exists (e.g., "MIME type of the image to preserve content type for downstream processing/serialization") or simply delete the comment if redundant, and ensure you update both occurrences associated with the mime_type Field definitions so the intent is clear when inspecting the Field declarations.backend/app/services/llm/providers/base.py (1)
72-76: Update docstring to reflect multimodal input support.The docstring still describes
resolved_inputas "text string or file path for audio" but the signature now acceptsstr | list[ContentPart], which includes multimodal content parts.Suggested fix
Args: completion_config: LLM completion configuration, pass params as-is to provider API query: Query parameters including input and conversation_id - resolved_input: The resolved input content (text string or file path for audio) + resolved_input: The resolved input content (text string for text/audio, or list of content parts for multimodal) include_provider_raw_response: Whether to include the raw LLM provider response in the output🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/base.py` around lines 72 - 76, Update the docstring for the method in backend/app/services/llm/providers/base.py to reflect that resolved_input can be either a plain string or a list of multimodal ContentPart objects (signature: resolved_input: str | list[ContentPart]), and describe what a ContentPart represents (e.g., text, image, audio chunk, metadata or file path) and how the provider should handle each part; keep the other params (completion_config, query, include_provider_raw_response) descriptions intact but clarify that when resolved_input is a list, the implementation should iterate content parts and pass them to the provider API as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 37-39: Update the test_empty_parts_raises unit test to assert the
specific pydantic.ValidationError instead of a broad Exception: change the
pytest.raises target from Exception to pydantic.ValidationError when
instantiating MultiModalInput(parts=[]). Locate the test method
test_empty_parts_raises and import or reference pydantic.ValidationError so the
test fails only on validation errors raised by the MultiModalInput
model_validator.
---
Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Line 2: The import currently brings in typing.List which is deprecated and
unused; update the import in backend.app.models.llm.request by removing List
from the from typing import ... list so it only imports Annotated, Any, Literal,
Union (or use built-in list annotations where present), and run a quick search
for any remaining references to typing.List in this module to ensure no other
changes are needed.
- Around line 124-128: Update the QueryInput typing to use Python 3.11+ pipe
union syntax instead of Union[...]: replace the Annotated[Union[TextInput,
AudioInput, ImageInput, PDFInput], Field(discriminator="type")] expression for
QueryInput with Annotated[TextInput | AudioInput | ImageInput | PDFInput,
Field(discriminator="type")] so the discriminated union uses the X | Y style
while keeping the Annotated and Field(discriminator="type") wrapper intact;
ensure the symbol QueryInput is updated accordingly and imports remain valid.
- Around line 59-63: Replace the Union[...] declaration for KaapiLLMParams with
the modern pipe union syntax: change the type alias KaapiLLMParams =
Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use KaapiLLMParams =
TextLLMParams | STTLLMParams | TTSLLMParams so it matches the file's Python 3.11
style (consistent with other annotations like str | None) and update any
imports/comments if necessary.
- Around line 87-91: The inline comment "# keeping the mime_type" is incomplete
for the mime_type Field(s) in the LLM request model; update or remove those
comments so they are meaningful: replace the truncated comment with a short,
clear description of why mime_type exists (e.g., "MIME type of the image to
preserve content type for downstream processing/serialization") or simply delete
the comment if redundant, and ensure you update both occurrences associated with
the mime_type Field definitions so the intent is clear when inspecting the Field
declarations.
In `@backend/app/services/llm/providers/base.py`:
- Around line 72-76: Update the docstring for the method in
backend/app/services/llm/providers/base.py to reflect that resolved_input can be
either a plain string or a list of multimodal ContentPart objects (signature:
resolved_input: str | list[ContentPart]), and describe what a ContentPart
represents (e.g., text, image, audio chunk, metadata or file path) and how the
provider should handle each part; keep the other params (completion_config,
query, include_provider_raw_response) descriptions intact but clarify that when
resolved_input is a list, the implementation should iterate content parts and
pass them to the provider API as appropriate.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/models/llm/request.pybackend/app/services/llm/jobs.pybackend/app/services/llm/providers/base.pybackend/app/tests/services/llm/test_multimodal.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/services/llm/jobs.py
| ) | ||
| return llm_response, None | ||
|
|
||
| def _execute_text( |
There was a problem hiding this comment.
lets check if _execute_text, _execute_pdf, _execute_image can be combined
kartpop
left a comment
There was a problem hiding this comment.
approved with comments - please see if simplifications can be made
…CompletionConfig and related methods in GoogleAIProvider
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/llm.py (1)
21-43:⚠️ Potential issue | 🟡 Minor
serialize_inputdoes not handle new input types.The
serialize_inputfunction handlesTextInputandAudioInputexplicitly but falls back tostr(query_input)for the newImageInput,PDFInput, andlisttypes. This would serialize them as their string representation, losing structured metadata that should be stored (likemime_type,size_bytes, etc.).Consider extending
serialize_inputto handle these new types consistently withAudioInput:🔧 Proposed fix
elif isinstance(query_input, AudioInput): return json.dumps( { "type": "audio", "format": query_input.content.format, "mime_type": query_input.content.mime_type, # approximate byte size from b64encoded value "size_bytes": len(query_input.content.value) * 3 // 4, } ) + elif isinstance(query_input, ImageInput): + contents = query_input.content if isinstance(query_input.content, list) else [query_input.content] + return json.dumps( + { + "type": "image", + "items": [ + { + "format": c.format, + "mime_type": c.mime_type, + "size_bytes": len(c.value) * 3 // 4 if c.format == "base64" else None, + } + for c in contents + ], + } + ) + elif isinstance(query_input, PDFInput): + contents = query_input.content if isinstance(query_input.content, list) else [query_input.content] + return json.dumps( + { + "type": "pdf", + "items": [ + { + "format": c.format, + "mime_type": c.mime_type, + "size_bytes": len(c.value) * 3 // 4 if c.format == "base64" else None, + } + for c in contents + ], + } + ) + elif isinstance(query_input, list): + return json.dumps({"type": "multimodal", "count": len(query_input)}) else: return str(query_input)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/llm.py` around lines 21 - 43, serialize_input currently only handles TextInput and AudioInput and falls back to str(...) for ImageInput, PDFInput and list inputs; update serialize_input to explicitly handle ImageInput and PDFInput by returning a JSON object like the AudioInput branch containing type ("image" or "pdf"), format, mime_type and an approximate size_bytes (compute from base64 length same as audio), and handle list inputs (e.g., List[QueryInput] or similar) by serializing each element via serialize_input and returning a JSON array string; update branches to reference the concrete types ImageInput and PDFInput and ensure you reuse the same size_bytes calculation used for AudioInput and call serialize_input recursively for list elements to preserve structured metadata.
🧹 Nitpick comments (4)
backend/app/services/llm/providers/gai.py (2)
427-429: Remove unnecessary f-string prefix.The log message has no placeholders, so the
fprefix is unnecessary.🧹 Proposed fix
else: logger.warning( - f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 427 - 429, In GoogleAIProvider._execute_text replace the unnecessary f-string on the logger.warning call by removing the leading "f" so the static message is a plain string; update the logger.warning invocation that currently logs "[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros" to use a normal string literal without interpolation.
457-463: Thequeryparameter is unused.The
query: QueryParamsparameter was added to theexecutemethod signature but is not used in the method body. If it's needed for API consistency with other providers, consider adding an underscore prefix (_query) to indicate it's intentionally unused, or remove it if not needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 457 - 463, The execute method signature for execute(self, completion_config: NativeCompletionConfig, query: QueryParams, ...) declares query but never uses it; rename the parameter to _query (i.e., execute(self, completion_config: NativeCompletionConfig, _query: QueryParams, ...)) to signal it's intentionally unused (or remove it if your provider interface doesn't require it), and update any overriding implementations or callers to match the new name so type hints (QueryParams) remain intact and no unused-variable linter warnings occur; keep the rest of the signature and behavior unchanged.backend/app/models/llm/request.py (2)
2-2: Remove unusedListimport.
Listfromtypingis imported but not used in the file. The code correctly uses the built-inlist[...]syntax throughout.🧹 Proposed fix
-from typing import Annotated, Any, List, Literal, Union +from typing import Annotated, Any, Literal, Union🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` at line 2, Remove the unused typing import "List" from the import statement that currently reads "from typing import Annotated, Any, List, Literal, Union" in request.py; keep the other imports (Annotated, Any, Literal, Union) intact so the file uses the built-in list[...] syntax without the unused List symbol.
59-63: Consider using|union syntax for modern Python.The
Union[]syntax can be replaced with the more concise|operator in Python 3.10+.🧹 Proposed fix
-KaapiLLMParams = Union[ - TextLLMParams, - STTLLMParams, - TTSLLMParams, -] +KaapiLLMParams = TextLLMParams | STTLLMParams | TTSLLMParams🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 59 - 63, Replace the legacy typing.Union usage with Python 3.10+ union operator for KaapiLLMParams: change the definition using Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use the | operator (TextLLMParams | STTLLMParams | TTSLLMParams) and remove the now-unneeded Union import if present; ensure the identifier KaapiLLMParams and the referenced types TextLLMParams, STTLLMParams, TTSLLMParams are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 415-440: The _execute_text path must validate the GoogleAI
response like _execute_stt does: check that response.response_id is present and
response.text is not None before constructing TextContent/LLMResponse. Add a
guard in _execute_text that logs an error (or raises/returns a failed
LLMCallResponse) when response.response_id or response.text is missing,
mirroring the checks in _execute_stt, and only build
LLMCallResponse/LLMResponse/TextContent when those fields are valid to avoid
passing None into TextContent(value=...).
- Around line 403-405: In _execute_text, remove the contents.append call that
inserts {"role": "system", "parts": [{"text": instructions}]} and instead set
generation_kwargs["system_instruction"] = instructions so the system instruction
is passed via GenerateContentConfig; update the code paths that build
generation_kwargs and keep the rest of contents for user/assistant messages
unchanged to match the Gemini API behavior implemented in _execute_tts.
---
Outside diff comments:
In `@backend/app/crud/llm.py`:
- Around line 21-43: serialize_input currently only handles TextInput and
AudioInput and falls back to str(...) for ImageInput, PDFInput and list inputs;
update serialize_input to explicitly handle ImageInput and PDFInput by returning
a JSON object like the AudioInput branch containing type ("image" or "pdf"),
format, mime_type and an approximate size_bytes (compute from base64 length same
as audio), and handle list inputs (e.g., List[QueryInput] or similar) by
serializing each element via serialize_input and returning a JSON array string;
update branches to reference the concrete types ImageInput and PDFInput and
ensure you reuse the same size_bytes calculation used for AudioInput and call
serialize_input recursively for list elements to preserve structured metadata.
---
Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Line 2: Remove the unused typing import "List" from the import statement that
currently reads "from typing import Annotated, Any, List, Literal, Union" in
request.py; keep the other imports (Annotated, Any, Literal, Union) intact so
the file uses the built-in list[...] syntax without the unused List symbol.
- Around line 59-63: Replace the legacy typing.Union usage with Python 3.10+
union operator for KaapiLLMParams: change the definition using
Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use the | operator
(TextLLMParams | STTLLMParams | TTSLLMParams) and remove the now-unneeded Union
import if present; ensure the identifier KaapiLLMParams and the referenced types
TextLLMParams, STTLLMParams, TTSLLMParams are updated accordingly.
In `@backend/app/services/llm/providers/gai.py`:
- Around line 427-429: In GoogleAIProvider._execute_text replace the unnecessary
f-string on the logger.warning call by removing the leading "f" so the static
message is a plain string; update the logger.warning invocation that currently
logs "[GoogleAIProvider._execute_text] Response missing usage_metadata, using
zeros" to use a normal string literal without interpolation.
- Around line 457-463: The execute method signature for execute(self,
completion_config: NativeCompletionConfig, query: QueryParams, ...) declares
query but never uses it; rename the parameter to _query (i.e., execute(self,
completion_config: NativeCompletionConfig, _query: QueryParams, ...)) to signal
it's intentionally unused (or remove it if your provider interface doesn't
require it), and update any overriding implementations or callers to match the
new name so type hints (QueryParams) remain intact and no unused-variable linter
warnings occur; keep the rest of the signature and behavior unchanged.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/crud/llm.pybackend/app/models/llm/request.pybackend/app/services/llm/providers/gai.py
| generation_kwargs = {} | ||
| if instructions: | ||
| contents.append({"role": "system", "parts": [{"text": instructions}]}) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Google Gemini API system_instruction parameter role system content
💡 Result:
In the Gemini API, you generally don’t put a {role: "system"} message inside contents (OpenAI-style). Instead, you pass your “system prompt” via the separate system_instruction / systemInstruction field (part of GenerateContentConfig in the SDKs, or top-level in REST). (ai.google.dev)
REST shape
{
"system_instruction": {
"parts": [{ "text": "You are a cat. Your name is Neko." }]
},
"contents": [
{ "parts": [{ "text": "Hello there" }] }
]
}SDK notes (roles)
- In the Gen AI SDKs, you typically set
config.systemInstruction(JS) /config.system_instruction(Python) to a string. (ai.google.dev) - If you construct a Content object for the system instruction (e.g., Go), the docs’ sample uses role =
userfor that content (not"system"). (ai.google.dev) - Many integrations that accept OpenAI-style
{role:"system"}messages simply extract that message and map it to Gemini’ssystem_instruction. (reference-server.pipecat.ai)
Citations:
- 1: https://ai.google.dev/gemini-api/docs/system-instructions
- 2: https://ai.google.dev/gemini-api/docs/system-instructions
- 3: https://ai.google.dev/gemini-api/docs/system-instructions
- 4: https://ai.google.dev/gemini-api/docs/system-instructions
- 5: https://reference-server.pipecat.ai/en/stable/_modules/pipecat/adapters/services/gemini_adapter.html?utm_source=openai
🏁 Script executed:
cd backend/app/services/llm/providers && sed -n '400,410p' gai.py && echo "---" && sed -n '150,160p' gai.py && echo "---" && sed -n '270,280p' gai.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1180
🏁 Script executed:
cd backend/app/services/llm/providers && sed -n '380,430p' gai.py | head -60Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 2285
🏁 Script executed:
cd backend/app/services/llm/providers && sed -n '380,440p' gai.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 2696
🏁 Script executed:
cd backend/app/services/llm/providers && grep -n "system_instruction" gai.py | head -20Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 143
🏁 Script executed:
cd backend/app/services/llm/providers && sed -n '380,395p' gai.py && echo "---" && grep -n "logger\." gai.py | grep "_execute_text"Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 836
Move system instruction to GenerateContentConfig parameter instead of contents array.
The Gemini API does not support role: "system" in the contents array. According to the Gemini API documentation, system instructions must be passed via the system_instruction parameter in GenerateContentConfig, not as a content message. This is already correctly implemented in _execute_tts (line 275) but not in _execute_text (line 405).
Update line 405 to pass instructions through config instead:
generation_kwargs["system_instruction"] = instructionsRemove the incorrect contents.append() line that adds role: "system".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/providers/gai.py` around lines 403 - 405, In
_execute_text, remove the contents.append call that inserts {"role": "system",
"parts": [{"text": instructions}]} and instead set
generation_kwargs["system_instruction"] = instructions so the system instruction
is passed via GenerateContentConfig; update the code paths that build
generation_kwargs and keep the rest of contents for user/assistant messages
unchanged to match the Gemini API behavior implemented in _execute_tts.
There was a problem hiding this comment.
as per the gemini docs they do support the "role": "system"
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Gemini API contents array role system support 2026
💡 Result:
As of March 1, 2026, the Gemini API contents array does not support a "system" role. In REST, each contents[] item is a conversation turn and the documented roles are "user" and "model" (they should typically alternate for multi‑turn chat). [1]
To provide “system” guidance, you set it outside the contents array using system_instruction (REST) / systemInstruction (SDK config). [2] In Vertex AI’s Gemini interface, the role inside systemInstruction is ignored. [3]
REST example (system instruction + user message):
{
"system_instruction": {
"parts": [{ "text": "You are a helpful assistant." }]
},
"contents": [
{ "role": "user", "parts": [{ "text": "Hello!" }] }
]
}[2]
Sources: [1] [2] [3]
@vprashrex, I've verified the latest Gemini API documentation, and as of March 2026, the Gemini API contents array does not support a "system" role—the documented roles are "user" and "model".
System guidance must be set outside the contents array using the system_instruction parameter in the REST API or systemInstruction in SDK config.
Correct approach (matching _execute_tts at line 275):
generation_kwargs = {}
if instructions:
generation_kwargs["system_instruction"] = instructions
if temperature is not None:
generation_kwargs["temperature"] = temperature
if thinking_level is not None:
generation_kwargs["thinking_config"] = ThinkingConfig(
include_thoughts=False, thinking_level=thinking_level
)Then remove line 405 that incorrectly appends {"role": "system"} to the contents array.
CodeRabbit's original finding is correct—the current implementation at line 405 uses an unsupported pattern that could cause API errors.
🧠 Learnings used
Learnt from: vprashrex
Repo: ProjectTech4DevAI/kaapi-backend PR: 628
File: backend/app/services/llm/providers/oai.py:49-61
Timestamp: 2026-02-26T03:16:20.731Z
Learning: Enforce that mime_type is always defaulted in resolve_image_content and resolve_pdf_content (in backend/app/utils.py) before content reaches provider implementations. Provider modules under backend/app/services/llm/providers/ (e.g., oai.py) can assume mime_type is non-None and should not rely on runtime None checks for mime_type. This reduces null-related errors and defines a clear contract between utils and providers. Consider adding explicit type hints or runtime guards if downstream changes could reintroduce None mime_type.
| response = self.client.models.generate_content( | ||
| model=model, | ||
| contents=contents, | ||
| config=GenerateContentConfig(**generation_kwargs), | ||
| ) | ||
|
|
||
| if response.usage_metadata: | ||
| input_tokens = response.usage_metadata.prompt_token_count or 0 | ||
| output_tokens = response.usage_metadata.candidates_token_count or 0 | ||
| total_tokens = response.usage_metadata.total_token_count or 0 | ||
| reasoning_tokens = response.usage_metadata.thoughts_token_count or 0 | ||
| else: | ||
| logger.warning( | ||
| f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros" | ||
| ) | ||
| input_tokens = 0 | ||
| output_tokens = 0 | ||
| total_tokens = 0 | ||
| reasoning_tokens = 0 | ||
|
|
||
| llm_response = LLMCallResponse( | ||
| response=LLMResponse( | ||
| provider_response_id=response.response_id, | ||
| model=response.model_version or model, | ||
| provider=completion_config.provider, | ||
| output=TextOutput(content=TextContent(value=response.text)), |
There was a problem hiding this comment.
Add response validation for consistency with other execute methods.
Unlike _execute_stt (lines 170-174), this method does not validate that response.response_id and response.text are present before using them. If response.text is None, creating TextContent(value=response.text) may fail or cause unexpected behavior.
🛡️ Proposed fix
response = self.client.models.generate_content(
model=model,
contents=contents,
config=GenerateContentConfig(**generation_kwargs),
)
+ if not response.response_id:
+ return None, "Google AI response missing response_id"
+
+ if not response.text:
+ return None, "Google AI response missing text content"
+
if response.usage_metadata:🧰 Tools
🪛 Ruff (0.15.2)
[error] 428-428: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/llm/providers/gai.py` around lines 415 - 440, The
_execute_text path must validate the GoogleAI response like _execute_stt does:
check that response.response_id is present and response.text is not None before
constructing TextContent/LLMResponse. Add a guard in _execute_text that logs an
error (or raises/returns a failed LLMCallResponse) when response.response_id or
response.text is missing, mirroring the checks in _execute_stt, and only build
LLMCallResponse/LLMResponse/TextContent when those fields are valid to avoid
passing None into TextContent(value=...).
…ng parameter handling in Google params mapping tests
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…nAI/Google AI execution routing
…ig and update assertions to validate system_instruction in Google AI execution routing
…larify configuration parameters
Summary
Target issue is #636
Explain the motivation for making this change. What existing problem does the pull request solve?
Add multimodal support to the
llm/callendpoint by extending OpenAI and Google providers to handle images and PDFs (including multiple files). Theinputfield should accept a single dict or a list of dicts to enable mixed-content requests (text + image/PDF) in a single call.Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Summary by CodeRabbit
New Features
Improvements