Skip to content

Multimodal: Providers Extended to handle image/pdf/multimodal flows#628

Merged
vprashrex merged 17 commits intomainfrom
feature/multimodal-integration
Mar 2, 2026
Merged

Multimodal: Providers Extended to handle image/pdf/multimodal flows#628
vprashrex merged 17 commits intomainfrom
feature/multimodal-integration

Conversation

@vprashrex
Copy link
Collaborator

@vprashrex vprashrex commented Feb 26, 2026

Summary

Target issue is #636

Explain the motivation for making this change. What existing problem does the pull request solve?

Add multimodal support to the llm/call endpoint by extending OpenAI and Google providers to handle images and PDFs (including multiple files). The input field should accept a single dict or a list of dicts to enable mixed-content requests (text + image/PDF) in a single call.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

  • New Features

    • Multimodal support: send images, PDFs, and mixed text+media to LLMs.
    • Email utilities: templates and send functions for test/new-account/reset messages.
    • API response wrapper for consistent success/failure payloads.
    • Password reset: token generation and verification.
  • Improvements

    • Providers extended to handle image/pdf/multimodal flows and better parameter propagation.
    • Stronger input resolution, validation, and temporary file cleanup.
    • Expanded multimodal test coverage.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added multimodal support across the LLM stack: new Image/PDF content and input models, multimodal input model and typing, input resolution and cleanup utilities, provider implementations updated (OpenAI/Google) to handle images/PDFs/multimodal, job/registry plumbing adjusted, and comprehensive multimodal tests.

Changes

Cohort / File(s) Summary
Models / Public API
backend/app/models/llm/__init__.py, backend/app/models/llm/request.py
Re-exported and defined multimodal types: ImageContent, PDFContent, ImageInput, PDFInput; extended QueryInput and LLM param/completion typing to include image, pdf, and multimodal.
Provider base & typing
backend/app/services/llm/providers/base.py
Added MultiModalInput, ContentPart alias (Text/Image/PDF), validator to require non-empty parts; broadened BaseProvider.execute to accept `str
OpenAI provider
backend/app/services/llm/providers/oai.py
Added format_parts helper and multimodal dispatch; execute accepts MultiModalInput or lists of image/pdf parts and wraps parts into API content blocks.
Google provider
backend/app/services/llm/providers/gai.py
Added format_parts, _execute_text multimodal path, and integrated multimodal handling into execute; response/usage handling updated for text/multimodal flows.
Jobs & registry
backend/app/services/llm/jobs.py, backend/app/services/llm/providers/registry.py
resolved_input_context accepts ImageInput/PDFInput and lists; registry lazy-loads credential retrieval import.
Utilities & resolution
backend/app/utils.py
Added content resolvers (resolve_image_content, resolve_pdf_content), resolve_input extended for multimodal lists, temp-file cleanup, and supporting helpers. Many utility additions included.
CRUD / request handling
backend/app/crud/llm.py
Create LLM call now recognizes ImageInput, PDFInput, and list-based multimodal inputs; input_type expanded to include pdf and multimodal.
Mappers
backend/app/services/llm/mappers.py
Now propagates reasoning into Google params instead of warning; retained knowledge_base_ids warning.
Tests
backend/app/tests/services/llm/test_multimodal.py
New comprehensive tests for multimodal model creation, resolution utilities, provider formatting, and end-to-end multimodal assembly.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Resolver as resolve_input (utils)
  participant Jobs as jobs.resolved_input_context
  participant Provider as Provider (OpenAI/Google)
  participant External as External LLM API

  Client->>Resolver: submit QueryParams (text|audio|image|pdf|list)
  Resolver-->>Client: (resolved_input, temp_path?)
  Client->>Jobs: call resolved_input_context(resolved_input)
  Jobs-->>Provider: execute(completion_config, query, resolved_input)
  Provider->>External: send API request (text|multimodal)
  External-->>Provider: API response (raw, usage)
  Provider-->>Jobs: LLMCallResponse, provider_raw_response
  Jobs-->>Client: final LLMCallResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested labels

enhancement, ready-for-review

Suggested reviewers

  • Prajna1999
  • kartpop
  • avirajsingh7

Poem

"I hopped through code with ears aflutter,
stitched images, PDFs and text like butter.
Parts combined in multimodal cheer,
now models see pictures far and near.
Hooray — inputs joined, the rabbit claps a mutter 🐇"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: extending LLM providers to handle multimodal inputs (image, PDF, and multimodal content flows) across the codebase.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/multimodal-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vprashrex vprashrex self-assigned this Feb 26, 2026
@vprashrex vprashrex requested a review from kartpop February 26, 2026 02:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/services/llm/providers/gai.py (1)

674-715: ⚠️ Potential issue | 🟠 Major

Missing return statement for unhandled completion types.

If completion_type doesn't match any of the handled cases (stt, tts, text, image, pdf, multimodal), the function falls through to the exception handlers without returning. This could lead to None being returned implicitly.

Suggested fix
             elif completion_type == "multimodal":
                 return self._execute_multimodal(
                     completion_config=completion_config,
                     resolved_input=resolved_input,
                     include_provider_raw_response=include_provider_raw_response,
                 )
+
+            else:
+                return None, f"Unsupported completion type: {completion_type}"

         except TypeError as e:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 674 - 715, The
dispatch on completion_type (variable completion_type) handles
"stt","tts","text","image","pdf","multimodal" but has no default branch, so when
an unknown type is passed the method can fall through and return None; update
the dispatch to handle unknown values by adding a final else that either raises
a clear exception (e.g., ValueError/UnsupportedCompletionTypeError with the
offending completion_type included) or returns a well-formed error response
object consistent with this service's API; ensure the change is applied in the
method containing the calls to _execute_stt, _execute_tts, _execute_text,
_execute_image, _execute_pdf, and _execute_multimodal so all paths return
deterministically.
🧹 Nitpick comments (7)
backend/app/services/llm/providers/base.py (1)

24-28: Add return type hint to validator.

For better type safety, add Self return type annotation to the validator method.

Suggested fix
+from typing import Self
+
     `@model_validator`(mode="after")
-    def validate_parts(self):
+    def validate_parts(self) -> Self:
         if not self.parts:
             raise ValueError("MultiModalInput requires at least one content part")
         return self
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/base.py` around lines 24 - 28, The
validator method validate_parts should declare a Self return type; change its
signature to return "Self" (e.g., def validate_parts(self) -> Self:) and ensure
Self is imported (from typing or typing_extensions depending on Python target).
Keep the `@model_validator`(mode="after") decorator and the existing body intact,
only adding the return type annotation so the method returns a typed instance of
the surrounding model class.
backend/app/services/llm/mappers.py (1)

88-92: Update docstring to include reasoning parameter.

The docstring's "Supported Mapping" section should include the newly supported reasoning parameter to keep documentation in sync with the implementation.

Suggested fix
     Supported Mapping:
         - model → model
         - instructions → instructions (for STT prompts, if available)
         - temperature -> temperature parameter (0-2)
+        - reasoning → reasoning (if provided)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/mappers.py` around lines 88 - 92, Update the
module/function docstring in mappers.py where the "Supported Mapping" list is
defined (the module-level docstring or the docstring for the mapping function)
to include the newly supported reasoning parameter by adding a line like
"reasoning → reasoning" (or "reasoning → reasoning parameter") to the list so
the documentation matches the implementation; ensure formatting matches the
existing list style and placement among other entries (model, instructions,
temperature).
backend/app/models/llm/request.py (2)

59-101: Consider extracting common fields to a base class.

ImageLLMParams, PDFLLMParams, and MultimodalLLMParams share identical fields. A base class could reduce duplication and ensure consistency.

class BaseVisualLLMParams(SQLModel):
    model: str
    instructions: str | None = None
    response_format: Literal["text"] | None = Field(None, description="Currently supports text type")
    temperature: float | None = Field(default=0.2, ge=0.0, le=2.0)
    reasoning: Literal["low", "medium", "high"] | None = None

class ImageLLMParams(BaseVisualLLMParams):
    pass

class PDFLLMParams(BaseVisualLLMParams):
    pass

class MultimodalLLMParams(BaseVisualLLMParams):
    pass

This allows future differentiation while reducing current duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 101, Extract the
duplicated fields into a shared base model and have ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams inherit from it: create a
BaseVisualLLMParams SQLModel containing model, instructions, response_format,
temperature, and reasoning, then redefine ImageLLMParams, PDFLLMParams, and
MultimodalLLMParams to subclass BaseVisualLLMParams (no additional fields) so
the shared schema lives in one place and future extensions can add specific
fields to the subclasses.

2-2: Use modern type annotation syntax.

List from typing is deprecated in Python 3.9+. Use the built-in list instead.

Suggested fix
-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal

Note: List doesn't appear to be used in this file. If it's needed elsewhere, use built-in list instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, The import statement currently
pulls List from typing which is deprecated; update the typing import in the
top-level import (the line importing Annotated, Any, List, Literal, Union) by
removing List and, where any list annotations exist in this module, replace
typing.List[...] with the built-in list[...] syntax (or if no list is used,
simply remove List from the import). Ensure you update references that use List
to use built-in list and clean up the unused List import to avoid linter
warnings.
backend/app/services/llm/providers/gai.py (1)

380-665: Consider extracting shared logic to reduce code duplication.

The methods _execute_image, _execute_pdf, _execute_text, and _execute_multimodal share significant duplicate code for:

  • Model validation
  • Generation kwargs setup (instructions, temperature, thinking_config)
  • Usage metadata extraction
  • LLMCallResponse construction

Consider extracting common logic into helper methods:

def _build_generation_config(self, params: dict) -> tuple[dict, list]:
    """Extract generation kwargs and contents from params."""
    # Shared logic for instructions, temperature, thinking_level
    ...

def _extract_usage(self, response, method_name: str) -> tuple[int, int, int, int]:
    """Extract usage metadata with fallback to zeros."""
    ...

def _build_response(self, response, model, provider, include_raw: bool) -> LLMCallResponse:
    """Build standardized LLMCallResponse."""
    ...

This would significantly reduce the ~300 lines of near-duplicate code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 380 - 665, The four
methods (_execute_image, _execute_pdf, _execute_text, _execute_multimodal)
duplicate model validation, building contents/generation_kwargs (instructions,
temperature, thinking_config), usage extraction, and LLMCallResponse
construction; extract that shared logic into helpers—e.g., add
_build_generation_config(params) returning (generation_kwargs, contents),
_extract_usage(response, method_name) returning (input_tokens, output_tokens,
total_tokens, reasoning_tokens), and _build_response(response, model, provider,
include_provider_raw_response) returning an LLMCallResponse—and call these from
each _execute_* method after validating model (keep the existing model check in
each method but delegate the rest), replacing repeated blocks with calls to
those helpers to remove duplication.
backend/app/services/llm/jobs.py (1)

107-110: Add return type hint to context manager.

As per coding guidelines, all functions should have type hints for return values. Context managers should annotate their return type.

Suggested fix
+from typing import Any, Generator
+
 `@contextmanager`
 def resolved_input_context(
     query_input: TextInput | AudioInput | ImageInput | PDFInput | list,
-):
+) -> Generator[Any, None, None]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 107 - 110, The context manager
function resolved_input_context lacks a return type hint; update its signature
to include a typed iterator return such as -> Iterator[Union[TextInput,
AudioInput, ImageInput, PDFInput, list]] (or -> Generator[Union[...], None,
None]) and add the necessary typing imports (Iterator/Generator and Union) so
the `@contextmanager-decorated` function is properly annotated; reference the
function name resolved_input_context and ensure the return union mirrors the
existing parameter union.
backend/app/services/llm/providers/registry.py (1)

74-74: Log message should use function name prefix.

As per coding guidelines, log messages should be prefixed with the function name in square brackets.

Suggested fix
-        logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)
+        logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/registry.py` at line 74, The logger.error
call that logs "Failed to initialize {provider_type} client: {e}" must be
prefixed with the current function name in square brackets; update the
logger.error invocation in the function containing that line to compute the
function name (e.g., via import inspect and using
inspect.currentframe().f_code.co_name) and include it like
f"[{inspect.currentframe().f_code.co_name}] Failed to initialize {provider_type}
client: {e}", keeping exc_info=True; modify only the logger.error call that
references provider_type and e.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 117-118: Remove the debug print statement and replace it with a
proper logger call: delete print(f"Resolved input: {resolved_input}, error:
{error}") and instead call the module logger (e.g., logger.debug or
logging.getLogger(__name__).debug) to emit the same message and variables
(resolved_input and error); if no logger exists in this module, instantiate one
with logging.getLogger(__name__) at top-level before using it.

In `@backend/app/services/llm/providers/base.py`:
- Around line 71-75: The ternary that builds hint uses an unnecessary f-string
for the else branch; update the hint assignment so the else branch uses a plain
string (remove the leading `f`) where it currently says f" Please ensure the
input type matches the completion type." — locate the hint variable construction
that checks isinstance(resolved_input, MultiModalInput) and remove the `f`
prefix from the else string.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 446-448: The success log in GoogleAIProvider._execute_image
incorrectly references "_execute_text"; update the logger.info call in the
GoogleAIProvider._execute_image method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_image] Successfully generated image response:
{response.response_id}") so logs accurately reflect the operation; locate the
logger.info invocation inside _execute_image and change the message string
accordingly while preserving the response.response_id interpolation.
- Around line 565-568: The warning log in GoogleAIProvider currently references
the wrong function name and uses an unnecessary f-string; update the
logger.warning call in _execute_text to reference
"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
(remove the f-prefix since there is no interpolation) so the message accurately
names the function and avoids the needless formatted string.
- Around line 491-494: The warning log in GoogleAIProvider incorrectly names the
function `_execute_stt` and uses an unnecessary f-string; update the
logger.warning call inside `_execute_pdf` to reference `_execute_pdf` instead of
`_execute_stt` and remove the f-prefix so the message is a plain string (e.g., "
[GoogleAIProvider._execute_pdf] Response missing usage_metadata, using zeros").
Ensure you modify the logger.warning invocation in the `_execute_pdf` method.
- Around line 421-423: Update the logger.warning call in
GoogleAIProvider._execute_image to reference the correct function name and
remove the unnecessary f-string; specifically change the message from
"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to "[GoogleAIProvider._execute_image] Response missing usage_metadata, using
zeros" and make it a plain string literal (no f-prefix) in the logger.warning
invocation.
- Around line 662-664: The success log in GoogleAIProvider._execute_multimodal
incorrectly references "_execute_text"; update the logger.info call inside the
_execute_multimodal method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_multimodal] Successfully generated text response:
{response.response_id}") so the message matches the method where
response.response_id is produced.
- Around line 517-519: The logger.info message in GoogleAIProvider._execute_pdf
incorrectly references "_execute_text"; update the log string in the logger.info
call within the GoogleAIProvider._execute_pdf method to reference "_execute_pdf"
(e.g., keep using response.response_id but change the label to indicate
_execute_pdf) so the success log correctly identifies the originating method.
- Around line 636-639: Update the logger call inside
GoogleAIProvider._execute_multimodal to reference the correct function name and
remove the unnecessary f-string: change the message from
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to a plain string like "[GoogleAIProvider._execute_multimodal] Response missing
usage_metadata, using zeros" so the log correctly identifies _execute_multimodal
and doesn't use an f-prefix.

In `@backend/app/services/llm/providers/oai.py`:
- Around line 49-61: The data URL builder in
backend/app/services/llm/providers/oai.py (inside the branches handling
ImageContent and PDFContent where part.format == "base64") uses part.mime_type
which may be None, producing invalid URLs like "data:None;base64,..."; update
the logic that constructs url so it uses a safe fallback MIME type (e.g.,
"application/octet-stream") when part.mime_type is falsy before interpolating,
keeping the rest of the behavior and the same keys added to items
({"type":"input_image","image_url":...} and
{"type":"input_file","file_url":...}) so other code consuming items remains
unchanged.

In `@backend/app/utils.py`:
- Around line 486-488: The function resolve_input is missing a parameter type
for query_input and uses a quoted forward reference in its return annotation;
update the signature to explicitly type query_input (e.g. Union[str,
list[ImageContent], list[PDFContent], MultiModalInput]) and remove the quotes
around MultiModalInput in the return type so the annotation reads tuple[str |
list[ImageContent] | list[PDFContent] | MultiModalInput, str | None]; keep the
existing import names (ImageContent, PDFContent, MultiModalInput) and adjust any
imports if needed.
- Line 47: ContentPart is too broad because it includes AudioContent while
multimodal inputs explicitly exclude audio; update the type alias in ContentPart
to only include the allowed types (TextContent, ImageContent, PDFContent) so it
matches MultiModalInput.parts and the logic in providers/base.py, removing
AudioContent from the union to tighten type safety.

---

Outside diff comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 674-715: The dispatch on completion_type (variable
completion_type) handles "stt","tts","text","image","pdf","multimodal" but has
no default branch, so when an unknown type is passed the method can fall through
and return None; update the dispatch to handle unknown values by adding a final
else that either raises a clear exception (e.g.,
ValueError/UnsupportedCompletionTypeError with the offending completion_type
included) or returns a well-formed error response object consistent with this
service's API; ensure the change is applied in the method containing the calls
to _execute_stt, _execute_tts, _execute_text, _execute_image, _execute_pdf, and
_execute_multimodal so all paths return deterministically.

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Around line 59-101: Extract the duplicated fields into a shared base model and
have ImageLLMParams, PDFLLMParams, and MultimodalLLMParams inherit from it:
create a BaseVisualLLMParams SQLModel containing model, instructions,
response_format, temperature, and reasoning, then redefine ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams to subclass BaseVisualLLMParams (no
additional fields) so the shared schema lives in one place and future extensions
can add specific fields to the subclasses.
- Line 2: The import statement currently pulls List from typing which is
deprecated; update the typing import in the top-level import (the line importing
Annotated, Any, List, Literal, Union) by removing List and, where any list
annotations exist in this module, replace typing.List[...] with the built-in
list[...] syntax (or if no list is used, simply remove List from the import).
Ensure you update references that use List to use built-in list and clean up the
unused List import to avoid linter warnings.

In `@backend/app/services/llm/jobs.py`:
- Around line 107-110: The context manager function resolved_input_context lacks
a return type hint; update its signature to include a typed iterator return such
as -> Iterator[Union[TextInput, AudioInput, ImageInput, PDFInput, list]] (or ->
Generator[Union[...], None, None]) and add the necessary typing imports
(Iterator/Generator and Union) so the `@contextmanager-decorated` function is
properly annotated; reference the function name resolved_input_context and
ensure the return union mirrors the existing parameter union.

In `@backend/app/services/llm/mappers.py`:
- Around line 88-92: Update the module/function docstring in mappers.py where
the "Supported Mapping" list is defined (the module-level docstring or the
docstring for the mapping function) to include the newly supported reasoning
parameter by adding a line like "reasoning → reasoning" (or "reasoning →
reasoning parameter") to the list so the documentation matches the
implementation; ensure formatting matches the existing list style and placement
among other entries (model, instructions, temperature).

In `@backend/app/services/llm/providers/base.py`:
- Around line 24-28: The validator method validate_parts should declare a Self
return type; change its signature to return "Self" (e.g., def
validate_parts(self) -> Self:) and ensure Self is imported (from typing or
typing_extensions depending on Python target). Keep the
`@model_validator`(mode="after") decorator and the existing body intact, only
adding the return type annotation so the method returns a typed instance of the
surrounding model class.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 380-665: The four methods (_execute_image, _execute_pdf,
_execute_text, _execute_multimodal) duplicate model validation, building
contents/generation_kwargs (instructions, temperature, thinking_config), usage
extraction, and LLMCallResponse construction; extract that shared logic into
helpers—e.g., add _build_generation_config(params) returning (generation_kwargs,
contents), _extract_usage(response, method_name) returning (input_tokens,
output_tokens, total_tokens, reasoning_tokens), and _build_response(response,
model, provider, include_provider_raw_response) returning an LLMCallResponse—and
call these from each _execute_* method after validating model (keep the existing
model check in each method but delegate the rest), replacing repeated blocks
with calls to those helpers to remove duplication.

In `@backend/app/services/llm/providers/registry.py`:
- Line 74: The logger.error call that logs "Failed to initialize {provider_type}
client: {e}" must be prefixed with the current function name in square brackets;
update the logger.error invocation in the function containing that line to
compute the function name (e.g., via import inspect and using
inspect.currentframe().f_code.co_name) and include it like
f"[{inspect.currentframe().f_code.co_name}] Failed to initialize {provider_type}
client: {e}", keeping exc_info=True; modify only the logger.error call that
references provider_type and e.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecbae15 and 74e328e.

📒 Files selected for processing (9)
  • backend/app/models/llm/__init__.py
  • backend/app/models/llm/request.py
  • backend/app/services/llm/jobs.py
  • backend/app/services/llm/mappers.py
  • backend/app/services/llm/providers/base.py
  • backend/app/services/llm/providers/gai.py
  • backend/app/services/llm/providers/oai.py
  • backend/app/services/llm/providers/registry.py
  • backend/app/utils.py

Comment on lines +117 to +118
print(f"Resolved input: {resolved_input}, error: {error}")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove debug print statement.

Debug print() statements should not be committed to production code. Use the logger if debugging information is needed.

Suggested fix
     resolved_input, error = resolve_input(query_input)
-    print(f"Resolved input: {resolved_input}, error: {error}")
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"Resolved input: {resolved_input}, error: {error}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 117 - 118, Remove the debug
print statement and replace it with a proper logger call: delete
print(f"Resolved input: {resolved_input}, error: {error}") and instead call the
module logger (e.g., logger.debug or logging.getLogger(__name__).debug) to emit
the same message and variables (resolved_input and error); if no logger exists
in this module, instantiate one with logging.getLogger(__name__) at top-level
before using it.

Comment on lines +421 to +423
logger.warning(
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect function name in log message and remove unnecessary f-prefix.

The log message references _execute_stt but this is in _execute_image. Also, remove the unnecessary f prefix.

Suggested fix
         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.warning(
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
)
else:
logger.warning(
"[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"
)
🧰 Tools
🪛 Ruff (0.15.2)

[error] 422-422: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 421 - 423, Update the
logger.warning call in GoogleAIProvider._execute_image to reference the correct
function name and remove the unnecessary f-string; specifically change the
message from "[GoogleAIProvider._execute_stt] Response missing usage_metadata,
using zeros" to "[GoogleAIProvider._execute_image] Response missing
usage_metadata, using zeros" and make it a plain string literal (no f-prefix) in
the logger.warning invocation.

Comment on lines +446 to +448
logger.info(
f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect function name in log message.

The success log references _execute_text but this is in _execute_image.

Suggested fix
         logger.info(
-            f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}"
+            f"[GoogleAIProvider._execute_image] Successfully generated image response: {response.response_id}"
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info(
f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}"
)
logger.info(
f"[GoogleAIProvider._execute_image] Successfully generated image response: {response.response_id}"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 446 - 448, The
success log in GoogleAIProvider._execute_image incorrectly references
"_execute_text"; update the logger.info call in the
GoogleAIProvider._execute_image method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_image] Successfully generated image response:
{response.response_id}") so logs accurately reflect the operation; locate the
logger.info invocation inside _execute_image and change the message string
accordingly while preserving the response.response_id interpolation.

Comment on lines +636 to +639
else:
logger.warning(
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect function name in log message and remove unnecessary f-prefix.

The log message references _execute_stt but this is in _execute_multimodal.

Suggested fix
         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
logger.warning(
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
)
else:
logger.warning(
"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"
)
🧰 Tools
🪛 Ruff (0.15.2)

[error] 638-638: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 636 - 639, Update the
logger call inside GoogleAIProvider._execute_multimodal to reference the correct
function name and remove the unnecessary f-string: change the message from
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to a plain string like "[GoogleAIProvider._execute_multimodal] Response missing
usage_metadata, using zeros" so the log correctly identifies _execute_multimodal
and doesn't use an f-prefix.

Comment on lines +49 to +61
elif isinstance(part, ImageContent):
if part.format == "base64":
url = f"data:{part.mime_type};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_image", "image_url": url})

elif isinstance(part, PDFContent):
if part.format == "base64":
url = f"data:{part.mime_type};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_file", "file_url": url})
Copy link

@coderabbitai coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle potential None mime_type when building data URLs.

When part.format == "base64", the code builds a data URL using part.mime_type, but mime_type is optional (can be None). This would result in data:None;base64,... which is invalid.

Suggested fix
             elif isinstance(part, ImageContent):
                 if part.format == "base64":
+                    mime = part.mime_type or "image/png"
-                    url = f"data:{part.mime_type};base64,{part.value}"
+                    url = f"data:{mime};base64,{part.value}"
                 else:
                     url = part.value
                 items.append({"type": "input_image", "image_url": url})

             elif isinstance(part, PDFContent):
                 if part.format == "base64":
+                    mime = part.mime_type or "application/pdf"
-                    url = f"data:{part.mime_type};base64,{part.value}"
+                    url = f"data:{mime};base64,{part.value}"
                 else:
                     url = part.value
                 items.append({"type": "input_file", "file_url": url})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif isinstance(part, ImageContent):
if part.format == "base64":
url = f"data:{part.mime_type};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_image", "image_url": url})
elif isinstance(part, PDFContent):
if part.format == "base64":
url = f"data:{part.mime_type};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_file", "file_url": url})
elif isinstance(part, ImageContent):
if part.format == "base64":
mime = part.mime_type or "image/png"
url = f"data:{mime};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_image", "image_url": url})
elif isinstance(part, PDFContent):
if part.format == "base64":
mime = part.mime_type or "application/pdf"
url = f"data:{mime};base64,{part.value}"
else:
url = part.value
items.append({"type": "input_file", "file_url": url})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/oai.py` around lines 49 - 61, The data URL
builder in backend/app/services/llm/providers/oai.py (inside the branches
handling ImageContent and PDFContent where part.format == "base64") uses
part.mime_type which may be None, producing invalid URLs like
"data:None;base64,..."; update the logic that constructs url so it uses a safe
fallback MIME type (e.g., "application/octet-stream") when part.mime_type is
falsy before interpolating, keeping the rest of the behavior and the same keys
added to items ({"type":"input_image","image_url":...} and
{"type":"input_file","file_url":...}) so other code consuming items remains
unchanged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_image_content and resolve_pdf_content already sets default mime type so no need to resolve this in provider level. false alarm

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that resolve_image_content and resolve_pdf_content set default MIME types
cat backend/app/utils.py | grep -A 20 "def resolve_image_content"
echo "---"
cat backend/app/utils.py | grep -A 20 "def resolve_pdf_content"

Length of output: 1255


@vprashrex, you're absolutely right! I can confirm that resolve_image_content and resolve_pdf_content in backend/app/utils.py already ensure default MIME types are set:

# In resolve_image_content:
if not c.mime_type:
    c.mime_type = "image/png"

# In resolve_pdf_content:
if not c.mime_type:
    c.mime_type = "application/pdf"

By the time the content reaches the provider level, mime_type is guaranteed to be non-None. My review comment was indeed a false alarm. Thank you for the clarification! 🙏


✏️ Learnings added
Learnt from: vprashrex
Repo: ProjectTech4DevAI/kaapi-backend PR: 628
File: backend/app/services/llm/providers/oai.py:49-61
Timestamp: 2026-02-26T03:16:20.731Z
Learning: In backend/app/utils.py, the functions resolve_image_content and resolve_pdf_content ensure that mime_type is always set with defaults ("image/png" for images, "application/pdf" for PDFs) before content reaches provider implementations. Provider code in backend/app/services/llm/providers/ can safely assume mime_type is never None.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +486 to +488
def resolve_input(
query_input,
) -> tuple[str | list[ImageContent] | list[PDFContent] | "MultiModalInput", str | None]:
Copy link

@coderabbitai coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -50 backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1603


🏁 Script executed:

sed -n '1,50p' backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1603


🏁 Script executed:

grep -n "from __future__" backend/app/utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 111


🏁 Script executed:

sed -n '486,540p' backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2637


🏁 Script executed:

sed -n '1,100p' backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3417


Add explicit type annotation for query_input and remove the unnecessary quoted forward reference in the return type.

The query_input parameter lacks a type annotation, violating the coding guideline that requires type hints on all function parameters. Additionally, "MultiModalInput" does not need quotes in the return annotation since from __future__ import annotations is enabled.

Proposed fix
 def resolve_input(
-    query_input,
-) -> tuple[str | list[ImageContent] | list[PDFContent] | "MultiModalInput", str | None]:
+    query_input: TextInput | AudioInput | ImageInput | PDFInput | list[TextInput | AudioInput | ImageInput | PDFInput],
+) -> tuple[str | list[ImageContent] | list[PDFContent] | MultiModalInput, str | None]:
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 488-488: Remove quotes from type annotation

Remove quotes

(UP037)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/utils.py` around lines 486 - 488, The function resolve_input is
missing a parameter type for query_input and uses a quoted forward reference in
its return annotation; update the signature to explicitly type query_input (e.g.
Union[str, list[ImageContent], list[PDFContent], MultiModalInput]) and remove
the quotes around MultiModalInput in the return type so the annotation reads
tuple[str | list[ImageContent] | list[PDFContent] | MultiModalInput, str |
None]; keep the existing import names (ImageContent, PDFContent,
MultiModalInput) and adjust any imports if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its just for type annotation, not going to have any effect on functionality wise

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vprashrex, you're absolutely right that type hints have no effect on runtime performance — Python remains a dynamically typed language and CPython ignores type annotations during execution.

While the runtime impact is zero, type hints do provide value through:

  • Static type checking with tools like mypy or pyright
  • Enhanced IDE autocomplete and error detection
  • Improved code readability and maintainability
  • Better documentation for other developers

That said, I understand your decision here. Feel free to keep the current implementation as-is!

…tter type management and update relevant classes to use it.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
backend/app/services/llm/providers/gai.py (3)

565-572: Remove unnecessary f-string prefix.

The warning message has no placeholders but uses an f-string prefix.

Suggested fix
         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 565 - 572, The
logger.warning call in GoogleAIProvider._execute_text uses an unnecessary
f-string prefix for a static message; remove the f prefix so the call becomes
logger.warning("[GoogleAIProvider._execute_text] Response missing
usage_metadata, using zeros") to avoid misleading formatting usage while leaving
the surrounding logic (setting input_tokens, output_tokens, total_tokens,
reasoning_tokens to 0) unchanged.

420-427: Remove unnecessary f-string prefix.

The warning message has no placeholders but uses an f-string prefix.

Suggested fix
         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 420 - 427, The
logger.warning call in GoogleAIProvider._execute_image uses an unnecessary
f-string for a static message; replace the f-string literal passed to
logger.warning with a plain string literal (i.e., remove the leading "f") so the
call becomes logger.warning("[GoogleAIProvider._execute_image] Response missing
usage_metadata, using zeros"), leaving the surrounding logic that sets
input_tokens, output_tokens, total_tokens, and reasoning_tokens unchanged.

636-643: Remove unnecessary f-string prefix.

The warning message has no placeholders but uses an f-string prefix.

Suggested fix
         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 636 - 643, In
GoogleAIProvider._execute_multimodal replace the logger.warning call that uses
an unnecessary f-string (currently f"[GoogleAIProvider._execute_multimodal]
Response missing usage_metadata, using zeros") with a plain string literal
(remove the leading f) so the message is logged without the redundant f-string
prefix; keep the rest of the block (setting input_tokens, output_tokens,
total_tokens, reasoning_tokens to 0) unchanged.
backend/app/models/llm/request.py (3)

119-126: Consider using X | Y syntax for type unions.

Per Python 3.10+ best practices (and PEP 604), the | operator is preferred over Union[] for type annotations. Since from __future__ import annotations isn't present here, this is optional but improves readability.

Suggested fix
-KaapiLLMParams = Union[
-    TextLLMParams,
-    STTLLMParams,
-    TTSLLMParams,
-    ImageLLMParams,
-    PDFLLMParams,
-    MultimodalLLMParams,
-]
+KaapiLLMParams = (
+    TextLLMParams
+    | STTLLMParams
+    | TTSLLMParams
+    | ImageLLMParams
+    | PDFLLMParams
+    | MultimodalLLMParams
+)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 119 - 126, Replace the
Union[...] alias KaapiLLMParams with the PEP 604 union operator form using |
(e.g., TextLLMParams | STTLLMParams | ...), and optionally add "from __future__
import annotations" at the top if you want to ensure forward-compatible
evaluation of annotations; update the KaapiLLMParams definition accordingly so
it references TextLLMParams, STTLLMParams, TTSLLMParams, ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams using the | operator.

59-116: Consider extracting common fields to reduce duplication.

ImageLLMParams, PDFLLMParams, and MultimodalLLMParams share identical fields (model, instructions, knowledge_base_ids, temperature, reasoning, max_num_results). Consider a base class to reduce duplication.

Example approach
class BaseVisionLLMParams(SQLModel):
    model: str
    instructions: str | None = None
    knowledge_base_ids: list[str] | None = Field(
        default=None,
        description="List of vector store IDs to use for knowledge retrieval",
    )
    temperature: float | None = Field(default=0.2, ge=0.0, le=2.0)
    reasoning: Literal["low", "medium", "high"] | None = None
    max_num_results: int | None = Field(
        default=None, ge=1,
        description="Maximum number of candidate results to return",
    )

class ImageLLMParams(BaseVisionLLMParams):
    pass

class PDFLLMParams(BaseVisionLLMParams):
    pass

class MultimodalLLMParams(BaseVisionLLMParams):
    pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 116, The three classes
ImageLLMParams, PDFLLMParams, and MultimodalLLMParams duplicate the same fields;
extract a shared base SQLModel (e.g., BaseVisionLLMParams) containing model,
instructions, knowledge_base_ids, temperature, reasoning, and max_num_results
and have ImageLLMParams, PDFLLMParams, and MultimodalLLMParams subclass that
base (or alias it) so the duplicated field definitions are removed and
maintained in one place.

2-2: Use built-in list instead of deprecated typing.List.

Static analysis correctly identifies that typing.List is deprecated in Python 3.9+. Use the built-in list type instead.

Suggested fix
-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal, Union
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, Replace the deprecated
typing.List import with the built-in list type: update the import line in
request.py to remove List from typing and update any type annotations that use
List[...] to use built-in list[...] (references: the import line containing
Annotated, Any, List, Literal, Union and any annotations in this module that use
List, e.g., List[str] -> list[str]); ensure no other typing.List usages remain
and run type checks to confirm correctness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 58-97: When building payloads in the ImageContent and PDFContent
branches, part.mime_type can be None which may cause Gemini API rejects; update
both branches to coalesce part.mime_type into a safe default (e.g., mime =
part.mime_type or "application/octet-stream" or use a more specific default like
"image/png" for images or "application/pdf" for PDFs) and then use mime for the
"mime_type" field in the inline_data and file_data dicts (affecting the items
list entries, the "inline_data"/"file_data" payloads, and the
ImageContent/PDFContent handling in format_parts).

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Around line 119-126: Replace the Union[...] alias KaapiLLMParams with the PEP
604 union operator form using | (e.g., TextLLMParams | STTLLMParams | ...), and
optionally add "from __future__ import annotations" at the top if you want to
ensure forward-compatible evaluation of annotations; update the KaapiLLMParams
definition accordingly so it references TextLLMParams, STTLLMParams,
TTSLLMParams, ImageLLMParams, PDFLLMParams, and MultimodalLLMParams using the |
operator.
- Around line 59-116: The three classes ImageLLMParams, PDFLLMParams, and
MultimodalLLMParams duplicate the same fields; extract a shared base SQLModel
(e.g., BaseVisionLLMParams) containing model, instructions, knowledge_base_ids,
temperature, reasoning, and max_num_results and have ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams subclass that base (or alias it) so the
duplicated field definitions are removed and maintained in one place.
- Line 2: Replace the deprecated typing.List import with the built-in list type:
update the import line in request.py to remove List from typing and update any
type annotations that use List[...] to use built-in list[...] (references: the
import line containing Annotated, Any, List, Literal, Union and any annotations
in this module that use List, e.g., List[str] -> list[str]); ensure no other
typing.List usages remain and run type checks to confirm correctness.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 565-572: The logger.warning call in GoogleAIProvider._execute_text
uses an unnecessary f-string prefix for a static message; remove the f prefix so
the call becomes logger.warning("[GoogleAIProvider._execute_text] Response
missing usage_metadata, using zeros") to avoid misleading formatting usage while
leaving the surrounding logic (setting input_tokens, output_tokens,
total_tokens, reasoning_tokens to 0) unchanged.
- Around line 420-427: The logger.warning call in
GoogleAIProvider._execute_image uses an unnecessary f-string for a static
message; replace the f-string literal passed to logger.warning with a plain
string literal (i.e., remove the leading "f") so the call becomes
logger.warning("[GoogleAIProvider._execute_image] Response missing
usage_metadata, using zeros"), leaving the surrounding logic that sets
input_tokens, output_tokens, total_tokens, and reasoning_tokens unchanged.
- Around line 636-643: In GoogleAIProvider._execute_multimodal replace the
logger.warning call that uses an unnecessary f-string (currently
f"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using
zeros") with a plain string literal (remove the leading f) so the message is
logged without the redundant f-string prefix; keep the rest of the block
(setting input_tokens, output_tokens, total_tokens, reasoning_tokens to 0)
unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74e328e and 42a001d.

📒 Files selected for processing (6)
  • backend/app/models/llm/request.py
  • backend/app/services/llm/jobs.py
  • backend/app/services/llm/providers/base.py
  • backend/app/services/llm/providers/gai.py
  • backend/app/services/llm/providers/oai.py
  • backend/app/utils.py

Comment on lines +58 to +97
elif isinstance(part, ImageContent):
if part.format == "base64":
items.append(
{
"inline_data": {
"data": part.value,
"mime_type": part.mime_type,
}
}
)
else:
items.append(
{
"file_data": {
"file_uri": part.value,
"mime_type": part.mime_type,
"display_name": None,
}
}
)
elif isinstance(part, PDFContent):
if part.format == "base64":
items.append(
{
"inline_data": {
"data": part.value,
"mime_type": part.mime_type,
}
}
)
else:
items.append(
{
"file_data": {
"file_uri": part.value,
"mime_type": part.mime_type,
"display_name": None,
}
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle potential None mime_type in format_parts.

Similar to the OpenAI provider, when building inline_data or file_data payloads, part.mime_type can be None. The Gemini API may reject requests with null mime_type values.

Suggested fix
             elif isinstance(part, ImageContent):
                 if part.format == "base64":
+                    mime = part.mime_type or "image/png"
                     items.append(
                         {
                             "inline_data": {
                                 "data": part.value,
-                                "mime_type": part.mime_type,
+                                "mime_type": mime,
                             }
                         }
                     )
                 else:
+                    mime = part.mime_type or "image/png"
                     items.append(
                         {
                             "file_data": {
                                 "file_uri": part.value,
-                                "mime_type": part.mime_type,
+                                "mime_type": mime,
                                 "display_name": None,
                             }
                         }
                     )
             elif isinstance(part, PDFContent):
                 if part.format == "base64":
+                    mime = part.mime_type or "application/pdf"
                     items.append(
                         {
                             "inline_data": {
                                 "data": part.value,
-                                "mime_type": part.mime_type,
+                                "mime_type": mime,
                             }
                         }
                     )
                 else:
+                    mime = part.mime_type or "application/pdf"
                     items.append(
                         {
                             "file_data": {
                                 "file_uri": part.value,
-                                "mime_type": part.mime_type,
+                                "mime_type": mime,
                                 "display_name": None,
                             }
                         }
                     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 58 - 97, When
building payloads in the ImageContent and PDFContent branches, part.mime_type
can be None which may cause Gemini API rejects; update both branches to coalesce
part.mime_type into a safe default (e.g., mime = part.mime_type or
"application/octet-stream" or use a more specific default like "image/png" for
images or "application/pdf" for PDFs) and then use mime for the "mime_type"
field in the inline_data and file_data dicts (affecting the items list entries,
the "inline_data"/"file_data" payloads, and the ImageContent/PDFContent handling
in format_parts).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/tests/services/llm/test_multimodal.py (1)

28-50: Consider using factory pattern or fixtures for repeated test data.

Several tests create similar content objects (e.g., ImageContent(format="base64", value="abc", mime_type="image/png")). Using pytest fixtures or a factory function would reduce repetition and make tests easier to maintain.

As per coding guidelines: "Use factory pattern for test fixtures in backend/app/tests/"

💡 Example factory approach
# Add at top of file or in conftest.py
import pytest

`@pytest.fixture`
def text_content():
    return TextContent(value="hello")

`@pytest.fixture`  
def image_content_base64():
    return ImageContent(format="base64", value="abc", mime_type="image/png")

`@pytest.fixture`
def pdf_content_base64():
    return PDFContent(format="base64", value="abc", mime_type="application/pdf")

# Or use factory functions
def make_image_content(format="base64", value="abc", mime_type="image/png"):
    return ImageContent(format=format, value=value, mime_type=mime_type)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/llm/test_multimodal.py` around lines 28 - 50,
Several tests repeatedly construct similar objects (e.g.,
ImageContent(format="base64", value="abc", mime_type="image/png"),
PDFContent(...), TextContent(...)) — refactor by adding pytest fixtures or small
factory functions (e.g., image_content_base64, pdf_content_base64, text_content
or make_image_content) and replace inline constructions in tests like
TestValidateCompletionInput tests and the MultiModalInput setup; ensure tests
still call validate_completion_input with the fixture/factory-produced objects
(referencing ImageContent, PDFContent, TextContent, MultiModalInput,
validate_completion_input).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 115-117: Replace the broad exception check in the test with the
specific Pydantic validation error: change the pytest.raises(Exception) in
test_empty_parts_raises to pytest.raises(pydantic.ValidationError) (or import
ValidationError from pydantic and use that) so the test asserts that
MultiModalInput(parts=[]) triggers the model_validator validation failure;
ensure the test imports pydantic.ValidationError if not already present.

---

Nitpick comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 28-50: Several tests repeatedly construct similar objects (e.g.,
ImageContent(format="base64", value="abc", mime_type="image/png"),
PDFContent(...), TextContent(...)) — refactor by adding pytest fixtures or small
factory functions (e.g., image_content_base64, pdf_content_base64, text_content
or make_image_content) and replace inline constructions in tests like
TestValidateCompletionInput tests and the MultiModalInput setup; ensure tests
still call validate_completion_input with the fixture/factory-produced objects
(referencing ImageContent, PDFContent, TextContent, MultiModalInput,
validate_completion_input).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a001d and 335e59b.

📒 Files selected for processing (1)
  • backend/app/tests/services/llm/test_multimodal.py

Comment on lines +115 to +117
def test_empty_parts_raises(self):
with pytest.raises(Exception):
MultiModalInput(parts=[])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use specific exception type instead of broad Exception.

MultiModalInput uses a Pydantic model_validator, so it raises pydantic.ValidationError when validation fails. Using Exception could mask unrelated errors.

🛠️ Proposed fix
+from pydantic import ValidationError
+
 # ...
 
     def test_empty_parts_raises(self):
-        with pytest.raises(Exception):
+        with pytest.raises(ValidationError):
             MultiModalInput(parts=[])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_empty_parts_raises(self):
with pytest.raises(Exception):
MultiModalInput(parts=[])
def test_empty_parts_raises(self):
with pytest.raises(ValidationError):
MultiModalInput(parts=[])
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 116-116: Do not assert blind exception: Exception

(B017)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/llm/test_multimodal.py` around lines 115 - 117,
Replace the broad exception check in the test with the specific Pydantic
validation error: change the pytest.raises(Exception) in test_empty_parts_raises
to pytest.raises(pydantic.ValidationError) (or import ValidationError from
pydantic and use that) so the test asserts that MultiModalInput(parts=[])
triggers the model_validator validation failure; ensure the test imports
pydantic.ValidationError if not already present.

@vprashrex vprashrex requested a review from Prajna1999 February 26, 2026 04:44
)


KaapiLLMParams = Union[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let this be at the top of the file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can’t be moved to the top because KaapiLLMParams depends on types that are defined above. Moving it up would cause unresolved reference issues.

None,
f"Unsupported completion type '{completion_type}' for Google AI provider",

elif completion_type == "text":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think completion_type should be text only - in text itself, we should be able to take any combination of
[text, list(image), list(pdf)] as input
-- separate completion types should not be there for all

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
backend/app/tests/services/llm/test_multimodal.py (1)

37-39: ⚠️ Potential issue | 🟡 Minor

Use specific ValidationError instead of broad Exception.

MultiModalInput uses a Pydantic model_validator, so validation failures raise pydantic.ValidationError. Using Exception could mask unrelated errors and is flagged by static analysis (B017).

Suggested fix
+from pydantic import ValidationError
+
 # ... at line 37-39 ...
     def test_empty_parts_raises(self):
-        with pytest.raises(Exception):
+        with pytest.raises(ValidationError):
             MultiModalInput(parts=[])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/llm/test_multimodal.py` around lines 37 - 39,
Update the test_empty_parts_raises unit test to assert the specific
pydantic.ValidationError instead of a broad Exception: change the pytest.raises
target from Exception to pydantic.ValidationError when instantiating
MultiModalInput(parts=[]). Locate the test method test_empty_parts_raises and
import or reference pydantic.ValidationError so the test fails only on
validation errors raised by the MultiModalInput model_validator.
🧹 Nitpick comments (5)
backend/app/models/llm/request.py (4)

2-2: Use built-in list instead of deprecated typing.List.

typing.List is deprecated in Python 3.9+. Since the codebase uses Python 3.11+, use the built-in list type directly.

Suggested fix
-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal, Union

Note: List appears unused in this file since all list annotations already use the built-in list syntax.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, The import currently brings in
typing.List which is deprecated and unused; update the import in
backend.app.models.llm.request by removing List from the from typing import ...
list so it only imports Annotated, Any, Literal, Union (or use built-in list
annotations where present), and run a quick search for any remaining references
to typing.List in this module to ensure no other changes are needed.

124-128: Use X | Y union syntax for QueryInput.

For consistency with Python 3.11+ conventions and other union types in this file.

Suggested fix
 # Discriminated union for query input types
 QueryInput = Annotated[
-    Union[TextInput, AudioInput, ImageInput, PDFInput],
+    TextInput | AudioInput | ImageInput | PDFInput,
     Field(discriminator="type"),
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 124 - 128, Update the
QueryInput typing to use Python 3.11+ pipe union syntax instead of Union[...]:
replace the Annotated[Union[TextInput, AudioInput, ImageInput, PDFInput],
Field(discriminator="type")] expression for QueryInput with Annotated[TextInput
| AudioInput | ImageInput | PDFInput, Field(discriminator="type")] so the
discriminated union uses the X | Y style while keeping the Annotated and
Field(discriminator="type") wrapper intact; ensure the symbol QueryInput is
updated accordingly and imports remain valid.

59-63: Use X | Y union syntax for type annotations.

Per Python 3.10+ and the codebase's Python 3.11+ requirement, prefer X | Y over Union[X, Y] for consistency with the rest of the file (e.g., str | None patterns used elsewhere).

Suggested fix
-KaapiLLMParams = Union[
-    TextLLMParams,
-    STTLLMParams,
-    TTSLLMParams,
-]
+KaapiLLMParams = TextLLMParams | STTLLMParams | TTSLLMParams
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 63, Replace the
Union[...] declaration for KaapiLLMParams with the modern pipe union syntax:
change the type alias KaapiLLMParams = Union[TextLLMParams, STTLLMParams,
TTSLLMParams] to use KaapiLLMParams = TextLLMParams | STTLLMParams |
TTSLLMParams so it matches the file's Python 3.11 style (consistent with other
annotations like str | None) and update any imports/comments if necessary.

87-91: Incomplete comments on mime_type fields.

Lines 87 and 97 have truncated comments (# keeping the mime_type) that don't convey meaningful information. Either complete the thought or remove the comments.

Suggested fix
-    # keeping the mime_type
     mime_type: str | None = Field(
         None,
         description="MIME type of the image (e.g., image/png, image/jpeg)",
     )
-    # keeping the mime_type
     mime_type: str | None = Field(
         None,
         description="MIME type of the PDF (e.g., application/pdf)",
     )

Also applies to: 97-101

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 87 - 91, The inline comment
"# keeping the mime_type" is incomplete for the mime_type Field(s) in the LLM
request model; update or remove those comments so they are meaningful: replace
the truncated comment with a short, clear description of why mime_type exists
(e.g., "MIME type of the image to preserve content type for downstream
processing/serialization") or simply delete the comment if redundant, and ensure
you update both occurrences associated with the mime_type Field definitions so
the intent is clear when inspecting the Field declarations.
backend/app/services/llm/providers/base.py (1)

72-76: Update docstring to reflect multimodal input support.

The docstring still describes resolved_input as "text string or file path for audio" but the signature now accepts str | list[ContentPart], which includes multimodal content parts.

Suggested fix
         Args:
             completion_config: LLM completion configuration, pass params as-is to provider API
             query: Query parameters including input and conversation_id
-            resolved_input: The resolved input content (text string or file path for audio)
+            resolved_input: The resolved input content (text string for text/audio, or list of content parts for multimodal)
             include_provider_raw_response: Whether to include the raw LLM provider response in the output
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/base.py` around lines 72 - 76, Update the
docstring for the method in backend/app/services/llm/providers/base.py to
reflect that resolved_input can be either a plain string or a list of multimodal
ContentPart objects (signature: resolved_input: str | list[ContentPart]), and
describe what a ContentPart represents (e.g., text, image, audio chunk, metadata
or file path) and how the provider should handle each part; keep the other
params (completion_config, query, include_provider_raw_response) descriptions
intact but clarify that when resolved_input is a list, the implementation should
iterate content parts and pass them to the provider API as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 37-39: Update the test_empty_parts_raises unit test to assert the
specific pydantic.ValidationError instead of a broad Exception: change the
pytest.raises target from Exception to pydantic.ValidationError when
instantiating MultiModalInput(parts=[]). Locate the test method
test_empty_parts_raises and import or reference pydantic.ValidationError so the
test fails only on validation errors raised by the MultiModalInput
model_validator.

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Line 2: The import currently brings in typing.List which is deprecated and
unused; update the import in backend.app.models.llm.request by removing List
from the from typing import ... list so it only imports Annotated, Any, Literal,
Union (or use built-in list annotations where present), and run a quick search
for any remaining references to typing.List in this module to ensure no other
changes are needed.
- Around line 124-128: Update the QueryInput typing to use Python 3.11+ pipe
union syntax instead of Union[...]: replace the Annotated[Union[TextInput,
AudioInput, ImageInput, PDFInput], Field(discriminator="type")] expression for
QueryInput with Annotated[TextInput | AudioInput | ImageInput | PDFInput,
Field(discriminator="type")] so the discriminated union uses the X | Y style
while keeping the Annotated and Field(discriminator="type") wrapper intact;
ensure the symbol QueryInput is updated accordingly and imports remain valid.
- Around line 59-63: Replace the Union[...] declaration for KaapiLLMParams with
the modern pipe union syntax: change the type alias KaapiLLMParams =
Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use KaapiLLMParams =
TextLLMParams | STTLLMParams | TTSLLMParams so it matches the file's Python 3.11
style (consistent with other annotations like str | None) and update any
imports/comments if necessary.
- Around line 87-91: The inline comment "# keeping the mime_type" is incomplete
for the mime_type Field(s) in the LLM request model; update or remove those
comments so they are meaningful: replace the truncated comment with a short,
clear description of why mime_type exists (e.g., "MIME type of the image to
preserve content type for downstream processing/serialization") or simply delete
the comment if redundant, and ensure you update both occurrences associated with
the mime_type Field definitions so the intent is clear when inspecting the Field
declarations.

In `@backend/app/services/llm/providers/base.py`:
- Around line 72-76: Update the docstring for the method in
backend/app/services/llm/providers/base.py to reflect that resolved_input can be
either a plain string or a list of multimodal ContentPart objects (signature:
resolved_input: str | list[ContentPart]), and describe what a ContentPart
represents (e.g., text, image, audio chunk, metadata or file path) and how the
provider should handle each part; keep the other params (completion_config,
query, include_provider_raw_response) descriptions intact but clarify that when
resolved_input is a list, the implementation should iterate content parts and
pass them to the provider API as appropriate.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 335e59b and 56c7a44.

📒 Files selected for processing (4)
  • backend/app/models/llm/request.py
  • backend/app/services/llm/jobs.py
  • backend/app/services/llm/providers/base.py
  • backend/app/tests/services/llm/test_multimodal.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/services/llm/jobs.py

@vprashrex vprashrex requested a review from kartpop February 27, 2026 05:27
)
return llm_response, None

def _execute_text(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets check if _execute_text, _execute_pdf, _execute_image can be combined

Copy link
Collaborator

@kartpop kartpop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with comments - please see if simplifications can be made

@Prajna1999 Prajna1999 changed the title Feature/multimodal integration Multimodal: Providers Extended to handle image/pdf/multimodal flows Feb 28, 2026
…CompletionConfig and related methods in GoogleAIProvider
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/crud/llm.py (1)

21-43: ⚠️ Potential issue | 🟡 Minor

serialize_input does not handle new input types.

The serialize_input function handles TextInput and AudioInput explicitly but falls back to str(query_input) for the new ImageInput, PDFInput, and list types. This would serialize them as their string representation, losing structured metadata that should be stored (like mime_type, size_bytes, etc.).

Consider extending serialize_input to handle these new types consistently with AudioInput:

🔧 Proposed fix
     elif isinstance(query_input, AudioInput):
         return json.dumps(
             {
                 "type": "audio",
                 "format": query_input.content.format,
                 "mime_type": query_input.content.mime_type,
                 # approximate byte size from b64encoded value
                 "size_bytes": len(query_input.content.value) * 3 // 4,
             }
         )
+    elif isinstance(query_input, ImageInput):
+        contents = query_input.content if isinstance(query_input.content, list) else [query_input.content]
+        return json.dumps(
+            {
+                "type": "image",
+                "items": [
+                    {
+                        "format": c.format,
+                        "mime_type": c.mime_type,
+                        "size_bytes": len(c.value) * 3 // 4 if c.format == "base64" else None,
+                    }
+                    for c in contents
+                ],
+            }
+        )
+    elif isinstance(query_input, PDFInput):
+        contents = query_input.content if isinstance(query_input.content, list) else [query_input.content]
+        return json.dumps(
+            {
+                "type": "pdf",
+                "items": [
+                    {
+                        "format": c.format,
+                        "mime_type": c.mime_type,
+                        "size_bytes": len(c.value) * 3 // 4 if c.format == "base64" else None,
+                    }
+                    for c in contents
+                ],
+            }
+        )
+    elif isinstance(query_input, list):
+        return json.dumps({"type": "multimodal", "count": len(query_input)})
     else:
         return str(query_input)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/llm.py` around lines 21 - 43, serialize_input currently only
handles TextInput and AudioInput and falls back to str(...) for ImageInput,
PDFInput and list inputs; update serialize_input to explicitly handle ImageInput
and PDFInput by returning a JSON object like the AudioInput branch containing
type ("image" or "pdf"), format, mime_type and an approximate size_bytes
(compute from base64 length same as audio), and handle list inputs (e.g.,
List[QueryInput] or similar) by serializing each element via serialize_input and
returning a JSON array string; update branches to reference the concrete types
ImageInput and PDFInput and ensure you reuse the same size_bytes calculation
used for AudioInput and call serialize_input recursively for list elements to
preserve structured metadata.
🧹 Nitpick comments (4)
backend/app/services/llm/providers/gai.py (2)

427-429: Remove unnecessary f-string prefix.

The log message has no placeholders, so the f prefix is unnecessary.

🧹 Proposed fix
         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 427 - 429, In
GoogleAIProvider._execute_text replace the unnecessary f-string on the
logger.warning call by removing the leading "f" so the static message is a plain
string; update the logger.warning invocation that currently logs
"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
to use a normal string literal without interpolation.

457-463: The query parameter is unused.

The query: QueryParams parameter was added to the execute method signature but is not used in the method body. If it's needed for API consistency with other providers, consider adding an underscore prefix (_query) to indicate it's intentionally unused, or remove it if not needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 457 - 463, The
execute method signature for execute(self, completion_config:
NativeCompletionConfig, query: QueryParams, ...) declares query but never uses
it; rename the parameter to _query (i.e., execute(self, completion_config:
NativeCompletionConfig, _query: QueryParams, ...)) to signal it's intentionally
unused (or remove it if your provider interface doesn't require it), and update
any overriding implementations or callers to match the new name so type hints
(QueryParams) remain intact and no unused-variable linter warnings occur; keep
the rest of the signature and behavior unchanged.
backend/app/models/llm/request.py (2)

2-2: Remove unused List import.

List from typing is imported but not used in the file. The code correctly uses the built-in list[...] syntax throughout.

🧹 Proposed fix
-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal, Union
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, Remove the unused typing import
"List" from the import statement that currently reads "from typing import
Annotated, Any, List, Literal, Union" in request.py; keep the other imports
(Annotated, Any, Literal, Union) intact so the file uses the built-in list[...]
syntax without the unused List symbol.

59-63: Consider using | union syntax for modern Python.

The Union[] syntax can be replaced with the more concise | operator in Python 3.10+.

🧹 Proposed fix
-KaapiLLMParams = Union[
-    TextLLMParams,
-    STTLLMParams,
-    TTSLLMParams,
-]
+KaapiLLMParams = TextLLMParams | STTLLMParams | TTSLLMParams
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 63, Replace the legacy
typing.Union usage with Python 3.10+ union operator for KaapiLLMParams: change
the definition using Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use the
| operator (TextLLMParams | STTLLMParams | TTSLLMParams) and remove the
now-unneeded Union import if present; ensure the identifier KaapiLLMParams and
the referenced types TextLLMParams, STTLLMParams, TTSLLMParams are updated
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 415-440: The _execute_text path must validate the GoogleAI
response like _execute_stt does: check that response.response_id is present and
response.text is not None before constructing TextContent/LLMResponse. Add a
guard in _execute_text that logs an error (or raises/returns a failed
LLMCallResponse) when response.response_id or response.text is missing,
mirroring the checks in _execute_stt, and only build
LLMCallResponse/LLMResponse/TextContent when those fields are valid to avoid
passing None into TextContent(value=...).
- Around line 403-405: In _execute_text, remove the contents.append call that
inserts {"role": "system", "parts": [{"text": instructions}]} and instead set
generation_kwargs["system_instruction"] = instructions so the system instruction
is passed via GenerateContentConfig; update the code paths that build
generation_kwargs and keep the rest of contents for user/assistant messages
unchanged to match the Gemini API behavior implemented in _execute_tts.

---

Outside diff comments:
In `@backend/app/crud/llm.py`:
- Around line 21-43: serialize_input currently only handles TextInput and
AudioInput and falls back to str(...) for ImageInput, PDFInput and list inputs;
update serialize_input to explicitly handle ImageInput and PDFInput by returning
a JSON object like the AudioInput branch containing type ("image" or "pdf"),
format, mime_type and an approximate size_bytes (compute from base64 length same
as audio), and handle list inputs (e.g., List[QueryInput] or similar) by
serializing each element via serialize_input and returning a JSON array string;
update branches to reference the concrete types ImageInput and PDFInput and
ensure you reuse the same size_bytes calculation used for AudioInput and call
serialize_input recursively for list elements to preserve structured metadata.

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Line 2: Remove the unused typing import "List" from the import statement that
currently reads "from typing import Annotated, Any, List, Literal, Union" in
request.py; keep the other imports (Annotated, Any, Literal, Union) intact so
the file uses the built-in list[...] syntax without the unused List symbol.
- Around line 59-63: Replace the legacy typing.Union usage with Python 3.10+
union operator for KaapiLLMParams: change the definition using
Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use the | operator
(TextLLMParams | STTLLMParams | TTSLLMParams) and remove the now-unneeded Union
import if present; ensure the identifier KaapiLLMParams and the referenced types
TextLLMParams, STTLLMParams, TTSLLMParams are updated accordingly.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 427-429: In GoogleAIProvider._execute_text replace the unnecessary
f-string on the logger.warning call by removing the leading "f" so the static
message is a plain string; update the logger.warning invocation that currently
logs "[GoogleAIProvider._execute_text] Response missing usage_metadata, using
zeros" to use a normal string literal without interpolation.
- Around line 457-463: The execute method signature for execute(self,
completion_config: NativeCompletionConfig, query: QueryParams, ...) declares
query but never uses it; rename the parameter to _query (i.e., execute(self,
completion_config: NativeCompletionConfig, _query: QueryParams, ...)) to signal
it's intentionally unused (or remove it if your provider interface doesn't
require it), and update any overriding implementations or callers to match the
new name so type hints (QueryParams) remain intact and no unused-variable linter
warnings occur; keep the rest of the signature and behavior unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56c7a44 and 2c76a4a.

📒 Files selected for processing (3)
  • backend/app/crud/llm.py
  • backend/app/models/llm/request.py
  • backend/app/services/llm/providers/gai.py

Comment on lines +403 to +405
generation_kwargs = {}
if instructions:
contents.append({"role": "system", "parts": [{"text": instructions}]})
Copy link

@coderabbitai coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Google Gemini API system_instruction parameter role system content

💡 Result:

In the Gemini API, you generally don’t put a {role: "system"} message inside contents (OpenAI-style). Instead, you pass your “system prompt” via the separate system_instruction / systemInstruction field (part of GenerateContentConfig in the SDKs, or top-level in REST). (ai.google.dev)

REST shape

{
  "system_instruction": {
    "parts": [{ "text": "You are a cat. Your name is Neko." }]
  },
  "contents": [
    { "parts": [{ "text": "Hello there" }] }
  ]
}

(ai.google.dev)

SDK notes (roles)

  • In the Gen AI SDKs, you typically set config.systemInstruction (JS) / config.system_instruction (Python) to a string. (ai.google.dev)
  • If you construct a Content object for the system instruction (e.g., Go), the docs’ sample uses role = user for that content (not "system"). (ai.google.dev)
  • Many integrations that accept OpenAI-style {role:"system"} messages simply extract that message and map it to Gemini’s system_instruction. (reference-server.pipecat.ai)

Citations:


🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '400,410p' gai.py && echo "---" && sed -n '150,160p' gai.py && echo "---" && sed -n '270,280p' gai.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1180


🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '380,430p' gai.py | head -60

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2285


🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '380,440p' gai.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2696


🏁 Script executed:

cd backend/app/services/llm/providers && grep -n "system_instruction" gai.py | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 143


🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '380,395p' gai.py && echo "---" && grep -n "logger\." gai.py | grep "_execute_text"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 836


Move system instruction to GenerateContentConfig parameter instead of contents array.

The Gemini API does not support role: "system" in the contents array. According to the Gemini API documentation, system instructions must be passed via the system_instruction parameter in GenerateContentConfig, not as a content message. This is already correctly implemented in _execute_tts (line 275) but not in _execute_text (line 405).

Update line 405 to pass instructions through config instead:

generation_kwargs["system_instruction"] = instructions

Remove the incorrect contents.append() line that adds role: "system".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 403 - 405, In
_execute_text, remove the contents.append call that inserts {"role": "system",
"parts": [{"text": instructions}]} and instead set
generation_kwargs["system_instruction"] = instructions so the system instruction
is passed via GenerateContentConfig; update the code paths that build
generation_kwargs and keep the rest of contents for user/assistant messages
unchanged to match the Gemini API behavior implemented in _execute_tts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per the gemini docs they do support the "role": "system"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

Gemini API contents array role system support 2026

💡 Result:

As of March 1, 2026, the Gemini API contents array does not support a "system" role. In REST, each contents[] item is a conversation turn and the documented roles are "user" and "model" (they should typically alternate for multi‑turn chat). [1]

To provide “system” guidance, you set it outside the contents array using system_instruction (REST) / systemInstruction (SDK config). [2] In Vertex AI’s Gemini interface, the role inside systemInstruction is ignored. [3]

REST example (system instruction + user message):

{
  "system_instruction": {
    "parts": [{ "text": "You are a helpful assistant." }]
  },
  "contents": [
    { "role": "user", "parts": [{ "text": "Hello!" }] }
  ]
}

[2]

Sources: [1] [2] [3]


@vprashrex, I've verified the latest Gemini API documentation, and as of March 2026, the Gemini API contents array does not support a "system" role—the documented roles are "user" and "model".

System guidance must be set outside the contents array using the system_instruction parameter in the REST API or systemInstruction in SDK config.

Correct approach (matching _execute_tts at line 275):

generation_kwargs = {}
if instructions:
    generation_kwargs["system_instruction"] = instructions

if temperature is not None:
    generation_kwargs["temperature"] = temperature

if thinking_level is not None:
    generation_kwargs["thinking_config"] = ThinkingConfig(
        include_thoughts=False, thinking_level=thinking_level
    )

Then remove line 405 that incorrectly appends {"role": "system"} to the contents array.

CodeRabbit's original finding is correct—the current implementation at line 405 uses an unsupported pattern that could cause API errors.


🧠 Learnings used
Learnt from: vprashrex
Repo: ProjectTech4DevAI/kaapi-backend PR: 628
File: backend/app/services/llm/providers/oai.py:49-61
Timestamp: 2026-02-26T03:16:20.731Z
Learning: Enforce that mime_type is always defaulted in resolve_image_content and resolve_pdf_content (in backend/app/utils.py) before content reaches provider implementations. Provider modules under backend/app/services/llm/providers/ (e.g., oai.py) can assume mime_type is non-None and should not rely on runtime None checks for mime_type. This reduces null-related errors and defines a clear contract between utils and providers. Consider adding explicit type hints or runtime guards if downstream changes could reintroduce None mime_type.

Comment on lines +415 to +440
response = self.client.models.generate_content(
model=model,
contents=contents,
config=GenerateContentConfig(**generation_kwargs),
)

if response.usage_metadata:
input_tokens = response.usage_metadata.prompt_token_count or 0
output_tokens = response.usage_metadata.candidates_token_count or 0
total_tokens = response.usage_metadata.total_token_count or 0
reasoning_tokens = response.usage_metadata.thoughts_token_count or 0
else:
logger.warning(
f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
)
input_tokens = 0
output_tokens = 0
total_tokens = 0
reasoning_tokens = 0

llm_response = LLMCallResponse(
response=LLMResponse(
provider_response_id=response.response_id,
model=response.model_version or model,
provider=completion_config.provider,
output=TextOutput(content=TextContent(value=response.text)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add response validation for consistency with other execute methods.

Unlike _execute_stt (lines 170-174), this method does not validate that response.response_id and response.text are present before using them. If response.text is None, creating TextContent(value=response.text) may fail or cause unexpected behavior.

🛡️ Proposed fix
         response = self.client.models.generate_content(
             model=model,
             contents=contents,
             config=GenerateContentConfig(**generation_kwargs),
         )

+        if not response.response_id:
+            return None, "Google AI response missing response_id"
+
+        if not response.text:
+            return None, "Google AI response missing text content"
+
         if response.usage_metadata:
🧰 Tools
🪛 Ruff (0.15.2)

[error] 428-428: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 415 - 440, The
_execute_text path must validate the GoogleAI response like _execute_stt does:
check that response.response_id is present and response.text is not None before
constructing TextContent/LLMResponse. Add a guard in _execute_text that logs an
error (or raises/returns a failed LLMCallResponse) when response.response_id or
response.text is missing, mirroring the checks in _execute_stt, and only build
LLMCallResponse/LLMResponse/TextContent when those fields are valid to avoid
passing None into TextContent(value=...).

…ng parameter handling in Google params mapping tests
@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 97.97297% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/crud/llm.py 40.00% 6 Missing ⚠️
backend/app/services/llm/providers/gai.py 94.54% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ig and update assertions to validate system_instruction in Google AI execution routing
@vprashrex vprashrex added enhancement New feature or request ready-for-review labels Mar 2, 2026
@vprashrex vprashrex moved this to In Review in Kaapi-dev Mar 2, 2026
@vprashrex vprashrex merged commit 2dc5949 into main Mar 2, 2026
3 checks passed
@vprashrex vprashrex deleted the feature/multimodal-integration branch March 2, 2026 08:20
@github-project-automation github-project-automation bot moved this from In Review to Closed in Kaapi-dev Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

Status: Closed

Development

Successfully merging this pull request may close these issues.

Multimodal: Add Support for Image, PDF & Mixed Content for AI Assessment

3 participants