Multimodal: Providers Extended to handle image/pdf/multimodal flows by vprashrex · Pull Request #628 · ProjectTech4DevAI/kaapi-backend

vprashrex · 2026-02-26T02:23:52Z

Summary

Target issue is #636

Explain the motivation for making this change. What existing problem does the pull request solve?

Add multimodal support to the llm/call endpoint by extending OpenAI and Google providers to handle images and PDFs (including multiple files). The input field should accept a single dict or a list of dicts to enable mixed-content requests (text + image/PDF) in a single call.

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

New Features
- Multimodal support: send images, PDFs, and mixed text+media to LLMs.
- Email utilities: templates and send functions for test/new-account/reset messages.
- API response wrapper for consistent success/failure payloads.
- Password reset: token generation and verification.
Improvements
- Providers extended to handle image/pdf/multimodal flows and better parameter propagation.
- Stronger input resolution, validation, and temporary file cleanup.
- Expanded multimodal test coverage.

…cessing logic

…input type

…ation

…, and Multimodal parameters; update validation to restrict audio input in multimodal processing.

coderabbitai · 2026-02-26T02:24:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Added multimodal support across the LLM stack: new Image/PDF content and input models, multimodal input model and typing, input resolution and cleanup utilities, provider implementations updated (OpenAI/Google) to handle images/PDFs/multimodal, job/registry plumbing adjusted, and comprehensive multimodal tests.

Changes

Cohort / File(s)	Summary
Models / Public API `backend/app/models/llm/__init__.py`, `backend/app/models/llm/request.py`	Re-exported and defined multimodal types: `ImageContent`, `PDFContent`, `ImageInput`, `PDFInput`; extended `QueryInput` and LLM param/completion typing to include `image`, `pdf`, and `multimodal`.
Provider base & typing `backend/app/services/llm/providers/base.py`	Added `MultiModalInput`, `ContentPart` alias (Text/Image/PDF), validator to require non-empty parts; broadened `BaseProvider.execute` to accept `str
OpenAI provider `backend/app/services/llm/providers/oai.py`	Added `format_parts` helper and multimodal dispatch; `execute` accepts `MultiModalInput` or lists of image/pdf parts and wraps parts into API content blocks.
Google provider `backend/app/services/llm/providers/gai.py`	Added `format_parts`, `_execute_text` multimodal path, and integrated multimodal handling into `execute`; response/usage handling updated for text/multimodal flows.
Jobs & registry `backend/app/services/llm/jobs.py`, `backend/app/services/llm/providers/registry.py`	`resolved_input_context` accepts ImageInput/PDFInput and lists; registry lazy-loads credential retrieval import.
Utilities & resolution `backend/app/utils.py`	Added content resolvers (`resolve_image_content`, `resolve_pdf_content`), `resolve_input` extended for multimodal lists, temp-file cleanup, and supporting helpers. Many utility additions included.
CRUD / request handling `backend/app/crud/llm.py`	Create LLM call now recognizes `ImageInput`, `PDFInput`, and list-based multimodal inputs; input_type expanded to include `pdf` and `multimodal`.
Mappers `backend/app/services/llm/mappers.py`	Now propagates `reasoning` into Google params instead of warning; retained knowledge_base_ids warning.
Tests `backend/app/tests/services/llm/test_multimodal.py`	New comprehensive tests for multimodal model creation, resolution utilities, provider formatting, and end-to-end multimodal assembly.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Resolver as resolve_input (utils)
  participant Jobs as jobs.resolved_input_context
  participant Provider as Provider (OpenAI/Google)
  participant External as External LLM API

  Client->>Resolver: submit QueryParams (text|audio|image|pdf|list)
  Resolver-->>Client: (resolved_input, temp_path?)
  Client->>Jobs: call resolved_input_context(resolved_input)
  Jobs-->>Provider: execute(completion_config, query, resolved_input)
  Provider->>External: send API request (text|multimodal)
  External-->>Provider: API response (raw, usage)
  Provider-->>Jobs: LLMCallResponse, provider_raw_response
  Jobs-->>Client: final LLMCallResponse

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

TTS/STT: Gemini-TTS Model Integration with unified API #574 — Overlaps with input-resolution utilities and audio/image handling changes in backend/app/utils.py.
TTS/STT: Speech-To-Text Using Gemini in Unified API #550 — Related changes to LLM provider/model interfaces and execute surface.

Suggested labels

enhancement, ready-for-review

Suggested reviewers

Prajna1999
kartpop
avirajsingh7

Poem

"I hopped through code with ears aflutter,
stitched images, PDFs and text like butter.
Parts combined in multimodal cheer,
now models see pictures far and near.
Hooray — inputs joined, the rabbit claps a mutter 🐇"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: extending LLM providers to handle multimodal inputs (image, PDF, and multimodal content flows) across the codebase.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/multimodal-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/services/llm/providers/gai.py (1)

674-715: ⚠️ Potential issue | 🟠 Major

Missing return statement for unhandled completion types.

If completion_type doesn't match any of the handled cases (stt, tts, text, image, pdf, multimodal), the function falls through to the exception handlers without returning. This could lead to None being returned implicitly.

Suggested fix

             elif completion_type == "multimodal":
                 return self._execute_multimodal(
                     completion_config=completion_config,
                     resolved_input=resolved_input,
                     include_provider_raw_response=include_provider_raw_response,
                 )
+
+            else:
+                return None, f"Unsupported completion type: {completion_type}"

         except TypeError as e:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 674 - 715, The
dispatch on completion_type (variable completion_type) handles
"stt","tts","text","image","pdf","multimodal" but has no default branch, so when
an unknown type is passed the method can fall through and return None; update
the dispatch to handle unknown values by adding a final else that either raises
a clear exception (e.g., ValueError/UnsupportedCompletionTypeError with the
offending completion_type included) or returns a well-formed error response
object consistent with this service's API; ensure the change is applied in the
method containing the calls to _execute_stt, _execute_tts, _execute_text,
_execute_image, _execute_pdf, and _execute_multimodal so all paths return
deterministically.

🧹 Nitpick comments (7)

backend/app/services/llm/providers/base.py (1)

24-28: Add return type hint to validator.

For better type safety, add Self return type annotation to the validator method.

Suggested fix

+from typing import Self
+
     `@model_validator`(mode="after")
-    def validate_parts(self):
+    def validate_parts(self) -> Self:
         if not self.parts:
             raise ValueError("MultiModalInput requires at least one content part")
         return self

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/base.py` around lines 24 - 28, The
validator method validate_parts should declare a Self return type; change its
signature to return "Self" (e.g., def validate_parts(self) -> Self:) and ensure
Self is imported (from typing or typing_extensions depending on Python target).
Keep the `@model_validator`(mode="after") decorator and the existing body intact,
only adding the return type annotation so the method returns a typed instance of
the surrounding model class.

backend/app/services/llm/mappers.py (1)

88-92: Update docstring to include reasoning parameter.

The docstring's "Supported Mapping" section should include the newly supported reasoning parameter to keep documentation in sync with the implementation.

Suggested fix

     Supported Mapping:
         - model → model
         - instructions → instructions (for STT prompts, if available)
         - temperature -> temperature parameter (0-2)
+        - reasoning → reasoning (if provided)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/mappers.py` around lines 88 - 92, Update the
module/function docstring in mappers.py where the "Supported Mapping" list is
defined (the module-level docstring or the docstring for the mapping function)
to include the newly supported reasoning parameter by adding a line like
"reasoning → reasoning" (or "reasoning → reasoning parameter") to the list so
the documentation matches the implementation; ensure formatting matches the
existing list style and placement among other entries (model, instructions,
temperature).

backend/app/models/llm/request.py (2)

59-101: Consider extracting common fields to a base class.

ImageLLMParams, PDFLLMParams, and MultimodalLLMParams share identical fields. A base class could reduce duplication and ensure consistency.

class BaseVisualLLMParams(SQLModel):
    model: str
    instructions: str | None = None
    response_format: Literal["text"] | None = Field(None, description="Currently supports text type")
    temperature: float | None = Field(default=0.2, ge=0.0, le=2.0)
    reasoning: Literal["low", "medium", "high"] | None = None

class ImageLLMParams(BaseVisualLLMParams):
    pass

class PDFLLMParams(BaseVisualLLMParams):
    pass

class MultimodalLLMParams(BaseVisualLLMParams):
    pass

This allows future differentiation while reducing current duplication.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 101, Extract the
duplicated fields into a shared base model and have ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams inherit from it: create a
BaseVisualLLMParams SQLModel containing model, instructions, response_format,
temperature, and reasoning, then redefine ImageLLMParams, PDFLLMParams, and
MultimodalLLMParams to subclass BaseVisualLLMParams (no additional fields) so
the shared schema lives in one place and future extensions can add specific
fields to the subclasses.

2-2: Use modern type annotation syntax.

List from typing is deprecated in Python 3.9+. Use the built-in list instead.

Suggested fix

-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal

Note: List doesn't appear to be used in this file. If it's needed elsewhere, use built-in list instead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, The import statement currently
pulls List from typing which is deprecated; update the typing import in the
top-level import (the line importing Annotated, Any, List, Literal, Union) by
removing List and, where any list annotations exist in this module, replace
typing.List[...] with the built-in list[...] syntax (or if no list is used,
simply remove List from the import). Ensure you update references that use List
to use built-in list and clean up the unused List import to avoid linter
warnings.

backend/app/services/llm/providers/gai.py (1)

380-665: Consider extracting shared logic to reduce code duplication.

The methods _execute_image, _execute_pdf, _execute_text, and _execute_multimodal share significant duplicate code for:

Model validation
Generation kwargs setup (instructions, temperature, thinking_config)
Usage metadata extraction
LLMCallResponse construction

Consider extracting common logic into helper methods:

def _build_generation_config(self, params: dict) -> tuple[dict, list]:
    """Extract generation kwargs and contents from params."""
    # Shared logic for instructions, temperature, thinking_level
    ...

def _extract_usage(self, response, method_name: str) -> tuple[int, int, int, int]:
    """Extract usage metadata with fallback to zeros."""
    ...

def _build_response(self, response, model, provider, include_raw: bool) -> LLMCallResponse:
    """Build standardized LLMCallResponse."""
    ...

This would significantly reduce the ~300 lines of near-duplicate code.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 380 - 665, The four
methods (_execute_image, _execute_pdf, _execute_text, _execute_multimodal)
duplicate model validation, building contents/generation_kwargs (instructions,
temperature, thinking_config), usage extraction, and LLMCallResponse
construction; extract that shared logic into helpers—e.g., add
_build_generation_config(params) returning (generation_kwargs, contents),
_extract_usage(response, method_name) returning (input_tokens, output_tokens,
total_tokens, reasoning_tokens), and _build_response(response, model, provider,
include_provider_raw_response) returning an LLMCallResponse—and call these from
each _execute_* method after validating model (keep the existing model check in
each method but delegate the rest), replacing repeated blocks with calls to
those helpers to remove duplication.

backend/app/services/llm/jobs.py (1)

107-110: Add return type hint to context manager.

As per coding guidelines, all functions should have type hints for return values. Context managers should annotate their return type.

Suggested fix

+from typing import Any, Generator
+
 `@contextmanager`
 def resolved_input_context(
     query_input: TextInput | AudioInput | ImageInput | PDFInput | list,
-):
+) -> Generator[Any, None, None]:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 107 - 110, The context manager
function resolved_input_context lacks a return type hint; update its signature
to include a typed iterator return such as -> Iterator[Union[TextInput,
AudioInput, ImageInput, PDFInput, list]] (or -> Generator[Union[...], None,
None]) and add the necessary typing imports (Iterator/Generator and Union) so
the `@contextmanager-decorated` function is properly annotated; reference the
function name resolved_input_context and ensure the return union mirrors the
existing parameter union.

backend/app/services/llm/providers/registry.py (1)

74-74: Log message should use function name prefix.

As per coding guidelines, log messages should be prefixed with the function name in square brackets.

Suggested fix

-        logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)
+        logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/registry.py` at line 74, The logger.error
call that logs "Failed to initialize {provider_type} client: {e}" must be
prefixed with the current function name in square brackets; update the
logger.error invocation in the function containing that line to compute the
function name (e.g., via import inspect and using
inspect.currentframe().f_code.co_name) and include it like
f"[{inspect.currentframe().f_code.co_name}] Failed to initialize {provider_type}
client: {e}", keeping exc_info=True; modify only the logger.error call that
references provider_type and e.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 117-118: Remove the debug print statement and replace it with a
proper logger call: delete print(f"Resolved input: {resolved_input}, error:
{error}") and instead call the module logger (e.g., logger.debug or
logging.getLogger(__name__).debug) to emit the same message and variables
(resolved_input and error); if no logger exists in this module, instantiate one
with logging.getLogger(__name__) at top-level before using it.

In `@backend/app/services/llm/providers/base.py`:
- Around line 71-75: The ternary that builds hint uses an unnecessary f-string
for the else branch; update the hint assignment so the else branch uses a plain
string (remove the leading `f`) where it currently says f" Please ensure the
input type matches the completion type." — locate the hint variable construction
that checks isinstance(resolved_input, MultiModalInput) and remove the `f`
prefix from the else string.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 446-448: The success log in GoogleAIProvider._execute_image
incorrectly references "_execute_text"; update the logger.info call in the
GoogleAIProvider._execute_image method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_image] Successfully generated image response:
{response.response_id}") so logs accurately reflect the operation; locate the
logger.info invocation inside _execute_image and change the message string
accordingly while preserving the response.response_id interpolation.
- Around line 565-568: The warning log in GoogleAIProvider currently references
the wrong function name and uses an unnecessary f-string; update the
logger.warning call in _execute_text to reference
"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
(remove the f-prefix since there is no interpolation) so the message accurately
names the function and avoids the needless formatted string.
- Around line 491-494: The warning log in GoogleAIProvider incorrectly names the
function `_execute_stt` and uses an unnecessary f-string; update the
logger.warning call inside `_execute_pdf` to reference `_execute_pdf` instead of
`_execute_stt` and remove the f-prefix so the message is a plain string (e.g., "
[GoogleAIProvider._execute_pdf] Response missing usage_metadata, using zeros").
Ensure you modify the logger.warning invocation in the `_execute_pdf` method.
- Around line 421-423: Update the logger.warning call in
GoogleAIProvider._execute_image to reference the correct function name and
remove the unnecessary f-string; specifically change the message from
"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to "[GoogleAIProvider._execute_image] Response missing usage_metadata, using
zeros" and make it a plain string literal (no f-prefix) in the logger.warning
invocation.
- Around line 662-664: The success log in GoogleAIProvider._execute_multimodal
incorrectly references "_execute_text"; update the logger.info call inside the
_execute_multimodal method to use the correct function name (e.g.,
"[GoogleAIProvider._execute_multimodal] Successfully generated text response:
{response.response_id}") so the message matches the method where
response.response_id is produced.
- Around line 517-519: The logger.info message in GoogleAIProvider._execute_pdf
incorrectly references "_execute_text"; update the log string in the logger.info
call within the GoogleAIProvider._execute_pdf method to reference "_execute_pdf"
(e.g., keep using response.response_id but change the label to indicate
_execute_pdf) so the success log correctly identifies the originating method.
- Around line 636-639: Update the logger call inside
GoogleAIProvider._execute_multimodal to reference the correct function name and
remove the unnecessary f-string: change the message from
f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
to a plain string like "[GoogleAIProvider._execute_multimodal] Response missing
usage_metadata, using zeros" so the log correctly identifies _execute_multimodal
and doesn't use an f-prefix.

In `@backend/app/services/llm/providers/oai.py`:
- Around line 49-61: The data URL builder in
backend/app/services/llm/providers/oai.py (inside the branches handling
ImageContent and PDFContent where part.format == "base64") uses part.mime_type
which may be None, producing invalid URLs like "data:None;base64,..."; update
the logic that constructs url so it uses a safe fallback MIME type (e.g.,
"application/octet-stream") when part.mime_type is falsy before interpolating,
keeping the rest of the behavior and the same keys added to items
({"type":"input_image","image_url":...} and
{"type":"input_file","file_url":...}) so other code consuming items remains
unchanged.

In `@backend/app/utils.py`:
- Around line 486-488: The function resolve_input is missing a parameter type
for query_input and uses a quoted forward reference in its return annotation;
update the signature to explicitly type query_input (e.g. Union[str,
list[ImageContent], list[PDFContent], MultiModalInput]) and remove the quotes
around MultiModalInput in the return type so the annotation reads tuple[str |
list[ImageContent] | list[PDFContent] | MultiModalInput, str | None]; keep the
existing import names (ImageContent, PDFContent, MultiModalInput) and adjust any
imports if needed.
- Line 47: ContentPart is too broad because it includes AudioContent while
multimodal inputs explicitly exclude audio; update the type alias in ContentPart
to only include the allowed types (TextContent, ImageContent, PDFContent) so it
matches MultiModalInput.parts and the logic in providers/base.py, removing
AudioContent from the union to tighten type safety.

---

Outside diff comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 674-715: The dispatch on completion_type (variable
completion_type) handles "stt","tts","text","image","pdf","multimodal" but has
no default branch, so when an unknown type is passed the method can fall through
and return None; update the dispatch to handle unknown values by adding a final
else that either raises a clear exception (e.g.,
ValueError/UnsupportedCompletionTypeError with the offending completion_type
included) or returns a well-formed error response object consistent with this
service's API; ensure the change is applied in the method containing the calls
to _execute_stt, _execute_tts, _execute_text, _execute_image, _execute_pdf, and
_execute_multimodal so all paths return deterministically.

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Around line 59-101: Extract the duplicated fields into a shared base model and
have ImageLLMParams, PDFLLMParams, and MultimodalLLMParams inherit from it:
create a BaseVisualLLMParams SQLModel containing model, instructions,
response_format, temperature, and reasoning, then redefine ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams to subclass BaseVisualLLMParams (no
additional fields) so the shared schema lives in one place and future extensions
can add specific fields to the subclasses.
- Line 2: The import statement currently pulls List from typing which is
deprecated; update the typing import in the top-level import (the line importing
Annotated, Any, List, Literal, Union) by removing List and, where any list
annotations exist in this module, replace typing.List[...] with the built-in
list[...] syntax (or if no list is used, simply remove List from the import).
Ensure you update references that use List to use built-in list and clean up the
unused List import to avoid linter warnings.

In `@backend/app/services/llm/jobs.py`:
- Around line 107-110: The context manager function resolved_input_context lacks
a return type hint; update its signature to include a typed iterator return such
as -> Iterator[Union[TextInput, AudioInput, ImageInput, PDFInput, list]] (or ->
Generator[Union[...], None, None]) and add the necessary typing imports
(Iterator/Generator and Union) so the `@contextmanager-decorated` function is
properly annotated; reference the function name resolved_input_context and
ensure the return union mirrors the existing parameter union.

In `@backend/app/services/llm/mappers.py`:
- Around line 88-92: Update the module/function docstring in mappers.py where
the "Supported Mapping" list is defined (the module-level docstring or the
docstring for the mapping function) to include the newly supported reasoning
parameter by adding a line like "reasoning → reasoning" (or "reasoning →
reasoning parameter") to the list so the documentation matches the
implementation; ensure formatting matches the existing list style and placement
among other entries (model, instructions, temperature).

In `@backend/app/services/llm/providers/base.py`:
- Around line 24-28: The validator method validate_parts should declare a Self
return type; change its signature to return "Self" (e.g., def
validate_parts(self) -> Self:) and ensure Self is imported (from typing or
typing_extensions depending on Python target). Keep the
`@model_validator`(mode="after") decorator and the existing body intact, only
adding the return type annotation so the method returns a typed instance of the
surrounding model class.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 380-665: The four methods (_execute_image, _execute_pdf,
_execute_text, _execute_multimodal) duplicate model validation, building
contents/generation_kwargs (instructions, temperature, thinking_config), usage
extraction, and LLMCallResponse construction; extract that shared logic into
helpers—e.g., add _build_generation_config(params) returning (generation_kwargs,
contents), _extract_usage(response, method_name) returning (input_tokens,
output_tokens, total_tokens, reasoning_tokens), and _build_response(response,
model, provider, include_provider_raw_response) returning an LLMCallResponse—and
call these from each _execute_* method after validating model (keep the existing
model check in each method but delegate the rest), replacing repeated blocks
with calls to those helpers to remove duplication.

In `@backend/app/services/llm/providers/registry.py`:
- Line 74: The logger.error call that logs "Failed to initialize {provider_type}
client: {e}" must be prefixed with the current function name in square brackets;
update the logger.error invocation in the function containing that line to
compute the function name (e.g., via import inspect and using
inspect.currentframe().f_code.co_name) and include it like
f"[{inspect.currentframe().f_code.co_name}] Failed to initialize {provider_type}
client: {e}", keeping exc_info=True; modify only the logger.error call that
references provider_type and e.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecbae15 and 74e328e.

📒 Files selected for processing (9)

backend/app/models/llm/__init__.py
backend/app/models/llm/request.py
backend/app/services/llm/jobs.py
backend/app/services/llm/mappers.py
backend/app/services/llm/providers/base.py
backend/app/services/llm/providers/gai.py
backend/app/services/llm/providers/oai.py
backend/app/services/llm/providers/registry.py
backend/app/utils.py

coderabbitai · 2026-02-26T02:29:27Z

backend/app/services/llm/jobs.py

+    print(f"Resolved input: {resolved_input}, error: {error}")
+


⚠️ Potential issue | 🟠 Major

Remove debug print statement.

Debug print() statements should not be committed to production code. Use the logger if debugging information is needed.

Suggested fix

resolved_input, error = resolve_input(query_input) - print(f"Resolved input: {resolved_input}, error: {error}") -

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"Resolved input: {resolved_input}, error: {error}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 117 - 118, Remove the debug print statement and replace it with a proper logger call: delete print(f"Resolved input: {resolved_input}, error: {error}") and instead call the module logger (e.g., logger.debug or logging.getLogger(__name__).debug) to emit the same message and variables (resolved_input and error); if no logger exists in this module, instantiate one with logging.getLogger(__name__) at top-level before using it.

backend/app/services/llm/providers/base.py

coderabbitai · 2026-02-26T02:29:27Z

backend/app/services/llm/providers/gai.py

+            logger.warning(
+                f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+            )


⚠️ Potential issue | 🟡 Minor

Fix incorrect function name in log message and remove unnecessary f-prefix.

The log message references _execute_stt but this is in _execute_image. Also, remove the unnecessary f prefix.

Suggested fix

else: logger.warning( - f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.warning(

f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"

)

else:

logger.warning(

"[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"

)

🧰 Tools

🪛 Ruff (0.15.2)

[error] 422-422: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 421 - 423, Update the logger.warning call in GoogleAIProvider._execute_image to reference the correct function name and remove the unnecessary f-string; specifically change the message from "[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" to "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros" and make it a plain string literal (no f-prefix) in the logger.warning invocation.

coderabbitai · 2026-02-26T02:29:28Z

backend/app/services/llm/providers/gai.py

+        logger.info(
+            f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}"
+        )


⚠️ Potential issue | 🟡 Minor

Fix incorrect function name in log message.

The success log references _execute_text but this is in _execute_image.

Suggested fix

logger.info( - f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}" + f"[GoogleAIProvider._execute_image] Successfully generated image response: {response.response_id}" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info(

f"[GoogleAIProvider._execute_text] Successfully generated text response: {response.response_id}"

)

logger.info(

f"[GoogleAIProvider._execute_image] Successfully generated image response: {response.response_id}"

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 446 - 448, The success log in GoogleAIProvider._execute_image incorrectly references "_execute_text"; update the logger.info call in the GoogleAIProvider._execute_image method to use the correct function name (e.g., "[GoogleAIProvider._execute_image] Successfully generated image response: {response.response_id}") so logs accurately reflect the operation; locate the logger.info invocation inside _execute_image and change the message string accordingly while preserving the response.response_id interpolation.

backend/app/services/llm/providers/gai.py

coderabbitai · 2026-02-26T02:29:28Z

backend/app/services/llm/providers/gai.py

+        else:
+            logger.warning(
+                f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+            )


⚠️ Potential issue | 🟡 Minor

Fix incorrect function name in log message and remove unnecessary f-prefix.

The log message references _execute_stt but this is in _execute_multimodal.

Suggested fix

else: logger.warning( - f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" + "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

logger.warning(

f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"

)

else:

logger.warning(

"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"

)

🧰 Tools

🪛 Ruff (0.15.2)

[error] 638-638: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 636 - 639, Update the logger call inside GoogleAIProvider._execute_multimodal to reference the correct function name and remove the unnecessary f-string: change the message from f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros" to a plain string like "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros" so the log correctly identifies _execute_multimodal and doesn't use an f-prefix.

backend/app/services/llm/providers/gai.py

coderabbitai · 2026-02-26T02:29:28Z

backend/app/services/llm/providers/oai.py

+            elif isinstance(part, ImageContent):
+                if part.format == "base64":
+                    url = f"data:{part.mime_type};base64,{part.value}"
+                else:
+                    url = part.value
+                items.append({"type": "input_image", "image_url": url})
+
+            elif isinstance(part, PDFContent):
+                if part.format == "base64":
+                    url = f"data:{part.mime_type};base64,{part.value}"
+                else:
+                    url = part.value
+                items.append({"type": "input_file", "file_url": url})


⚠️ Potential issue | 🟠 Major

Handle potential None mime_type when building data URLs.

When part.format == "base64", the code builds a data URL using part.mime_type, but mime_type is optional (can be None). This would result in data:None;base64,... which is invalid.

Suggested fix

elif isinstance(part, ImageContent): if part.format == "base64": + mime = part.mime_type or "image/png" - url = f"data:{part.mime_type};base64,{part.value}" + url = f"data:{mime};base64,{part.value}" else: url = part.value items.append({"type": "input_image", "image_url": url}) elif isinstance(part, PDFContent): if part.format == "base64": + mime = part.mime_type or "application/pdf" - url = f"data:{part.mime_type};base64,{part.value}" + url = f"data:{mime};base64,{part.value}" else: url = part.value items.append({"type": "input_file", "file_url": url})

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

elif isinstance(part, ImageContent):

if part.format == "base64":

url = f"data:{part.mime_type};base64,{part.value}"

else:

url = part.value

items.append({"type": "input_image", "image_url": url})

elif isinstance(part, PDFContent):

if part.format == "base64":

url = f"data:{part.mime_type};base64,{part.value}"

else:

url = part.value

items.append({"type": "input_file", "file_url": url})

elif isinstance(part, ImageContent):

if part.format == "base64":

mime = part.mime_type or "image/png"

url = f"data:{mime};base64,{part.value}"

else:

url = part.value

items.append({"type": "input_image", "image_url": url})

elif isinstance(part, PDFContent):

if part.format == "base64":

mime = part.mime_type or "application/pdf"

url = f"data:{mime};base64,{part.value}"

else:

url = part.value

items.append({"type": "input_file", "file_url": url})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/oai.py` around lines 49 - 61, The data URL builder in backend/app/services/llm/providers/oai.py (inside the branches handling ImageContent and PDFContent where part.format == "base64") uses part.mime_type which may be None, producing invalid URLs like "data:None;base64,..."; update the logic that constructs url so it uses a safe fallback MIME type (e.g., "application/octet-stream") when part.mime_type is falsy before interpolating, keeping the rest of the behavior and the same keys added to items ({"type":"input_image","image_url":...} and {"type":"input_file","file_url":...}) so other code consuming items remains unchanged.

resolve_image_content and resolve_pdf_content already sets default mime type so no need to resolve this in provider level. false alarm

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify that resolve_image_content and resolve_pdf_content set default MIME types cat backend/app/utils.py | grep -A 20 "def resolve_image_content" echo "---" cat backend/app/utils.py | grep -A 20 "def resolve_pdf_content"

Length of output: 1255

@vprashrex, you're absolutely right! I can confirm that resolve_image_content and resolve_pdf_content in backend/app/utils.py already ensure default MIME types are set:

# In resolve_image_content: if not c.mime_type: c.mime_type = "image/png" # In resolve_pdf_content: if not c.mime_type: c.mime_type = "application/pdf"

By the time the content reaches the provider level, mime_type is guaranteed to be non-None. My review comment was indeed a false alarm. Thank you for the clarification! 🙏

✏️ Learnings added

Learnt from: vprashrex Repo: ProjectTech4DevAI/kaapi-backend PR: 628 File: backend/app/services/llm/providers/oai.py:49-61 Timestamp: 2026-02-26T03:16:20.731Z Learning: In backend/app/utils.py, the functions resolve_image_content and resolve_pdf_content ensure that mime_type is always set with defaults ("image/png" for images, "application/pdf" for PDFs) before content reaches provider implementations. Provider code in backend/app/services/llm/providers/ can safely assume mime_type is never None.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

backend/app/utils.py

coderabbitai · 2026-02-26T02:29:28Z

backend/app/utils.py

+def resolve_input(
+    query_input,
+) -> tuple[str | list[ImageContent] | list[PDFContent] | "MultiModalInput", str | None]:


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -50 backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1603

🏁 Script executed:

sed -n '1,50p' backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1603

🏁 Script executed:

grep -n "from __future__" backend/app/utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 111

🏁 Script executed:

sed -n '486,540p' backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2637

🏁 Script executed:

sed -n '1,100p' backend/app/utils.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3417

Add explicit type annotation for query_input and remove the unnecessary quoted forward reference in the return type.

The query_input parameter lacks a type annotation, violating the coding guideline that requires type hints on all function parameters. Additionally, "MultiModalInput" does not need quotes in the return annotation since from __future__ import annotations is enabled.

Proposed fix

def resolve_input( - query_input, -) -> tuple[str | list[ImageContent] | list[PDFContent] | "MultiModalInput", str | None]: + query_input: TextInput | AudioInput | ImageInput | PDFInput | list[TextInput | AudioInput | ImageInput | PDFInput], +) -> tuple[str | list[ImageContent] | list[PDFContent] | MultiModalInput, str | None]:

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 488-488: Remove quotes from type annotation

Remove quotes

(UP037)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/utils.py` around lines 486 - 488, The function resolve_input is missing a parameter type for query_input and uses a quoted forward reference in its return annotation; update the signature to explicitly type query_input (e.g. Union[str, list[ImageContent], list[PDFContent], MultiModalInput]) and remove the quotes around MultiModalInput in the return type so the annotation reads tuple[str | list[ImageContent] | list[PDFContent] | MultiModalInput, str | None]; keep the existing import names (ImageContent, PDFContent, MultiModalInput) and adjust any imports if needed.

its just for type annotation, not going to have any effect on functionality wise

@vprashrex, you're absolutely right that type hints have no effect on runtime performance — Python remains a dynamically typed language and CPython ignores type annotations during execution.

While the runtime impact is zero, type hints do provide value through:

Static type checking with tools like mypy or pyright

Enhanced IDE autocomplete and error detection

Improved code readability and maintainability

Better documentation for other developers

That said, I understand your decision here. Feel free to keep the current implementation as-is!

…tter type management and update relevant classes to use it.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

backend/app/services/llm/providers/gai.py (3)

565-572: Remove unnecessary f-string prefix.

The warning message has no placeholders but uses an f-string prefix.

Suggested fix

         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 565 - 572, The
logger.warning call in GoogleAIProvider._execute_text uses an unnecessary
f-string prefix for a static message; remove the f prefix so the call becomes
logger.warning("[GoogleAIProvider._execute_text] Response missing
usage_metadata, using zeros") to avoid misleading formatting usage while leaving
the surrounding logic (setting input_tokens, output_tokens, total_tokens,
reasoning_tokens to 0) unchanged.

420-427: Remove unnecessary f-string prefix.

The warning message has no placeholders but uses an f-string prefix.

Suggested fix

         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_image] Response missing usage_metadata, using zeros"
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 420 - 427, The
logger.warning call in GoogleAIProvider._execute_image uses an unnecessary
f-string for a static message; replace the f-string literal passed to
logger.warning with a plain string literal (i.e., remove the leading "f") so the
call becomes logger.warning("[GoogleAIProvider._execute_image] Response missing
usage_metadata, using zeros"), leaving the surrounding logic that sets
input_tokens, output_tokens, total_tokens, and reasoning_tokens unchanged.

636-643: Remove unnecessary f-string prefix.

The warning message has no placeholders but uses an f-string prefix.

Suggested fix

         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using zeros"
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 636 - 643, In
GoogleAIProvider._execute_multimodal replace the logger.warning call that uses
an unnecessary f-string (currently f"[GoogleAIProvider._execute_multimodal]
Response missing usage_metadata, using zeros") with a plain string literal
(remove the leading f) so the message is logged without the redundant f-string
prefix; keep the rest of the block (setting input_tokens, output_tokens,
total_tokens, reasoning_tokens to 0) unchanged.

backend/app/models/llm/request.py (3)

119-126: Consider using X | Y syntax for type unions.

Per Python 3.10+ best practices (and PEP 604), the | operator is preferred over Union[] for type annotations. Since from __future__ import annotations isn't present here, this is optional but improves readability.

Suggested fix

-KaapiLLMParams = Union[
-    TextLLMParams,
-    STTLLMParams,
-    TTSLLMParams,
-    ImageLLMParams,
-    PDFLLMParams,
-    MultimodalLLMParams,
-]
+KaapiLLMParams = (
+    TextLLMParams
+    | STTLLMParams
+    | TTSLLMParams
+    | ImageLLMParams
+    | PDFLLMParams
+    | MultimodalLLMParams
+)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 119 - 126, Replace the
Union[...] alias KaapiLLMParams with the PEP 604 union operator form using |
(e.g., TextLLMParams | STTLLMParams | ...), and optionally add "from __future__
import annotations" at the top if you want to ensure forward-compatible
evaluation of annotations; update the KaapiLLMParams definition accordingly so
it references TextLLMParams, STTLLMParams, TTSLLMParams, ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams using the | operator.

59-116: Consider extracting common fields to reduce duplication.

ImageLLMParams, PDFLLMParams, and MultimodalLLMParams share identical fields (model, instructions, knowledge_base_ids, temperature, reasoning, max_num_results). Consider a base class to reduce duplication.

Example approach

class BaseVisionLLMParams(SQLModel):
    model: str
    instructions: str | None = None
    knowledge_base_ids: list[str] | None = Field(
        default=None,
        description="List of vector store IDs to use for knowledge retrieval",
    )
    temperature: float | None = Field(default=0.2, ge=0.0, le=2.0)
    reasoning: Literal["low", "medium", "high"] | None = None
    max_num_results: int | None = Field(
        default=None, ge=1,
        description="Maximum number of candidate results to return",
    )

class ImageLLMParams(BaseVisionLLMParams):
    pass

class PDFLLMParams(BaseVisionLLMParams):
    pass

class MultimodalLLMParams(BaseVisionLLMParams):
    pass

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 116, The three classes
ImageLLMParams, PDFLLMParams, and MultimodalLLMParams duplicate the same fields;
extract a shared base SQLModel (e.g., BaseVisionLLMParams) containing model,
instructions, knowledge_base_ids, temperature, reasoning, and max_num_results
and have ImageLLMParams, PDFLLMParams, and MultimodalLLMParams subclass that
base (or alias it) so the duplicated field definitions are removed and
maintained in one place.

2-2: Use built-in list instead of deprecated typing.List.

Static analysis correctly identifies that typing.List is deprecated in Python 3.9+. Use the built-in list type instead.

Suggested fix

-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal, Union

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, Replace the deprecated
typing.List import with the built-in list type: update the import line in
request.py to remove List from typing and update any type annotations that use
List[...] to use built-in list[...] (references: the import line containing
Annotated, Any, List, Literal, Union and any annotations in this module that use
List, e.g., List[str] -> list[str]); ensure no other typing.List usages remain
and run type checks to confirm correctness.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 58-97: When building payloads in the ImageContent and PDFContent
branches, part.mime_type can be None which may cause Gemini API rejects; update
both branches to coalesce part.mime_type into a safe default (e.g., mime =
part.mime_type or "application/octet-stream" or use a more specific default like
"image/png" for images or "application/pdf" for PDFs) and then use mime for the
"mime_type" field in the inline_data and file_data dicts (affecting the items
list entries, the "inline_data"/"file_data" payloads, and the
ImageContent/PDFContent handling in format_parts).

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Around line 119-126: Replace the Union[...] alias KaapiLLMParams with the PEP
604 union operator form using | (e.g., TextLLMParams | STTLLMParams | ...), and
optionally add "from __future__ import annotations" at the top if you want to
ensure forward-compatible evaluation of annotations; update the KaapiLLMParams
definition accordingly so it references TextLLMParams, STTLLMParams,
TTSLLMParams, ImageLLMParams, PDFLLMParams, and MultimodalLLMParams using the |
operator.
- Around line 59-116: The three classes ImageLLMParams, PDFLLMParams, and
MultimodalLLMParams duplicate the same fields; extract a shared base SQLModel
(e.g., BaseVisionLLMParams) containing model, instructions, knowledge_base_ids,
temperature, reasoning, and max_num_results and have ImageLLMParams,
PDFLLMParams, and MultimodalLLMParams subclass that base (or alias it) so the
duplicated field definitions are removed and maintained in one place.
- Line 2: Replace the deprecated typing.List import with the built-in list type:
update the import line in request.py to remove List from typing and update any
type annotations that use List[...] to use built-in list[...] (references: the
import line containing Annotated, Any, List, Literal, Union and any annotations
in this module that use List, e.g., List[str] -> list[str]); ensure no other
typing.List usages remain and run type checks to confirm correctness.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 565-572: The logger.warning call in GoogleAIProvider._execute_text
uses an unnecessary f-string prefix for a static message; remove the f prefix so
the call becomes logger.warning("[GoogleAIProvider._execute_text] Response
missing usage_metadata, using zeros") to avoid misleading formatting usage while
leaving the surrounding logic (setting input_tokens, output_tokens,
total_tokens, reasoning_tokens to 0) unchanged.
- Around line 420-427: The logger.warning call in
GoogleAIProvider._execute_image uses an unnecessary f-string for a static
message; replace the f-string literal passed to logger.warning with a plain
string literal (i.e., remove the leading "f") so the call becomes
logger.warning("[GoogleAIProvider._execute_image] Response missing
usage_metadata, using zeros"), leaving the surrounding logic that sets
input_tokens, output_tokens, total_tokens, and reasoning_tokens unchanged.
- Around line 636-643: In GoogleAIProvider._execute_multimodal replace the
logger.warning call that uses an unnecessary f-string (currently
f"[GoogleAIProvider._execute_multimodal] Response missing usage_metadata, using
zeros") with a plain string literal (remove the leading f) so the message is
logged without the redundant f-string prefix; keep the rest of the block
(setting input_tokens, output_tokens, total_tokens, reasoning_tokens to 0)
unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74e328e and 42a001d.

📒 Files selected for processing (6)

backend/app/models/llm/request.py
backend/app/services/llm/jobs.py
backend/app/services/llm/providers/base.py
backend/app/services/llm/providers/gai.py
backend/app/services/llm/providers/oai.py
backend/app/utils.py

coderabbitai · 2026-02-26T03:08:04Z

backend/app/services/llm/providers/gai.py

+            elif isinstance(part, ImageContent):
+                if part.format == "base64":
+                    items.append(
+                        {
+                            "inline_data": {
+                                "data": part.value,
+                                "mime_type": part.mime_type,
+                            }
+                        }
+                    )
+                else:
+                    items.append(
+                        {
+                            "file_data": {
+                                "file_uri": part.value,
+                                "mime_type": part.mime_type,
+                                "display_name": None,
+                            }
+                        }
+                    )
+            elif isinstance(part, PDFContent):
+                if part.format == "base64":
+                    items.append(
+                        {
+                            "inline_data": {
+                                "data": part.value,
+                                "mime_type": part.mime_type,
+                            }
+                        }
+                    )
+                else:
+                    items.append(
+                        {
+                            "file_data": {
+                                "file_uri": part.value,
+                                "mime_type": part.mime_type,
+                                "display_name": None,
+                            }
+                        }
+                    )


⚠️ Potential issue | 🟡 Minor

Handle potential None mime_type in format_parts.

Similar to the OpenAI provider, when building inline_data or file_data payloads, part.mime_type can be None. The Gemini API may reject requests with null mime_type values.

Suggested fix

elif isinstance(part, ImageContent): if part.format == "base64": + mime = part.mime_type or "image/png" items.append( { "inline_data": { "data": part.value, - "mime_type": part.mime_type, + "mime_type": mime, } } ) else: + mime = part.mime_type or "image/png" items.append( { "file_data": { "file_uri": part.value, - "mime_type": part.mime_type, + "mime_type": mime, "display_name": None, } } ) elif isinstance(part, PDFContent): if part.format == "base64": + mime = part.mime_type or "application/pdf" items.append( { "inline_data": { "data": part.value, - "mime_type": part.mime_type, + "mime_type": mime, } } ) else: + mime = part.mime_type or "application/pdf" items.append( { "file_data": { "file_uri": part.value, - "mime_type": part.mime_type, + "mime_type": mime, "display_name": None, } } )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 58 - 97, When building payloads in the ImageContent and PDFContent branches, part.mime_type can be None which may cause Gemini API rejects; update both branches to coalesce part.mime_type into a safe default (e.g., mime = part.mime_type or "application/octet-stream" or use a more specific default like "image/png" for images or "application/pdf" for PDFs) and then use mime for the "mime_type" field in the inline_data and file_data dicts (affecting the items list entries, the "inline_data"/"file_data" payloads, and the ImageContent/PDFContent handling in format_parts).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/app/tests/services/llm/test_multimodal.py (1)

28-50: Consider using factory pattern or fixtures for repeated test data.

Several tests create similar content objects (e.g., ImageContent(format="base64", value="abc", mime_type="image/png")). Using pytest fixtures or a factory function would reduce repetition and make tests easier to maintain.

As per coding guidelines: "Use factory pattern for test fixtures in backend/app/tests/"

💡 Example factory approach

# Add at top of file or in conftest.py
import pytest

`@pytest.fixture`
def text_content():
    return TextContent(value="hello")

`@pytest.fixture`  
def image_content_base64():
    return ImageContent(format="base64", value="abc", mime_type="image/png")

`@pytest.fixture`
def pdf_content_base64():
    return PDFContent(format="base64", value="abc", mime_type="application/pdf")

# Or use factory functions
def make_image_content(format="base64", value="abc", mime_type="image/png"):
    return ImageContent(format=format, value=value, mime_type=mime_type)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/llm/test_multimodal.py` around lines 28 - 50,
Several tests repeatedly construct similar objects (e.g.,
ImageContent(format="base64", value="abc", mime_type="image/png"),
PDFContent(...), TextContent(...)) — refactor by adding pytest fixtures or small
factory functions (e.g., image_content_base64, pdf_content_base64, text_content
or make_image_content) and replace inline constructions in tests like
TestValidateCompletionInput tests and the MultiModalInput setup; ensure tests
still call validate_completion_input with the fixture/factory-produced objects
(referencing ImageContent, PDFContent, TextContent, MultiModalInput,
validate_completion_input).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 115-117: Replace the broad exception check in the test with the
specific Pydantic validation error: change the pytest.raises(Exception) in
test_empty_parts_raises to pytest.raises(pydantic.ValidationError) (or import
ValidationError from pydantic and use that) so the test asserts that
MultiModalInput(parts=[]) triggers the model_validator validation failure;
ensure the test imports pydantic.ValidationError if not already present.

---

Nitpick comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 28-50: Several tests repeatedly construct similar objects (e.g.,
ImageContent(format="base64", value="abc", mime_type="image/png"),
PDFContent(...), TextContent(...)) — refactor by adding pytest fixtures or small
factory functions (e.g., image_content_base64, pdf_content_base64, text_content
or make_image_content) and replace inline constructions in tests like
TestValidateCompletionInput tests and the MultiModalInput setup; ensure tests
still call validate_completion_input with the fixture/factory-produced objects
(referencing ImageContent, PDFContent, TextContent, MultiModalInput,
validate_completion_input).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a001d and 335e59b.

📒 Files selected for processing (1)

backend/app/tests/services/llm/test_multimodal.py

coderabbitai · 2026-02-26T04:22:34Z

backend/app/tests/services/llm/test_multimodal.py

+    def test_empty_parts_raises(self):
+        with pytest.raises(Exception):
+            MultiModalInput(parts=[])


⚠️ Potential issue | 🟡 Minor

Use specific exception type instead of broad Exception.

MultiModalInput uses a Pydantic model_validator, so it raises pydantic.ValidationError when validation fails. Using Exception could mask unrelated errors.

🛠️ Proposed fix

+from pydantic import ValidationError + # ... def test_empty_parts_raises(self): - with pytest.raises(Exception): + with pytest.raises(ValidationError): MultiModalInput(parts=[])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_empty_parts_raises(self):

with pytest.raises(Exception):

MultiModalInput(parts=[])

def test_empty_parts_raises(self):

with pytest.raises(ValidationError):

MultiModalInput(parts=[])

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 116-116: Do not assert blind exception: Exception

(B017)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/services/llm/test_multimodal.py` around lines 115 - 117, Replace the broad exception check in the test with the specific Pydantic validation error: change the pytest.raises(Exception) in test_empty_parts_raises to pytest.raises(pydantic.ValidationError) (or import ValidationError from pydantic and use that) so the test asserts that MultiModalInput(parts=[]) triggers the model_validator validation failure; ensure the test imports pydantic.ValidationError if not already present.

backend/app/services/llm/providers/gai.py

kartpop · 2026-02-26T05:45:58Z

backend/app/models/llm/request.py

+    )
+
+
+KaapiLLMParams = Union[


let this be at the top of the file

It can’t be moved to the top because KaapiLLMParams depends on types that are defined above. Moving it up would cause unresolved reference issues.

kartpop · 2026-02-26T05:49:54Z

backend/app/services/llm/providers/gai.py

-                    None,
-                    f"Unsupported completion type '{completion_type}' for Google AI provider",
+
+            elif completion_type == "text":


I think completion_type should be text only - in text itself, we should be able to take any combination of
[text, list(image), list(pdf)] as input
-- separate completion types should not be there for all

…and completion type

coderabbitai

♻️ Duplicate comments (1)

backend/app/tests/services/llm/test_multimodal.py (1)

37-39: ⚠️ Potential issue | 🟡 Minor

Use specific ValidationError instead of broad Exception.

MultiModalInput uses a Pydantic model_validator, so validation failures raise pydantic.ValidationError. Using Exception could mask unrelated errors and is flagged by static analysis (B017).

Suggested fix

+from pydantic import ValidationError
+
 # ... at line 37-39 ...
     def test_empty_parts_raises(self):
-        with pytest.raises(Exception):
+        with pytest.raises(ValidationError):
             MultiModalInput(parts=[])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/services/llm/test_multimodal.py` around lines 37 - 39,
Update the test_empty_parts_raises unit test to assert the specific
pydantic.ValidationError instead of a broad Exception: change the pytest.raises
target from Exception to pydantic.ValidationError when instantiating
MultiModalInput(parts=[]). Locate the test method test_empty_parts_raises and
import or reference pydantic.ValidationError so the test fails only on
validation errors raised by the MultiModalInput model_validator.

🧹 Nitpick comments (5)

backend/app/models/llm/request.py (4)

2-2: Use built-in list instead of deprecated typing.List.

typing.List is deprecated in Python 3.9+. Since the codebase uses Python 3.11+, use the built-in list type directly.

Suggested fix

-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal, Union

Note: List appears unused in this file since all list annotations already use the built-in list syntax.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, The import currently brings in
typing.List which is deprecated and unused; update the import in
backend.app.models.llm.request by removing List from the from typing import ...
list so it only imports Annotated, Any, Literal, Union (or use built-in list
annotations where present), and run a quick search for any remaining references
to typing.List in this module to ensure no other changes are needed.

124-128: Use X | Y union syntax for QueryInput.

For consistency with Python 3.11+ conventions and other union types in this file.

Suggested fix

 # Discriminated union for query input types
 QueryInput = Annotated[
-    Union[TextInput, AudioInput, ImageInput, PDFInput],
+    TextInput | AudioInput | ImageInput | PDFInput,
     Field(discriminator="type"),
 ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 124 - 128, Update the
QueryInput typing to use Python 3.11+ pipe union syntax instead of Union[...]:
replace the Annotated[Union[TextInput, AudioInput, ImageInput, PDFInput],
Field(discriminator="type")] expression for QueryInput with Annotated[TextInput
| AudioInput | ImageInput | PDFInput, Field(discriminator="type")] so the
discriminated union uses the X | Y style while keeping the Annotated and
Field(discriminator="type") wrapper intact; ensure the symbol QueryInput is
updated accordingly and imports remain valid.

59-63: Use X | Y union syntax for type annotations.

Per Python 3.10+ and the codebase's Python 3.11+ requirement, prefer X | Y over Union[X, Y] for consistency with the rest of the file (e.g., str | None patterns used elsewhere).

Suggested fix

-KaapiLLMParams = Union[
-    TextLLMParams,
-    STTLLMParams,
-    TTSLLMParams,
-]
+KaapiLLMParams = TextLLMParams | STTLLMParams | TTSLLMParams

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 63, Replace the
Union[...] declaration for KaapiLLMParams with the modern pipe union syntax:
change the type alias KaapiLLMParams = Union[TextLLMParams, STTLLMParams,
TTSLLMParams] to use KaapiLLMParams = TextLLMParams | STTLLMParams |
TTSLLMParams so it matches the file's Python 3.11 style (consistent with other
annotations like str | None) and update any imports/comments if necessary.

87-91: Incomplete comments on mime_type fields.

Lines 87 and 97 have truncated comments (# keeping the mime_type) that don't convey meaningful information. Either complete the thought or remove the comments.

Suggested fix

-    # keeping the mime_type
     mime_type: str | None = Field(
         None,
         description="MIME type of the image (e.g., image/png, image/jpeg)",
     )

-    # keeping the mime_type
     mime_type: str | None = Field(
         None,
         description="MIME type of the PDF (e.g., application/pdf)",
     )

Also applies to: 97-101

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 87 - 91, The inline comment
"# keeping the mime_type" is incomplete for the mime_type Field(s) in the LLM
request model; update or remove those comments so they are meaningful: replace
the truncated comment with a short, clear description of why mime_type exists
(e.g., "MIME type of the image to preserve content type for downstream
processing/serialization") or simply delete the comment if redundant, and ensure
you update both occurrences associated with the mime_type Field definitions so
the intent is clear when inspecting the Field declarations.

backend/app/services/llm/providers/base.py (1)

72-76: Update docstring to reflect multimodal input support.

The docstring still describes resolved_input as "text string or file path for audio" but the signature now accepts str | list[ContentPart], which includes multimodal content parts.

Suggested fix

         Args:
             completion_config: LLM completion configuration, pass params as-is to provider API
             query: Query parameters including input and conversation_id
-            resolved_input: The resolved input content (text string or file path for audio)
+            resolved_input: The resolved input content (text string for text/audio, or list of content parts for multimodal)
             include_provider_raw_response: Whether to include the raw LLM provider response in the output

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/base.py` around lines 72 - 76, Update the
docstring for the method in backend/app/services/llm/providers/base.py to
reflect that resolved_input can be either a plain string or a list of multimodal
ContentPart objects (signature: resolved_input: str | list[ContentPart]), and
describe what a ContentPart represents (e.g., text, image, audio chunk, metadata
or file path) and how the provider should handle each part; keep the other
params (completion_config, query, include_provider_raw_response) descriptions
intact but clarify that when resolved_input is a list, the implementation should
iterate content parts and pass them to the provider API as appropriate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/app/tests/services/llm/test_multimodal.py`:
- Around line 37-39: Update the test_empty_parts_raises unit test to assert the
specific pydantic.ValidationError instead of a broad Exception: change the
pytest.raises target from Exception to pydantic.ValidationError when
instantiating MultiModalInput(parts=[]). Locate the test method
test_empty_parts_raises and import or reference pydantic.ValidationError so the
test fails only on validation errors raised by the MultiModalInput
model_validator.

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Line 2: The import currently brings in typing.List which is deprecated and
unused; update the import in backend.app.models.llm.request by removing List
from the from typing import ... list so it only imports Annotated, Any, Literal,
Union (or use built-in list annotations where present), and run a quick search
for any remaining references to typing.List in this module to ensure no other
changes are needed.
- Around line 124-128: Update the QueryInput typing to use Python 3.11+ pipe
union syntax instead of Union[...]: replace the Annotated[Union[TextInput,
AudioInput, ImageInput, PDFInput], Field(discriminator="type")] expression for
QueryInput with Annotated[TextInput | AudioInput | ImageInput | PDFInput,
Field(discriminator="type")] so the discriminated union uses the X | Y style
while keeping the Annotated and Field(discriminator="type") wrapper intact;
ensure the symbol QueryInput is updated accordingly and imports remain valid.
- Around line 59-63: Replace the Union[...] declaration for KaapiLLMParams with
the modern pipe union syntax: change the type alias KaapiLLMParams =
Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use KaapiLLMParams =
TextLLMParams | STTLLMParams | TTSLLMParams so it matches the file's Python 3.11
style (consistent with other annotations like str | None) and update any
imports/comments if necessary.
- Around line 87-91: The inline comment "# keeping the mime_type" is incomplete
for the mime_type Field(s) in the LLM request model; update or remove those
comments so they are meaningful: replace the truncated comment with a short,
clear description of why mime_type exists (e.g., "MIME type of the image to
preserve content type for downstream processing/serialization") or simply delete
the comment if redundant, and ensure you update both occurrences associated with
the mime_type Field definitions so the intent is clear when inspecting the Field
declarations.

In `@backend/app/services/llm/providers/base.py`:
- Around line 72-76: Update the docstring for the method in
backend/app/services/llm/providers/base.py to reflect that resolved_input can be
either a plain string or a list of multimodal ContentPart objects (signature:
resolved_input: str | list[ContentPart]), and describe what a ContentPart
represents (e.g., text, image, audio chunk, metadata or file path) and how the
provider should handle each part; keep the other params (completion_config,
query, include_provider_raw_response) descriptions intact but clarify that when
resolved_input is a list, the implementation should iterate content parts and
pass them to the provider API as appropriate.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 335e59b and 56c7a44.

📒 Files selected for processing (4)

backend/app/models/llm/request.py
backend/app/services/llm/jobs.py
backend/app/services/llm/providers/base.py
backend/app/tests/services/llm/test_multimodal.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/services/llm/jobs.py

kartpop · 2026-02-28T05:11:34Z

backend/app/services/llm/providers/gai.py

+        )
+        return llm_response, None
+
+    def _execute_text(


lets check if _execute_text, _execute_pdf, _execute_image can be combined

kartpop

approved with comments - please see if simplifications can be made

…CompletionConfig and related methods in GoogleAIProvider

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/crud/llm.py (1)

21-43: ⚠️ Potential issue | 🟡 Minor

serialize_input does not handle new input types.

The serialize_input function handles TextInput and AudioInput explicitly but falls back to str(query_input) for the new ImageInput, PDFInput, and list types. This would serialize them as their string representation, losing structured metadata that should be stored (like mime_type, size_bytes, etc.).

Consider extending serialize_input to handle these new types consistently with AudioInput:

🔧 Proposed fix

     elif isinstance(query_input, AudioInput):
         return json.dumps(
             {
                 "type": "audio",
                 "format": query_input.content.format,
                 "mime_type": query_input.content.mime_type,
                 # approximate byte size from b64encoded value
                 "size_bytes": len(query_input.content.value) * 3 // 4,
             }
         )
+    elif isinstance(query_input, ImageInput):
+        contents = query_input.content if isinstance(query_input.content, list) else [query_input.content]
+        return json.dumps(
+            {
+                "type": "image",
+                "items": [
+                    {
+                        "format": c.format,
+                        "mime_type": c.mime_type,
+                        "size_bytes": len(c.value) * 3 // 4 if c.format == "base64" else None,
+                    }
+                    for c in contents
+                ],
+            }
+        )
+    elif isinstance(query_input, PDFInput):
+        contents = query_input.content if isinstance(query_input.content, list) else [query_input.content]
+        return json.dumps(
+            {
+                "type": "pdf",
+                "items": [
+                    {
+                        "format": c.format,
+                        "mime_type": c.mime_type,
+                        "size_bytes": len(c.value) * 3 // 4 if c.format == "base64" else None,
+                    }
+                    for c in contents
+                ],
+            }
+        )
+    elif isinstance(query_input, list):
+        return json.dumps({"type": "multimodal", "count": len(query_input)})
     else:
         return str(query_input)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/llm.py` around lines 21 - 43, serialize_input currently only
handles TextInput and AudioInput and falls back to str(...) for ImageInput,
PDFInput and list inputs; update serialize_input to explicitly handle ImageInput
and PDFInput by returning a JSON object like the AudioInput branch containing
type ("image" or "pdf"), format, mime_type and an approximate size_bytes
(compute from base64 length same as audio), and handle list inputs (e.g.,
List[QueryInput] or similar) by serializing each element via serialize_input and
returning a JSON array string; update branches to reference the concrete types
ImageInput and PDFInput and ensure you reuse the same size_bytes calculation
used for AudioInput and call serialize_input recursively for list elements to
preserve structured metadata.

🧹 Nitpick comments (4)

backend/app/services/llm/providers/gai.py (2)

427-429: Remove unnecessary f-string prefix.

The log message has no placeholders, so the f prefix is unnecessary.

🧹 Proposed fix

         else:
             logger.warning(
-                f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
+                "[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 427 - 429, In
GoogleAIProvider._execute_text replace the unnecessary f-string on the
logger.warning call by removing the leading "f" so the static message is a plain
string; update the logger.warning invocation that currently logs
"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
to use a normal string literal without interpolation.

457-463: The query parameter is unused.

The query: QueryParams parameter was added to the execute method signature but is not used in the method body. If it's needed for API consistency with other providers, consider adding an underscore prefix (_query) to indicate it's intentionally unused, or remove it if not needed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 457 - 463, The
execute method signature for execute(self, completion_config:
NativeCompletionConfig, query: QueryParams, ...) declares query but never uses
it; rename the parameter to _query (i.e., execute(self, completion_config:
NativeCompletionConfig, _query: QueryParams, ...)) to signal it's intentionally
unused (or remove it if your provider interface doesn't require it), and update
any overriding implementations or callers to match the new name so type hints
(QueryParams) remain intact and no unused-variable linter warnings occur; keep
the rest of the signature and behavior unchanged.

backend/app/models/llm/request.py (2)

2-2: Remove unused List import.

List from typing is imported but not used in the file. The code correctly uses the built-in list[...] syntax throughout.

🧹 Proposed fix

-from typing import Annotated, Any, List, Literal, Union
+from typing import Annotated, Any, Literal, Union

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` at line 2, Remove the unused typing import
"List" from the import statement that currently reads "from typing import
Annotated, Any, List, Literal, Union" in request.py; keep the other imports
(Annotated, Any, Literal, Union) intact so the file uses the built-in list[...]
syntax without the unused List symbol.

59-63: Consider using | union syntax for modern Python.

The Union[] syntax can be replaced with the more concise | operator in Python 3.10+.

🧹 Proposed fix

-KaapiLLMParams = Union[
-    TextLLMParams,
-    STTLLMParams,
-    TTSLLMParams,
-]
+KaapiLLMParams = TextLLMParams | STTLLMParams | TTSLLMParams

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 59 - 63, Replace the legacy
typing.Union usage with Python 3.10+ union operator for KaapiLLMParams: change
the definition using Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use the
| operator (TextLLMParams | STTLLMParams | TTSLLMParams) and remove the
now-unneeded Union import if present; ensure the identifier KaapiLLMParams and
the referenced types TextLLMParams, STTLLMParams, TTSLLMParams are updated
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/providers/gai.py`:
- Around line 415-440: The _execute_text path must validate the GoogleAI
response like _execute_stt does: check that response.response_id is present and
response.text is not None before constructing TextContent/LLMResponse. Add a
guard in _execute_text that logs an error (or raises/returns a failed
LLMCallResponse) when response.response_id or response.text is missing,
mirroring the checks in _execute_stt, and only build
LLMCallResponse/LLMResponse/TextContent when those fields are valid to avoid
passing None into TextContent(value=...).
- Around line 403-405: In _execute_text, remove the contents.append call that
inserts {"role": "system", "parts": [{"text": instructions}]} and instead set
generation_kwargs["system_instruction"] = instructions so the system instruction
is passed via GenerateContentConfig; update the code paths that build
generation_kwargs and keep the rest of contents for user/assistant messages
unchanged to match the Gemini API behavior implemented in _execute_tts.

---

Outside diff comments:
In `@backend/app/crud/llm.py`:
- Around line 21-43: serialize_input currently only handles TextInput and
AudioInput and falls back to str(...) for ImageInput, PDFInput and list inputs;
update serialize_input to explicitly handle ImageInput and PDFInput by returning
a JSON object like the AudioInput branch containing type ("image" or "pdf"),
format, mime_type and an approximate size_bytes (compute from base64 length same
as audio), and handle list inputs (e.g., List[QueryInput] or similar) by
serializing each element via serialize_input and returning a JSON array string;
update branches to reference the concrete types ImageInput and PDFInput and
ensure you reuse the same size_bytes calculation used for AudioInput and call
serialize_input recursively for list elements to preserve structured metadata.

---

Nitpick comments:
In `@backend/app/models/llm/request.py`:
- Line 2: Remove the unused typing import "List" from the import statement that
currently reads "from typing import Annotated, Any, List, Literal, Union" in
request.py; keep the other imports (Annotated, Any, Literal, Union) intact so
the file uses the built-in list[...] syntax without the unused List symbol.
- Around line 59-63: Replace the legacy typing.Union usage with Python 3.10+
union operator for KaapiLLMParams: change the definition using
Union[TextLLMParams, STTLLMParams, TTSLLMParams] to use the | operator
(TextLLMParams | STTLLMParams | TTSLLMParams) and remove the now-unneeded Union
import if present; ensure the identifier KaapiLLMParams and the referenced types
TextLLMParams, STTLLMParams, TTSLLMParams are updated accordingly.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 427-429: In GoogleAIProvider._execute_text replace the unnecessary
f-string on the logger.warning call by removing the leading "f" so the static
message is a plain string; update the logger.warning invocation that currently
logs "[GoogleAIProvider._execute_text] Response missing usage_metadata, using
zeros" to use a normal string literal without interpolation.
- Around line 457-463: The execute method signature for execute(self,
completion_config: NativeCompletionConfig, query: QueryParams, ...) declares
query but never uses it; rename the parameter to _query (i.e., execute(self,
completion_config: NativeCompletionConfig, _query: QueryParams, ...)) to signal
it's intentionally unused (or remove it if your provider interface doesn't
require it), and update any overriding implementations or callers to match the
new name so type hints (QueryParams) remain intact and no unused-variable linter
warnings occur; keep the rest of the signature and behavior unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56c7a44 and 2c76a4a.

📒 Files selected for processing (3)

backend/app/crud/llm.py
backend/app/models/llm/request.py
backend/app/services/llm/providers/gai.py

coderabbitai · 2026-02-28T07:59:51Z

backend/app/services/llm/providers/gai.py

+        generation_kwargs = {}
+        if instructions:
+            contents.append({"role": "system", "parts": [{"text": instructions}]})


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Google Gemini API system_instruction parameter role system content

💡 Result:

In the Gemini API, you generally don’t put a {role: "system"} message inside contents (OpenAI-style). Instead, you pass your “system prompt” via the separate system_instruction / systemInstruction field (part of GenerateContentConfig in the SDKs, or top-level in REST). (ai.google.dev)

REST shape

{ "system_instruction": { "parts": [{ "text": "You are a cat. Your name is Neko." }] }, "contents": [ { "parts": [{ "text": "Hello there" }] } ] }

(ai.google.dev)

SDK notes (roles)

In the Gen AI SDKs, you typically set config.systemInstruction (JS) / config.system_instruction (Python) to a string. (ai.google.dev)

If you construct a Content object for the system instruction (e.g., Go), the docs’ sample uses role = user for that content (not "system"). (ai.google.dev)

Many integrations that accept OpenAI-style {role:"system"} messages simply extract that message and map it to Gemini’s system_instruction. (reference-server.pipecat.ai)

Citations:

1: https://ai.google.dev/gemini-api/docs/system-instructions

2: https://ai.google.dev/gemini-api/docs/system-instructions

3: https://ai.google.dev/gemini-api/docs/system-instructions

4: https://ai.google.dev/gemini-api/docs/system-instructions

5: https://reference-server.pipecat.ai/en/stable/_modules/pipecat/adapters/services/gemini_adapter.html?utm_source=openai

🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '400,410p' gai.py && echo "---" && sed -n '150,160p' gai.py && echo "---" && sed -n '270,280p' gai.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1180

🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '380,430p' gai.py | head -60

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2285

🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '380,440p' gai.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2696

🏁 Script executed:

cd backend/app/services/llm/providers && grep -n "system_instruction" gai.py | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 143

🏁 Script executed:

cd backend/app/services/llm/providers && sed -n '380,395p' gai.py && echo "---" && grep -n "logger\." gai.py | grep "_execute_text"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 836

Move system instruction to GenerateContentConfig parameter instead of contents array.

The Gemini API does not support role: "system" in the contents array. According to the Gemini API documentation, system instructions must be passed via the system_instruction parameter in GenerateContentConfig, not as a content message. This is already correctly implemented in _execute_tts (line 275) but not in _execute_text (line 405).

Update line 405 to pass instructions through config instead:

generation_kwargs["system_instruction"] = instructions

Remove the incorrect contents.append() line that adds role: "system".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 403 - 405, In _execute_text, remove the contents.append call that inserts {"role": "system", "parts": [{"text": instructions}]} and instead set generation_kwargs["system_instruction"] = instructions so the system instruction is passed via GenerateContentConfig; update the code paths that build generation_kwargs and keep the rest of contents for user/assistant messages unchanged to match the Gemini API behavior implemented in _execute_tts.

as per the gemini docs they do support the "role": "system"

🧩 Analysis chain

🌐 Web query:

Gemini API contents array role system support 2026

💡 Result:

As of March 1, 2026, the Gemini API contents array does not support a "system" role. In REST, each contents[] item is a conversation turn and the documented roles are "user" and "model" (they should typically alternate for multi‑turn chat). [1]

To provide “system” guidance, you set it outside the contents array using system_instruction (REST) / systemInstruction (SDK config). [2] In Vertex AI’s Gemini interface, the role inside systemInstruction is ignored. [3]

REST example (system instruction + user message):

{ "system_instruction": { "parts": [{ "text": "You are a helpful assistant." }] }, "contents": [ { "role": "user", "parts": [{ "text": "Hello!" }] } ] }

[2]

Sources: [1] [2] [3]

@vprashrex, I've verified the latest Gemini API documentation, and as of March 2026, the Gemini API contents array does not support a "system" role—the documented roles are "user" and "model".

System guidance must be set outside the contents array using the system_instruction parameter in the REST API or systemInstruction in SDK config.

Correct approach (matching _execute_tts at line 275):

generation_kwargs = {} if instructions: generation_kwargs["system_instruction"] = instructions if temperature is not None: generation_kwargs["temperature"] = temperature if thinking_level is not None: generation_kwargs["thinking_config"] = ThinkingConfig( include_thoughts=False, thinking_level=thinking_level )

Then remove line 405 that incorrectly appends {"role": "system"} to the contents array.

CodeRabbit's original finding is correct—the current implementation at line 405 uses an unsupported pattern that could cause API errors.

🧠 Learnings used

Learnt from: vprashrex Repo: ProjectTech4DevAI/kaapi-backend PR: 628 File: backend/app/services/llm/providers/oai.py:49-61 Timestamp: 2026-02-26T03:16:20.731Z Learning: Enforce that mime_type is always defaulted in resolve_image_content and resolve_pdf_content (in backend/app/utils.py) before content reaches provider implementations. Provider modules under backend/app/services/llm/providers/ (e.g., oai.py) can assume mime_type is non-None and should not rely on runtime None checks for mime_type. This reduces null-related errors and defines a clear contract between utils and providers. Consider adding explicit type hints or runtime guards if downstream changes could reintroduce None mime_type.

coderabbitai · 2026-02-28T07:59:52Z

backend/app/services/llm/providers/gai.py

+        response = self.client.models.generate_content(
+            model=model,
+            contents=contents,
+            config=GenerateContentConfig(**generation_kwargs),
+        )
+
+        if response.usage_metadata:
+            input_tokens = response.usage_metadata.prompt_token_count or 0
+            output_tokens = response.usage_metadata.candidates_token_count or 0
+            total_tokens = response.usage_metadata.total_token_count or 0
+            reasoning_tokens = response.usage_metadata.thoughts_token_count or 0
+        else:
+            logger.warning(
+                f"[GoogleAIProvider._execute_text] Response missing usage_metadata, using zeros"
+            )
+            input_tokens = 0
+            output_tokens = 0
+            total_tokens = 0
+            reasoning_tokens = 0
+
+        llm_response = LLMCallResponse(
+            response=LLMResponse(
+                provider_response_id=response.response_id,
+                model=response.model_version or model,
+                provider=completion_config.provider,
+                output=TextOutput(content=TextContent(value=response.text)),


⚠️ Potential issue | 🟡 Minor

Add response validation for consistency with other execute methods.

Unlike _execute_stt (lines 170-174), this method does not validate that response.response_id and response.text are present before using them. If response.text is None, creating TextContent(value=response.text) may fail or cause unexpected behavior.

🛡️ Proposed fix

response = self.client.models.generate_content( model=model, contents=contents, config=GenerateContentConfig(**generation_kwargs), ) + if not response.response_id: + return None, "Google AI response missing response_id" + + if not response.text: + return None, "Google AI response missing text content" + if response.usage_metadata:

🧰 Tools

🪛 Ruff (0.15.2)

[error] 428-428: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 415 - 440, The _execute_text path must validate the GoogleAI response like _execute_stt does: check that response.response_id is present and response.text is not None before constructing TextContent/LLMResponse. Add a guard in _execute_text that logs an error (or raises/returns a failed LLMCallResponse) when response.response_id or response.text is missing, mirroring the checks in _execute_stt, and only build LLMCallResponse/LLMResponse/TextContent when those fields are valid to avoid passing None into TextContent(value=...).

…ng parameter handling in Google params mapping tests

codecov · 2026-03-01T12:56:27Z

Codecov Report

❌ Patch coverage is 97.97297% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/crud/llm.py	40.00%	6 Missing ⚠️
backend/app/services/llm/providers/gai.py	94.54%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

…nAI/Google AI execution routing

…ig and update assertions to validate system_instruction in Google AI execution routing

…larify configuration parameters

vprashrex added 6 commits February 24, 2026 19:31

Enhance multimodal support: Add Image and PDF input types, update pro…

e977b26

…cessing logic

added integration for multimodal for both providers

cbfd85f

pretify the codes and added support for image, pdf and multimodal to …

1f3f13e

…input type

fixes to function nae and added the input_type

e5a9fa8

Add support for multimodal input types: Image, PDF, and enhance valid…

668e3fa

…ation

Enhance multimodal support: Allow None for instructions in Image, PDF…

74e328e

…, and Multimodal parameters; update validation to restrict audio input in multimodal processing.

vprashrex self-assigned this Feb 26, 2026

vprashrex requested a review from kartpop February 26, 2026 02:25

Merge branch 'main' into feature/multimodal-integration

2f13bd7

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Refactor multimodal input handling: Introduce ContentPart type for be…

42a001d

…tter type management and update relevant classes to use it.

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Add comprehensive tests for multimodal input validation and processing

335e59b

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

vprashrex requested a review from Prajna1999 February 26, 2026 04:44

kartpop reviewed Feb 26, 2026

View reviewed changes

backend/app/services/llm/providers/gai.py Show resolved Hide resolved

kartpop reviewed Feb 26, 2026

View reviewed changes

removed multimodal, image and pdf llmparams and from both input type …

56c7a44

…and completion type

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

added the table reference for image, pdf and multimodal

54076f3

vprashrex requested a review from kartpop February 27, 2026 05:27

kartpop reviewed Feb 28, 2026

View reviewed changes

kartpop approved these changes Feb 28, 2026

View reviewed changes

Prajna1999 changed the title ~~Feature/multimodal integration~~ Multimodal: Providers Extended to handle image/pdf/multimodal flows Feb 28, 2026

Prajna1999 approved these changes Feb 28, 2026

View reviewed changes

Remove completion_type for image, pdf, and multimodal types in Native…

2c76a4a

…CompletionConfig and related methods in GoogleAIProvider

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

Refactor credential patching in LLM provider tests and update reasoni…

2ed9af9

…ng parameter handling in Google params mapping tests

vprashrex added 3 commits March 1, 2026 19:50

Add tests for edge cases in multimodal input handling and enhance Ope…

2b265db

…nAI/Google AI execution routing

replaced role: "system"

fdb82b0

Rename test_instructions_appended to test_instructions_passed_to_conf…

a00bb6d

…ig and update assertions to validate system_instruction in Google AI execution routing

vprashrex added enhancement New feature or request ready-for-review labels Mar 2, 2026

vprashrex added this to Kaapi-dev Mar 2, 2026

vprashrex moved this to In Review in Kaapi-dev Mar 2, 2026

vprashrex linked an issue Mar 2, 2026 that may be closed by this pull request

Multimodal: Add Support for Image, PDF & Mixed Content for AI Assessment #636

Closed

Enhance LLM API documentation to support multimodal input types and c…

ad2e1be

…larify configuration parameters

vprashrex merged commit 2dc5949 into main Mar 2, 2026
3 checks passed

vprashrex deleted the feature/multimodal-integration branch March 2, 2026 08:20

github-project-automation bot moved this from In Review to Closed in Kaapi-dev Mar 2, 2026

coderabbitai bot mentioned this pull request Mar 5, 2026

LLM Chain: Add foundation for chain execution with database schema #616

Merged

2 tasks

Conversation

vprashrex commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vprashrex Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vprashrex Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kartpop Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

vprashrex Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

kartpop Feb 26, 2026

Choose a reason for hiding this comment

vprashrex commented Feb 26, 2026 •

edited

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

coderabbitai bot Feb 26, 2026 •

edited

Loading

coderabbitai bot Feb 26, 2026 •

edited

Loading

coderabbitai bot Feb 28, 2026 •

edited

Loading

codecov bot commented Mar 1, 2026 •

edited

Loading