Skip to content

Fix: deepseek nvidia multimodal#24857

Closed
ezzy1630 wants to merge 3 commits intoanomalyco:devfrom
ezzy1630:fix/deepseek-nvidia-multimodal
Closed

Fix: deepseek nvidia multimodal#24857
ezzy1630 wants to merge 3 commits intoanomalyco:devfrom
ezzy1630:fix/deepseek-nvidia-multimodal

Conversation

@ezzy1630
Copy link
Copy Markdown

Issue for this PR

Closes #

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Please provide a description of the issue, the changes you made to fix it, and why they work. It is expected that you understand why your changes work and if you do not understand why at least say as much so a maintainer knows how much to value the PR.

When a user attaches images to a message and the selected model is text-only (e.g. DeepSeek V4 Pro on NVIDIA NIM), unsupportedParts() correctly replaces each image with an error-text part. The problem is that the user message then has two text parts — the original text and the "ERROR: Cannot read..." replacement. Because there's more than one part, the @ai-sdk/openai-compatible SDK serialises content as a JSON array instead of a plain string. NVIDIA's Python backend does a str.join() over the content and throws: Internal server error: sequence item 4: expected str instance, list found. The fix adds mergeTextParts() in transform.ts, which runs after unsupportedParts(). For text-only models (no image/audio/video/pdf input capability), it collapses any all-text content array into a single text part so the SDK emits "content": "..." (scalar) instead of an array. Multimodal models are untouched. Also adds DeepSeek V4 Flash and V4 Pro to the NVIDIA section of the test fixture, both released 2026-04-24 and previously missing.

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

How did you verify your code works?

Added two unit tests in test/provider/transform.test.ts — one confirming the merge happens for text-only models with an attached image, one confirming multimodal models are unaffected. All 141 existing transform tests pass and the full typecheck passes.

Screenshots / recordings

N/A — no UI changes.
If this is a UI change, please include a screenshot or recording.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

If you do not follow this template your PR will be automatically rejected.

…backends

When a user attaches images to a message sent to a text-only model (e.g.
DeepSeek V4 Pro on NVIDIA NIM), unsupportedParts() converts each image to
an error-text part. The resulting user message content is then an array of
two or more text objects, which the @ai-sdk/openai-compatible SDK serialises
as a JSON array. NVIDIA's Python backend passes that array through a str.join()
call and raises "sequence item N: expected str instance, list found".

Fix: after unsupportedParts runs, mergeTextParts() collapses all-text-part
arrays into a single text part for text-only models. The SDK then emits
"content": "..." (scalar) instead of the array form, which every backend
handles correctly.

Also adds DeepSeek V4 Flash and V4 Pro to the NVIDIA section of the test
fixture so the models are discoverable in unit tests.
@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • No issue referenced. Please add Closes #<number> linking to the relevant issue.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Copy Markdown
Contributor

Hey! Your PR title Fix/deepseek nvidia multimodal doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search results, I found one potentially related PR:

Related PR:

This PR is related to the same DeepSeek V4 models on NVIDIA NIM, though it appears to address a different aspect (chat_template_kwargs injection). It's worth reviewing to ensure these changes don't conflict or duplicate efforts around DeepSeek V4 NVIDIA support.

However, PR #24857 (the current PR) appears to be the only one specifically addressing the text-part merging issue for text-only models with attached images on NVIDIA NIM.

No direct duplicate PRs found for the core fix (mergeTextParts for content serialization).

@ezzy1630 ezzy1630 changed the title Fix/deepseek nvidia multimodal Fix: deepseek nvidia multimodal Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

@github-actions github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Apr 29, 2026
@github-actions github-actions Bot closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant