Skip to content

novita: fix Wan 2.7 R2V media item types to match upstream enum#9

Merged
duanbing merged 1 commit into
mainfrom
novita/wan-r2v-media-types
May 20, 2026
Merged

novita: fix Wan 2.7 R2V media item types to match upstream enum#9
duanbing merged 1 commit into
mainfrom
novita/wan-r2v-media-types

Conversation

@duanbing
Copy link
Copy Markdown

Summary

  • R2V was sending {type:"image"|"video"} in each media[] item; Novita's enum is reference_image | reference_video | first_frame. Upstream rejected every R2V request with "failed to exec task".
  • Repack now emits reference_image / reference_video from the legacy flat image_urls+video_urls shape.
  • media added to the R2V allowed-fields whitelist so direct API callers can submit the rich shape (including first_frame and per-item reference_voice) verbatim. The repack block is skipped when media is already present.
  • Synthesised media array truncated at 5 to match Novita's combined-items cap.

Test plan

  • Re-run a Wan 2.7 R2V generation from the playground (legacy image_urls+video_urls flow) and confirm Novita accepts the call (status moves past failed to exec task).
  • Curl POST with a rich media: [{type:"first_frame", url}, ...] body and confirm pass-through.

The Wan 2.7 R2V (`/v3/async/wan2.7-r2v`) endpoint requires each item
in the `media` array to carry a `type` value from the enum:
  - `reference_image`
  - `reference_video`
  - `first_frame`

We were sending `image` and `video`, which Novita rejects with the
generic "failed to exec task" 500 — every R2V submission via the
playground / legacy `image_urls`+`video_urls` shape was failing
silently for that reason.

Two changes in `build_body`:

1. Repack each `image_urls[]` URL as `{type: "reference_image", url}`
   and each `video_urls[]` URL as `{type: "reference_video", url}`.
   No way to express `first_frame` or per-item `reference_voice`
   from the legacy flat shape — callers who want those use the new
   pass-through path below.

2. Pass `media` through the allowed-fields whitelist for the R2V
   shape so direct API callers / a future media-editor UI can
   submit the rich shape (`[{type, url, reference_voice?}, ...]`)
   verbatim. The `!body.contains_key("media")` guard in the repack
   block ensures the pass-through wins when both shapes are present.

Also cap the synthesised `media` array at 5 items to match Novita's
documented ceiling (combined images+videos ≤ 5), so users who upload
more get a deterministic truncate-from-front rather than a 422.
@github-actions
Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the Contributor License Agreement (CLA) and hereby sign the CLA.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@duanbing duanbing merged commit 048539e into main May 20, 2026
6 of 7 checks passed
@duanbing duanbing deleted the novita/wan-r2v-media-types branch May 20, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant