Skip to content

Provider Grok

hovhannes@picsart.com edited this page Jun 19, 2026 · 4 revisions

Grok

Modes: video · image · audio · Models: 7

Vendor: xAI · Official API docs: docs.x.ai

Grok Imagine (by xAI) is a multi-mode family on a single API: a fast image-to-video model with start-frame and native audio, image generation in standard and a higher-fidelity Quality tier, and the Grok TTS voice model. Video covers text-to-video, image-to-video, edit, and extend; image generation is synchronous while video is async.

Models

id Name Input type
grok-imagine-video Grok t2v
grok-edit-video Grok Edit Video v2v
grok-extend-video Grok Extend Video v2v
grok-imagine-image Grok Imagine t2i
grok-imagine-image-quality Grok Imagine Quality t2i
grok-tts Grok TTS tts
grok-imagine-video-1.5 Grok Imagine 1.5 i2v

CLI

# text-to-video with native audio
gen-ai generate -m grok-imagine-video \
  -p "a neon hovercar drifting through a rainy cyberpunk alley, cinematic" \
  --ar 16:9 -r 720p -d 6

# image-to-video from a start image
gen-ai generate -m grok-imagine-video -p "slow push-in, drifting fog" -i ./still.jpg

# restyle / continue an existing clip
gen-ai generate -m grok-edit-video -p "anime style repaint" --video ./clip.mp4
gen-ai generate -m grok-extend-video -p "the camera keeps flying forward" --video ./clip.mp4

# high-fidelity image generation
gen-ai generate -m grok-imagine-image-quality \
  -p "a porcelain teapot on a marble counter, soft window light" --ar 1:1 -r 2k -n 4

# text-to-speech
gen-ai generate -m grok-tts -p "Welcome to the Picsart AI Playground." --voice eve

MCP

{ "name": "picsart_generate",
  "arguments": {
    "model": "grok-imagine-video",
    "prompt": "a neon hovercar drifting through a rainy cyberpunk alley",
    "aspectRatio": "16:9",
    "resolution": "720p",
    "duration": 6
  } }
{ "name": "picsart_generate",
  "arguments": {
    "model": "grok-imagine-image-quality",
    "prompt": "a porcelain teapot on a marble counter, soft window light",
    "aspectRatio": "1:1",
    "resolution": "2k",
    "count": 4
  } }
{ "name": "picsart_generate",
  "arguments": {
    "model": "grok-tts",
    "prompt": "Welcome to the Picsart AI Playground.",
    "voiceId": "eve"
  } }

Parameters

Full parameter surface for every model, sourced from gen-ai models info <id> --json. CLI flags show the primary short form; the canonical --kebab-case long form always works too.

grok-imagine-video — Grok

Try grok-imagine-video in Playground ↗

Input type: t2v

Param CLI flag Type Values
prompt -p text required
aspectRatio --ar enum 16:9 · 9:16 · 1:1 · 4:3 · 3:4 · 3:2 · 2:3 (default 16:9)
resolution -r enum 480p · 720p (default 720p)
duration -d enum 3 · 5 · 6 · 8 · 10 · 12 · 15 (default 6)
imageUrls -i file image (up to 1)

grok-edit-video — Grok Edit Video

Try grok-edit-video in Playground ↗

Input type: v2v

Param CLI flag Type Values
prompt -p text required
videoUrl --video file required video

grok-extend-video — Grok Extend Video

Try grok-extend-video in Playground ↗

Input type: v2v

Param CLI flag Type Values
prompt -p text required
duration -d enum 3 · 5 · 6 · 8 · 10 (default 6)
videoUrl --video file required video

grok-imagine-image — Grok Imagine

Try grok-imagine-image in Playground ↗

Input type: t2i

Param CLI flag Type Values
prompt -p text required
aspectRatio --ar enum 1:1 · 16:9 · 9:16 · 4:3 · 3:4 · 3:2 · 2:3 · 2:1 · 1:2 · 19.5:9 · 9:19.5 · 20:9 · 9:20 (default 1:1)
resolution -r enum 1k · 2k (default 1k)
count -n enum 1 · 2 · 4 (default 1)
imageUrls -i file image (up to 1)

grok-imagine-image-quality — Grok Imagine Quality

Try grok-imagine-image-quality in Playground ↗

Input type: t2i

Param CLI flag Type Values
prompt -p text required
aspectRatio --ar enum 1:1 · 16:9 · 9:16 · 4:3 · 3:4 · 3:2 · 2:3 · 2:1 · 1:2 · 19.5:9 · 9:19.5 · 20:9 · 9:20 (default 1:1)
resolution -r enum 1k · 2k (default 2k)
count -n enum 1 · 2 · 4 (default 1)
imageUrls -i file image (up to 1)

grok-tts — Grok TTS

Try grok-tts in Playground ↗

Input type: tts

Param CLI flag Type Values
language --language text free text
accent --accent text free text
prompt -p text required (≤15000 chars)
voiceId --voice enum eve (Eve) · ara (Ara) · rex (Rex) · sal (Sal) · leo (Leo) (default eve)

grok-imagine-video-1.5 — Grok Imagine 1.5

Try grok-imagine-video-1.5 in Playground ↗

Input type: i2v

Param CLI flag Type Values
prompt -p text required
aspectRatio --ar enum 16:9 · 9:16 · 1:1 · 4:3 · 3:4 · 3:2 · 2:3 (default 16:9)
resolution -r enum 480p · 720p (default 720p)
duration -d enum 3 · 5 · 6 · 8 · 10 · 12 · 15 (default 8)
imageUrls -i file required image (up to 1)

Notes: grok-imagine-video also backs image-to-video (pass -i); the edit / extend variants take a --video input and retain the source duration.

Pricing

gen-ai pricing grok-imagine-video -d 6 -r 720p

Video cost scales with duration (priced per second); image cost scales with the Quality vs. standard tier and count; TTS is priced per character.

Clone this wiki locally