[Usage]: Vllm whisper model response_format verbose_json not working #14818

deepakkumar07-debug · 2025-03-14T12:43:17Z

My current environment

I'm running Dockerfile.cpu, and added just these installation at line number 44. Since I'm using whisper model.

# install optional dependencies like librosa
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install librosa && \
    pip install vllm[audio,video]==0.7.3

and I'm serving the VLLm using docker command like below

docker run -d --restart=unless-stopped --name vllm-whisper-api  \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<MY_TOKEN>" \
    -p 4001:8000 \
    --ipc=host \
    vllm-cpu-inference \
    --model openai/whisper-small \
    --task transcription \
    --host 0.0.0.0 --port 8000

I'm testing the whisper model audio file, with text its working, with json same output only, but with verbose_json I'm getting error.

import requests

with open("audio-samples/audio.wav", "rb") as audio_file:
    response = requests.post("http://localhost:4001/v1/audio/transcriptions",
                             files={"file": audio_file},
                             data={"model": "openai/whisper-small",
                                   "language": "en",
                                   # "response_format": "json",
                                   # "response_format": "text",
                                   # "stream": True
                                   "response_format": "verbose_json",
                                   "timestamp_granularities[]": ["word", "segment"]
                                   # "timestamp_granularities[]": ["segment"]
                                   }
                             )
print("Transcription:", response.text)

Output

Transcription: {"object":"error","message":"Currently only support response_format `text` or `json`","type":"BadRequestError","param":null,"code":400}

The text was updated successfully, but these errors were encountered:

NickLucche · 2025-03-14T12:50:37Z

verbose_json output format has not yet been implemented

deepakkumar07-debug · 2025-03-14T12:55:04Z

I'm going to use this in my server using GPU. may i know when it will be implemented. because with Transformer + whisper model im getting timestamp for token wise. since im going to deploy for production workloads that's why buddy...

deepakkumar07-debug added the usage label Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Vllm whisper model response_format verbose_json not working #14818

[Usage]: Vllm whisper model response_format verbose_json not working #14818

deepakkumar07-debug commented Mar 14, 2025 •

edited

Loading

NickLucche commented Mar 14, 2025

deepakkumar07-debug commented Mar 14, 2025 •

edited

Loading

[Usage]: Vllm whisper model response_format verbose_json not working #14818

[Usage]: Vllm whisper model response_format verbose_json not working #14818

Comments

deepakkumar07-debug commented Mar 14, 2025 • edited Loading

My current environment

NickLucche commented Mar 14, 2025

deepakkumar07-debug commented Mar 14, 2025 • edited Loading

deepakkumar07-debug commented Mar 14, 2025 •

edited

Loading

deepakkumar07-debug commented Mar 14, 2025 •

edited

Loading