Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Vllm whisper model response_format verbose_json not working #14818

Open
deepakkumar07-debug opened this issue Mar 14, 2025 · 2 comments
Labels
usage How to use vllm

Comments

@deepakkumar07-debug
Copy link

deepakkumar07-debug commented Mar 14, 2025

My current environment

I'm running Dockerfile.cpu, and added just these installation at line number 44. Since I'm using whisper model.

# install optional dependencies like librosa
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install librosa && \
    pip install vllm[audio,video]==0.7.3

and I'm serving the VLLm using docker command like below

docker run -d --restart=unless-stopped --name vllm-whisper-api  \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<MY_TOKEN>" \
    -p 4001:8000 \
    --ipc=host \
    vllm-cpu-inference \
    --model openai/whisper-small \
    --task transcription \
    --host 0.0.0.0 --port 8000

I'm testing the whisper model audio file, with text its working, with json same output only, but with verbose_json I'm getting error.

import requests

with open("audio-samples/audio.wav", "rb") as audio_file:
    response = requests.post("http://localhost:4001/v1/audio/transcriptions",
                             files={"file": audio_file},
                             data={"model": "openai/whisper-small",
                                   "language": "en",
                                   # "response_format": "json",
                                   # "response_format": "text",
                                   # "stream": True
                                   "response_format": "verbose_json",
                                   "timestamp_granularities[]": ["word", "segment"]
                                   # "timestamp_granularities[]": ["segment"]
                                   }
                             )
print("Transcription:", response.text)

Output

Transcription: {"object":"error","message":"Currently only support response_format `text` or `json`","type":"BadRequestError","param":null,"code":400}
@deepakkumar07-debug deepakkumar07-debug added the usage How to use vllm label Mar 14, 2025
@NickLucche
Copy link
Contributor

verbose_json output format has not yet been implemented

@deepakkumar07-debug
Copy link
Author

deepakkumar07-debug commented Mar 14, 2025

I'm going to use this in my server using GPU. may i know when it will be implemented. because with Transformer + whisper model im getting timestamp for token wise. since im going to deploy for production workloads that's why buddy...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants