# Qwen3-ASR
Set up a vLLM OpenAI service with the Qwen3-ASR-0.6B model, download the "asr_en.wav" audio file, and then transcribe this audio file using the local vLLM OpenAI service.

## Install Dependencies

Install the necessary Python packages, including qwen_asr and vllm, which are required to run the model and the OpenAI service.


In [None]:
!pip install qwen-asr flash-attn

## Download Test Audio File

The next step is to download the audio file as specified in the main task, before setting up the vLLM OpenAI service and transcribing the audio.



In [1]:
# Download the audio file
!wget -O csgo.wav https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR/csgo.wav

--2026-02-18 11:49:21--  https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR/csgo.wav
Resolving qianwen-res.oss-cn-beijing.aliyuncs.com (qianwen-res.oss-cn-beijing.aliyuncs.com)... 8.141.181.139
Connecting to qianwen-res.oss-cn-beijing.aliyuncs.com (qianwen-res.oss-cn-beijing.aliyuncs.com)|8.141.181.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1214508 (1.2M) [audio/wav]
Saving to: ‘csgo.wav’


2026-02-18 11:49:23 (1.13 MB/s) - ‘csgo.wav’ saved [1214508/1214508]



In [2]:
from IPython.display import Audio

Audio('csgo.wav')

## Download the Model

Download the Qwen/Qwen3-ASR-0.6B model using huggingface-cli to a local directory named `Qwen3-ASR-0.6B`.

In [None]:
!hf download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B

## Transcribe

I will import the necessary libraries, define the model and audio paths, initialize the ASR model with the specified parameters, transcribe the audio with timestamps, and then print the transcription.



In [None]:
import torch
from qwen_asr import Qwen3ASRModel

# Define the path to the downloaded Qwen3-ASR-0.6B model
model_path = 'Qwen3-ASR-0.6B'

# Initialize the ASR model use vLLM backend
# asr_model = Qwen3ASRModel.LLM(
#     model_path,
#     max_inference_batch_size=128
# )

# Initialize the ASR model use Transformers backend
asr_model = Qwen3ASRModel.from_pretrained(
    model_path,
    dtype=torch.bfloat16,
    device_map="cuda:0",
    max_inference_batch_size=128,
)

print("ASR model loaded successfully.")

# Define the path to the downloaded audio file
audio_file_path = 'csgo.wav'

# Transcribe the audio file with timestamps
transcription = asr_model.transcribe(
    audio_file_path,
    language = None
)

# Print the results
print("Transcription successful:")
result = transcription[0]
print(f"Language: {result.language}")
print(f"Text: {result.text}")

Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.


ASR model loaded successfully.
Transcription successful:
Language: English
Text: A historic moment for both these teams. The history of Kerrigan is one of the greatest in-game leaders. The history of Simple is one of the greatest players. Oh, look at how they're doing it. Robs and Twist, they're pushing out. They're ready to fight. They're actually so deep. They're not ready for it. Bit getting caught at the bottom of the stairs. Imagine making that call in the thirtieth round. How's your hand not shaking when you make that push to take that fight out in the open rain and Kerrigan, the only two people at the B bomb site. They're going to be coming for it. Only twenty seconds left. Navy, they don't have enough time for this. There's a huge defense all of a sudden. The window is closed. Broke is here to shut it down. He wants the overtime.


## Start the vLLM OpenAI API server

nest-asyncio 用于在 Colab 的异步环境中运行服务，pyngrok 用于外网映射。

In [None]:
!pip install -U vllm --pre \
    --extra-index-url https://wheels.vllm.ai/nightly/cu129

!pip install vllm[audio] nest-asyncio pyngrok

In [None]:
# 卸载并重新安装特定兼容版本的 protobuf
!pip install "protobuf<4.0.0" --force-reinstall

In [1]:
import os


# 确保指向 GPU
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

# 屏蔽 TF 所有的无用日志
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'


直接使用 vLLM 的命令行入口点。为了防止单次运行卡死，建议在后台运行

In [1]:
# 开启异步
import nest_asyncio
nest_asyncio.apply()

# 使用 nohup 启动，并将所有输出记录到 vllm.log
!pkill -f vllm
!nohup vllm serve \
    --model ./Qwen3-ASR-0.6B \
    --trust-remote-code \
    --dtype float16 \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8000 \
    > vllm.log 2>&1 &


你还可以通过 qwen-asr-serve 命令启动一个 vLLM 服务器，该命令是对 vllm serve 的封装。

In [10]:
!qwen-asr-serve Qwen3-ASR-0.6B \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8000 \
    > vllm.log 2>&1 &

## Transcribe Audio via OpenAI API (Requests)

Make a `POST` request to the local vLLM OpenAI API endpoint, sending the `csgo.wav` audio file for transcription and printing the result.

In [None]:
from openai import OpenAI
import base64
from qwen_asr import parse_asr_output

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

# 读取本地音频文件并编码
with open("test_audio.wav", "rb") as f:
    audio_base64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="Qwen/Qwen3-ASR-1.7B",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "请转录这段音频内容"},
            {"type": "audio_url", "audio_url": {"url": f"data:audio/wav;base64,{audio_base64}"}}
        ]
    }]
)

# Extract and parse the transcription content
content = response["choices"][0]["message"]["content"]
language, text = parse_asr_output(content)

# Print the results
print("Transcription successful:")
print(f"Language: {language}")
print(f"Text: {text}")