-
Notifications
You must be signed in to change notification settings - Fork 1
Alibaba Integration
Access 100+ Qwen models through Alibaba Cloud DashScope
Alibaba Cloud integration provides access to Qwen language models via DashScope:
- 25+ Qwen Models - Flagship (Max, Plus, Flash), specialized (Coder, Math, VL, Omni)
- OpenAI-Compatible API - Direct compatibility with OpenAI Python SDK
- Multi-Region Support - Singapore (International) and Beijing (China) endpoints
- Automatic Failover - Integrated into provider failover chain
- Cost-Effective - Competitive pricing on quality models
- Sign up: https://www.alibabacloud.com/
- Navigate to Model Studio: https://dashscope.aliyuncs.com/
- Create or retrieve API key from dashboard
- Copy the key
Add to your .env file:
# Required: Main API key
ALIBABA_CLOUD_API_KEY=your-dashscope-api-key-here
# Optional: Region-specific keys (if you have separate keys)
ALIBABA_CLOUD_API_KEY_INTERNATIONAL=dashscope-intl-key
ALIBABA_CLOUD_API_KEY_CHINA=dashscope-cn-key# Check logs for successful provider loading
✓ Loaded alibaba_cloud provider client| Model | Context | Use Case | Pricing ($/1M tokens) |
|---|---|---|---|
| qwen-flash | 1M | Fast, cost-effective | $0.001 / $0.003 |
| qwen-plus | 1M | Balanced performance | $0.005 / $0.015 |
| qwen-max | 262K | Most powerful | $0.012 / $0.036 |
| qwen-coder | 262K | Code generation | $0.008 / $0.024 |
| qwen-long | 10M | Document processing | $0.001 / $0.003 |
| Model | Context | Specialty |
|---|---|---|
| qwq-plus | 262K | Advanced reasoning for math and code |
| qwq-32b-preview | 262K | 32B parameter reasoning model |
| Model | Context | Features |
|---|---|---|
| qwen-omni | - | Multimodal (text, image, audio, video) |
| qwen-vl | - | Vision and language understanding |
| qwen-math | - | Mathematics problem-solving |
| qwen-mt | - | Translation (92 languages) |
-
qwen-3-30b-a3b-instruct- 30B instruction model -
qwen-3-80b-a3b-instruct- 80B instruction model -
qwen-3-30b-a3b-thinking- 30B with thinking mode -
qwen-3-80b-a3b-thinking- 80B with thinking mode
-
qwen-2.5-72b-instruct- Enhanced 72B model -
qwen-2.5-7b-instruct- Efficient 7B model
-
qwen-2-72b-instruct- Stable 72B baseline -
qwen-2-7b-instruct- Stable 7B baseline
-
qwen-1.5-72b-chat- Legacy 72B -
qwen-1.5-14b-chat- Legacy 14B
import requests
url = "http://localhost:8000/v1/chat/completions"
headers = {"Authorization": "Bearer YOUR_GATEWAYZ_API_KEY"}
payload = {
"model": "qwen-plus",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain quantum computing"}
],
"temperature": 0.7,
"max_tokens": 1000
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())from openai import OpenAI
# Both URL formats work
client = OpenAI(
api_key="YOUR_GATEWAYZ_API_KEY",
base_url="http://localhost:8000" # SDK appends /v1 automatically
)
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "user", "content": "Write a Python function to sort a list"}
]
)
print(response.choices[0].message.content)response = client.chat.completions.create(
model="qwen-flash",
messages=[{"role": "user", "content": "Write a haiku about AI"}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")# Using org/model format
payload = {
"model": "qwen/qwen-max", # Automatically routes to Alibaba Cloud
"messages": [{"role": "user", "content": "Hello"}]
}response = client.chat.completions.create(
model="qwen-coder",
messages=[
{
"role": "user",
"content": "Write a FastAPI endpoint for user registration with email validation"
}
],
temperature=0.3 # Lower temperature for code generation
)response = client.chat.completions.create(
model="qwq-plus",
messages=[
{
"role": "user",
"content": "Solve: If x^2 + 5x + 6 = 0, what are the values of x?"
}
]
)The gateway automatically detects Qwen models based on:
-
Pattern Matching: Models starting with
qwen/oralibaba-cloud/ - Model Name Mapping: Direct lookups in transformation table
- Failover Support: Falls back to alternative providers if needed
User Input → Provider → DashScope Model ID
qwen-plus → alibaba-cloud → qwen-plus
qwen/qwen-max → alibaba-cloud → qwen-max
alibaba-cloud/qwen-coder → alibaba-cloud → qwen-coder
qwen-3-30b → alibaba-cloud → qwen-3-30b-a3b-instruct
| Variable | Required | Description |
|---|---|---|
ALIBABA_CLOUD_API_KEY |
Yes | Main DashScope API key |
ALIBABA_CLOUD_API_KEY_INTERNATIONAL |
No | Singapore endpoint key (optional) |
ALIBABA_CLOUD_API_KEY_CHINA |
No | Beijing endpoint key (optional) |
Singapore (International):
https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Beijing (Mainland China):
https://dashscope.aliyuncs.com/compatible-mode/v1
Default: Singapore endpoint is used unless configured otherwise in alibaba_cloud_client.py
Pricing per 1M tokens:
| Model | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|
| qwen-flash | 0.001 | 0.003 | 1M |
| qwen-plus | 0.005 | 0.015 | 1M |
| qwen-max | 0.012 | 0.036 | 262K |
| qwen-coder | 0.008 | 0.024 | 262K |
| qwq-plus | 0.020 | 0.060 | 262K |
| qwen-long | 0.001 | 0.003 | 10M |
Pricing is defined in src/data/manual_pricing.json and automatically applied.
Alibaba Cloud is integrated into the failover chain:
Priority Order:
1. huggingface
2. featherless
3. vercel-ai-gateway
4. aihubmix
5. anannas
6. alibaba-cloud ← Your provider
7. fireworks
8. together
9. google-vertex
10. openrouter
Auto-Retry: If Alibaba Cloud returns 502/503/504, gateway automatically tries next provider.
Alibaba Cloud requests subject to:
- Per-user limits
- Per-API-key limits
- System-wide limits
Configure via rate_limits table in database.
Issue: "Authorization failed" or "Invalid API key"
Solution:
- Verify
ALIBABA_CLOUD_API_KEYis set correctly - Check API key hasn't expired
- Confirm you're using a valid DashScope API key
- Test API key directly:
curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"qwen-plus","messages":[{"role":"user","content":"test"}]}'
Issue: Provider returns 503 errors
Behavior: Gateway automatically falls back to next provider in chain
Debug:
- Check Alibaba Cloud service status: https://www.alibabacloud.com/
- Monitor logs for failover events
- Verify region endpoint is accessible
Issue: "Model xyz not found"
Solution:
- Verify model is available in your region (Singapore vs Beijing)
- Check model ID is correctly mapped in
model_transformations.py - Try using a different region endpoint in
alibaba_cloud_client.py - List available models:
# Check model catalog response = requests.get("http://localhost:8000/models") models = [m for m in response.json()["data"] if "qwen" in m["id"].lower()]
Issue: Request times out
Solution:
- Default timeout is 30 seconds
- Increase timeout for Alibaba Cloud:
request_timeout = PROVIDER_TIMEOUTS.get("alibaba-cloud", 30)
- Use faster models (qwen-flash instead of qwen-max)
- Reduce max_tokens in request
Issue: Provider not loaded on startup
Solution: Check logs for:
⚠ Failed to load alibaba_cloud provider client: ImportError: ...Verify dependencies:
pip install openai httpxEdit src/services/alibaba_cloud_client.py:
# Singapore (International)
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
# Beijing (Mainland China)
base_url = "https://dashscope.aliyuncs.com/compatible-mode/v1"Add custom headers if needed:
def get_alibaba_cloud_client():
return OpenAI(
base_url=base_url,
api_key=Config.ALIBABA_CLOUD_API_KEY,
default_headers={
"X-DashScope-SSE": "enable", # Enable streaming
"X-Custom-Header": "value"
}
)response = client.chat.completions.create(
model="qwen-plus",
messages=[...],
temperature=0.7,
top_p=0.9,
max_tokens=2000,
frequency_penalty=0.5,
presence_penalty=0.5
)# Look for successful provider loading
✓ Loaded alibaba_cloud provider client# Request routing logs
Transformed model ID from 'qwen-plus' to 'qwen-plus' for provider alibaba-cloud# Missing key error
ValueError: Alibaba Cloud API key not configured-
src/services/alibaba_cloud_client.py (NEW)
- Core provider integration
- Functions:
get_alibaba_cloud_client(),make_alibaba_cloud_request_openai(),process_alibaba_cloud_response(),make_alibaba_cloud_request_openai_stream()
-
src/config/config.py (MODIFIED)
- Added
ALIBABA_CLOUD_API_KEYconfiguration - Added region-specific key support
- Added
-
src/routes/chat.py (MODIFIED)
- Provider imports and registration
- Request routing for streaming/non-streaming
-
src/services/model_transformations.py (MODIFIED)
- Alibaba Cloud model ID mappings
- Provider detection for Qwen patterns
-
src/services/provider_failover.py (MODIFIED)
- Added to fallback priority chain
-
src/data/manual_pricing.json (MODIFIED)
- Qwen model pricing data
Request:
{
"model": "qwen-plus",
"messages": [
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello"}
],
"temperature": 0.7,
"max_tokens": 1000
}Response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "qwen-plus",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm here to help..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 100,
"total_tokens": 120
}
}Model responses automatically cached by Redis when available, reducing latency for repeated requests.
OpenAI client maintains connection pools internally. No additional configuration needed.
For high-volume requests, consider batching multiple requests:
import asyncio
async def batch_requests(prompts):
tasks = [
client.chat.completions.create(
model="qwen-flash",
messages=[{"role": "user", "content": p}]
)
for p in prompts
]
return await asyncio.gather(*tasks)- Use qwen-flash for high-throughput, low-cost workloads
- Use qwen-plus for balanced performance and cost
- Use qwen-max for complex reasoning tasks
- Use qwen-coder for code generation
- Use qwq-plus for mathematical reasoning
- Set appropriate temperature (0.3 for code, 0.7-0.9 for creative)
- Monitor usage via pricing audit system
- Handle rate limits with exponential backoff
- Integration Guide - Add new providers
- Provider Failover - Automatic failover
- Pricing System - Token-based pricing
- Model Health System - Monitor model status
Last Updated: December 2024 Status: Production Ready
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References