Alibaba Integration

Alibaba Cloud Integration

Access 100+ Qwen models through Alibaba Cloud DashScope

Overview

Alibaba Cloud integration provides access to Qwen language models via DashScope:

25+ Qwen Models - Flagship (Max, Plus, Flash), specialized (Coder, Math, VL, Omni)
OpenAI-Compatible API - Direct compatibility with OpenAI Python SDK
Multi-Region Support - Singapore (International) and Beijing (China) endpoints
Automatic Failover - Integrated into provider failover chain
Cost-Effective - Competitive pricing on quality models

Quick Setup

1. Get API Key

Sign up: https://www.alibabacloud.com/
Navigate to Model Studio: https://dashscope.aliyuncs.com/
Create or retrieve API key from dashboard
Copy the key

2. Configure Environment

Add to your .env file:

# Required: Main API key
ALIBABA_CLOUD_API_KEY=your-dashscope-api-key-here

# Optional: Region-specific keys (if you have separate keys)
ALIBABA_CLOUD_API_KEY_INTERNATIONAL=dashscope-intl-key
ALIBABA_CLOUD_API_KEY_CHINA=dashscope-cn-key

3. Verify Configuration

# Check logs for successful provider loading
✓ Loaded alibaba_cloud provider client

Supported Models

Commercial Models (Recommended)

Model	Context	Use Case	Pricing ($/1M tokens)
qwen-flash	1M	Fast, cost-effective	$0.001 / $0.003
qwen-plus	1M	Balanced performance	$0.005 / $0.015
qwen-max	262K	Most powerful	$0.012 / $0.036
qwen-coder	262K	Code generation	$0.008 / $0.024
qwen-long	10M	Document processing	$0.001 / $0.003

Reasoning Models

Model	Context	Specialty
qwq-plus	262K	Advanced reasoning for math and code
qwq-32b-preview	262K	32B parameter reasoning model

Specialized Models

Model	Context	Features
qwen-omni	-	Multimodal (text, image, audio, video)
qwen-vl	-	Vision and language understanding
qwen-math	-	Mathematics problem-solving
qwen-mt	-	Translation (92 languages)

Series Models

Qwen 3 Series (Latest)

qwen-3-30b-a3b-instruct - 30B instruction model
qwen-3-80b-a3b-instruct - 80B instruction model
qwen-3-30b-a3b-thinking - 30B with thinking mode
qwen-3-80b-a3b-thinking - 80B with thinking mode

Qwen 2.5 Series

qwen-2.5-72b-instruct - Enhanced 72B model
qwen-2.5-7b-instruct - Efficient 7B model

Qwen 2 Series

qwen-2-72b-instruct - Stable 72B baseline
qwen-2-7b-instruct - Stable 7B baseline

Qwen 1.5 Series (Legacy)

qwen-1.5-72b-chat - Legacy 72B
qwen-1.5-14b-chat - Legacy 14B

Usage Examples

Basic Chat Completion

import requests

url = "http://localhost:8000/v1/chat/completions"
headers = {"Authorization": "Bearer YOUR_GATEWAYZ_API_KEY"}

payload = {
    "model": "qwen-plus",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Using OpenAI SDK

from openai import OpenAI

# Both URL formats work
client = OpenAI(
    api_key="YOUR_GATEWAYZ_API_KEY",
    base_url="http://localhost:8000"  # SDK appends /v1 automatically
)

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "user", "content": "Write a Python function to sort a list"}
    ]
)

print(response.choices[0].message.content)

Streaming Responses

response = client.chat.completions.create(
    model="qwen-flash",
    messages=[{"role": "user", "content": "Write a haiku about AI"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

With Organization Prefix

# Using org/model format
payload = {
    "model": "qwen/qwen-max",  # Automatically routes to Alibaba Cloud
    "messages": [{"role": "user", "content": "Hello"}]
}

Code Generation with Qwen Coder

response = client.chat.completions.create(
    model="qwen-coder",
    messages=[
        {
            "role": "user",
            "content": "Write a FastAPI endpoint for user registration with email validation"
        }
    ],
    temperature=0.3  # Lower temperature for code generation
)

Math Problem Solving

response = client.chat.completions.create(
    model="qwq-plus",
    messages=[
        {
            "role": "user",
            "content": "Solve: If x^2 + 5x + 6 = 0, what are the values of x?"
        }
    ]
)

Model Selection and Routing

The gateway automatically detects Qwen models based on:

Pattern Matching: Models starting with qwen/ or alibaba-cloud/
Model Name Mapping: Direct lookups in transformation table
Failover Support: Falls back to alternative providers if needed

Automatic Detection

User Input                  → Provider      → DashScope Model ID
qwen-plus                  → alibaba-cloud → qwen-plus
qwen/qwen-max              → alibaba-cloud → qwen-max
alibaba-cloud/qwen-coder   → alibaba-cloud → qwen-coder
qwen-3-30b                 → alibaba-cloud → qwen-3-30b-a3b-instruct

Configuration

Environment Variables

Variable	Required	Description
`ALIBABA_CLOUD_API_KEY`	Yes	Main DashScope API key
`ALIBABA_CLOUD_API_KEY_INTERNATIONAL`	No	Singapore endpoint key (optional)
`ALIBABA_CLOUD_API_KEY_CHINA`	No	Beijing endpoint key (optional)

Region Endpoints

Singapore (International):

https://dashscope-intl.aliyuncs.com/compatible-mode/v1

Beijing (Mainland China):

https://dashscope.aliyuncs.com/compatible-mode/v1

Default: Singapore endpoint is used unless configured otherwise in alibaba_cloud_client.py

Pricing

Pricing per 1M tokens:

Model	Input ($/1M)	Output ($/1M)	Context
qwen-flash	0.001	0.003	1M
qwen-plus	0.005	0.015	1M
qwen-max	0.012	0.036	262K
qwen-coder	0.008	0.024	262K
qwq-plus	0.020	0.060	262K
qwen-long	0.001	0.003	10M

Pricing is defined in src/data/manual_pricing.json and automatically applied.

Failover Behavior

Alibaba Cloud is integrated into the failover chain:

Priority Order:
1. huggingface
2. featherless
3. vercel-ai-gateway
4. aihubmix
5. anannas
6. alibaba-cloud      ← Your provider
7. fireworks
8. together
9. google-vertex
10. openrouter

Auto-Retry: If Alibaba Cloud returns 502/503/504, gateway automatically tries next provider.

Rate Limiting

Alibaba Cloud requests subject to:

Per-user limits
Per-API-key limits
System-wide limits

Configure via rate_limits table in database.

Troubleshooting

401 Unauthorized

Issue: "Authorization failed" or "Invalid API key"

Solution:

Verify ALIBABA_CLOUD_API_KEY is set correctly
Check API key hasn't expired
Confirm you're using a valid DashScope API key

Test API key directly:

curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen-plus","messages":[{"role":"user","content":"test"}]}'

503 Service Unavailable

Issue: Provider returns 503 errors

Behavior: Gateway automatically falls back to next provider in chain

Debug:

Check Alibaba Cloud service status: https://www.alibabacloud.com/
Monitor logs for failover events
Verify region endpoint is accessible

Model Not Found

Issue: "Model xyz not found"

Solution:

Verify model is available in your region (Singapore vs Beijing)
Check model ID is correctly mapped in model_transformations.py
Try using a different region endpoint in alibaba_cloud_client.py

List available models:

# Check model catalog
response = requests.get("http://localhost:8000/models")
models = [m for m in response.json()["data"] if "qwen" in m["id"].lower()]

Timeout Issues

Issue: Request times out

Solution:

Default timeout is 30 seconds

Increase timeout for Alibaba Cloud:

request_timeout = PROVIDER_TIMEOUTS.get("alibaba-cloud", 30)

Use faster models (qwen-flash instead of qwen-max)
Reduce max_tokens in request

Import/Loading Errors

Issue: Provider not loaded on startup

Solution: Check logs for:

⚠ Failed to load alibaba_cloud provider client: ImportError: ...

Verify dependencies:

pip install openai httpx

Advanced Configuration

Switching Regions

Edit src/services/alibaba_cloud_client.py:

# Singapore (International)
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

# Beijing (Mainland China)
base_url = "https://dashscope.aliyuncs.com/compatible-mode/v1"

Custom Headers

Add custom headers if needed:

def get_alibaba_cloud_client():
    return OpenAI(
        base_url=base_url,
        api_key=Config.ALIBABA_CLOUD_API_KEY,
        default_headers={
            "X-DashScope-SSE": "enable",  # Enable streaming
            "X-Custom-Header": "value"
        }
    )

Model-Specific Parameters

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[...],
    temperature=0.7,
    top_p=0.9,
    max_tokens=2000,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

Monitoring

Check Provider Status

# Look for successful provider loading
✓ Loaded alibaba_cloud provider client

Model Transformation Logs

# Request routing logs
Transformed model ID from 'qwen-plus' to 'qwen-plus' for provider alibaba-cloud

API Key Configuration

# Missing key error
ValueError: Alibaba Cloud API key not configured

Integration Points

Files Modified/Added

src/services/alibaba_cloud_client.py (NEW)
- Core provider integration
- Functions: get_alibaba_cloud_client(), make_alibaba_cloud_request_openai(), process_alibaba_cloud_response(), make_alibaba_cloud_request_openai_stream()
src/config/config.py (MODIFIED)
- Added ALIBABA_CLOUD_API_KEY configuration
- Added region-specific key support
src/routes/chat.py (MODIFIED)
- Provider imports and registration
- Request routing for streaming/non-streaming
src/services/model_transformations.py (MODIFIED)
- Alibaba Cloud model ID mappings
- Provider detection for Qwen patterns
src/services/provider_failover.py (MODIFIED)
- Added to fallback priority chain
src/data/manual_pricing.json (MODIFIED)
- Qwen model pricing data

API Response Format

Request:

{
  "model": "qwen-plus",
  "messages": [
    {"role": "system", "content": "You are helpful"},
    {"role": "user", "content": "Hello"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000
}

Response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen-plus",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm here to help..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 100,
    "total_tokens": 120
  }
}

Performance Optimization

Caching

Model responses automatically cached by Redis when available, reducing latency for repeated requests.

Connection Pooling

OpenAI client maintains connection pools internally. No additional configuration needed.

Batch Processing

For high-volume requests, consider batching multiple requests:

import asyncio

async def batch_requests(prompts):
    tasks = [
        client.chat.completions.create(
            model="qwen-flash",
            messages=[{"role": "user", "content": p}]
        )
        for p in prompts
    ]
    return await asyncio.gather(*tasks)

Best Practices

Use qwen-flash for high-throughput, low-cost workloads
Use qwen-plus for balanced performance and cost
Use qwen-max for complex reasoning tasks
Use qwen-coder for code generation
Use qwq-plus for mathematical reasoning
Set appropriate temperature (0.3 for code, 0.7-0.9 for creative)
Monitor usage via pricing audit system
Handle rate limits with exponential backoff

Reference Links

Last Updated: December 2024 Status: Production Ready

Home

Reading Path (start here, in order)

Testing

Security & Access

Billing

Monitoring

Features

Providers

Operations

Data References

Alibaba Integration

Alibaba Cloud Integration

Overview

Quick Setup

1. Get API Key

2. Configure Environment

3. Verify Configuration

Supported Models

Commercial Models (Recommended)

Reasoning Models

Specialized Models

Series Models

Qwen 3 Series (Latest)

Qwen 2.5 Series

Qwen 2 Series

Qwen 1.5 Series (Legacy)

Usage Examples

Basic Chat Completion

Using OpenAI SDK

Streaming Responses

With Organization Prefix

Code Generation with Qwen Coder

Math Problem Solving

Model Selection and Routing

Automatic Detection

Configuration

Environment Variables

Region Endpoints

Pricing

Failover Behavior

Rate Limiting

Troubleshooting

401 Unauthorized

503 Service Unavailable

Model Not Found

Timeout Issues

Import/Loading Errors

Advanced Configuration

Switching Regions

Custom Headers

Model-Specific Parameters

Monitoring

Check Provider Status

Model Transformation Logs

API Key Configuration

Integration Points

Files Modified/Added

API Response Format

Performance Optimization

Caching

Connection Pooling

Batch Processing

Best Practices

Related Documentation

Reference Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!