# Deploy a model on SageMaker Endpoint with SGLang container using AWS Python API (boto3)


In this notebook we will deploy `Qwen/Qwen3-VL-30B-A3B-Thinking` model on Amazon SageMaker AI Endpoint using SGLang container.

## Qwen3-VL-30B-A3B-Instruct
- **Parameters**: 31B (Mixture of Experts)
- **Instance Type**: ml.g6.12xlarge
- **GPUs**: 4 NVIDIA L40S Tensor Core GPUs with 192 GB of total GPU memory (48 GB of memory per GPU)
- **Highlights**: High accuracy for complex visual reasoning. Medical imaging, scientific research, advanced OCR.

## Model Capabilities 
- Advanced spatial perception (2D/3D reasoning)
- Multi-language OCR (32 languages)
- Visual agent functionality
- Video understanding with timestamps
- Visual coding generation
- Context Length: 256K tokens (expandable to 1M)

In [None]:
%pip install sagemaker --upgrade --quiet --no-warn-conflicts

In [22]:
import json
import boto3
import sagemaker
import time
from IPython.display import display, Markdown

role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # bucket to house artifacts
region = sess._region_name  # region name of the current SageMaker Studio environment
account_id = sess.account_id()

sm_client = boto3.client("sagemaker")  # client to intreract with SageMaker
smr_client = boto3.client("sagemaker-runtime")  # client to intreract with SageMaker Endpoints
s3_client = boto3.client("s3")

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")
print(f"boto3 version: {boto3.__version__}")
print(f"sagemaker version: {sagemaker.__version__}")

sagemaker role arn: arn:aws:iam::736221153822:role/SageMaker-ServiceRole-Default
sagemaker bucket: sagemaker-us-east-1-736221153822
sagemaker session region: us-east-1
boto3 version: 1.40.64
sagemaker version: 2.254.1


## Container

In [None]:
inference_image = f"{account_id}.dkr.ecr.us-east-1.amazonaws.com/sglang:v0.5.4"

instance = {"type": "ml.g6e.12xlarge", "num_gpu": 4}

model_id = "Qwen/Qwen3-VL-30B-A3B-Thinking"
model_name = sagemaker.utils.name_from_base("model-sgl", short=True)
endpoint_config_name = model_name
endpoint_name = model_name

health_check_timeout = 600

env = {
    "OPTION_MODEL": model_id,
    "OPTION_CONTEXT_LENGTH": "32768",
    "OPTION_TENSOR_PARALLEL_SIZE": json.dumps(instance["num_gpu"]),
    "OPTION_TOOL_CALL_PARSER": "qwen",
    "OPTION_REASONING_PARSER": "qwen3",
}

### Model -> Endpoint Config -> Endpoint

In [None]:
create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = {
        "Image": inference_image,
        "Environment": env,
    }
)
model_arn = create_model_response["ModelArn"]
print(f"Created Model: {model_arn}")

In [None]:
endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants = [
        {
            "VariantName": "alltraffic",
            "ModelName": model_name,
            "InstanceType": instance["type"],
            "InitialInstanceCount": 1,
            "ContainerStartupHealthCheckTimeoutInSeconds": health_check_timeout,
            "RoutingConfig": {
                'RoutingStrategy': 'LEAST_OUTSTANDING_REQUESTS'
            },
        },
    ],
)

In [None]:
create_endpoint_response = sm_client.create_endpoint(
    EndpointName = endpoint_name, EndpointConfigName = endpoint_config_name
)
sess.wait_for_endpoint(endpoint_name)

## Inference Test

### Text inference

In [23]:
payload={
    "model": model_id,
    "messages": [
        {"role": "user", "content": "Name popular places to visit in London?"}
    ],
}

start_time = time.time()
res = smr_client.invoke_endpoint(EndpointName = endpoint_name,
                                 Body = json.dumps(payload),
                                 ContentType = "application/json")
response = json.loads(res["Body"].read().decode("utf8")) 
end_time = time.time()

print(f"‚úÖ Response time: {end_time-start_time:.2f}s")
display(Markdown(response["choices"][0]["message"]["content"]))

‚úÖ Response time: 11.49s


Here's a curated list of **must-see popular places in London**, blending iconic landmarks, rich history, culture, and unique experiences. These are consistently ranked among the top attractions for first-time visitors:

### üè∞ 1. **The Tower of London**  
   *Why visit?* A UNESCO World Heritage site housing the **Crown Jewels**, a 1,000-year-old fortress, and grim history (including the Tower's role as a prison). Don‚Äôt miss the **Beefeater tours** and the **Crown Jewels display**.  
   *Tip:* Book tickets online to skip queues; allow 3‚Äì4 hours.

### üèõÔ∏è 2. **Buckingham Palace**  
   *Why visit?* The monarch‚Äôs official London residence. Witness the **Changing of the Guard** (daily, 11 AM; check for updates). See the State Rooms (April‚ÄìOct only).  
   *Tip:* Visit on a **free day** (Mon‚ÄìFri, 2:30 PM) for entry to the State Rooms.

### üåâ 3. **Tower Bridge**  
   *Why visit?* An engineering marvel! Walk across the **glass walkway** for panoramic city views, or explore the **Bridge Exhibition** on the history of the bridge.  
   *Tip:* Visit at sunset for stunning photos with the city lights.

### üó∫Ô∏è 4. **The British Museum**  
   *Why visit?* Home to **8 million artifacts**, including the Rosetta Stone, Parthenon sculptures, and Egyptian mummies. **Free entry** (exhibitions may charge).  
   *Tip:* Focus on key exhibits (e.g., Room 60 for the Rosetta Stone) to avoid overwhelm.

### üèØ 5. **Westminster Abbey**  
   *Why visit?* The coronation church for British monarchs. A masterpiece of Gothic architecture with royal burials (Newton, Darwin) and stunning stained glass.  
   *Tip:* Book tickets in advance; wear comfy shoes (it‚Äôs a large, ancient building).

### üé≠ 6. **West End Theatres**  
   *Why visit?* The global hub of **Broadway-style musicals and plays**. See *The Lion King*, *Wicked*, or *Hamilton* at venues like the **Palace Theatre** or **His Majesty‚Äôs Theatre**.  
   *Tip:* Book tickets early (use **TodayTix** for deals); matinee shows are cheaper.

### üåÜ 7. **The London Eye**  
   *Why visit?* Iconic 135m Ferris wheel on the South Bank. Offers **breathtaking 360¬∞ views** of the city (including Big Ben, St. Paul‚Äôs, and the Thames).  
   *Tip:* Book "early bird" tickets for shorter lines; avoid weekends if possible.

### üñºÔ∏è 8. **National Gallery (Trafalgar Square)**  
   *Why visit?* One of the world‚Äôs greatest art collections, with works by Van Gogh, da Vinci, and Turner. **Free entry** (exhibitions may charge).  
   *Tip:* Join a free **15-minute tour** at 11 AM daily.

### üåø 9. **Hyde Park & Kensington Gardens**  
   *Why visit?* London‚Äôs largest royal park. Explore **Kensington Palace** (royal residence), **The Serpentine Lake**, or relax by the **Nelson Monument**.  
   *Tip:* Rent a rowboat on the Serpentine or visit **Peter Pan statue** in Kensington Gardens.

### üåâ 10. **South Bank (Thames Walk)**  
   *Why visit?* A vibrant riverside strip with **street performers**, art galleries, and **Shakespeare‚Äôs Globe Theatre**. Walk from the London Eye to the Tate Modern.  
   *Tip:* Grab a pie at **Pret a Manger** or enjoy sunset views at **Borough Market**.

---

### üí° **Pro Tips for Your Visit**  
- **Transport:** Use the **Oyster card** or contactless payment for the Tube (London Underground).  
- **Time Management:** Prioritize 2‚Äì3 major attractions per day‚ÄîLondon‚Äôs sights are spread out!  
- **Free Attractions:** Many museums (British Museum, National Gallery) are free; explore neighborhoods like **Camden Market** or **Notting Hill** without spending.  
- **Avoid Crowds:** Visit popular sites **early morning** (e.g., Tower of London at 9 AM) or on weekdays.  
- **Hidden Gems:** Add **Camden Market** (alternative culture) or **St. Paul‚Äôs Cathedral** (dome views) for deeper local flavor.

> London‚Äôs magic lies in its mix of **history, diversity, and energy**. Whether you‚Äôre exploring royal palaces, soaking in street art, or sipping tea in a traditional pub, every corner has a story. üåü  
> *Need a tailored itinerary? Let me know your interests (history, food, shopping) for a personalized plan!*

### Image Understanding

Let's ask what is displayed on the image below

In [24]:
from IPython.display import Image as IPyImage
IPyImage(url="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg", height=400, width=400)

In [26]:
# üß™ Test 2: Image Understanding
print("üñºÔ∏è Testing Image Understanding...")

image_request = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What do you see in this image? Describe it in detail."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
                    }
                }
            ]
        }
    ],
    "temperature": 0.7
}

# Measure inference time
start_time = time.time()
res = smr_client.invoke_endpoint(EndpointName = endpoint_name,
                                 Body = json.dumps(image_request),
                                 ContentType = "application/json")
response = json.loads(res["Body"].read().decode("utf8")) 
end_time = time.time()

# Display results
print(f"‚úÖ Image Understanding Test Completed")
print(f"   Response Time: {end_time - start_time:.2f} seconds")
print()

# Render the response
display(Markdown("**Image Analysis:**"))
display(Markdown(response["choices"][0]["message"]["content"]))

üñºÔ∏è Testing Image Understanding...
‚úÖ Image Understanding Test Completed
   Response Time: 7.54 seconds



**Image Analysis:**

The image showcases a **Pallas‚Äôs cat** (also known as the *manul*), a small wild feline species native to Central Asia, set against a wintry backdrop. Here‚Äôs a detailed breakdown:  

### 1. The Animal (Pallas‚Äôs Cat)  
- **Physical Appearance**: The cat has a *stout, stocky body* with short legs and a dense, thick coat of fur‚Äîadapted for cold climates. Its fur is a mix of **tawny brown, gray, and black**, with distinct dark stripes running along its cheeks and body. Snowflakes are dusted across its back, indicating recent contact with the snow.  
- **Facial Features**: It has a rounded head, small, tufted ears, and a short muzzle. Dark stripes frame its cheeks, and its eyes are partially squinted (likely due to cold or falling snow), giving it a calm, almost contemplative expression.  
- **Movement**: The cat is captured mid-stride, with one paw lifted, suggesting it is walking through the snow.  


### 2. The Environment  
- **Snow-Covered Ground**: The foreground is blanketed in *white snow*, with subtle texture variations (e.g., faint tracks, patches of exposed ground).  
- **Birch Trees**: In the background, there are **white birch tree trunks** with dark, irregular markings‚Äîtypical of birch forests in cold, northern regions. These trees create a natural, wintry backdrop.  
- **Chain-Link Fence**: Behind the birch trees, a *metal chain-link fence* is visible, indicating this scene likely occurs in a **zoo or wildlife sanctuary** (not a completely wild setting).  
- **Weather**: *Falling snowflakes* are visible in the air, adding dynamism and emphasizing the cold, wintry atmosphere. The lighting is soft and diffused (overcast sky), enhancing the serene, chilly mood.  


### 3. Overall Mood and Context  
The image conveys a sense of *cold resilience*‚Äîthe Pallas‚Äôs cat‚Äôs thick fur and sturdy build are perfectly adapted to this harsh environment. The combination of the cat‚Äôs gentle movement, the falling snow, and the natural backdrop (birch trees) creates a tranquil yet stark portrait of wildlife in a wintry habitat. The fence hints at human intervention, suggesting this is a controlled environment where the cat‚Äôs survival and behavior are observed, while still honoring its natural adaptations.  


This scene beautifully highlights the Pallas‚Äôs cat‚Äôs unique physical traits and its relationship with a cold, snowy ecosystem.

## Cleanup

In [27]:
sess.delete_endpoint(endpoint_name)
sess.delete_endpoint_config(endpoint_config_name)
sess.delete_model(model_name)